We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 3749
    • 24,544 Posts
    @jason,

    I hope these are simple questions. smiley

    I’m starting to work on re-doing the migration of my old site to 0.9.7 (and making some changes to the info at the Bobs Guides page).

    The character sets and collations don’t match (the old site is in Latin1, the new one is UTF-8).

    I exported the data only, but there’s still character set and collation info in the SQL dump files.

    1. I’m wondering if it’s better to just delete the lines that specify the charset and collation, or to edit them to match the new DB.

    2. I’m wondering how likely it is that my troubles were caused by charset issues.

    3. The old site is on a php4 machine and the new one, php5. I’m assuming that moving just the data should make this a non-issue?

    Bob


    Bob
      Did I help you? Buy me a beer
      Get my Book: MODX:The Official Guide
      MODX info for everyone: http://bobsguides.com/modx.html
      My MODX Extras
      Bob's Guides is now hosted at A2 MODX Hosting
    • I wish I had better news, but this is one of the most difficult questions in modern web development (next to identity/single sign-on). Here’s some articles that discuss the problem.

      http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL
      http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

      I try and avoid the issue by always starting out with utf8, but i know that does not solve your problem. I’ll try and dig up some more ways to solve this tomorrow, but essentially, there is no easy answer.
        • 3749
        • 24,544 Posts
        Quote from: OpenGeek at Feb 17, 2008, 07:40 AM

        I wish I had better news, but this is one of the most difficult questions in modern web development (next to identity/single sign-on). Here’s some articles that discuss the problem.

        http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL
        http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

        I try and avoid the issue by always starting out with utf8, but i know that does not solve your problem. I’ll try and dig up some more ways to solve this tomorrow, but essentially, there is no easy answer.

        Thanks. I’ve struggled with this before and have been pretty careful to use UTF-8 on my new installs, but most of my early MODx installs used the defaults at the remote host and ended up as Latin1.

        I hadn’t thought about the need to convert the characters in the data. I’m reading up on various methods for doing that.

        Unfortunately, all my reading so far makes me *very* hesitant to convert any of the old production sites’ databases.

        I did find this interesting and useful info in the MySQL docs. It clarified some things I had wondered about:

        The database character set and collation are used as default values if the table character set and collation are not specified in CREATE TABLE statements. The database character set also is used by LOAD DATA INFILE. The character set and collation have no other purposes.

        And here’s a fairly clear piece on conversion issues and where to look for the current settings:

        http://wiki.refbase.net/index.php/Troubleshooting#MySQL_migration_and_character_set_problems

        The attached image shows the mess that exists at my oldest remote site (the one I was trying to migrate to 0.9.7):


        Maybe there should be some reference to character sets on the "moving sites to a new server" page?


        Bob
          Did I help you? Buy me a beer
          Get my Book: MODX:The Official Guide
          MODX info for everyone: http://bobsguides.com/modx.html
          My MODX Extras
          Bob's Guides is now hosted at A2 MODX Hosting
          • 3749
          • 24,544 Posts
          After further delving, I found that the iconv() function is included in php5, and this code will (in theory) do the conversion:

          <?php
            // Simple file translation.
          
          $FileToconvert = "fname";
          $FileConverted = "fname2";
          
          echo "Converting $FileToconvert ...";
          
          file_put_contents($FileConverted, iconv("ISO-8859-1","UTF-8",file_get_contents($FileToconvert))); 
          
          echo "File converted in $FileConverted"; 
          ?>


          I ran it (in PhpEd) on the site-contents and site-templates SQL dump files. I did a byte-by-byte comparison with Windows Comp and found no differences, and the files are the same size, so apparently, all the characters I have in those files are expressed the same way in both charsets.

          The iconv() function does *not* change the default charset and collation statements in the .sql files, but I’m assuming that those would be ignored for a table that already exists, so it seems less likely that charset encoding caused any of my problems.

          When I have time, I’ll try to look at whether the conversion from php4 to php5 was an issue. I didn’t use any compatibility mode on either the export or the import that could have been an issue.

          Bob
            Did I help you? Buy me a beer
            Get my Book: MODX:The Official Guide
            MODX info for everyone: http://bobsguides.com/modx.html
            My MODX Extras
            Bob's Guides is now hosted at A2 MODX Hosting
            • 3749
            • 24,544 Posts
            I spoke too soon about the charset issue. I ran iconv on the snippets, chunks, and categories tables and the converted files were significantly larger, so importing the unconverted tables may have corrupted my DB after all.

            I’ve created a snippet that generates the SQL to (in theory) safely convert an existing MODx DB to UTF-8. If it works (or can be made to work) it may take some of the charset issues off the table.

            http://modxcms.com/forums/index.php/topic,22960.0.html

            It’s untested until I can create a decent sandbox for it.

            Bob


              Did I help you? Buy me a beer
              Get my Book: MODX:The Official Guide
              MODX info for everyone: http://bobsguides.com/modx.html
              My MODX Extras
              Bob's Guides is now hosted at A2 MODX Hosting
              • 7231
              • 4,205 Posts
              I used iconv on a db file to convert from UTF8 to iso and it worked great. However I used the unix function not the php one (not sure if it make a difference). One thig to notice is that this will only convert text fields, varchar will not be converted with iconv (as far as I know).
                [font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

                Something is happening here, but you don&#39;t know what it is.
                Do you, Mr. Jones? - [bob dylan]
                • 7231
                • 4,205 Posts
                The iconv() function does *not* change the default charset and collation statements in the .sql files, but I’m assuming that those would be ignored for a table that already exists, so it seems less likely that charset encoding caused any of my problems.
                In the case I described above I changed these manually directly in the sql file before importing back to the db.
                  [font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

                  Something is happening here, but you don&#39;t know what it is.
                  Do you, Mr. Jones? - [bob dylan]