We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 22098
    • 218 Posts
    Hello,

    we have a lot of multi-language websites running, and it seems that inbetween MODx releases the UTF-8 encoding handling changes. It seems that version 0.9.6.1 P2 worked allright, that 0.9.6.2 didn’t work correctly for all characters, and that 0.9.6.3 works different but still not ok.

    We have a test string we’re using:

    ŤťŮůČčĎďĚěŇňŘř | Iñtërnâtiônàlizætiøn

    and we test this by entering it in the page description field. We have configured MODx encoding to UTF-8 in the configuration settings.
    In 0.9.6.1 p2 this string is saved and recalled correctly. In later versions this string is not saved and/or recalled correctly.

    Olaf
    • I’ve tested with this and cannot get any of the fields to fail. I suspect what you are seeing is that the actual settings of the table are the default MySQL default (Latin1) but you’re using the Set Charset method and trying to force UTF-8. The solution is to switch the connection method to SET NAMES, or actually force the db to be created with the proper connection methods. 0961p2 used the SET NAMES method which does internal conversions between charsets but can make for extra migration effort if and when you ever need to use true UTF8 data.
        Ryan Thrash, MODX Co-Founder
        Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
        • 22098
        • 218 Posts
        Quote from: rthrash at Feb 11, 2009, 12:41 PM

        I’ve tested with this and cannot get any of the fields to fail. I suspect what you are seeing is that the actual settings of the table are the default MySQL default (Latin1) but you’re using the Set Charset method and trying to force UTF-8. The solution is to switch the connection method to SET NAMES, or actually force the db to be created with the proper connection methods. 0961p2 used the SET NAMES method which does internal conversions between charsets but can make for extra migration effort if and when you ever need to use true UTF8 data.

        i’m simply using the MODx cms editor, i am not using any set charset or set names. Where should i do this, ie where can i change the connection method?

        Olaf
        • This is done during upgrades or installation.
            Ryan Thrash, MODX Co-Founder
            Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
            • 22098
            • 218 Posts
            Quote from: rthrash at Feb 11, 2009, 12:47 PM

            This is done during upgrades or installation.

            and where can i change it afterwards?
              • 22098
              • 218 Posts
              One of our developers mentioned that due to some bug in some versions of mySQL it’s adviced to always use SET NAMES when using UTF-8 on a mySQL database, because sometimes it’s not automatically set.

              Anyway, the strange thing is that the version which works correctly has this in the config:

              $database_connection_charset = ’’;

              while the version that doesn’t work ok has:

              $database_connection_charset = ’utf8’;
              $database_connection_method = ’SET NAMES’;

              UPDATE:

              after modifying the config of a not correctly working version from

              $database_connection_charset = ’utf8’;

              to

              $database_connection_charset = ’’;

              it worked fine! It didn’t make any difference if i commented the set names out or not.
              Strange, i would think that it should be the other way around?

              Olaf
              • You should be able to re-rerun the installer in upgrade mode. These lines should match how your DB is actually set up, which you can find out from phpMyAdmin I think:
                $database_connection_charset = 'utf8';
                $database_connection_method = 'SET CHARACTER SET';


                If you’ve used SET NAMES in the past, I suspect you need to match your connection charset to match your actual DB environment, or upgrade to a DB version that doesn’t exhibit the bug. Proper UTF8 data is going to be increasingly important in Revolution ... a mandatory in fact.
                  Ryan Thrash, MODX Co-Founder
                  Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
                  • 22098
                  • 218 Posts
                  i’ve tested some more... it seems that MODx always gives a table collation of "latin1_swedish_ci" .... by setting the $database_connection_charset empty everything goes allright, but when setting this to UTF8 while the table collation is different it goes wrong for some characters.
                  By setting it to $database_connection_charset = ’latin1_swedish_ci’; everything goes ok again too.
                  I guess when the MODx default table collation would be UTF8 setting the $database_connection_charset = ’utf8’ would work fine too.

                  Olaf
                  • That means your tables themselves were actually created with latin1_swedish_ci collations, which happens to be the MySQL default. A mismatch in the MODx configuration would indeed cause the problems you’re seeing. SET NAMES uses internal translation routines to translate between charsets to give you quasi-utf8. And now that I’ve stated that someone who actually understands this stuff can correct me. tongue
                      Ryan Thrash, MODX Co-Founder
                      Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
                      • 22098
                      • 218 Posts
                      Quote from: rthrash at Feb 11, 2009, 02:00 PM

                      That means your tables themselves were actually created with latin1_swedish_ci collations, which happens to be the MySQL default. A mismatch in the MODx configuration would indeed cause the problems you’re seeing. SET NAMES uses internal translation routines to translate between charsets to give you quasi-utf8. And now that I’ve stated that someone who actually understands this stuff can correct me. tongue

                      Our developers looked into it some more, it seems that when installing MODx with the installer, you can choose the character set encoding, and it’s set correctly in the $database_connection_charset variable in the config.php, but the creation of the tables themselves don’t force the chosen collation on the tables, and thus they are always the default mySQL collation (latin1_swedish_ci’). So it seems in the installation script the table create statements have to be extended with a character_set and collation setting (http://dev.mysql.com/doc/refman/5.0/en/charset-table.html). If this is done correctly the MODx tables will have a correct UTF8 collation, the config.php setting will also be corresponding to UTF8 and thus everything will work ok.

                      Olaf