We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 17931
    • 24 Posts
    I’ve been having some trouble with character encodings when dealing with the modx site contents from the database (rather than the manager/frontend), and I think I sorted it out.

    There were several causes (for example, my own terminal was not UTF8 capable), but more importantly, I found that the variable
    $database_connection_charset on the manager/includes/config.inc.php is blank after installation.

    This leads MODx to use the default system encoding for the connection, and when this encoding is different from the one chosen for the MODx database, the data is inserted into the database with the system encoding rather than the MODx encoding.

    For example, my system was latin1 but MODx was UTF8. All looked normal from the manager/frontend, but there was no way I could display diacritical characters on the mySQL client (using UTF8 encoding, naturally). What happens was that that data was stored in latin1 encoding! The reason everything was OK in the manager/frontend was just because the data was being sent and retrieved using the same encoding.

    After toying around with several installations, the culprit was found in the $database_connection_charset variable. I have seen this topic
    (http://modxcms.com/forums/index.php/topic,10312.0.html) and this topic (http://modxcms.com/forums/index.php/topic,15170.msg99529.html#msg99529) which would lead me to think that this problem should not exist anymore. If so, this seems like a bug in the installation process. By the way, during installation, I was never asked for the charset, only the collation - maybe that is related to this.

    Should I file a bug?

    regards,
    joão

      • 6726
      • 7,075 Posts
      Thanks for reporting, as you say there are several reports about this.

      The testing team has spotted this, which has to do with how the db charset is determined (based on db collation). A bug is already filed and and we are currently working on fixing it.

      In the meantime, when you upgrade make sure you choose [tt]Advanced Upgrade Install (edit database config)[/tt] and specify the correct info.
        .: COO - Commerce Guys - Community Driven Innovation :.


        MODx est l'outil id
        • 24495
        • 407 Posts
        I’m not a programmer or a specialist in this problem but it seems to me that we should have a "final" solution for problems with UTF8. Today I read this message (in German): http://modxcms.com/forums/index.php/topic,11846.msg146453.html#msg146453
        Skilled developers should read the codes posted there, maybe it helps a bit.
        • That’s copied from an english thread, and we’ve yet to truly identify the full-scope of the problem, but I think it’s going to end up being one aggravated by various PHP versions and their handling of UTF-8 content in general.
            • 17931
            • 24 Posts
            So it’s really a bug in the installation process, right?

            I would suggest that some note about $database_connection_charset is posted in the installation instructions until a new release fixes this, since it can have nasty effects.

            @davidm: I have searched for the bug report, but did not find it. Which one is it?
              • 24495
              • 407 Posts
              Seems to be a notable amount of problems with db charset especially with UTF-8. What about a sub-forum below "Core Code" or "Release Support" for all problems and hints regarding this?

              Btw: I transferred a web site developed with MODx and iso-8859-1 to an other db/web pack at the same hoster. Same configuration on both packs. After transfer I converted the db and all tables in there to utf8. Then I used the db-convert script (http://blog.dopefreshtight.de/artikel/von-iso-8859-1-zu-utf-8-in-php-und-mysql/) to convert the content. After this I corrected all not-converted content and chunks and some minor things to utf-8 compliant characters.
              Yesterday I’ve put the additional lines into the manager files like described here http://modxcms.com/forums/index.php/topic,11846.msg146453.html#msg146453. After that all umlauts and special characters were corrupted. Don’t no why but have corrected all mistakes manually (2 hours undecided). Hope that was it for all time.
                • 17931
                • 24 Posts
                Yesterday I’ve put the additional lines into the manager files like described here http://modxcms.com/forums/index.php/topic,11846.msg146453.html#msg146453. After that all umlauts and special characters were corrupted. Don’t no why but have corrected all mistakes manually (2 hours Undecided). Hope that was it for all time.
                Yes - that’s what happened to me as well. As I said, I think the data was stored in latin1, ie iso-8859-1(5), as that was the default system encoding ending up being used for the connection. This happens despite the fact that the db tables and columns are declared as UTF-8. I had luck with that because the site content is small, so it took me only a few minutes in mysql-query-browser (I just built container pages so far, so I only had to edit the page titles).

                By the way - isn’t this a mysql flaw? Shouldn’t it detect the incoming encoding (based on @@character_set_database vs @@character_set_connection), warn the user, something?

                Seems to be a notable amount of problems with db charset especially with UTF-8. What about a sub-forum below "Core Code" or "Release Support" for all problems and hints regarding this?
                Yes - that should be a subforum, especially to avoid people from language subforums go around trying to reinvent the wheel. Maybe one called "Language / Encoding" (something to that effect) under the "Support" forum, rather than the "Development" forum, where the "Internationalization" is. I suggest this because this is not a developer problem, it’s an actual user problem.

                Notwithstanding, I definitely think there should be a visible instruction/warning during the installation process until a new MODx release fixes this.