We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 22851
    • 805 Posts
    I’ve got a modx 0.9.6 installation that I am trying to upgrade to modx 0.9.6.1p2. It uses a utf-8 encoded mysql database with the utf8_unicode_ci collation. The installation of 0.9.6.1p2 completes successfully. However, all Japanese text is subsequently displayed as if it’s being interpreted using an 8-bit encoding. I have checked the database and all the tables are still marked at utf8_unicode_ci, so I am guessing that this must be a config problem. I have looked in config.inc.php. This contains the correct database details, including

    $database_connection_charset = ’utf8’;

    so, that’s okay. I am not sure what else to do diagnose and correct this problem.

    Please can someone here point me in the right direction?

    Thank you,

    Paul
      YAMS: Yet Another Multilingual Solution for MODx
      YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
      Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
    • Encoding is still a bit of a black art to me, and a lot of others, I suspect. Where are the Japanese characters being killed? In the Manager? In the database? On the frontend web pages?
        Studying MODX in the desert - http://sottwell.com
        Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
        Join the Slack Community - http://modx.org
        • 22851
        • 805 Posts
        The characters appear incorrectly in the manager and on the frontend web pages. The database is fine. I know that because I just tried restoring it from the 0.9.6 backup and I get the same problem.

        Paul
          YAMS: Yet Another Multilingual Solution for MODx
          YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
          Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
        • Do a view source on the pages in your browser, and see what charset they’re claiming to be. Also might try running a web page through the W3C validator and see if it is OK with the charsets declared in the headers and the meta tags.
            Studying MODX in the desert - http://sottwell.com
            Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
            Join the Slack Community - http://modx.org
            • 22851
            • 805 Posts
            The meta tag in the content says:
            <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" ></meta>

            I checked the response headers sent by the server as well. They say

            Content-Type: application/xhtml+xml; charset=UTF-8

            as well. It is as if each single byte that is coming out of the database is being treated as a unicode character and then being converted into a utf-8 byte sequence - effectively double encoded.

            Paul
              YAMS: Yet Another Multilingual Solution for MODx
              YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
              Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
              • 22851
              • 805 Posts
              The test site I am upgrading isn’t live, so I had to submit the source of the page content to the w3 validator, rather than getting it to request the page itself and so be able to check headers etc. The source validates anyway. The validator is only checking the xml structure. "æ´»æƒ…å ±ã‚’" is perfectly acceptable content as far as it is concerned.

              Paul
                YAMS: Yet Another Multilingual Solution for MODx
                YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
                Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
              • I had in mind if it gave a warning of a character set mismatch. You won’t get that from a file, since there’s no HTML headers being sent. I don’t know what else to suggest. I’ve had similar things happen, and never did get them completely sorted out, but not with a non-European language.
                  Studying MODX in the desert - http://sottwell.com
                  Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
                  Join the Slack Community - http://modx.org
                  • 22851
                  • 805 Posts
                  I downloaded a webpage and opened the page in an editor. It identifies the source as being UTF-8 without BOM. However, the content appears incorrectly. So, it is UTF-8 encoded... it’s just that some additional transformation has been made to the original UTF-8 that preserves its UTF-8-ness. That supports my double-encoding theory.

                  Is it possible that MODx incorrectly thinks that the database is supplying it with ascii (or some other encoding) and so converting it to UTF-8? Or perhaps it is something to do with collations? Maybe it is assuming the more standard(?) utf8_general_ci instead of utf8_unicode_ci?

                  Paul
                    YAMS: Yet Another Multilingual Solution for MODx
                    YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
                    Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
                    • 22851
                    • 805 Posts
                    Ah ha... I may have found it. In my first post I said:

                    This contains the correct database details, including

                    $database_connection_charset = ’utf8’;

                    so, that’s okay.

                    ... is that really okay? Firstly, should it be ’utf-8’? I think not in this case, but I should investigate. Secondly, I checked the config files for my fresh install 0.9.6.1p2 sites, and they have

                    $database_connection_charset = ’’;

                    and they seem to be working fine. So, I am about to redo the upgrade, having changed my config file to use ’’ instead of ’utf8’ and see what happens. I’ll report back...

                    Paul
                      YAMS: Yet Another Multilingual Solution for MODx
                      YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
                      Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.
                      • 22851
                      • 805 Posts
                      OK. I restored my 0.9.6 install and I noticed that the original config already says:

                      $database_connection_charset = ’’;

                      So it’s actually MODx which, during the installation process, is changing the config.inc.php file to say

                      $database_connection_charset = ’utf8’;

                      So, I thought I would change it to

                      $database_connection_charset = ’utf8’;

                      before doing the upgrade to 0.9.6.1p2, to see if that made a difference to the end result. It doesn’t  sad. I get the same problem and after the install the config file says

                      $database_connection_charset = ’utf8’;

                      which I think is correct anyway. I’ve run out of ideas here.

                      Paul
                        YAMS: Yet Another Multilingual Solution for MODx
                        YAMS Forums | Latest: YAMS 1.1.9 | YAMS Documentation
                        Please consider donating if you appreciate the time and effort spent developing and supporting YAMS.