-
- 218 Posts
Hello,
we have a lot of multi-language websites running, and it seems that inbetween MODx releases the UTF-8 encoding handling changes. It seems that version 0.9.6.1 P2 worked allright, that 0.9.6.2 didn’t work correctly for all characters, and that 0.9.6.3 works different but still not ok.
We have a test string we’re using:
ŤťŮůČčĎďĚěŇňŘř | Iñtërnâtiônàlizætiøn
and we test this by entering it in the page description field. We have configured MODx encoding to UTF-8 in the configuration settings.
In 0.9.6.1 p2 this string is saved and recalled correctly. In later versions this string is not saved and/or recalled correctly.
Olaf
I’ve tested with this and cannot get any of the fields to fail. I suspect what you are seeing is that the actual settings of the table are the default MySQL default (Latin1) but you’re using the Set Charset method and trying to force UTF-8. The solution is to switch the connection method to SET NAMES, or actually force the db to be created with the proper connection methods. 0961p2 used the SET NAMES method which does internal conversions between charsets but can make for extra migration effort if and when you ever need to use true UTF8 data.
Ryan Thrash, MODX Co-Founder
Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
This is done during upgrades or installation.
Ryan Thrash, MODX Co-Founder
Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
-
- 218 Posts
One of our developers mentioned that due to some bug in some versions of mySQL it’s adviced to always use SET NAMES when using UTF-8 on a mySQL database, because sometimes it’s not automatically set.
Anyway, the strange thing is that the version which works correctly has this in the config:
$database_connection_charset = ’’;
while the version that doesn’t work ok has:
$database_connection_charset = ’utf8’;
$database_connection_method = ’SET NAMES’;
UPDATE:
after modifying the config of a not correctly working version from
$database_connection_charset = ’utf8’;
to
$database_connection_charset = ’’;
it worked fine! It didn’t make any difference if i commented the set names out or not.
Strange, i would think that it should be the other way around?
Olaf
-
- 218 Posts
i’ve tested some more... it seems that MODx always gives a table collation of "latin1_swedish_ci" .... by setting the $database_connection_charset empty everything goes allright, but when setting this to UTF8 while the table collation is different it goes wrong for some characters.
By setting it to $database_connection_charset = ’latin1_swedish_ci’; everything goes ok again too.
I guess when the MODx default table collation would be UTF8 setting the $database_connection_charset = ’utf8’ would work fine too.
Olaf