<![CDATA[ Changing character sets [split from Re: Migrating to 0.9.7] - My Forums]]> https://forums.modx.com/thread/?thread=32128 <![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174051 The iconv() function does *not* change the default charset and collation statements in the .sql files, but I’m assuming that those would be ignored for a table that already exists, so it seems less likely that charset encoding caused any of my problems.In the case I described above I changed these manually directly in the sql file before importing back to the db.]]> dev_cw Feb 18, 2008, 03:11 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174051 <![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174050 dev_cw Feb 18, 2008, 03:07 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174050 <![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174049
I’ve created a snippet that generates the SQL to (in theory) safely convert an existing MODx DB to UTF-8. If it works (or can be made to work) it may take some of the charset issues off the table.

http://modxcms.com/forums/index.php/topic,22960.0.html

It’s untested until I can create a decent sandbox for it.

Bob


]]>
BobRay Feb 18, 2008, 03:04 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174049
<![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174048
<?php
  // Simple file translation.

$FileToconvert = "fname";
$FileConverted = "fname2";

echo "Converting $FileToconvert ...";

file_put_contents($FileConverted, iconv("ISO-8859-1","UTF-8",file_get_contents($FileToconvert))); 

echo "File converted in $FileConverted"; 
?>


I ran it (in PhpEd) on the site-contents and site-templates SQL dump files. I did a byte-by-byte comparison with Windows Comp and found no differences, and the files are the same size, so apparently, all the characters I have in those files are expressed the same way in both charsets.

The iconv() function does *not* change the default charset and collation statements in the .sql files, but I’m assuming that those would be ignored for a table that already exists, so it seems less likely that charset encoding caused any of my problems.

When I have time, I’ll try to look at whether the conversion from php4 to php5 was an issue. I didn’t use any compatibility mode on either the export or the import that could have been an issue.

Bob
]]>
BobRay Feb 17, 2008, 02:53 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174048
<![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174047 Quote from: OpenGeek at Feb 17, 2008, 07:40 AM

I wish I had better news, but this is one of the most difficult questions in modern web development (next to identity/single sign-on). Here’s some articles that discuss the problem.

http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

I try and avoid the issue by always starting out with utf8, but i know that does not solve your problem. I’ll try and dig up some more ways to solve this tomorrow, but essentially, there is no easy answer.

Thanks. I’ve struggled with this before and have been pretty careful to use UTF-8 on my new installs, but most of my early MODx installs used the defaults at the remote host and ended up as Latin1.

I hadn’t thought about the need to convert the characters in the data. I’m reading up on various methods for doing that.

Unfortunately, all my reading so far makes me *very* hesitant to convert any of the old production sites’ databases.

I did find this interesting and useful info in the MySQL docs. It clarified some things I had wondered about:

The database character set and collation are used as default values if the table character set and collation are not specified in CREATE TABLE statements. The database character set also is used by LOAD DATA INFILE. The character set and collation have no other purposes.

And here’s a fairly clear piece on conversion issues and where to look for the current settings:

http://wiki.refbase.net/index.php/Troubleshooting#MySQL_migration_and_character_set_problems

The attached image shows the mess that exists at my oldest remote site (the one I was trying to migrate to 0.9.7):


Maybe there should be some reference to character sets on the "moving sites to a new server" page?


Bob]]>
BobRay Feb 17, 2008, 12:33 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174047
<![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174046
http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

I try and avoid the issue by always starting out with utf8, but i know that does not solve your problem. I’ll try and dig up some more ways to solve this tomorrow, but essentially, there is no easy answer.]]>
opengeek Feb 17, 2008, 01:40 AM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174046
<![CDATA[Changing character sets [split from Re: Migrating to 0.9.7]]]> https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174045
I hope these are simple questions. smiley

I’m starting to work on re-doing the migration of my old site to 0.9.7 (and making some changes to the info at the Bobs Guides page).

The character sets and collations don’t match (the old site is in Latin1, the new one is UTF-8).

I exported the data only, but there’s still character set and collation info in the SQL dump files.

1. I’m wondering if it’s better to just delete the lines that specify the charset and collation, or to edit them to match the new DB.

2. I’m wondering how likely it is that my troubles were caused by charset issues.

3. The old site is on a php4 machine and the new one, php5. I’m assuming that moving just the data should make this a non-issue?

Bob


Bob
]]>
BobRay Feb 16, 2008, 11:59 PM https://forums.modx.com/thread/32128/changing-character-sets-split-from-re-migrating-to-0-9-7#dis-post-174045