MySQL latin1 → utf8 (Wordpress upgrade)
Spurred on by mass hacking, I’ve updated my old (version 2.0.3) Wordpress install to something a little newer.
I decided to err on the side of caution and upgrade a copy of the live DB first - and I’m glad I did. I saw lots of problems with accents and symbols getting munged in the upgrade - well, that’s what I thought.
In reality, the symbols were being screwed in the backup-and-restore process. Words such as “naïve”, £-signs and various typographic quotes were obviously getting messed up through character-encoding issues.
After a bit of sleuthing, I found the output from mysqldump wasn’t valid. The problem was caused by a combination of default connection settings in a .my.cnf, and the fact that the old Wordpress install was storing utf-8 characters inside a latin1 database.
My MySQL default settings are:
[client]
default-character-set=utf8
Because we’re pulling latin1 data over a utf-8 connection, MySQL starts doing character-set conversions, and screws up a bunch of text that it thinks is latin1, but in reality is actually already in utf-8.
The fix for the backup process was to override my default settings with latin1:
$ mysqldump --default-character-set=latin1 --opt -h db.example.com -u user -ppassword schema > db-backup-latin1-20080707.sql
That worked fine, and the next step was reloading it into the DB as utf-8. This required a little bit of string replacement using a command-line utility bundled with MySQL. If you’re going to do this yourself, watch what you type: I somehow typed in “lastin1″ halfway through and lost an hour or so trying to figure out what went wrong. Anyhow, here’s the command-line:
$ replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < db-backup-latin1-20080707.sql > db-backup-utf8-20080707.sql
You should now be able to blat / restore / overwrite your DB and ensure all tables are in the appropriate character set, ready for a smooth wordpress upgrade.
$ mysql --opt -h db.example.com -u user -ppassword schema < db-backup-utf8-20080707.sql
August 14th, 2008 at 12:00 pm
Finally, a sollution that worked! Thanks, mate!
August 22nd, 2008 at 9:28 am
Hi,
can I’ve your support to convert latin1 DB into utf8?
thanks
August 22nd, 2008 at 11:19 am
Hi Dam,
The mysqldump, replace and mysql restore commands aren’t specific to Wordpress, so should work for any MySQL databaase where utf8 data’s accidentally been stored in a latin1 table.
Ash
September 12th, 2008 at 1:30 pm
[...] - bookmarked by 2 members originally found by shleyswt on 2008-09-10 MySQL latin1 → utf8 (Wordpress upgrade) http://hexmen.com/blog/2008/07/mysql-latin1-utf8-wordpress-upgrade/ - bookmarked by 1 members [...]
November 19th, 2008 at 11:25 am
What if I don’t have access to mysqldump but only phpMyAdmin?
Thanks
November 19th, 2008 at 11:56 am
@Paamayim
I’m not a big phpMyAdmin user, but assume you can export & import using the GUI, so (hopefully) it’s just a matter of reading around. I guess you may need to make use of phpMyAdmin’s transformations..?
Drupal has some info on exporting and importing.
Let us know if you figure it out.
November 19th, 2008 at 12:31 pm
Yes, exporting is pretty straightforward with phpMyAdmin, but it lacks of charset options, and I’m not sure if they would be useful in this case. I didn’t know about “transformations”, but it seems to be some painfull stuff.
By the way, I exported the tabels of my interest, opened the sql text file in notepad++ or any other advanced texte editor, than modified charset=latin1 to utf8 option in create statemente, and then, -by hand-
made search-replace of screwed up charachters and finally reimported the whole zipped file.
November 19th, 2008 at 3:02 pm
@Paamayim
When all else fails, brute-force is always an option…
(masochist!)
November 30th, 2008 at 3:31 pm
LOL… thanks for this. I was just about to do a HUGE, manual search and replace of my database when I searched on the off chance - and found this post.