MySQL latin1 → utf8 (Wordpress upgrade)

Spurred on by mass hacking, I’ve updated my old (version 2.0.3) Wordpress install to something a little newer.

I decided to err on the side of caution and upgrade a copy of the live DB first - and I’m glad I did. I saw lots of problems with accents and symbols getting munged in the upgrade - well, that’s what I thought.

In reality, the symbols were being screwed in the backup-and-restore process. Words such as “naïve”, £-signs and various typographic quotes were obviously getting messed up through character-encoding issues.

After a bit of sleuthing, I found the output from mysqldump wasn’t valid. The problem was caused by a combination of default connection settings in a .my.cnf, and the fact that the old Wordpress install was storing utf-8 characters inside a latin1 database.

My MySQL default settings are:


[client]
default-character-set=utf8

Because we’re pulling latin1 data over a utf-8 connection, MySQL starts doing character-set conversions, and screws up a bunch of text that it thinks is latin1, but in reality is actually already in utf-8.

The fix for the backup process was to override my default settings with latin1:


$ mysqldump --default-character-set=latin1 --opt -h db.example.com -u user -ppassword schema > db-backup-latin1-20080707.sql

That worked fine, and the next step was reloading it into the DB as utf-8. This required a little bit of string replacement using a command-line utility bundled with MySQL. If you’re going to do this yourself, watch what you type: I somehow typed in “lastin1″ halfway through and lost an hour or so trying to figure out what went wrong. Anyhow, here’s the command-line:


$ replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < db-backup-latin1-20080707.sql > db-backup-utf8-20080707.sql

You should now be able to blat / restore / overwrite your DB and ensure all tables are in the appropriate character set, ready for a smooth wordpress upgrade.


$ mysql --opt -h db.example.com -u user -ppassword schema < db-backup-utf8-20080707.sql

9 Responses to “MySQL latin1 → utf8 (Wordpress upgrade)”

  1. Marko Says:

    Finally, a sollution that worked! Thanks, mate!

  2. dam Says:

    Hi,

    can I’ve your support to convert latin1 DB into utf8?

    thanks

  3. Ash Says:

    Hi Dam,

    The mysqldump, replace and mysql restore commands aren’t specific to Wordpress, so should work for any MySQL databaase where utf8 data’s accidentally been stored in a latin1 table.

    Ash

  4. Bookmarks about Mysql Says:

    [...] - bookmarked by 2 members originally found by shleyswt on 2008-09-10 MySQL latin1 → utf8 (Wordpress upgrade) http://hexmen.com/blog/2008/07/mysql-latin1-utf8-wordpress-upgrade/ - bookmarked by 1 members [...]

  5. Paamayim Says:

    What if I don’t have access to mysqldump but only phpMyAdmin?

    Thanks

  6. Ash Says:

    @Paamayim

    I’m not a big phpMyAdmin user, but assume you can export & import using the GUI, so (hopefully) it’s just a matter of reading around. I guess you may need to make use of phpMyAdmin’s transformations..?

    Drupal has some info on exporting and importing.

    Let us know if you figure it out.

  7. Paamayim Says:

    Yes, exporting is pretty straightforward with phpMyAdmin, but it lacks of charset options, and I’m not sure if they would be useful in this case. I didn’t know about “transformations”, but it seems to be some painfull stuff.

    By the way, I exported the tabels of my interest, opened the sql text file in notepad++ or any other advanced texte editor, than modified charset=latin1 to utf8 option in create statemente, and then, -by hand- :( made search-replace of screwed up charachters and finally reimported the whole zipped file.

  8. Ash Says:

    @Paamayim

    When all else fails, brute-force is always an option…

    (masochist!) ;-)

  9. Web Proxy Blog Says:

    LOL… thanks for this. I was just about to do a HUGE, manual search and replace of my database when I searched on the off chance - and found this post.

Leave a Reply