Spurred on by mass hacking, I’ve updated my old (version 2.0.3) WordPress install to something a little newer.

I decided to err on the side of caution and upgrade a copy of the live DB first – and I’m glad I did. I saw lots of problems with accents and symbols getting munged in the upgrade – well, that’s what I thought.

In reality, the symbols were being screwed in the backup-and-restore process. Words such as “naïve”, £-signs and various typographic quotes were obviously getting messed up through character-encoding issues.

After a bit of sleuthing, I found the output from mysqldump wasn’t valid. The problem was caused by a combination of default connection settings in a .my.cnf, and the fact that the old WordPress install was storing utf-8 characters inside a latin1 database.

My MySQL default settings are:


[client]
default-character-set=utf8

Because we’re pulling latin1 data over a utf-8 connection, MySQL starts doing character-set conversions, and screws up a bunch of text that it thinks is latin1, but in reality is actually already in utf-8.

The fix for the backup process was to override my default settings with latin1:


$ mysqldump --default-character-set=latin1 --opt -h db.example.com -u user -ppassword schema > db-backup-latin1-20080707.sql

That worked fine, and the next step was reloading it into the DB as utf-8. This required a little bit of string replacement using a command-line utility bundled with MySQL. If you’re going to do this yourself, watch what you type: I somehow typed in “lastin1” halfway through and lost an hour or so trying to figure out what went wrong. Anyhow, here’s the command-line:


$ replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < db-backup-latin1-20080707.sql > db-backup-utf8-20080707.sql

You should now be able to blat / restore / overwrite your DB and ensure all tables are in the appropriate character set, ready for a smooth wordpress upgrade.


$ mysql --opt -h db.example.com -u user -ppassword schema < db-backup-utf8-20080707.sql

Posted in Tip

11 thoughts on “MySQL latin1 → utf8 (WordPress upgrade)

  1. Hi Dam,

    The mysqldump, replace and mysql restore commands aren’t specific to WordPress, so should work for any MySQL databaase where utf8 data’s accidentally been stored in a latin1 table.

    Ash

  2. Yes, exporting is pretty straightforward with phpMyAdmin, but it lacks of charset options, and I’m not sure if they would be useful in this case. I didn’t know about “transformations”, but it seems to be some painfull stuff.

    By the way, I exported the tabels of my interest, opened the sql text file in notepad++ or any other advanced texte editor, than modified charset=latin1 to utf8 option in create statemente, and then, -by hand- 🙁 made search-replace of screwed up charachters and finally reimported the whole zipped file.

Comments are closed.