As to why we must convert MySQL database's character set from whatever utf8 to utf8mb4, please refer to the article "How to support full Unicode in MySQL databases".

I rewrite the manual steps specified in that article here, with additional information about using long indexes. Then I will introduce the automation tool. With the Ruby tool, you don't have to go through all the miserable manual steps.

 

Prerequisites:

1 - Backup the database.

2 - Upgrade MySQL server to newer version than 5.5.3.

3 - Long indexes:

If you want to use long indexes that can index utf8mb4 VARCHAR columns that are longer than 191 characters, you must have the following parameters set:

innodb_file_per_table : ON
innodb_large_prefix : ON
innodb_file_format : Barracuda 
innodb_file_format_max : Barracuda

Also, each table's ROW_FORMAT in the database must be DYNAMIC. 

From MySQL console, you can verify those parameters with the following commands:

SHOW VARIABLES LIKE "%innodb_file_per_table%";
SHOW VARIABLES LIKE "%innodb_file_format%";
SHOW VARIABLES LIKE "%innodb_large_prefix%";

From a Rails console, you can verify them using the following commands:

connection = ActiveRecord::Base.connection
connection.execute('SHOW VARIABLES LIKE "%innodb_file_per_table%"').to_a
connection.execute('SHOW VARIABLES LIKE "%innodb_file_format%"').to_a
connection.execute('SHOW VARIABLES LIKE "%innodb_large_prefix%"').to_a

 

In Rails console, if you connect with SUPERUSER privilege, you can set those parameters like this:

connection = ActiveRecord::Base.connection
connection.execute('set global innodb_file_per_table = `ON`')
connection.execute('set global innodb_file_format = `Barracuda`')
connection.execute('set global innodb_file_format_max = `Barracuda`')
connection.execute('set global innodb_large_prefix = `ON`')
connection.execute('commit')

 

Of course if you login MySQL console with the SUPERUSER privilege, you can set those parameters too.

 

Manual conversion steps:

1- Convert the whole database, each table, each column: 

# 1 - Convert the database:
ALTER DATABASE #{database_name} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;   # 2 - Convert each table: For all the tables in the system,  you must do these 2 commands: ALTER TABLE #{table_name} ROW_FORMAT=DYNAMIC; ALTER TABLE #{table_name} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;   # 3 - Convert all VARCHAR/TEXT columns in all the tables in the system. You must run one command, select the correct one depends on the type/length of each column:

# For VARCHAR columns that have lengths between 1 and 255: ALTER TABLE #{table_name} CHANGE #{column_name} #{column_name]} VARCHAR(#{column_length}) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # For TEXT columns that have the lengths smaller than 65535/4: ALTER TABLE #{table_name} CHANGE #{column_name} #{column_name]} TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # For TEXT columns that have the lengths greater than 65535/4: ALTER TABLE #{table_name} CHANGE #{column_name} #{column_name]} MEDIUMTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

# For TEXT columns that have the lengths smaller than 65535/4: ALTER TABLE #{table_name} CHANGE #{column_name} #{column_name]}  TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # For TEXT columns that have the lengths greater than 65535/4: ALTER TABLE #{table_name} CHANGE #{column_name} #{column_name]}  MEDIUMTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci  

 

2- Modify connection, client, server character set:

In your application code, set the connection character set to utf8mb4. This can be done by simply replacing any variants of SET NAMES utf8 with SET NAMES utf8mb4. If your old SET NAMES statement specified the collation, make sure to change that as well, e.g. SET NAMES utf8 COLLATE utf8_unicode_ci becomes SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci.

Make sure to set the client and server character set as well. I have the following in my MySQL configuration file (/etc/my.cnf):

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

You can easily confirm these settings work correctly:

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

 

3- Repair and optimize all tables:

After upgrading the MySQL server and making the necessary changes explained above, make sure to repair and optimize all databases and tables.  If you  don't perform those repairing and optimizing, you may run into some weird bugs where UPDATE statements don't  have any effect, even though no errors were thrown.

You could run the following MySQL queries for each table you want to repair and optimize:

# For each table
REPAIR TABLE table_name;
OPTIMIZE TABLE table_name;

Luckily, this can easily be done in one go using the command-line mysqlcheck utility:

$ mysqlcheck -u root -p --auto-repair --optimize --all-databases

 

This will prompt for the root user’s password, after which all tables in all databases will be repaired and optimized.

Automate all the manual steps with Ruby program:

I wrote a Ruby script as an ActiveRecord migration script that automated  all the manual steps above. There is also a test script that can be run with `rails runner` to verify the correctness of the conversion. Of course you can modify them into Ruby programs that can be called in stand-alone Ruby environment without using Rails.

Here is the repo of the script: convert_mysql_to_utf8mb4