MySQL replication has become quite sturdy over the years, but there are some things it still does not like. One of this is changing the table definition on a slave, without changing it on the master or changing the table definition on the master and disabling binary logging for the respective session. Both scenarios will cause any slave to stop, because he cannot convert to the data type he has for this column.
I once observed a similar scenario on a customers set up. They wanted to save time when changing the ID column of several tables, and executed a script that worked more or less parallel. This would have been no problem, if they did the change via the replication, which would eventually do the change on all connected servers. The replication was still running when they started the change an there was a considerable load on the database. The result was, that all slaves had stopped their work.
First we need to find the reason, why the slave stopped. “SHOW SLAVE STATUS\G” will show you an error like this:
Column 0 of table 'test.table' cannot be converted from type 'int' to type 'bigint(20) unsigned'
This error means, that there is a mismatch between the definitions for the first column of the table “table”. You can find the difference by issuing “SHOW CREATE TABLE table\G” on the master and on the slave. In order for the replication to work, both definitions have to be same.
But what if the definitions are the same and the replication is still not working? The statement in the binary log expects a certain data type of the field it should write to. If the data types do not match, the slave throws an error and stops its work.
If you want to simulate this error, you can do it quite easy. Set up a standard replication (Master-Slave), then create a new table, where you can test on.
USE test; CREATE TABLE table ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, data VARCHAR(50) DEFAULT NULL, ts TIMESTAMP NOT NULL, PRIMARY KEY(id) ) ENGINE = InnoDB;
Then continue with the altering of the table definition. Do it on the master, so you have the same scenario.
USE test; SET SESSION sql_log_bin = 0; ALTER TABLE table CHANGE `id` `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT;
The IO-Thread will keep going, but the SQL-Thread will stop. The problem here is, that the slave is not willing to accept the changed data type. Some may say this is bad behaviour, but IMHO it is the correct behaviour regarding data integrity.
It is easy to solve the problem. Just get into the MySQL-CLI and enter the following commands:
SET GLOBAL slave_type_conversions = 'all_lossy, all_non_lossy'; STOP SLAVE; START SLAVE;
What does this do? The problem here is, that the replication does not want to convert data to a different data type.
This will allow the slave to get back in sync with its master. After he is done, issue the following commands in the MySQL-CLI:
SET GLOBAL slave_type_conversions = ''; STOP SLAVE; START SLAVE;
Now check your replication again. You should see, that the slave is catching up with the master. Depending on how much has changed since the replication failed, this can take from seconds to hours.