I like to test backups with a restore. Database restores work pretty well most of the time, as long as the backup file is not corrupt, binary logs can be found, etc. There is a plethora of blog posts on this and software to make it easy. I have some control over that. However, I do not always have control over the server backups.
Server backups are SUPER easy in AWS. I love them. Snapshots may cause problems taking them but the restore – piece of cake. Now I’m in a Rackspace Cloud (RAX) and I’m not sure how this backing up is working? So I push and push to do a restore. Of course I first upgrade the database so we can go back to the old version. I stop mysql and tell the SysAdmin to start the restore.
First off, the restore on this small server took OVER THREE HOURS???!!! WTF? really? And I used to complain about the 15 min AWS would take… scheesh.
Everything seems to be working but it’s not – only I don’t know this. A week later I go to do the update to the Master/Slave pair (the slave is the one that I tested the upgrade and them RAX restore on). The Master server is going just as planned, everything is so easy. But the slave is throwing errors right and left. Databases can’t be dropped, then after they are created data can’t be loaded into the mysql tables. Really, it’s a mess.
I panic on the drop database mytest; when it fails and leaves behind a couple of ibd files. I know those are special and I’m leery of just running a rm -r mytest. I try a bunch of things but in the end I run the rm -r mytest, then go into mysql and create database mytest; drop database mytest; I’m good to go.
The mysql database was a bigger problem. It was such a mess that I just removed everything. I mean EVERYTHING and then ran: mysql_install_db
After that I started it up with the cd /usr ; /usr/bin/mysqld_safe &
Then I needed the mysql user data. Not to worry, I already had that but the number of columns did not match as they were 5.5 and this server had 5.6 installed. Again no big thing I just went to a server that was still 5.5 and ran this:
mysql -BNr mysql -e “desc user” | cut -f1 > user-cols
Then I added “,” at the end of each line and wrapped it in “( )” and added that into the sql file. Still didn’t work so I added INSERT IGNORE. ran that against the DB: mysql < mysql-user.sql
I was left with a couple of root users without passwords so I cleaned that up. Then I was good to go. Ran the schema import mysql < mysql-schema.sql and that ran fine. yea.
FINALLY I could run the mysql < mysql-data.sql
That one takes awhile. The final steps will be reset master; show master status; and then grabbing what I need to start the slave.
I did panic for a few seconds but all the problems reminded me to breath deeply, carry a big stick and beat it all into submission. Happily this is the dev servers and not prod as the time to do this took a lot longer than it should have.
Looking back at things I should have run a DB restore of the same day as the server restore and then rebuilt the slave data.