Keep This Place CLEAN!

Keep This Place CLEAN!

query-digest tool

First off DISCLAIMER! Always test this to be sure your servers can handle these tools. I try not to ever run percona toolkit tools on the production database server. There are ways to get around this with a little thought. Also, remember that passwords run on the command line will show up in a ps command.

I’ve always used maatkit’s query-digest tool but I think that percona toolkit has made it better. It’s one of my fav mysql tools. I think the documentation is lacking for all the really great things you can do with this tool but it’s ability to warm up a cache on passive master or new slave is priceless.

Generally startups do not hire a DBA in the beginning. Let’s face it, it’s usually one or two engineers with hopefully a marketing or PM in the beginning. First hires tend to be on the business end and more engineers. Everyone thinks that DBAs aren’t really necessary in the beginning. They’ll finally hire a backend engineer to fix some of the queries before they get a DBA. Then the database begins to bottleneck as the site gets more popular. That’s usually when the DBA comes in, along with the architect. Fair enough.

So while the backend re-architecture begins the shoring up of the current system has to happen. This usually involves more servers. Perhaps in the beginning there was a master, a backup slave and a couple of read slaves in the system. The CEO is trying to keep the budget down but is freaking out about the server cost because database servers are more expensive than WebServers/AppServers. But the system needs help and to keep it up more servers are thrown at the problem. This is where query-digest really shines.

First I always add a passive master (which is nothing more than a slave that no one reads from). It’s insurance – just in case there’s a hardware failure a fully caught up slave is ready to take the fallen master’s place. Or is it?

Generally the first thing I do is upgrade the version of MySQL so that the app can get  prepared for the new architecture. I’ve never had this involve one upgrade. But it’s good practice. You want to be sure that the app can handle the newer versions which are faster and have fewer bugs so minor upgrades are started. This means I need 2 servers (ok I can get by with one extra but I usually need an extra read slave by now). The first thing I do is a mysqldump and import into a new slave with a newer version of MySQL. You really don’t have to do this with some upgrades unless you’re also cleaning up charsets. And when you’re not working with a MySQL specialist you almost ALWAYS have to clean up charsets. That means – mysqldump–>import.sql. But does your new slave have all the data that the master has?

That means you need to run pt-table-checksum first, which means you need to run this at a slow time on your master. Once you are sure that your new slave has the same data as your master you are ready to use this new slave, or are you?

Of course you’re not! That’s when I run pt-query-digest. I choose a slave that I’m gonna grab the read queries from and on the NEW SLAVE (this is important!) I run this command:

/usr/local/bin/pt-query-digest –noreport –processlist h=my_running_slave -u myuser -psecret –filter ‘$event->{fingerprint} =~ m/^select/’ –print | mysql -f >/dev/null &

Well, actually I script it and have a variable that runs the query digest for a certain number of minutes or hours & then kills it. I usually have a ~/.my.cnf for this user so I don’t have to have the user/password on the command line. This is just for documentation. I highly suggest you take security precautions.

After I run this cache warmup for a couple of hours (I’m usually sleeping to be honest), I make sure that replication is alive and well and no writes got into this slave. I see how long a typical query takes on this server.  I check one of the gnarly queries to see if it too is speedy. If all is well I put the slave into the read pool and watch it like a hawk.

Once all the slaves are upgraded, including the passive master, it’s time to make the passive master the active master. Just do the same as you would to a slave, but this time you have to grab the queries from the master. That makes it a bit stickier as you need to watch that you’re not overloading your master. I tend to do it in spurts, 2-5 min every 15 min or so. I have to be awake for this to keep an eye on it, after the re-architecture I won’t have to (well I just set up a nagios alert that will wake me up if things go wrong and sleep with my laptop). Viola! I will do this a number of times as re-architecture takes time.

The other cool thing to do with query digest is to use it with tcpdump. I found that the docs for tcpdump were not all that helpful either. I love tcpdump for those servers that are so overloaded I can’t log slow queries. Usually I create a dump file that I’ll offload to another server to look at. Here’s how I get my dump file

/usr/sbin/tcpdump -i eth0 port 3306 -s 65535 -xnq -tttt > tcpdump.out
/usr/sbin/tcpdump -i eth0 port 3306 -s 0 -xnq -tttt > tcpdump.out

You hit <ctrl>-c to kill this and it writes to the tcpdump.out file. I’ve used both of the above and I’m not sure if I got more of the slow queries into the tcpdump outfile? tcpdump will truncate super long queries but it does give you a feeling for the query. I usually talk to an engineer who can give me the full query so I can see if there’s any sort of ORDER BY or GROUP BY.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s