Author Archives: dbchick

About dbchick

geeky but eccentric totally into MySQL and all things MySQL. Also into old school photography, interesting food (ok, SF foodie!), and of course cats! Looking for a kickass MySQL DBA job in the city of San Francisco. I know there's a fun startup looking for a DBA!

Performance Schema

IMG_6610Most DBAs wanted something like performance schema  back in 4.0. Percona started to add statistics to information_schema by 5.#s.  I never turned them on. I’m not sure if it was due to being too busy, not knowing what the performance hit would be or just not knowing about them. I admit it. I didn’t turn them on. But when I went to Mark Leith’s talks at the Percona Conference in Santa Clara I wanted to use those new statistics. But clearly I was going to have to upgrade to mysql 5.6 to get the full monty.

First thing I did when I upgraded to 5.6 (percona server) was to turn on performance_schema (P_S) full tilt, even though I’d been warned there would be a performance hit. But hey, I was turning it full on, on these big beasts. I hadn’t even dreamed of servers this big but here I was. I thought the application would only use about 40% so what was 5% more? WRONG! First of all, having never used servers of this size I didn’t know I needed to tune them and neither did my SysAdmin. They were NEVER tuned. After they went into production I had to turn off performance_schema from collecting anything because I needed that 10%+. I also had to do some other crazy stuff I do NOT recommend as it isn’t replication safe. No one seemed to care but me.

However, I did turn P_S full on on the testing server. I wanted to see if it would help me slay some easy low hanging fruit sort of problems. The first thing I found was that to really use it in a meaningful way I needed to install views and triggers that make up the sys schema. I hate views and triggers as they make upgrading more challenging and they can slow down the server. I wondered if this was where the majority of the performance hit came from ? HA! That was an additional hit to the 10% I was experiencing. I only installed the sys package by Mark Leith when I wanted to use it. They’re very helpful if you’re following one of Mark’s slideshows or blog posts about sys schema. They’re also fairly easy to install and uninstall.

I ran some queries that I got from those blog posts but interestingly nothing was really new. I’d already looked at the DDL of the largest tables, I’d seen the queries and I knew what was wrong. And like my experience told me, fixing the DDL and queries would boost the performance. I put P_S away while I was busting my butt on other problems.  But I longed for something like innotop so I could just glance at it throughout the day. Ah ha! I was not the only one. ps_top to the rescue.

I stumbled on a blog post by someone else who needed/wanted a ps_top program. I’m self taught so I learned UNIX by man pages (back when they seemed better written). I’ve compiled some gnarly programs as we didn’t have things like apt-get or yum. So what if I never compiled a GO app before? I can read documentation. But as an early adopter it means the documentation isn’t written for people like me (those that have never experienced GO before). Luckily I worked with the creator who worked on the documentation and I got ps-top to compile and run. Suddenly P_S was coming in handy with each new release. I could see the IO spike suddenly on a new table. Or the IO wait time. It was really handy for a new app that was being created.

I can’t wait to use mysql 5.7 and see how P_S has changed (rumor has it there is less of a performance hit and even MORE status). But in the meantime here are some cool URLs I’ve collected and want to be able to go back to (in no particular order).

 

 

 

 

 

Advertisements

Prepared Statements and pt-query-digest

IMG_6543I know I keep writing about pt-query-digest. It’s one of the best tools for MySQL. It’s complexity is a blessing and a curse and while the documentation is good, it’s always too general. I write these posts so I can find out what I did so I can share with others.

First off, prepared statements – WHO USES THESE? if you read the blog posts like this one:
4 Things to Know About MySQL Prepared Statements

or this one
Re-factoring some internals of prepared statements in 5.7

You would wonder why anyone would use prepared statements pre 5.7 (some of us wonder why we can’t just use 5.7 now? When will Oracle release this beauty to GA???!! )  But as a DB you have to show things like why and without performance_schema’s monitoring of memory leaks (comes in 5.7) it makes the weirdness that is prepared statements pre 5.7 scary. But lets just look at the performance of them. so off to pt-query-digest.

My slow log was not verbose and the long_query_time was set to 10, which means anything in my slow log should be looked at. This was on a staging server where P_S (performance_schema is fully turned on, something I will not do until maybe 5.7 if it is truly the lean machine as blog posts tout). I forgot to turn long_query_time to 0 until after it had been running all morning.

I did the usual

pt-query-digest –report-format=all ./slow.log > slow-log-report.txt

NOTHING related to the prepared statements showed up in the report except under this:

    # MISC 0xMISC              104.5362  2.4% 29940  0.0035   0.0 <435 ITEMS>

Adding –limit 100% added in ALL the queries on ALL the databases, not what I wanted. My prepared statements are in one schema by one user. I choose to filter based on user. I’m pretty sure I could have filtered on schema (or is it db?) but user seemed the easiest and the docs had an example.

pt-query-digest –limit 100%  –filter ‘($event->{user} || “”) =~ m/myuser/’  slow.log.org >qt-newdb.txt

Percona Toolkit 2.2 

Re-visit of warming up a cache with pt-query-digest

IMG_5175The only thing I do NOT like about the percona toolkit is that they willy nilly change the arguments you can give it. AND I can never find what to use instead. Really, Percona this is annoying. I’ll start this post with the DETS on server and version of pt-query-digest:
Server: Ubuntu 14.04.1 LTS
pt-query-digest version:   2.2.13

pt-query-digest –user myuser–password SECRET–processlist  h=10.1.0.118 –filter ‘$event->{arg} =~ m/^select/i’ –no-report  –interval 0.01 –run-time 10m –output=slowlog | mysql -f

What’s changed:

  1. filter – fingerprint in this syntax was incorrect and {arg} works. Double checked and the output is only SELECTS
  2. New args: interval, run-time,output

    – interval is great for grabbing the faster queries.
    – run-time can let you choose how long to grab those queries and tailor the time to warm up your cache. I’m not sure at my current place how long this will take but at previous places it took 30min
    – output is what’s used instead of PRINT! output allows you to choose your output format. I use slowlog as it’s the format to run against a cold server.

whew that was painful.

dstat – benchmark stats in one place

IMG_9511I’ve been using dstat for awhile now and I figured I should blog about it. I’m always surprised when I meet techies that benchmark but don’t know about dstat? How can that be?? It’s like iostat, vmstat and a bunch of other tools we all use in one. ok I hear ya, when you have a “swiss army knife” of tools using them can be a challenge so I thought I’d blog about the ways I’m using it.

First you may want to look at man page sorts of things:

http://dag.wiee.rs/home-made/dstat/

http://www.onaxer.com/2011/03/25/disk-io-performance-tuning-benchmarking-tool/

Because I’m using dstat for benchmarking, that means that biz folks are gonna look at the results and they all want graphs. The cool thing about dstat is that you can output it into a csv file. Graphite, a commonly used monitoring tool uses csv files. But I don’t have access to a Graphite server so I’ll make my graphs with another tool. Here’s what my dstat wrapper looks like (I also have hdparm )

system-check.sh

#!/bin/bash

#####################################
## VARIABLES
#####################################
DELAY=3
COUNT=10
SLEEP=10
MAX=10
NOW=`date +”%Y%m%d%H%M”`
STATS_DIR=/data/reports/dstat
OUTPUT_FILE=${STATS_DIR}/${NOW}-dstat.csv
#DSTAT=”dstat -t –top-io –top-bio -cl”
DSTAT=”dstat -t –top-io –top-bio -cl –disk-util”
HDPARM_LOG=/tmp/${NOW}-hdparm.log

## check if stats dir exists
if [ ! -d ${STATS_DIR} ]; then
#echo “no ${STATS_DIR} ”
mkdir -p ${STATS_DIR}
fi

for i in `seq 1 ${MAX}` ; do
(${DSTAT} –output ${OUTPUT_FILE} $DELAY $COUNT )
(echo `date` >> ${HDPARM_LOG} )
(/sbin/hdparm -Tt /dev/xvda >> ${HDPARM_LOG})
sleep $SLEEP
done

exit 0

#################

Percona Live 2014 part 2

IMG_8830

From Somarts Work MORE! #6 © erin o’neill 2014

Percona Live 2014 part 2.

I did a LOT of networking and missed some talks. But the highlights were: Ryan Lowe on R, Jeremy Cole on his a Journey to the core II (smartly I went to Peter Zaitsev’s Innodb architecture and performance optimization Tutorial), Benchmarking Databases for Scale (Percona), Performance Monitoring at Scale with Yoshinori, RDS with Laine Campbell of Blackbird, the Lightening talks were fun and my favorites were “What not to say to a DBA” by Gillian Gunson,  “Relay Log – the Unsung Hero” by Vishnu Rao (flipkart), Shlomi Noach’s Lightening talk (damn he is one funny dude!).

There were so many talks to go to and too much work. I missed ones I wanted to go to. I love this conference. Can’t wait for next year.

 

(ps no Booth Babes this year too!)

 

Percona Live 2014 part 1

IggyPopIMG_1170This year the Percona Live conference started with a free Open Source Appreciation Day. There were two tracks and I went to the OpenStack track where JAY PIPES was speaking!!! OMFG! He is a great speaker and pretty much disappeared into drizzle land to be spit out into OpenStack land. He is still a great speaker. Yea Jay Pipes!  My take away was the Trove is OpenStacks answer to RDS (Jay did not speak about Trove per se, his talk was titled: Tales from the Field: Backend Data Storage in OpenStack Clouds ).

The conference always starts off with tutorials and this year I had a pass for that. I went to two tutorials. The first was the MySQL Break/Fix Lab with mostly René Cannaò (Blackbird, (formerly PalominoDB)), then Gillian Gunson (Blackbird (formerly PalominoDB)) and Alkin Tezuysal (Blackbird, (formerly PalominoDB)) with the shortest time. I wish this had been intermediate rather than beginning. But I wanted to support them. I had great fun tweeting about this session. And it was a great beginner session. I wish I’d gone to it in my first year of being a MySQL DBA.

Then I went to InnoDB Architecture and Performance Optimization with Peter Zaitsev (Percona).  This was an advance talk on the internals which is not something I know a lot about. But I learned at the Oracle MySQL Connect that some of it does sink in and help with things like tuning and indexes.  I was also taking it to prep for Jeremy Cole’s session later in the week. Turns out Jeremy was there to prep for his talk too! AND near the end they had a lively discussion where Jeremy politely told Peter he was wrong about this one thing & Peter paid attention! It was very cool to see two very smart people have a discussion rather than some ego fueled bullshit session. We all benefited from it greatly.

That was the end of day 2 and I was already pretty tired. How can that be? I had 3 more days of this.

When things fail – Server Restores and databases

IMG_0069

I like to test backups with a restore. Database restores work pretty well most of the time, as long as the backup file is not corrupt, binary logs can be found, etc. There is a plethora of blog posts on this and software to make it easy. I have some control over that. However, I do not always have control over the server backups.

Server backups are SUPER easy in AWS. I love them. Snapshots may cause problems taking them but the restore – piece of cake. Now I’m in a Rackspace Cloud (RAX) and I’m not sure how this backing up is working? So I push and push to do a restore. Of course I first upgrade the database so we can go back to the old version. I stop mysql and tell the SysAdmin to start the restore.

First off, the restore on this small server took OVER THREE HOURS???!!! WTF? really? And I used to complain about the 15 min AWS would take…  scheesh.

Everything seems to be working but it’s not – only I don’t know this. A week later I go to do the update to the Master/Slave pair (the slave is the one that I tested the upgrade and them RAX restore on). The Master server is going just as planned, everything is so easy. But the slave is throwing errors right and left. Databases can’t be dropped, then after they are created data can’t be loaded into the mysql tables. Really, it’s a mess.

I panic on the drop database mytest; when it fails and leaves behind a couple of ibd files. I know those are special and I’m leery of just running a rm -r mytest. I try a bunch of things but in the end I run the rm -r mytest, then go into mysql and create database mytest; drop database mytest; I’m good to go.

The mysql database was a bigger problem. It was such a mess that I just removed everything. I mean EVERYTHING and then ran: mysql_install_db

After that I started it up with the cd /usr ; /usr/bin/mysqld_safe &

Then I needed the mysql user data. Not to worry, I already had that but the number of columns did not match as they were 5.5 and this server had 5.6 installed. Again no big thing I just went to a server that was still 5.5 and ran this:
mysql -BNr mysql -e “desc user” | cut -f1 > user-cols

Then I added “,” at the end of each line and wrapped it in “( )” and added that into the sql file. Still didn’t work so I added INSERT IGNORE. ran that against the DB:  mysql < mysql-user.sql

I was left with a couple of root users without passwords so I cleaned that up. Then I was good to go. Ran the schema import mysql < mysql-schema.sql and that ran fine. yea.

FINALLY I could run the mysql < mysql-data.sql

That one takes awhile. The final steps will be reset master; show master status; and then grabbing what I need to start the slave.

I did panic for a few seconds but all the problems reminded me to breath deeply, carry a big stick and beat it all into submission. Happily this is the dev servers and not prod as the time to do this took a lot longer than it should have.

Looking back at things I should have run a DB restore of the same day as the server restore and then rebuilt the slave data.