rmoff

March 11, 2011

Getting good quality I/O throughput data

Filed under: AWR, io, oracle, performance, visualisation — rmoff @ 11:33

This post expands on one I made last year here about sampling frequency (of I/O throughput, but it’s a generic concept).
The background to this is my analysis of the performance and capacity of our data warehouse on Oracle 11g.

Before I get too boring, here’s the fun bit:

Pork Pies per Hour (PP/h)

Jim wants to enter a championship pork-pie eating competition. He’s timed himself practising and over the course of an hour he eats four pork pies. So we might say that his Pork Pies per Hour (PP/h) rate is 4.

The competition lasts for thirty minutes. The world champion can eat eight pork pies in thirty minutes. Does Jim stand a chance?
(let’s pretend he has an insatiable appetite and isn’t going to get full, and all other smart-ass factors)

If his consumption rate was 4 PP/h, and he only has half an hour, then we would predict he’ll consume 4 * 0.5 = 2 pork pies. So Jim’s going to get his ass beat.

Or is he?

What if the rate of 4 PP/h masks the fact that the hour broke down as follows:

  • First 15 minutes: he ate one pork pie
  • 15 – 30 minutes: had a beer
  • 30 – 45 minutes: gobbled down three pork pies
  • 45 – 60 minutes: he had another beer and gently sweated pork fumes?

If that were the case and we had sampled every fifteen minutes, we’d see this:

So what we want to know, which is the maximum rate at which he can consume pork pies, is exposed only when we sample at an appropriate frequency.

Mega bites to Mega bytes

Enough of pork pies, and back to the tasty subject of I/O throughput. The point I am trying to make is that without an appropriate sample size the data that we have becomes less useful.

You can always process your data further to derive conclusions from it over a longer term. Five second samples are going to be fairly unintelligible if considered unprocessed over the timespan of a year.

But what you can’t do is add back in the detail that you lost by sampling with too great a frequency. Once that moment’s passed, it’s gone.

By the nature of metric which is a rate at which something happens, the sample is going to be an average over the sampling period. The problem with this is that it can mask peaks in the throughput. Peaks may (or may not) be hitting a ceiling in your system which an average figure will make you think you are plenty clear of.

System metrics for Oracle are available through AWR, which typically samples every hour. OS-level tools may sample more frequently, but in the context of capacity planning and analysis, periods are often going to be 10s of minutes, or hourly and upwards.

The following illustrates the effect of averaging I/O throughput figures.

The data is the average I/O throughput, sampled every five seconds (through this method). Note that already this is an average, but in the context of hourly samples (for example) we will have to live with five seconds as the starting point.

This first graph shows the original data, with a five minute average drawn over it. For each, the maximum is noted.
It can be seen that the maximum the I/O throughput hit was a shade over 3GB/s. That was the real maximum that we were driving through the I/O pipe over the period of time (ignoring that it’s a 5-second average). Now look at the maximum of the 5 minute average – we lost c240MB/s in our maximum, which is now 2.7GBs.

In the second graph the original sample is shown, with a 30 minute average. It’s clear to see the effect of averaging the data has – the peaks and troughs are smoothed out, giving a more even line. But is this what we want? Our apparent maximum I/O based on a 30 minute average has now almost halved!
Apparently, we only needed 1.6GB/s of I/O throughput during this time period. The graph clearly shows that this is a false statement. But what about now?

Same graph as before, but without the context of the original sample. Given this data – which is what you’ll have if you collect I/O throughput data that’s sampled every 30 minutes – then how would you know what the maximum throughput during that period was? It is impossible to know!

Pushing this point further, the same 30 minute average, over an extended period:

What’s the maximum throughput that was required during this period? When was the I/O throughput approaching capacity?
You can’t know from the averaged data alone!

Here’s the same extended period, with the original 5 second samples. This is just proving the point, that the 30 minute samples have obliterated the peaks particularly around 04:00 – 06:00.

So what?

To be able to plan for a system’s I/O capacity we need to know more than how much I/O it transferred over a relatively long period of time. We need to know what the biggest demand it put on the system was, otherwise we risk unseen bottlenecks. To make this useful, we also need to understand if these big demands were prolonged peaks or not. Particularly in a DW/BI environment, load is generally going to be sporadic. Sure, we run regular batches at predictable times, and may look to cache reports at fixed times, but they’re exceptions not the rule.

If a system has a limit of 3GB/s, and we hit that 3GB/s for a few seconds, what is the implication of that? We need to transfer the same amount of data, so hitting the limit means that the transfer is going to take longer. But a few seconds here and there may not matter — it all comes down to the context.

A report that runs for five minutes which hits I/O throughput limit for a second or two isn’t as much of a concern as one where the I/O hits the limit for minutes on end. There’s plenty written about system capacity and scalability, and it should be clear that if a system is hitting a capacity limit (whether it’s I/O, CPU, or whatever) for prolonged periods then the overall stability is going to suffer. Maybe that five minute report which spends four minutes on bottlenecked I/O doesn’t bother the user, but what about the report that’s supposed to run in a few seconds which is sat waiting for I/O at the same time?

(Ed: I’m pretty sure that I’m straying into the realm of knees and such, but need (knee’d?) to go off an read some more about it first to be sure)

I love twitter

In writing this, I’ve been struggling with some of the concepts which I can instinctively feel but don’t have the understanding to properly articulate. Particularly, if my assertion is that long sample periods are not a good idea, what is a good sample period? It’s no use being a smart-ass and rubbishing the data we collect if I can’t explain how we should collect it.

So, I turned to twitter. Twitter is awesome. (I think of Chet almost everytime I say this because he was one of the main guys who convinced me it was as good as the fuss made out. Check out his presentation all about it here here).

and back the answers came:

twitter is teh awesome

twitter is teh awesome

Amazingly helpful stuff, and focussed on my specific question. Sure, Google has changed our lives when it comes to finding the answers to questions. But (a) there is a lot of crap written on the internet (Blunder On So, anyone?), and (b) you will often find generally interesting things in the area in which you are interested, but for the more specific uncommon questions it’s unlikely you’ll get a direct hit.

Here, I had industry-leading figures directly answering my specific question! I love twitter. And I really appreciate experts taking the time to share their knowledge, experience and expertise.

What next?

Nothing that I’m writing here is new, but I enjoy writing things down to clarify my thoughts.

Courtesy of the good folk of twitter, I have some great links to follow up and digest.

Advertisements

March 9, 2011

Comparing methods for recording I/O – V$SYSSTAT vs HP Measureware

Filed under: io, oracle, rrdtool, unix — rmoff @ 13:38

I wrote last year about Graphing I/O data using gnuplot and Oracle V$SYSSTAT, using a script from Kevin Closson in his article How To Produce Raw, Spreadsheet-Ready Physical I/O Data With PL/SQL. Good For Exadata, Good For Traditional Storage.
Here I’ve got a simple comparison of the data recorded through this script (in essence, Oracle’s V$SYSSTAT), and directly on the OS through HP’s MeasureWare. It’s graphed out with my new favourite tool, rrdtool:

Happily for me, the data marries up almost exactly. One might say that so it ought. But it’s always good to have explicit proof of such a supposition.

October 26, 2010

Graphing I/O data using gnuplot and Oracle V$SYSSTAT

Filed under: io, oracle, unix — rmoff @ 15:42

Continuing in the beard-scratching theme of Unix related posts (previously – awk), here’s a way to graph out the I/O profile of your Oracle database via the Oracle metrics in gv$sysstat, and gnuplot. This is only the system I/O as observed by Oracle, so for belts & braces (or to placate a cynical sysadmin ;-)) you may want to cross-reference it with something like sar.

First, a pretty picture of what you can get:

Example gnuplot output of Oracle I/O

Why would you want to do this when you’ve got a pretty GUI in EM/Grid and flash graphs? Because the data in EM is averaged out over a relatively large sample (one minute for recent history, and whatever your AWR samplerate is for older history) and thus lower than the point-in-time I/O being driven through your server. My previous post (The danger of averages) illustrates this. For identifying bottlenecks or capacity planning, you need to know how much I/O throughput you really use. Consider this rough example: Using V$SYSMETRIC_HISTORY you can get an average over the last minute. For the first 30 seconds of this sampled minute you ran a query consuming I/O at 100 MB/s. For the last 30 seconds there was no system activity. V$SYSMETRIC_HISTORY shows a figure of 50 MB/s, as this is the average over a minute. You look at your system’s IO profile in EM and see 50 MB/s. Your hardware guys tell you that the system has a capacity of 100 MB/s throughput, so you think you’re well in the clear for capacity, when in reality you’re already hitting it. Now, what happens when two instances of this same query – each requiring 100 MB/s – runs? It will take twice as long when run concurrently (because they’ll have to share the throughput available, and thus get ~50 MB/s). It might be that this is acceptable, that the trade-off of hardware cost to increase I/O throughput capacity isn’t justifiable for making the queries run faster. But it’s important to be aware of the bottlenecks in a system so that they can be mitigated and considered in any capacity planning.

So, anyway, back to the point of this post:

Kevin Closson’s written a neat script here which will write out IO metrics from gv$sysstat to a flat file on the Oracle host. It looks like this:

2010-10-26-09:09:58|1|1|0|
2010-10-26-09:10:03|0|0|0|
2010-10-26-09:10:08|51|51|0|
2010-10-26-09:10:13|87|87|0|
2010-10-26-09:10:19|108|108|0|
2010-10-26-09:10:24|118|118|0|
2010-10-26-09:10:29|116|117|0|
2010-10-26-09:10:34|451|454|0|
2010-10-26-09:10:39|692|694|0|
2010-10-26-09:10:44|894|895|2|
2010-10-26-09:10:49|875|879|1|
2010-10-26-09:10:54|990|990|2|
2010-10-26-09:10:59|922|920|1|
2010-10-26-09:11:04|768|765|2|

I wanted a quick way to visualise the data, and also to plot it out once it was over the number of rows that Excel will chart at once (32k I think). gnuplot was the answer, but it’s a bit of a sod to get running as a fire-and-forget so here’s a help.
Here’s how it works:

  • For ease of use it copies the file generated by the above script to your local machine. If you don’t want to do that then remove the scp line.
  • It uses sed to convert bar (|) characters to Space ( ), as gnuplot requires whitespace separated columns. tr would probably do the job too.
  • It uses an inline input to gnuplot, but you could move this to a separate config file if you wanted
  • It plots the graph on screen (assuming you have configured X), and also writes it to a png file
# Download the file from a remote server
# Prompts for password, or use ssh key authentication to make it seamless
scp user@remotehost:/tmp/io.DBINSTANCE.log .
# Convert bar (|) to Space ( )
sed -e 's/|/ /g' io.DBINSTANCE.log > io.log
# Plot a graph (remove --persist if you don't want the window displayed)
gnuplot --persist <<EOF
set title "Read I/O MB/s\nSampled every 5 seconds"
set xdata time
set timefmt "%Y-%m-%d-%H:%M:%S"
set format x "%d %b\n%H:%M"
set ylabel "MB/s"
# You can set the y range to a specific constant based on your IO capacity
set yrange [0:3000]
set xlabel "Date/Time"
unset key
set grid
plot "io.log" using 1:3 with boxes fs
set terminal png font "courier, 10" size 1200,800
set output "io.png"
replot
EOF

This was written on cygwin, and presumably should work on any ‘nix system.


Ob. OBIEE: An alternative to gnuplot would be to graph the data in OBIEE using the text file as source data to a hacked together RPD 🙂 :

September 14, 2010

The danger of averages – Measuring I/O throughput

Filed under: io, oracle — rmoff @ 12:39

This query, based on AWR snapshots on sys.wrh$_sysstat, includes in its metrics the I/O read throughput for a given snapshot duration.

However it’s important to realise the huge limitation to this figure – it’s an average. It completely shoots you in the foot if you’re looking at capacity requirements.

Consider this real-life example extracted from the above query:

Timestamp					Total Read MBPS
===========================================
14-SEP-10 05.15.12.660      113.748
14-SEP-10 06.00.40.953      202.250
14-SEP-10 06.45.52.750       34.649
14-SEP-10 07.30.03.394       10.953
14-SEP-10 08.15.15.243       57.833
14-SEP-10 09.00.27.180       30.177

So, it looks like early in the morning we’re using about 200 MB/s throughput, and by about 9am somewhere around 30-50 MB/s ?

Let’s have a look at V$SYSMETRIC_HISTORY (which gives numbers every minute for the last hour) for the samples corresponding to the last AWR sample in the above resultset (08:15 – 09:00):
[io2.sql]

BEGIN_TIME           VALUE_MB 
-------------------  -------- 
2010-09-14-08:14:55       0.7 
2010-09-14-08:15:54       0.0 
2010-09-14-08:16:54       0.1 
2010-09-14-08:17:55       0.0 
2010-09-14-08:18:54       0.0 
2010-09-14-08:19:55     318.5 
2010-09-14-08:20:54     258.5 
2010-09-14-08:21:54     183.6 
2010-09-14-08:22:55      24.8 
2010-09-14-08:23:54       0.0 
2010-09-14-08:24:55       0.0 
2010-09-14-08:25:54       0.0 
2010-09-14-08:26:54       0.0 
2010-09-14-08:27:55       0.0 
2010-09-14-08:28:54       0.0 
2010-09-14-08:29:55       0.0 
2010-09-14-08:30:54       0.0 
2010-09-14-08:31:54       0.1 
2010-09-14-08:32:55       0.0 
2010-09-14-08:33:54       0.0 
2010-09-14-08:34:55       0.0 
2010-09-14-08:35:54       0.1 
2010-09-14-08:36:54       3.0 
2010-09-14-08:37:54       0.0 
2010-09-14-08:38:54       0.0 
2010-09-14-08:39:55       0.0 
2010-09-14-08:40:54       0.0 
2010-09-14-08:41:55       0.1 
2010-09-14-08:42:54       0.0 
2010-09-14-08:43:54       0.0 
2010-09-14-08:44:55       0.0 
2010-09-14-08:45:54       0.0 
2010-09-14-08:46:55       0.1 
2010-09-14-08:47:54     156.9 
2010-09-14-08:48:54     413.1 
2010-09-14-08:49:55       0.1 
2010-09-14-08:50:54       0.0 
2010-09-14-08:51:55       3.1 
2010-09-14-08:52:54       0.0 
2010-09-14-08:53:54       0.0 
2010-09-14-08:54:55       0.0 
2010-09-14-08:55:54       0.0 
2010-09-14-08:56:55       0.0 
2010-09-14-08:57:54       0.0 
2010-09-14-08:58:54       0.0 
2010-09-14-08:59:55       0.6 

(METRIC_ID = 2093 “Physical Read Total Bytes Per Sec”)
If you average out the numbers in this 45-minute sample, you get 30MB/s – which ties in with what AWR shows too. But it is clearly wrong to say that the IO throughput for the period is 30 MB/s. In terms of capacity the system is at times utilising over 400 MB/s – albeit for a short period of time:

v$sysmetric_history

(As a side note – this is a good illustration of why a bar chart is more appropriate here, rather than a line graph which is what I initially opted for. A line chart joins the data points giving a incorrect assumption of the value at points in between samples. A bar chat shows what it was when we sampled it, and only then. For planning capacity, it’s important to be considering only what we know to be true.)

For estimating something like disk space requirements, an average per x time slice extrapolated up may work, because the peaks will balance out the troughs – kind of what the point of an average is. But when thinking about capacity and the size of pipes a system requires, an average can wildly distort things. It’s the peaks that are important because it’s those that will bottleneck the system.

This script from Kevin Closson is good for recording granular IO throughput over a period of time.

Create a free website or blog at WordPress.com.