Histograms

Histogram as bar graphIn Curt Cotner’s recent blog post, What’s new in statement-level performance metrics, he wrote about the popularity among customers of Optim Performance Manager (OPM)’s ability to create and visualize histograms of statement costs. In this post, I’m going to take a look at histograms in general, examine some histograms that are new in DB2 for LUW as of version 9.5, and provide some resources that can show you how to make use of histograms yourself.

The following includes excerpts from my three-part article on DB2 for LUW Workload Manager histograms on developerWorks. You can find the full set of articles in the Smart Data Administration e-Kit. The articles are behind a login/registration screen, so if you have a developerWorks ID, it is just a matter of logging in and answering a small number of survey questions. Otherwise, registration is free, easy, and worth a minute of your time, considering all of the good information on developerWorks for registered users.

Here are links to each article:

Here is an embedded SlideShare of a slimmed-down version of the presentation I gave to the Michigan DB2 Users Group in March of 2009. It is based on the three articles above but is a little more visual. You really need to see it with the speaker notes to get the most out of it, so I recommend you download the ppt file and look at it in PowerPoint with the speaker notes on the screen.

What is a histogram?

A histogram is a graphical display of tabulated frequencies. To draw a histogram, first draw a horizontal line to represent the measurement axis and then mark the boundaries of adjacent class intervals.

Drawing a histogram - step 1

Above each class interval, draw a rectangle whose height is proportional to the frequency of that interval. Thus, if interval A contains twice as many observations as interval B, the rectangle above interval A would have twice the height of the rectangle above interval B.

Drawing a histogram - part 2

Say you ran ten queries and collected each query’s response time. You got the following times (in seconds): 8.1, 2.0, 7.9, 0.6, 6.2, 9.8, 4.0, 5.7, 9.6, and 6.3. How would you turn this into a histogram? You would start by picking class intervals. Let’s say you pick four: 0 seconds to just under 3 seconds, 3 to just under 6, 6 to just under 9, and 9 to just under 12. Then you just count how many of data points fell into each interval:

Class Interval (seconds) Count
0 to just under 3 2
3 to just under 6 2
6 to just under 9 4
9 to just under 12 2

If we graph the above table, we get:

Response time histogram example

Statement Cost Histograms

A statement, such as a SELECT, has an estimated cost that tells you how expensive its access plan is to execute. Such cost estimations are used by DB2’s cost-based optimizer to choose between access plans for the same statement. The unit of measurement is called timerons, which is an IBM-invented unit that is a weighted combination of I/O costs and CPU costs. It is not directly related to the execution time of the query, but, on average, a statement with a large cost will have a higher execution time than a statement with a small cost. It mainly depends on how accurate the estimate was, which is affected by the accuracy of the statistics on the tables and indexes that are referenced by the statement. Keeping the statistics on these tables and indexes up-to-date by frequent execution of the RUNSTATS command makes a big difference here.

The estimated cost histogram used in the DB2 for LUW workload manager lets you see the distributions of these estimated costs across all the DML statements from a particular application, or a particular user, or across all the applications and users on the system.

Other Histograms in DB2 for LUW Workload Manager

The DB2 for LUW Workload Manager offers five other histograms with the same flexibility over how they are collected and aggregated as it provides for the estimated cost histogram. The five histograms are: activity inter-arrival time, activity lifetime, activity queue time, activity execution time, and request execution time. “Activity” is a new term used in DB2 for LUW to represent both statements (DML, including CALLs, and DDL) and loads. Using activity inter-arrival time, for example, you can tell how much time occurred between one load and the next. Using activity lifetime, you can get an idea of how long a query took to return a response to the user. Activity queue time lets you know how much time your statements and loads spend queued by the Workload Manager and activity execution time tells you how much time they spent in the DB2 engine. Finally, the request execution time histogram tells you about the execution times of all requests, both those that came from user applications and those initiated inside DB2 itself. My slides and whitepapers linked above go into more detail on what each of these histograms mean.

Interested in seeing more blog posts about DB2 for LUW Workload Manager? Please let me know in the comments. In the next post, coverage of db2top features returns.