Performance, SQL Server

The Buffer Cache Hit Ratio counter doesn’t do what you think it does

Buffer Cache Hit Ratio – The Argument Winner?

The Buffer Cache Hit Ratio performance counter is a fairly easy metric to understand when performance tuning SQL Server. The relevant Technet page tells us that it:

– “Indicates the percentage of pages found in the buffer cache without having to read from disk” – Technet

Given that we want as much of our data found in the buffer cache (aka RAM) as possible, we want this metric to be as close to 100% as we can get it. This fact is so simple to understand that I’ve mistakenly used the BCHR to shut down discussions about memory pressure more than once. As in….

Dude. The buffer hit ratio is 99.999 percent! There can’t possibly be a problem with memory on this server! BOOM! You just got dealt with…” – Me, before I realised I was a muppet

You might be surprised to find however that it is quite possible to have a hit ratio approaching 100% whilst still needing to read significant amounts of data from disk. How can this be? Surely if I’m finding all the pages I need in RAM, I wouldn’t need to read any from disk?

Read Ahead’s Sometimes Don’t Count!

This conundrum belies a slight nuance in the way the BCHR number is generated. It obscures the fact that SQL Server’s Read Ahead mechanism will pre-fetch pages into the buffer pool before the execution engine looks for them.

As long as the page is pre-fetched before the execution engine needs the page – we don’t get a cache miss. This despite the fact that the query most definitely did result in substantial amounts of physical reads of data that was not in the buffer pool.

Jonathan Kehayias, author of the excellent “Troubleshooting SQL Server: A Guide for the Accidental DBA” did an equally excellent break down of this aspect of the BHCR number here:

https://www.simple-talk.com/sql/database-administration/great-sql-server-debates-buffer-cache-hit-ratio/

Is the Buffer Cache Hit Ratio number meaningless?

I’ve heard a few DBAs argue that the Buffer Cache Hit Ratio is useless because it obscures the “pre-fetch factor”. Although it’s certainly true that the ratio will tend to give a higher number than we might reasonably expect given what’s going on with the server – if it heads south this still tells us quite a lot. It tells us:

  1. Not only is our server experiencing memory pressure
  2. The workload is overwhelming not just memory, but also the IO subsystems ability to prefetch data before it’s needed – something a healthy system is actually pretty good at doing

So, the stock advice that a BCHR of <95% indicates memory pressure is absolutely true. We just need to bear in mind that a number > 95% doesn’t indicate all is well.

For what it’s worth I still check the Buffer Cache Hit Ratio as a matter of habit, but my preferred performance counters for spotting memory churn are now Page Life Expectancy and Page Reads/sec.

Update: 

For what it’s worth I had a quick look at one of our production servers to see if I could see this fact in action. The screenshot below does hint at the phenomenon, though not as well as Jonathan’s custom built demo.

Cache Hit Ratio

In it we can see that the Buffer Cache Hit Ratio (the red line) tending to be at or close to 100% despite relatively constant physical read activity (the blue line).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s