Thursday, December 22, 2011

Where is my IO?


We all are aware of sys.dm_io_virtual_file_stats and how useful is this DMV, But to be honest I could not find even one place which could give me the right working script leveraging this DMV and tell me if currently IO is an issue or not!

Well it think wait is over.I tried to put something together and come up with this. Feel free to correct this, but this works for me.

Anything more than 25 ms(that’s a Microsoft threshold J), practically anything more than 80 - 100ms can be something which you can dig into.

This gives you the real time picture and not since last time sql server started.
You can change the WAITFOR DELAY number to increase the delta figure.


-- Calculates average stalls per read, per write, and per total input/output for each database file.

SELECT DB_NAME(database_id) AS [Database Name], file_id ,io_stall_read_ms, num_of_reads,

CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],io_stall_write_ms,

num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],

io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes AS [total_io],

CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1))

AS [avg_io_stall_ms]

into #iostats_first

FROM sys.dm_io_virtual_file_stats(null,null)

ORDER BY avg_io_stall_ms DESC;


WAITFOR DELAY '00:05:00';

-- Calculates average stalls per read, per write, and per total input/output for each database file.

SELECT DB_NAME(database_id) AS [Database Name], file_id ,io_stall_read_ms, num_of_reads,

CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],io_stall_write_ms,

num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],

io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes AS [total_io],

CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1))

AS [avg_io_stall_ms]

into #iostats_second

FROM sys.dm_io_virtual_file_stats(null,null)

ORDER BY avg_io_stall_ms DESC;

select f.[Database Name],f.file_id,

(s.io_stall_read_ms - f.io_stall_read_ms) as 'io_stall_read_ms',

case (s.io_stall_read_ms - f.io_stall_read_ms)

when 0 then 0

else

((s.io_stall_read_ms - f.io_stall_read_ms)/(s.num_of_reads - f.num_of_reads))

end as 'avg_read_stall_ms',

(s.io_stall_write_ms - f.io_stall_write_ms) as 'io_stall_write_ms',

case (s.io_stall_write_ms - f.io_stall_write_ms)

when 0 then 0

else

((s.io_stall_write_ms - f.io_stall_write_ms)/(s.num_of_writes - f.num_of_writes))

end  as 'avg_write_stall_ms',

(s.io_stalls - f.io_stalls) as 'io_stalls',


case (s.io_stalls - f.io_stalls)

when 0 then 0

else((s.io_stalls - f.io_stalls)/(s.total_io - f.total_io)) end as 'avg_io_stall_ms'

from #iostats_first f

join #iostats_second s on f.[Database Name] = s.[Database Name]

and f.file_id = s.file_id

drop table #iostats_first

drop table #iostats_second

Wednesday, October 19, 2011

RAID 50 for database server!!! Good or Bad idea!


Just in case you wonder if RAID 50 is any good for database server!.

Dell says RAID 50 is great. Well It might be....... but not really for SQL database server. My and Microsoft view is that RAID 10 is a way to go because of excellent read and write performance.


RAID 50 is excellent for read but only “good” for random writes.


B.T.W , RAID 50 is a vendor specific configuration where multiple raid 5 arrays are striped across as well. It is still doing parity and now double the time. So how it can be good for write!!!..Not sure may be dell have done something else as well. In terms of performance it lies between RAID 10 and RAID 5.
 

According to equallogic , RAID Characteristics:
 

Table: RAID Level Characteristic Comparison
Workload Requirement
RAID10
RAID 50
RAID 5
RAID 6
Capacity
Average
Good
Excellent
Good
Availability
Excellent
Good
Average
Excellent
Sequential reads
Excellent
Excellent
Excellent
Excellent
Sequential writes
Good
Good
Good
Good
Random reads
Excellent
Excellent
Excellent
Excellent
Random writes
Excellent
Good
Average
Average
Performance impact of drive failure or RAID reconstruction
Minimal
Moderate
Moderate to heavy
Heavy



Random writes  are good in RAID 50 but excellent in RAID 10.

Monday, September 12, 2011

Why I say leverage SQL Server data and backup compression in 2008 R2??

If we all  look back, one thing you all will surely notice that CPU power has increased and its getting better and better every year.

If you compare the growth in technology with other components like disk speed ,memory, networking components etc etc, certainly CPU always stands out.

When we talk about SQL Server performance, its not stopped at disk speed, throughput and other individual components but is a mix of everything.

In SQL Server 2008 r2 ,compression is one great way to take away the load from the disk and memory perspective. Its simple, data and backup compression means less space and more data in cache. There is a little trade off in terms of processing the compression but now you have great processing power by intel series 2 sockets processors.

So why not go ahead and try it by yourself and see how much benefit you can get by exploring this feature.

I wont suggest you to go and blindly start compressing everything : ) :) , but start calculating your workload and focus on tables and indexes which are IO intensive and analyze if those objects can be compressed and give you more benefits in terms of IO and memory.

Tuesday, September 6, 2011

Thickness of arrow between operators in execution plans

In SQL Server if you have ever analysed the plans (execution plans) using SSMS plan analysis(I am sure you have :) ), then one thing you might have noticed is thickness of arrow between the two operators.

Thickness of arrow has been used by many DBA's as an estimation of size of data.

To correct the myth/confusion, this thickness of arrow just represents the row count and not actually the size of the data.

This means you can see a thin arrow representing 4 rows, but behind that it could be xml data of  GB size

so do not just rely on thickness of arrow before picking expensive operator and expensive flow of data between two operators.





enjoy 

Monday, August 15, 2011

How and where to find latest and greatest updates for SQL Server

As a DBA, Your prime duty is to make sure SQL Server updates are not out of date.There are many places which can give you the sql server release calender details.

The one I trust is :


This is safe as they also provide the matrix of build number and version number.

Enjoy :)  

Sunday, July 31, 2011

Power management and OLTP workload

This article is specially for those who uses OLTP DBMS on windows.

These days power management is very commonly used . Basically when a processor has power management feature enabled then the clock speed of the processor varies with the load.

In windows server 2008 and above,  you can manage the power by using windows power plan.( Go to power option and check the settings)

By default windows uses “Balance power plan option”. This means if your Intel processor(5500, 5600 series..) has turbo boost technology, then it won’t use it.

So for the database servers(on windows), make sure this setting is set to “High performance mode”.

You should also check if there is any hardware power management settings as well, which can override the soft power settings as explained above.

If yes, then ask hardware guy to get into the BIOS system and suggest it to change it to “High performance mode” for your database server.

OLTP workload is of nature where you need a processing very fast. So don't forget to check this option especially if you are using windows server 2008

Sunday, July 24, 2011

Is_Auto_Close_On , a killer option

If you are wondering why I called this option as "Killer Option" then here is an answer:

If you turn on this option then what happens is as soon as the last connection to the database is closed, All the resources(CPU,memory,IO) will be freed up and database will be closed.

When new connection is made then SQL Server restarts the database, reassign those resources to the below databases and will also perform checkdb for consistency checks.

This can cause huge performance issue on your production database at day time.

So please check if you have not turned on this option by mistake.


Sunday, July 17, 2011

Merge Replication bug in SQL Server 2008 R2

Sometime when I see these kind of fatal errors, I started wondering what the hell QA department is doing at MS?

Last week I encountered this error while viewing conflicts in merge replication:

"GUID is neither a data column nor a data relation for table summary" 

I searched on internet every where about this error but no luck. I searched all great books on replication but again no luck.

Ultimately I decided to approach Microsoft.

And ...Guess what! This reason of this error was publication name.

If you have DOT(.) in your publication name, for example "MyPublication1.2.3" then you wont be able to view and resolve any conflicts.

So Microsoft guy came back to me and told me that only way to resolve is to drop and rename the publication without DOT(.).

Yes again huge outage for my client but the morale of the story is:
Make sure when you test something lke replication in your test lab, You name your publication and other configure other settings exactly same as you plan when move to production.

Puneet

Wednesday, May 11, 2011

Is everymorning Healhcheck absolutely necessary??

Somebody recently made me thinking if everyday moning database healthcheck is absolutely necessary??? or should we only react when we get alerts or warnings.....

I think it depends on organisation to organisation. Some organisation these days are talking big about "VISIBILITY" and no doubt we have process like ITIL to accompolish their goals of visibility...

But what exactly is visibility means here??? does it mean there should be something like morning news time where DBA tells all developers and other stakeholders about the databases for last 24 hrs..

Can it be annoying?? or is it something which can be apprecited!!!

I think it all depends on your environment. When things are newly set up and everybody expect somethings coming from every specialist then it all looks good.

My personal view is that it is important regardless if anybody is keen to look at them or not. But again thats personal preference.

But if you want to place a process of distributing daily DBA reports ....better ask your boss and make sure people appreciate :)

or may be just create a space on sharepoint and upload there directly without actually sending email.

What do you say ???? I would love to hear your opinion on everyday DBA report

Wednesday, March 16, 2011

TempDB and Allocation Page Contention in simple words

We all know how importnt our tempdb is , especially for SQL Server 2005 and onward suits of product.
It is so important to make sure that your tempdb is always in healthy state.

One of the most common issue you may come across when monitoring TEMPDB is "Allocation Page contention".

I am sure this might be a new terminology for those  who do not deal much with SQL Internals. Let me explain this concept in a very simple words and to show what exactly you can look at to make sure your tempdb is not showing this symptom.

When user creates a temporary table in one of his script then what happens!!!. Your temporary table also need to be assign space also!!! right :)

Now SQL Server has mechanism to do so and thats where three concept comes into highlight:

PFS, GAM and SGAM.
So what happens when sql server sees create table #temp.... in your application?

1. It reads the SGAM page and look for free space in a mixed extent(group of 8 pages shared by more than one object)

2. while doing this SQL Server engine will put exclusive latch on SGAM page and will move to PFS page to find the free page for its object.

3. It will then place latch on PFS page as well and this latch will stay there till page is allocated to the table.


So this is way your temporary table get space in database. So now the first  question  what is an issue here?????

The issue is when your tempdb gets overloaded with allocation requests.

Second question is how do you find if there is a contention in allocating page?

The answer is very simple, Use our favourite "sys.dm_os_wait_stats" and look for a PAGELATCH waits on resource 2:1:1.
2:1:1 is a PFS page and too many PAGELATCH on this recource means there is too much allocation page contention.

Finding an issue is all good but you will only get medal if you fix this :) . So the last question is how do you fix allocation page contention?

Have multiple tempdb files
and follow tempdb best practice guide :) (papers are availaible from MS).

cheers
Puneet

Tuesday, March 8, 2011

Duplicate Indexes......Good or Bad!!!

This is not uncommon for you to come across a large table with lot of indexes. But Have you ever tried to see if there is any duplicate Indexes exists for that table?

For example If I have a table say "Customer" and define one Non clustered Index on FirstName and then I define another index on FirstName and LastName then Firstname appears twice and the first Index can be called redundant and is not required.

Disadvantages of having duplicate index is extra overhead(extra page). So this means any insertion will take more time.

Little things like this get easily un-noticed.

So go today and find out how many duplicate indexes are in your databases :)

Have fun !

Tuesday, February 22, 2011

count (*) every where.......

It is not uncommon to come across this keywork in your tsql environment.But do we really know what exactly happens behind the scene when we use select count(*) ......?

One of the main reason I have seen developers using count(*) is to check for data existence. This is very resource intensive operation, why!!!! because count(*)  has to scan all the rows in a table.

So whats the alternative?There is another keyword which you can use and that is "EXISTS". Exists hardly scan everything and stop at the first record that matches.

Such a small thing can improve lot of performance in your system.

you cacn check this yourself

Add SET STATISTICS IO and SET STATISTICS TIME ON statements and compare the output..

You will see logical reads are far less when you use EXISTS comparison to COUNT(*) ...



Tuesday, February 15, 2011

# Temporary tables and "select into"

This is again one of those myth where developer think that  using select into much faster then insert into  table command.


It is not!!! in fact insert into is faster in  lot of scenarios.



Overall we should avoid using "select into" especially when selecting into temporary tables and also when you are dealing with large resultset.This is because select into locks the system tables.
 
 when you are doing "select into #" temporary tables, we are locking system objects in tempdb database. This can prevent other users from creating temporary tables unless you are done.
 
Imagine if your inserting into temp table some subset from the table which has million rows,the system tables will be locked for the amount of time that insertion is finish...
 
Please use simple "Insert statement" instead of "select into"

Wednesday, January 26, 2011

UNIQUEIDENFIER and a primary key constraint..

I would lke to share some tips which can affect performance  big way.
 
This is about "uniquifiers " datatype...although i hope it wont be favourite for many after reading this.
 
When you chose your cluster index which is variable length column then you are adding 4 bytes to every row which you is an overhead.
 
Now interesting point is that UniqueIdentifier are also stored in variable blocks, which means you are adding overhead of 4 bytes every time to your row.
 
Now you can imagine what I am trying to tell you :) ......Ofcourse you are performing more IO unneccesary by adding those extra 4 bytes.
 
Also uniqueidentifier is not sequential, which means sql server has to work more harder when you make it as a clustered index(which works best sequentially).
 
In a nutshell, UniqueIdentifier  is very big data type and when you are inserting large number of rows this cause huge fragmentation and consequently the size of your cluster index grows huge as well.
 
Please use data type like  int and  make your cluster index narrow, sequential(uniqueidntifier is non sequential) and with fixed size columns.
 
Believe me this will make your life much easier :)
 

Does backup records any changes happening at the time backup job is running

I was at one of my client site and very simple though very smart question was thrown at me by one of the developer:
The question was "Does backup records any changes happening at the time backup process is running?"

The answer is "Yes". You can perform small exercise where you can insert many rows into a single table, Kick off the database backup process and open other session and run few DDL statements against the same database.

Oncebackup is finish try restoring the database with the .bak file you just took and you will see those new objects(tables in my experiment) were captured by the  process although i ran those statments after i kicked off backup.

:) :) try it yourself

Wednesday, January 12, 2011

Is recompilation good or bad?

I am assuming you all know what recompilation means here. In simple words for those who don't know, SQL Server recompiles the statement (in sql server 2008) or procedure when it realises that Plan existing in cache is no longer beneficial or aged out due to specific reasons.
Recompilation is useful when new execution strategy is require to  process the query but as recompilation involves cost of extra CPU cycle, too often recompilation of a store procedure can be expensive.
You have to make sure you know the cause of recompilation of store procedure. There can be various reasons behind this .Few of them are:
·         Schema changes to the temporary tables define inside the body of procedure
·         General DDL changes to the local tables
·         Change in statistics of table due to lot of insertions or distribution of data
·         Force recompilation by using recompile hint
·         Changing the SET option

If you know the cause of too much recompilation (I am using the word too much as occasional recompilation is not concern here) then you should look to rectify that cause. For example if the frequent recompilation is happening due to change in distribution of data or statistic change then you can use Keepfixed plan hint to avoid so.
If you are not dealing with large amount of data sets then use table variables instead of temporary tables. Unlike temporary tables, table variables do not create statistics hence you can save unnecessary recompilations caused because of temporary tables.
Using brilliant tool like profiler, you can capture the statements causing recompilation. Once you know which statements are causing recompilation, you can dig into the cause and work on it.

So next time if your trace shows lot of recompiles you know what to look for :)

Tuesday, January 11, 2011

Why my Buffercache hit ratio is High but Page Life expectancy is very low?

Buffer cache it ratio and Page Life expectancy are the two important counters which helps to identiy SQL Server memory issue.I have often seen the scenarios where these two counters conflict each other.

Recently I was monitoring one production SQL Server using spotlight and constantly I was getting pagelifeexpectancy alert but when i sneaked into the BufferCacheHit ratio counter, it was 99.6%.This is especially in the case of 32 bit SQL Server.

So the question came to my mind was "is their really Memory bottleneck?"

Answer is "Yes". What this mean is logical reads are still using buffer  but the pages are not staying their long enough also.

I would prefer PageLifeExpctancy alert over BufferCachHitRatio.Ideally only buffer cache hit ratio is not enough to conclude memory bottleneck.

Also in this kind of scenario i would look at IO wait as well.