Sunday, October 17, 2010

Which counters I should use to troubleshoot my sql server hardware resources?

Performance monitor is one of the best tools you can use to look how your sql server is coping up with the resources.
If you type and run perfmon.exe from your windows OS, you have already taken a first-rate step to dive into your monitoring process.
Question normally comes to our mind is what counters should we use to monitor usage of resources for such a powerful application like SQL Server?
Whether you are a DBA or operation system engineer who is helping DBA, it is always handy to keep a user defined data collector set to monitor your sql server resources.
You can use this set to create your benchmark or you can also use this to troubleshoot performance from time to time.
I normally keep this collector set handy, so that whenever I visit the client site and if required I can just upload these counters and start collecting the data. I keep the collection interval to the default of 15 seconds.
I am allotting the performance counters as for each hardware resources here. When we think from hardware perspective, we think of Memory, CPU and Storage (IO). I do not want to go to Networking here, Each of these counters can give you quick insight of any potential bottlenecks.
MEMORY
Object
Counter
Threshold
Memory
Available Bytes
Set according to your system
Memory
Pages/sec
< 50
SQL Server: Buffer Manager
Checkpoint Pages/sec
< 30 pages per second
SQL Server: Buffer Manager
Buffer cache hit ratio
>= 90%(normally it should be 99-100% for healthy system)
SQL Server: Buffer Manager
Page Life Expectancy
>300 sec
SQL Server: Buffer Manager
Lazy Writes/sec
< 20



CPU
Object
Counter
Threshold
Processor(_Total)
% Processor time

< 80%
Processor(_Total)
% Privileged Time

< 10%
System
Context Switches/sec

<1000
SQL Server:Statistics
Batch Requests/sec

Set according to your system and compare with the benchmark value
SQL Server: Statistics
SQL Compilation/sec
>100


STORAGE
Object
Counter
Threshold
Physical Disk
Avg. Disk Sec/Read

<= 10 ms is good
>10 ms and <=15 ms is tolerable
>20 ms is very bad
Physical Disk
Avg. Disk Sec/Write

<= 10 ms is good
>10 ms and <=20 ms is tolerable
>20 ms is very bad
Physical Disk
% Disk Time

<85%


enjoy! :) :)

Tuesday, October 12, 2010

Issue with Replicating spatial data type in sql server 2008 and R2 version

Few weeks before one of my client had an issue with replicating geometry data type in sql server 2008 r2 version.
If you try to set up very simple merge replication between two instances and try to synchronize geometry data type, Replication fails with the below error:

The Merge Agent failed because the schema of the article at the Publisher does not match the schema of the article at the Subscriber. This can occur when there are pending DDL changes waiting to be applied at the Subscriber. Restart the Merge Agent to apply the DDL changes and synchronize the subscription. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199398)
Get help: http://help/MSSQL_REPL-2147199398
Error converting data type varchar to geometry. (Source: MSSQLServer, Error number: 8114)
Get help: http://help/8114

I spent quit a time resolving this issue but at the end i approached Microsoft sql engineers. They confirm this as a bug in the current version and gave me the workaround.

Below is the reply from Mirosoft and applying this fix resolved the above issue:


Error and Symptom:
------------------------------
We got the below detailed error message in the profiler trace.
exec sp_MSadd_repl_error 3109,0,5,N'USER-PC\SQLEXPSP1',N'8114',N'Error converting data type varchar to geometry.',50,1,N'
SQL Merge Agent encountered an error.

This error occurs on the subscriber side (SQLEXPSP1), while the merge agent is applying the changes to the subscription. The data type in the published article is geometry, but it seems to be converted to "varchar" by some unknown reason. And then, conversion error occurs when the merge agent wants to convert the "varchar" back to "geometry" to match the schema on the subscriber.

Reproduce and Analysis:
-----------------------------------------
1.       Repro: I spent quite some time to reproduce the same error in my local labs. This error can be always reproduced in SQL Server 2008 and 2008 R2. Below is the error that is captured from the subscriber profiler trace. (You could also capture a similar trace if you run profiler against the SQLEXPSP1 instance.)
exec [dbo].[MSmerge_ins_sp_8F98318B5A5D4AD92A55CAA41E884BDE]
'7E976B77-94C4-DF11-9867-000FFEE4266F',
2,
0x881E2A55A4CA01000000FF,
0x00,
2005,
'',   ----this is the null char
'7E976B77-94C4-DF11-9867-000FFEE4266F',
0,
0x00,
90
2.       Cause: This SP failed with error "Error converting data type varchar to geometry." The error occurs because this procedure get a null char for the 6th input parameter, which couldn't be converted to a geometry value. By definition, this SP expects a geometry data here.
3.       Conclusion: I involved our production team for further clarification. And they confirmed this is a known issue in current builds of SQL Server 2008 SP1 and 2008R2. This is intended to be fixed in next Service Pack release.

Workaround:
------------------------
To work around this issue, we can use the below modified script to manually create the articles with two modified parameters.
exec sp_addmergearticle @publication = N'GeoMetryPub', @article = N'SpatialTable', @source_owner = N'dbo', @source_object = N'SpatialTable', @type = N'table', @description = N'', @creation_script = N'', @pre_creation_cmd = N'drop', @schema_option = 0x000000000C034FD1, @identityrangemanagementoption = N'auto', @pub_identity_range = 10000, @identity_range = 1000, @threshold = 80, @destination_owner = N'dbo', @force_reinit_subscription = 1, @column_tracking = N'false', @subset_filterclause = N'', @vertical_partition = N'false', @verify_resolver_signature = 1, @allow_interactive_resolver = N'false', @fast_multicol_updateproc = N'true', @check_permissions = 0, @subscriber_upload_options = 0, @delete_tracking = N'true', @compensate_for_errors = N'false', @stream_blob_columns = N'false', @partition_options = 0

The @schema_option was 0x000000010C034FD1, and we changed it to 0x000000000C034FD1.
The @stream_blob_columns was N'true' and we changed it to N'false'.

For more information about the sp_addmergearticle script, we can refer to

http://msdn.microsoft.com/en-us/library/ms174329.aspx
Note: You may need export the articles' creation script before dropping it. Then modify the schema_option and stream_blob_columns parameters, execute it. Then re-sync the replication. It should work after the modification.
Enjoy :) :)  and thanks to Microsoft for their excellent support here.
--Puneet

Monday, October 11, 2010

What is before Analysing execution plan and Indexes?

What is before Analysing execution plan and Indexes?
Now I know which query is most expensive in terms of CPU and execution time. Now what’s next??Many of us will jump into the execution plan and start looking at the key operator’s immediately.
Before we go there, we should consider outer factors which can also affect the query performance. By outer factors I do not mean "hardware resources", but components within the SQL Server RDBMS itself.
These are:

Database Object StatisticsAny object referred inside the query has statistics which optimizer uses to decide the execution plan. All execution plans are based upon these statistics.
This means if the database object statistics are not up-to-date, then the execution plan will be inefficient also.
You can check the statistics of the table and its indexes by running classic DBCC SHOW_STATISTICS command.


Index defragmentation
If the database objects referred inside the query are fragmented heavily then this will cause query to read more pages then required.
Make sure your indexes are not heavily fragmented. In sql server 2005 and onwards you can check this by using the DMV sys.dm_db_Index_physical_stats.
I normally look for any fragmentation level of more than 30% and also look at number of pages of the indexes. If index with high number of pages is fragmented heavily then that is definitely going to affect performance.


Connection level options
Setting connection level options different then default value inside the query can cause operations like recompilation. Make sure your Connection level options are set properly inside the body
of TSQL.

Tuesday, October 5, 2010

First Three things to do when MS SQL Server performance hits badly

I have experienced so many times at the client sites when manager comes to me and say that their xyz system is really slow right now
or xyz system is running like a dog.Can you please look at it!!
The first thing i do after hearing this is give response like "Ohh! let me have a look" or "Ouch". Now whats next?
In this article I would like to demonstrate what should be next after somebody complains and leave.
SQL Server 2005 and later is blessed with so many tools and built in system objects which can take you in the right direction.
one such feature is "WAIT STATISTICS". There is very useful DMV which you can use to know where your sql waits are and that is

"sys.dm_os_wait_stats "


Make sure you calculate the delta between two snapshot from sys.dm_os_wait_stats  taken at the interval of 5 seconds.
Check the top wait types and take that as a direction for your self to move to the next right path. Atleast you will come to know whether processes are waiting long on IO, CPU, memory or internal latches etc etc.

for example if you see notice that PAGEIOLATCH_xx waits are high then you can be sure that there is lot of physical IO is going on then what your storage system can digest.

simmilarly CXPACKET wait types can take you the conclusion that there are lot of parallel processes are running which could be the reason for inefficiency.

High percentage of signal wait indicates that processors are not keeping up.

Overall the analysis of result from above query will take you to the next right path, which I am going to explain now.

FILE STATISTICS
------------------------------
This is the second thing I look at. This statistics come from the DMV "sys.dm_io_virtual_file_stats".
This DMv shows you the actual read and write perform to the disk for a snapshot of time.
Here also it is important to take a two snapshots and calculate the Rate to get the actual idea about how much IO is perform on a particular drive or database.
I would leave this exercise for you :).Please contact me if you need assistance in that.
What you should be interested in knowing is IO_STALL(in milliseconds) per IO Operation.Anything more than 15 is a concern for disk subsystem.
Once I know that IO is an issue, I can drill down further to look into which queries are expensive in terms of IO.

TOP Queries in terms of CPU, Duration and IO
----------------------------------
SQL Server gives you handy real time view of top expensive queries in terms of IO and CPU.You can either use reports via SSMS or leverage
sys.dm_exec_sql_text(sql_handle) and sys.dm_exec_query_stats to find out the expensive queries.

Following above path can guide you through the rest of journey of troubleshooting.From here you can either start collecting few performance counters or even can run a trace to dig deeper.