Thursday, August 25, 2016

Azure Tutorial: Predicting Gas Prices Using Azure Machine Learning Studio

What if you could predict whether your stock of choice would rise or fall during the next month? Or if your favorite football team would win or lose their next match? How can you make such predictions? Perhaps machine learning can provide part of the answer. Cortana, the new digital personal assistant powered by Bing that comes with Windows Phone 8.1 accurately predicted 15 out of 16 matches in the 2014 FIFA World Cup.
In this Azure tutorial, we will explore Azure Machine Learning features and capabilities through solving one of the problems that we face in our everyday lives.
Azure Machine Learning tutorial
From the machine learning developer’s point of view, problems can be divided into two groups - those that can be solved using standard methods, and those that cannot be solved using standard methods. Unfortunately, most real life problems belongs to the second group. This is where machine learning comes into play. The basic idea is to use machines to find meaningful patterns in historical data and use it to solve the problem.

The Problem

Gas prices are probably one of the items already in most people’s budget. Constant increase or decrease can influence prices of other groceries and services as well. There are a lot of factors that can influence gas prices, from weather conditions to political decisions and administrative fees, and to totally unpredictable factors such as natural disasters or wars.
The plan for this Azure machine learning tutorial is to investigate some accessible data and find correlations that can be exploited to create a prediction model.

Azure Machine Learning Studio

Azure Machine Learning Studio is web-based integrated development environment (IDE) for developing data experiments. It is closely knit with the rest of Azure’s cloud services and that simplifies development and deployment of machine learning models and services.

Creating the Experiment

There are five basic steps to creating a machine learning example. We will examine each of these steps through developing our own prediction model for gas prices.

Obtaining the Data

Gathering data is one of the most important step in this process. Relevance and clarity of the data are the basis for creating good prediction models. Azure Machine Learning Studio provides a number of sample data sets. Another great collection of datasets can be found at archive.ics.uci.edu/ml/datasets.html.
After collecting the data, we need to upload it to the Studio through their simple data upload mechanism:
Azure Machine Learning Studio
Once uploaded, we can preview the data. The following picture shows part of our data that we just uploaded. Our goal here is to predict the price under the column labeled E95.
Machine Learning Studio
Our next step is to create a new experiment by dragging and dropping modules from the panel on the left into the working area.
Azure Meachine Learning

Preprocessing Data

Preprocessing available data involves adjusting the available data to your needs. The first module that we will use here is “Descriptive Statistics”. It computes statistical data from the available data. Besides “Descriptive Statistics” module, one of the commonly used modules is “Clean Missing Data”. The aim of this step is to give meaning to missing (null) values by replacing it with some other value or by removing them entirely.

Defining Features

Another module applied at this step in our tutorial is the “Filter Based Feature Selection” module. This module determines the features of the dataset that are most relevant to the results that we want to predict. In this case, as you can see in the picture below, the four most relevant features for “E95” values are “EDG BS”, “Oil”, “USD/HRK”, and “EUR/USD”.
filter based feature selection
Since “EDG BS” is another “output” value that cannot be used for making predictions, we will select only two from the remaining important features - that is price of oil, and currency rate under USD/HRK column.
Sample of the dataset after processing is shown below:
azure studio dataset

Choosing and Applying an Algorithm

Our next step is to split the available data using the “Split” module. The first part of the data (in our case 80%) will be used to train the model and the rest is used to score the trained model.
azure split module
The following steps are the most important steps in the entire Azure machine learning process. The module “Train Model” accepts two input parameters. First is the raw training data, and the other is the learning algorithm. Here, we will be using the “Linear Regression” algorithm. Output of the “Train Model” module is one of the input parameters of the “Score Model” module. The other one is the rest of the available data. Score Model adds a new column to our dataset, Scored Labels. Values under the “Scored Labels” column are closer to the values of their corresponding E95 values when the applied learning algorithm works well with the available data.
train model module
Evaluate Model module gives us an evaluation of the trained model expressed in statistical values. If we look at “Coefficient of Determination”, we can conclude that there is around an 80% chance of predicting the correct price using this model.
Now, it is worth a try to use “Neural Network Regression” module. We will need to add new “Train Model” and “Score Model” modules and connect the output to the existing “Evaluate Model” module.
neural network regression module
The “Neural Network Regression” module requires a bit more configuration. Since this is the most important module of the entire experiment, it is where we should focus our efforts, tweaking and experimenting with the settings and selection of the appropriate learning algorithm as a whole.
In this case, Evaluate module gives us a comparison of our two trained models. Again, based on Coefficient of Determination we see that Neural Networks provides slightly less accurate predictions.
azure evaluate model
At this point we can save the selected trained models for future use.
azure trained model
When we have a trained model, we can proceed with creating “Scoring Experiment”. That can be done by creating a new experiment from scratch or by using Azure Machine Learning Studio helper. Simply select the trained model and click on “Create Scoring Experiment”. New modules that we need here are “Web service input” and “Web service output”. We will add a “Project Columns” module to select our input and output values. Input values are Oil and USD/HRK, and output is predicted value under “Scored Labels” column of the “Score Model” output.
The picture below shows our scoring experiment after these few adjustments and after connecting the “Web service input” and “Web service output” modules accordingly.
scoring experiment
Another nifty helper feature comes to play at this point. With “Publish Web Service” you can create a simple web service hosted on Azure’s cloud infrastructure.
publish web service

Predicting New Data

Finally, we can test our prediction web service using a simple test form.
azure test form
machine learning test form

Conclusion

Through this simple machine learning tutorial we have shown how to create a fully functional prediction web service. Azure Machine Learning Studio integrated into the Azure platform can be a very powerful tool for creating data experiments. Besides Machine Learning Studio, there are other machine learning solutions such as Orange and Tiberious. Regardless of the development environment you like, I encourage you to explore machine learning and find your inner data scientist.
This article was written by Ivan Matec, a Toptal freelance developer.

Wednesday, August 17, 2016

Eight Reasons Why Stack Is Still a Viable Choice

Life was great for developers 10 years ago. Companies were happy with going 100 percent Microsoft for their development projects. With ASP.NET on the frontend, .NET middle-tier and SQL Server on the backend, things worked very well for the most part. When they didn’t, developers just accepted that as something that came with territory. Microsoft was all but running the show. Then, at the end of the last decade, Microsoft’s 800-pound gorilla status started to unravel. Maybe it was due to the introduction of the iPhone and Microsoft missing the shift to mobile, or maybe it was because of the proliferation of open source projects, but things changed, and today those same companies need to be persuaded that going with Microsoft Stack is a good idea. This article presents eight reasons in favor of sticking with the Microsoft software stack.

Reason #1: .NET is Still One of the Best

Introduced more than 10 years ago, .NET Framework is feature-rich and thoroughly battle-tested. While it was commonplace to have to combine native development with managed code in the early days of .NET, the vast majority of development tasks are supported out of the box today. Even companies such as Oracle released components that are 100 percent .NET managed code (i.e. ODP.NET managed driver) to interface with their products. .NET API is consistent, well documented and used by millions.
The knowledge-base available via MSDN, StackOverflow and thousands of forums and blogs is massive. In my years of developing in .NET, I cannot recall an instance where I would get stuck for long on a framework bug; each time, someone had already experienced, researched and posted an answer, not always the answer I was hoping for, but still something that moved me forward. With the upcoming 2015 release, .NET Core will be open-source and available on non-Windows systems.

Reason #2: ASP.NET has Evolved

microsoft stack
Looking back at the traditional web-to-database Microsoft stack from 10 years ago, it’s interesting to see which parts survived the test of time and which parts faded away. While the back-end of the Microsoft stack remained pretty much unchanged (we still use the same set of patterns and components, such as Dependency Injection, Tasks, Linq, EF or ADO) the front-end, the ASP.NET piece, saw a fundamental shift from “do it the Microsoft way “ (i.e. Web Forms) to “do it your way and use ASP.NET as a platform.” Today, ASP.NET is an MVC-based framework featuring robust infrastructure for authentication, bundling and routing that integrates with many non-Microsoft technologies such as Bootstrap and AngularJS. ASP.NET sites look nice on a wide range of form-factors, from phones to desktops, and its Web API capabilities make exposing web services a breeze. The framework has been open-source for a number of years, so if you get stuck on a problem, the source is available on GitHub. ASP.NET has changed, and changed for the better.

Reason #3: Simplicity of Web API and Power of WCF

web and microsoft stack
My all-time favorite quote is from Alan Kay who said, “Simple things should be simple; complex things should be possible”. When Windows Communication Foundation (WCF) first came out in 2006, it was anything but simple; behaviors, endpoints, and bindings were overwhelming. So, Microsoft released Web API, an easy-to-use framework that makes exposing HTTP web services a piece of cake. With a few lines of configuration, your API turns into a secure, “industry-standard” web service.
If your use case does not fit the “standard” mold, and you need full control over how your API is exposed over the wire, you can always fall back on WCF. With the myriad of configuration options and hooks, WCF lets you custom-serialize your data, log, intercept, route message, use peer-to-peer and queuing, and much-much more. Web API, together with WCF, delivers on both tenets of Kay’s quote: if you need a simple web service, you are done in minutes with Web API; if your service requirements are complex, “all” is possible with WCF. These two technologies provide a comprehensive coverage of service scenarios and come prepackaged with the .NET framework.

Reason #4: SQL Server is as Solid as Ever

For many years, it seemed like the tidal wave of new development languages, frameworks and patterns came through the front and middle tiers and spared the database back-end. After all, the good old “SELECT” is still as much in use today as it was 20 years ago. I suppose this is due to the fact that many companies view their data as the core of their business, and keeping the integrity of that core far outweighs the excitement of trying “something new” at the database layer.
SQL Server excels at its primary role of a data keeper with a myriad of features for transactions, referential integrity, backups, mirroring and replication, but what sets SQL Server apart from competition is how well it integrates with the rest of the Microsoft stack. For rapid development, there is the Entity Framework, currently in version 6, passed adolescence and delivering well on its promise of streamlining data access. If you need computing power, the .NET Framework is loaded in-process with SQL Server, meaning you can embed .NET code as stored procedures, functions or aggregates without sacrificing performance. Pair that with the fact that SQL Server 2014 comes with in-memory tables, and you can come up with some pretty slick real-time solutions that could not be made fast enough solely with SQL and regular tables. After years in the industry, SQL Server is still on top of my list of RDBMSs.

Reason #5: It’s Easily Testable

So many times, working in corporate IT, I saw software turning into these untouchable black boxes because there were no tests, and nobody wanted to mess with the code for fear of “breaking something else”. Then, I worked on systems that had thousands of tests, and it was a great feeling to be able to tell business that, “yes, we can make these changes,” years after software had been released. The Microsoft stack is designed with testability in mind. ASP.NET MVC has hooks for dependency injection, and in version 5, dependency injection will be included in the framework itself. In the middle tier, it’s a similar story: we use dependency injection to disassociate implementation from the interface, which lets us swap production types with mocks at test time. Even on the database side, there are SQL Server Data Tools that come with templates for testing against the stored procedure layer. Testing is an inseparable part of the software development process today, and the Microsoft stack comes well equipped for this new reality.

Reason #6: Elaborate Support Ecosystem

When it comes to support, it’s nice to have a range of options, starting with community forums and ending with an actual live human being working on-site on your server. The online ecosystem for Microsoft products is one of the largest in the industry. After all, Microsoft was started by Bill Gates, a software developer himself, who saw the wide adoption by developers as the key to proliferation of Microsoft products. That meant providing these developers with lots and lots of support.
Microsoft was among the first to encourage its employees to blog about the technology they were working on, and while the rest of the industry has certainly caught up, the amount and the quality of instructional videos, guides and articles coming directly from Microsoft today is still very impressive. That layer of quality online content is supplemented by a large number of community-based support ecosystems such as StackOverflow, which are not as consistent when it comes to content quality, but are, nevertheless, far more helpful than not.
Lastly, there is always an option to pick up the phone and call Microsoft support. I rarely had to use it, but there were a handful of production emergencies when having Microsoft devs analyze core dumps saved the day. The range of support options is clearly a factor in favor for going with the Microsoft stack.

Reason #7: Microsoft Sticks to their Products

A few years back, choosing Microsoft Silverlight as the front-end for an application seemed like a valid choice, but that’s no longer the case. With the mobile trend in full swing and JavaScript frameworks dominating the front-end space, Silverlight is no longer a feasible option; nevertheless, it’s still supported by Microsoft through 2021. Microsoft sticks to its guns, which is good for those of us who have to make technology choices without having a magic eight ball to tell us what technology trend will dominate the software landscape in the future. Going with the Microsoft stack ensures that time and money is invested into technology that will be supported even if it falls out of favor with the industry.

Reason #8: Visual Studio Umbrella

A decade ago, I was spending about 50 percent of my time working in Visual Studio and about 50 percent in other tools. Today, the split is overwhelmingly in favor of Visual Studio. Microsoft’s vision for Visual Studio to be a one-stop solution for hosting IDEs is coming to fruition with many Microsoft and non-Microsoft products offering some level of integration with Visual Studio. From database development with SQL Server Data Tools to writing iPad and Android apps with Xamarin, Visual Studio provides a familiar developer experience with a consistent user interface. The same can be said about working with Microsoft Azure, a cloud platform encompassing a variety of services from database hosting to mobile services.
Visual Studio obfuscated the complexities of distributed cloud infrastructure making the experience of developing cloud applications consistent with that of developing applications not hosted in the cloud. All the pieces seem to fit together nicely under the umbrella of Visual Studio, making the overall development process very efficient.

Microsoft Stack - The Best of Both Worlds

Today, there are far more choices for writing quality software compared to 10 years ago. That is certainly a good thing because competition forces big players, such as Google, Apple, Amazon and Microsoft, to continue to innovate and not get complacent. While Microsoft has been pushed from the top of the mountain by the tech evolution of the past decade, the company has shown that it’s willing to adapt and is attuned to realities of the current technological trends. ASP.NET embraced other technologies and methodologies, many of them open source, with the original Web Forms fading into history. The .NET framework continues to evolve, breaking new frontiers with libraries for multi-threading and many-core computing. With the imminent 2015 release, the core of the framework will be open-source and portable to non-Windows platforms, which is a step in the direction of inclusiveness and transparency.
These welcomed improvements come from a company that has long-established processes for releasing software that’s tested, documented and supported. Going with the Microsoft stack brings the excitement of working with modern languages and frameworks plus the stability of being backed by a software giant with decades of experience in the development industry. This is why I am recommending the Microsoft stack today.
This article was written by , a Toptal freelance developer.

Thursday, July 21, 2016

How to Tune Microsoft SQL Server for Performance

To retain its users, any application or website must run fast. For mission critical environments, a couple of milliseconds delay in getting information might create big problems. As database sizes grow day by day, we need to fetch data as fast as possible, and write the data back into the database as fast as possible. To make sure all operations are executing smoothly, we have to tune our database server for performance.
In this article I will describe a step-by-step procedure for basic performance tuning on one of the top database servers in the market: Microsoft SQL Server (SQL Server, for short).

#1 Finding The Culprits

As with any other software, we need to understand that SQL Server is a complex computer program. If we have a problem with it, we need to discover why it is not running as we expect.
sql server performance
From SQL Server we need to pull and push data as fast and as accurately as possible. If there are issues, a couple of basic reasons, and the first two things to check, are:
  • The hardware and installation settings, which may need correcting since SQL Server needs are specific
  • If we have provided the correct T-SQL code for SQL Server to implement
Even though SQL Server is proprietary software, Microsoft has provided a lot of ways to understand it and use it efficiently.
If the hardware is OK and the installation has been done properly, but the SQL Server is still running slowly, then first we need to find out if there are any software related errors. To check what is happening, we need to observe how different threads are performing. This is achieved by calculating wait statistics of different threads. SQL server uses threads for every user request, and the thread is nothing but another program inside our complex program called SQL Server. It is important to note that this thread is not an operating system thread on which SQL server is installed; it is related to the SQLOS thread, which is a pseudo operating system for the SQL Server.
Wait statistics can be calculated using sys.dm_os_wait_stats Dynamic Management View (DMV), which gives additional information about its current state. There are many scripts online to query this view, but my favorite is  Paul Randal’s script because it is easy to understand and has all the important parameters to observe wait statistics:
WITH [Waits] AS
(SELECT
pe], [wait_t
[wait_t
yime_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[wait_time_ms] / SUM ([wait_time_ms]
[waiting_tasks_count] AS [WaitCount], 100.0 *) OVER() AS [Percentage],
[wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats
ROW_NUMBER() OVER(ORDER BY WHERE [wait_type] NOT IN ( N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
EUE', N'CHKPT', N'CLR_AUTO_EVENT', N'CLR
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_Q U_MANUAL_EVENT', N'CLR_SEMAPHORE', N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
_SEMAPHORE', N'EXECSYNC', N'FSAGENT', N'FT_IF
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEU ETS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX', N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
KEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'ONDEMAN
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE', N'KSOURCE_W AD_TASK_QUEUE', N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
EEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEE
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP', N'S LP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK', N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHE
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N 'R_JOIN', N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT') AND [waiting_tasks_count] > 0 ) SELECT MAX ([W1].[wait_type]) AS [WaitType], CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
entage]) AS DECIMAL (5,2)) AS [Percentage], CAST ((MAX ([W1].[Wa
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S], CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S], MAX ([W1].[WaitCount]) AS [WaitCount], CAST (MAX ([W1].[Per citS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S], CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S], CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum] HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
GO
When we execute this script, we need to concentrate on the top rows of the result because they are set first and represent the maximum wait type.
We need to understand wait types so we can make the correct decisions. To learn about different wait types, we can go to the excellent Microsoft documentation.
Let’s take an example where we have too much PAGEIOLATCH_XX. This means a thread is waiting for data page reads from the disk into the buffer, which is nothing but a memory block. We must be sure we understand what’s going on. This does not necessarily mean a poor I/O subsystem or not enough memory, and increasing the I/O subsystem and memory will solve the problem, but only temporarily. To find a permanent solution we need to see why so much data is being read from the disk: What types of SQL commands are causing this? Are we reading too much data instead of reading less data by using filters, such as where clauses? Are too many data reads happening because of table scans or index scans? Can we convert them to index seeks by implementing or modifying existing indexes? Are we writing SQL queries that are misunderstood by SQL Optimizer (another program inside our SQL server program)?
We need to think from different angles and use different test cases to come up with solutions. Each of the above wait type needs a different solution. A database administrator needs to research them thoroughly before taking any action. But most of the time, finding problematic T-SQL queries and tuning them will solve 60 to 70 percent of the problems.

#2 Finding Problematic Queries

As mentioned above, first thing we can do is to search problematic queries. The following T-SQL code will find the 20 worst performing queries:
SELECT TOP 20
total_worker_time/execution_count AS Avg_CPU_Time
,Execution_count
me/execution_count as AVG_Run_Time ,total_elapsed_t
,total_elapsed_t iime ,(SELECT
,statement_start_offset/2+1,statement_end_offset ) FROM sys.dm
SUBSTRING(tex t_exec_sql_text(sql_handle) ) AS Query_Text FROM sys.dm_exec_query_stats
ORDER BY Avg_CPU_Time DESC
We need to be careful with the results; even though a query can have a maximum average run time, if it runs only once, the total effect on the server is low compared to a query which has a medium average run time and runs lots of times in a day.

#3 Fine Tuning Queries

The fine-tuning of a T-SQL query is an important concept. The fundamental thing to understand is how well we can write T-SQL queries and implement indexes, so that the SQL optimizer can find an optimized plan to do what we wanted it to do. With every new release of SQL Server, we get a more sophisticated optimizer that will cover our mistakes in writing not optimized SQL queries, and will also fix any bugs related to the previous optimizer. But, no matter how intelligent the optimizer may be, if we can’t tell it what we want (by writing a proper T-SQL queries), the SQL optimizer won’t do be able to do its job.
SQL Server uses advanced search and sorting algorithms. If we are good at search and sorting algorithms, then most of the time we can guess why SQL Server is taking particular action. The best book for learning more and understanding such algorithms is The Art of Computer Programming by Donald Knuth.
When we examine queries that need to be fine-tuned, we need to use the execution plan of those queries so that we can find out how SQL server is interpreting them.
I can’t cover all the aspects of the execution plan here, but on a basic level I can explain the things we need to consider.
  • First we need to find out which operators take most of the query cost.
  • If the operator is taking a lot of cost, we need to learn the reason why. Most of the time, scans will take up more cost than seeks. We need to examine why a particular scan (table scan or index scan) is happening instead of an index seek. We can solve this problem by implementing proper indexes on table columns, but as with any complex program, there is no fixed solution. For example, if the table is small then scans are faster than seeks.
  • There are approximately 78 operators, which represent the various actions and decisions of the SQL Server execution plan. We need to study them in-depth by consulting the Microsoft documentation, so that we can understand them better and take proper action.

#4 Execution Plan Re-use

Even if we implement proper indexes on tables and write good T-SQL code, if the execution plan is not reused, we will have performance issues. After fine-tuning the queries, we need to make sure that the execution plan may be re-used when necessary. Most of the CPU time will be spent on calculating execution plan that can be eliminated, if we re-use the plan.
We can use the query below to find out how many times execution plan is re-used, where usecountsrepresents how many times the plan is re-used:
SELECT [ecp].[refcounts]
, [ecp].[usecounts]
DB_NAME([est].[db
, [ecp].[objtype] , id]) AS [db_name] , [est].[objectid]
[query_plan] FROM sys.dm_exec
, [est].[text] as [query_ext] , [eqp] ._cached_plans ecp
exec_sql_text ( ecp.plan_handle ) est CROSS APPLY sys.dm
CROSS APPLY sys.dm
__exec_query_plan ( ecp.plan_handle ) eqp
The best way to re-use the execution plan is by implementing parameterized stored procedures. When we are not in a position to implement stored procedures, we can use sp_executesql, which can be used instead to execute T-SQL statements when the only change to the SQL statements are parameter values. SQL Server most likely will reuse the execution plan that it generated in the first execution.
Again, as with any complex computer program, there is no fixed solution. Sometimes it is better to compile the plan again.
Let’s examine following two example queries:
  • select name from table where name = 'sri';
  • select name from table where name = 'pal';
Let us assume we have a non-clustered index on the name column and half of the table has value sri and few rows have pal in the name column. For the first query, SQL Server will use the table scan because half of the table has the same values. But for the second query, it is better to use the index scan because only few rows have pal value.
Even though queries are similar, the same execution plan may not be good solution. Most of the time it will be a different case, so we need to carefully analyze everything before we decide. If we don’t want to re-use the execution plan, we can always use the “recompile” option in stored procedures.
Keep in mind that even after using stored procedures or sp_executesql, there are times when the execution plan won’t be re-used. They are:
  • When indexes used by the query change or are dropped
  • When the statistics, structure or schema of a table used by the query changes
  • When we use the “recompile” option
  • When there are a large number of insertions, updates or deletes
  • When we mix DDL and DML within a single query

#5 Removing Unnecessary Indexes

After fine-tuning the queries, we need to check how the indexes are used. Index maintenance requires lots of CPU and I/O. Every time we insert data into a database, SQL Server also needs to update the indexes, so it is better to remove them if they are not used.
sql server performance
SQL server provides us dm_db_index_usage_stats DMV to find index statistics. When we run the T-SQL code below, we get usage statistics for different indexes. If we find indexes that are not used at all, or used rarely, we can drop them to gain performance.
SELECT
OBJECT_NAME(IUS.[OBJECT_ID]) AS [OBJECT NAME],
DB_NAME(IUS.database_id) AS [DATABASE NAME],
ANS, USER_LOOKUPS, USER_U
I.[NAME] AS [INDEX NAME], USER_SEEKS, USER_S CPDATES FROM SYS.DM_DB_INDEX_USAGE_STATS AS IUS
US.[OBJECT_ID] AND I.INDEX_
INNER JOIN SYS.INDEXES AS I ON I.[OBJECT_ID] =
IID = IUS.INDEX_ID

#6 SQL Server Installation And Database Setup

When setting up a database, we need to keep data and log files separately. The main reason for this is that writing and accessing data files is not sequential, whereas writing and accessing log files is sequential. If we put them on the same drive we can’t use them in an optimized way.
When we purchase Storage Area Network (SAN), a vendor may give us some recommendations on how to setup it up, but this information is not always helpful. We need to have a detailed discussion with our hardware and networking guys on how to keep data and log files separately and in an optimized way.

#7 Don’t Overload SQL Server

The primary task of any database administrator is to make sure the production server runs smoothly and serves customers as well as possible. To make this happen we need to maintain separate databases (if possible, on separate machines) for the following environments:
  • Production
  • Development
  • Testing
  • Analytical
For a production database we need a database with full recovery mode, and for other databases, a simple recovery mode is enough.
Testing on a production database will put lots of load on the transaction log, indexes, CPU and I/O. That’s why we need to use separate databases for production, development, testing and analyzing. If possible, use separate machines for each database, because it will decrease the load on the CPU and I/O.

#8 Transaction log, tempdb and memory

Log file needs to have enough free space for normal operations because an autogrow operation on a log file is time-consuming and could force other operations to wait until it is completed. To find out the log file size for each database and how much it is used, we can use DBCC SQLPERF(logspace).
The best way to set up tempdb is to put it on separate disk. We need to keep the initial size as big as we can afford because when it reaches an autogrow situation, performance will decrease.
As mentioned before, we need to make sure that SQL server runs on a separate machine, preferably one without any other application on it. We need to keep some memory for the operating system, plus some more if it is part of a cluster, so in most cases around 2GB should do.

Conclusion:

The procedures and suggestions discussed here are for basic performance tuning only. If we follow these steps, we may, on average, get around 40 to 50 percent improvement in performance. To do advanced SQL Server performance tuning, we would need to dig much deeper into each of the steps covered here.
This post originally appeared in Toptal Engineering blog