Thursday, July 28, 2016

MySQL Master-Slave Replication on the Same Machine

MySQL replication is a process that enables data from one MySQL database server (the master) to be copied automatically to one or more MySQL database servers (the slaves). It is usually used to spread read access on multiple servers for scalability, although it can also be used for other purposes such as for failover, or analyzing data on the slave in order not to overload the master.
As the master-slave replication is a one-way replication (from master to slave), only the master database is used for the write operations, while read operations may be spread on multiple slave databases. What this means is that if master-slave replication is used as the scale-out solution, you need to have at least two data sources defined, one for write operations and the second for read operations.
MySQL master-slave replication
MySQL developers usually work on only one machine and tend to have their whole development environment on that one machine, with the logic that they are not dependent on a network or internet connection. If a master-slave replication is needed because, for example, they need to test replication in a development environment before deploying changes elsewhere, they have to create it on the same machine. While the setup of a single MySQL instance is fairly simple, we need to make some extra effort to setup a second, and then a master-slave replication.
For this step-by-step tutorial, I’ve chosen Ubuntu Linux as the host operating system, and the provided commands are for that operating system. If you want to setup your MySQL master-slave replication on some other operating system, you will need to make modifications for its specific commands. However, general principles of setting up the MySQL master-slave replication on the same machine are the same for all operating systems.
MySQL master-slave replication

Installation of the first MySQL instance

If you already have one instance of MySQL database installed on your machine, you can skip this step.
The easiest way to install MySQL on the Ubuntu is to run the following command from a terminal prompt:
sudo apt-get install mysql-server
During the installation process, you will be prompted to set a password for the MySQL root user.

Setting up mysqld_multi

In order to manage two MySQL instances on the same machine efficiently, we need to use mysqld_multi.
First step in setting up mysqld_multi is the creation of two separate [mysqld] groups in the existing my.cnffile. Default location of my.cnf file on the Ubuntu is /etc/mysql/. So, open my.cnf file with your favorite text editor, and rename existing [mysqld] group to [mysqld1]. This renamed group will be used for the configuration of the first MySQL instance and will be also configured as a master instance. As in MySQL master-slave replication each instance must have its own unique server-id, add the following line in [mysqld1] group:
server-id = 1
Since we need a separate [mysqld] group for the second MySQL instance, copy the [mysqld1] group with all current configurations, and paste it below in the same my.cnf file. Now, rename the copied group to [mysqld2], and make the following changes in the configuration for the slave:
server-id           = 2
port                = 3307
socket              = /var/run/mysqld/mysqld_slave.sock
pid-file            = /var/run/mysqld/
datadir             = /var/lib/mysql_slave
log_error           = /var/log/mysql_slave/error_slave.log
relay-log           = /var/log/mysql_slave/relay-bin
relay-log-index     = /var/log/mysql_slave/relay-bin.index
master-info-file    = /var/log/mysql_slave/
relay-log-info-file = /var/log/mysql_slave/
read_only           = 1
To setup the second MySQL instance as a slave, set server-id to 2, as it must be different to the master’s server-id.
Since both instances will run on the same machine, set port for the second instance to 3307since it has to be different from the port used for the first instance, which is 3306 by default.
In order to enable this second instance to use the same MySQL binaries, we need to set different values for socketpid-filedatadir and log_error.
We also need to enable relay-log in order to use the second instance as a slave (parameters relay-logrelay-log-index and relay-log-info-file), as well as to set master-info-file.
Finally, in order to make the slave instance read-only, parameter read_only is set to 1. You should be careful with this option since it doesn’t completely prevent changes on the slave. Even when the read_only is set to 1, updates will be permitted only from users who have the SUPER privilege. MySQL has recently introduced the new parameter super_read_only to prevent SUPER users making changes. This option is available with version 5.7.8.
Apart from the [mysqld1] and [mysqld2] groups, we also need to add a new group [mysqld_multi] to the my.cnf file:
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
user       = multi_admin
password   = multipass
Once we install the second MySQL instance, and we start up both, we will give appropriate privileges to the multi_admin user in order to be able to shut down MySQL instances.

Create new folders for the second MySQL instance

In the previous step we prepared the configuration file for the second MySQL instance. In that configuration file two new folders are used. The following Linux commands should be used in order to create those folders with appropriate privileges:
mkdir -p /var/lib/mysql_slave
chmod --reference /var/lib/mysql /var/lib/mysql_slave
chown --reference /var/lib/mysql /var/lib/mysql_slave
mkdir -p /var/log/mysql_slave
chmod --reference /var/log/mysql /var/log/mysql_slave
chown --reference /var/log/mysql /var/log/mysql_slave

Additional security settings in AppArmor

In some Linux environments, AppArmor security settings are needed in order to run the second MySQL instance. At least, they are required on Ubuntu.
To properly set-up AppArmor, edit /etc/apparmor.d/usr.sbin.mysqld file with your favorite text editor, add the following lines:
/var/lib/mysql_slave/ r,
/var/lib/mysql_slave/** rwk,
/var/log/mysql_slave/ r,
/var/log/mysql_slave/* rw,
/var/run/mysqld/ rw,
/var/run/mysqld/mysqld_slave.sock w,
/run/mysqld/ rw,
/run/mysqld/mysqld_slave.sock w,
After you save the file, reboot the machine in order for these changes to take effect.

Installation of the second MySQL instance

Several different approaches may be followed for the installation of the second MySQL instance. The approach presented in this tutorial uses the same MySQL binaries as the first, with separate data files necessary for the second installation.
Since we have already prepared the configuration file and the necessary folders and security changes in the previous steps, the final installation step of the second MySQL instance is the initialization of the MySQL data directory.
Execute the following command in order to initialize new MySQL data directory:
mysql_install_db --user=mysql --datadir=/var/lib/mysql_slave
Once MySQL data directory is initialized, you can start both MySQL instances using the mysqld_multi service:
mysqld_multi start
Set the root password for the second MySQL instance by using the mysqladmin with the appropriate host and port. Keep in mind, if host and port are not specified, mysqladmin will connect to the first MySQL instance by the default:
mysqladmin --host= --port=3307 -u root password rootpwd
In the example above I set the password to “rootpwd”, but using a more secure password is recommended.

Additional configuration of mysqld_multi

At the end of the “Setting up mysqld_multi” section, I wrote that we will give appropriate privileges to the multi_admin user later on, so now is the time for that. We need to give this user appropriate privileges in both instances, so let’s first connect to the first instance:
mysql --host= --port=3306 -uroot -p
Once logged in, execute the following two commands:
mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass';
Exit from the MySQL client, and connect to the second instance:
mysql --host= --port=3307 -uroot -p
Once logged in, execute the same two commands as above:
mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass';
Exit from the MySQL client.

Start both MySQL instances automatically on boot

The final step of setting up mysqld_multi is the installation of the automatic boot script in the init.d.
To do that, create new file named mysqld_multi in /etc/init.d, and give it appropriate privileges:
cd /etc/init.d
touch mysqld_multi
chmod +x /etc/init.d/mysqld_multi
Open this new file with your favorite text editor, and copy the following script:

# Provides:       scriptname
# Required-Start: $remote_fs $syslog
# Required-Stop:  $remote_fs $syslog
# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6
# Short-Description: Start daemon at boot time
# Description:    Enable service provided by daemon.
if test -x $bindir/mysqld_multi
    echo "Can't execute $bindir/mysqld_multi";
case "$1" in
    'start' )
     "$mysqld_multi" start $2
    'stop' )
     "$mysqld_multi" stop $2
    'report' )
     "$mysqld_multi" report $2
    'restart' )
     "$mysqld_multi" stop $2
     "$mysqld_multi" start $2
     echo "Usage: $0 {start|stop|report|restart}" >&2
Add mysqld_multi service to the default runlevels with the following command:
update-rc.d mysqld_multi defaults
Reboot your machine, and check that both MySQL instances are running by using the following command:
mysqld_multi report

Setup master-slave replication

Now, when we have two MySQL instances running on the same machine, we will setup the first instance as a master, and the second as a slave.
One part of the configuration was already performed in the chapter “Setting up mysqld_multi”. The only remaining change in the my.cnf file is to set binary logging on the master. To do this, edit my.cnf file with the following changes and additions in the [mysqld1] group:
log_bin                     = /var/log/mysql/mysql-bin.log
innodb_flush_log_at_trx_commit  = 1
sync_binlog                 = 1
binlog-format               = ROW
Restart the master MySQL instance in order for these changes to take effect:
mysqld_multi stop 1
mysqld_multi start 1
In order for the slave to connect to the master with the correct replication privileges, a new user should be created on the master. Connect to the master instance using the MySQL client with the appropriate host and port:
mysql -uroot -p --host= --port=3306
Create a new user for replication:
mysql> CREATE USER 'replication'@'%' IDENTIFIED BY 'replication';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
Exit from the MySQL client.
Execute the following command in order to create a dump of the master data:
mysqldump -uroot -p --host= --port=3306 --all-databases --master-data=2 > replicationdump.sql
Here we use the option --master-data=2 in order to have a comment containing a CHANGE MASTER statement inside the backup file. That comment indicates the replication coordinates at the time of the backup, and we will need those coordinates later for the update of master information in the slave instance. Here is the example of that comment:
-- Position to start replication or point-in-time recovery from

Import the dump you created in the previous step into the slave instance:
mysql -uroot -p --host= --port=3307 < replicationdump.sql
Finally, in order for slave instance to connect to the master instance, the master information on the slave needs to be updated with the appropriate connection parameters.
Connect to the slave instance using the MySQL client with the appropriate host and port:
mysql -uroot -p --host= --port=3307
Execute the following command in order to update the master information (take the replication coordinates from the dump file replicationdump.sql, as explained above):
 -> MASTER_USER='replication',
 -> MASTER_PASSWORD='replication',
 -> MASTER_LOG_FILE='mysql-bin.000001',
Execute the following command in order to start the slave:
Execute the following command in order to verify the replication is up and running:
Congratulations. Your MySQL master-slave replication on the same machine is now successfully set up.
MySQL master-slave replication

Wrap Up

Having a master-slave replication configured in your development environment is useful if you need it for a scale-out solution in the production environment. This way, you will also have separate data sources configured for write and read operations so you can test locally that everything works as expected before further deployment.
Additionally, you may want to have several slave instances configured on the same machine to test the load balancer that distributes the read operations to several slaves. In that case, you may use this same manual to setup other slave instances by repeating all the same steps.
This article wsa written by Ivan Bojovic, a Toptal SQL developer.

Thursday, July 21, 2016

How to Tune Microsoft SQL Server for Performance

To retain its users, any application or website must run fast. For mission critical environments, a couple of milliseconds delay in getting information might create big problems. As database sizes grow day by day, we need to fetch data as fast as possible, and write the data back into the database as fast as possible. To make sure all operations are executing smoothly, we have to tune our database server for performance.
In this article I will describe a step-by-step procedure for basic performance tuning on one of the top database servers in the market: Microsoft SQL Server (SQL Server, for short).

#1 Finding The Culprits

As with any other software, we need to understand that SQL Server is a complex computer program. If we have a problem with it, we need to discover why it is not running as we expect.
sql server performance
From SQL Server we need to pull and push data as fast and as accurately as possible. If there are issues, a couple of basic reasons, and the first two things to check, are:
  • The hardware and installation settings, which may need correcting since SQL Server needs are specific
  • If we have provided the correct T-SQL code for SQL Server to implement
Even though SQL Server is proprietary software, Microsoft has provided a lot of ways to understand it and use it efficiently.
If the hardware is OK and the installation has been done properly, but the SQL Server is still running slowly, then first we need to find out if there are any software related errors. To check what is happening, we need to observe how different threads are performing. This is achieved by calculating wait statistics of different threads. SQL server uses threads for every user request, and the thread is nothing but another program inside our complex program called SQL Server. It is important to note that this thread is not an operating system thread on which SQL server is installed; it is related to the SQLOS thread, which is a pseudo operating system for the SQL Server.
Wait statistics can be calculated using sys.dm_os_wait_stats Dynamic Management View (DMV), which gives additional information about its current state. There are many scripts online to query this view, but my favorite is  Paul Randal’s script because it is easy to understand and has all the important parameters to observe wait statistics:
WITH [Waits] AS
pe], [wait_t
yime_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[wait_time_ms] / SUM ([wait_time_ms]
[waiting_tasks_count] AS [WaitCount], 100.0 *) OVER() AS [Percentage],
[wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats
entage]) AS DECIMAL (5,2)) AS [Percentage], CAST ((MAX ([W1].[Wa
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S], CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S], MAX ([W1].[WaitCount]) AS [WaitCount], CAST (MAX ([W1].[Per citS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S], CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S], CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum] HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
When we execute this script, we need to concentrate on the top rows of the result because they are set first and represent the maximum wait type.
We need to understand wait types so we can make the correct decisions. To learn about different wait types, we can go to the excellent Microsoft documentation.
Let’s take an example where we have too much PAGEIOLATCH_XX. This means a thread is waiting for data page reads from the disk into the buffer, which is nothing but a memory block. We must be sure we understand what’s going on. This does not necessarily mean a poor I/O subsystem or not enough memory, and increasing the I/O subsystem and memory will solve the problem, but only temporarily. To find a permanent solution we need to see why so much data is being read from the disk: What types of SQL commands are causing this? Are we reading too much data instead of reading less data by using filters, such as where clauses? Are too many data reads happening because of table scans or index scans? Can we convert them to index seeks by implementing or modifying existing indexes? Are we writing SQL queries that are misunderstood by SQL Optimizer (another program inside our SQL server program)?
We need to think from different angles and use different test cases to come up with solutions. Each of the above wait type needs a different solution. A database administrator needs to research them thoroughly before taking any action. But most of the time, finding problematic T-SQL queries and tuning them will solve 60 to 70 percent of the problems.

#2 Finding Problematic Queries

As mentioned above, first thing we can do is to search problematic queries. The following T-SQL code will find the 20 worst performing queries:
total_worker_time/execution_count AS Avg_CPU_Time
me/execution_count as AVG_Run_Time ,total_elapsed_t
,total_elapsed_t iime ,(SELECT
,statement_start_offset/2+1,statement_end_offset ) FROM
SUBSTRING(tex t_exec_sql_text(sql_handle) ) AS Query_Text FROM sys.dm_exec_query_stats
We need to be careful with the results; even though a query can have a maximum average run time, if it runs only once, the total effect on the server is low compared to a query which has a medium average run time and runs lots of times in a day.

#3 Fine Tuning Queries

The fine-tuning of a T-SQL query is an important concept. The fundamental thing to understand is how well we can write T-SQL queries and implement indexes, so that the SQL optimizer can find an optimized plan to do what we wanted it to do. With every new release of SQL Server, we get a more sophisticated optimizer that will cover our mistakes in writing not optimized SQL queries, and will also fix any bugs related to the previous optimizer. But, no matter how intelligent the optimizer may be, if we can’t tell it what we want (by writing a proper T-SQL queries), the SQL optimizer won’t do be able to do its job.
SQL Server uses advanced search and sorting algorithms. If we are good at search and sorting algorithms, then most of the time we can guess why SQL Server is taking particular action. The best book for learning more and understanding such algorithms is The Art of Computer Programming by Donald Knuth.
When we examine queries that need to be fine-tuned, we need to use the execution plan of those queries so that we can find out how SQL server is interpreting them.
I can’t cover all the aspects of the execution plan here, but on a basic level I can explain the things we need to consider.
  • First we need to find out which operators take most of the query cost.
  • If the operator is taking a lot of cost, we need to learn the reason why. Most of the time, scans will take up more cost than seeks. We need to examine why a particular scan (table scan or index scan) is happening instead of an index seek. We can solve this problem by implementing proper indexes on table columns, but as with any complex program, there is no fixed solution. For example, if the table is small then scans are faster than seeks.
  • There are approximately 78 operators, which represent the various actions and decisions of the SQL Server execution plan. We need to study them in-depth by consulting the Microsoft documentation, so that we can understand them better and take proper action.

#4 Execution Plan Re-use

Even if we implement proper indexes on tables and write good T-SQL code, if the execution plan is not reused, we will have performance issues. After fine-tuning the queries, we need to make sure that the execution plan may be re-used when necessary. Most of the CPU time will be spent on calculating execution plan that can be eliminated, if we re-use the plan.
We can use the query below to find out how many times execution plan is re-used, where usecountsrepresents how many times the plan is re-used:
SELECT [ecp].[refcounts]
, [ecp].[usecounts]
, [ecp].[objtype] , id]) AS [db_name] , [est].[objectid]
[query_plan] FROM sys.dm_exec
, [est].[text] as [query_ext] , [eqp] ._cached_plans ecp
exec_sql_text ( ecp.plan_handle ) est CROSS APPLY
__exec_query_plan ( ecp.plan_handle ) eqp
The best way to re-use the execution plan is by implementing parameterized stored procedures. When we are not in a position to implement stored procedures, we can use sp_executesql, which can be used instead to execute T-SQL statements when the only change to the SQL statements are parameter values. SQL Server most likely will reuse the execution plan that it generated in the first execution.
Again, as with any complex computer program, there is no fixed solution. Sometimes it is better to compile the plan again.
Let’s examine following two example queries:
  • select name from table where name = 'sri';
  • select name from table where name = 'pal';
Let us assume we have a non-clustered index on the name column and half of the table has value sri and few rows have pal in the name column. For the first query, SQL Server will use the table scan because half of the table has the same values. But for the second query, it is better to use the index scan because only few rows have pal value.
Even though queries are similar, the same execution plan may not be good solution. Most of the time it will be a different case, so we need to carefully analyze everything before we decide. If we don’t want to re-use the execution plan, we can always use the “recompile” option in stored procedures.
Keep in mind that even after using stored procedures or sp_executesql, there are times when the execution plan won’t be re-used. They are:
  • When indexes used by the query change or are dropped
  • When the statistics, structure or schema of a table used by the query changes
  • When we use the “recompile” option
  • When there are a large number of insertions, updates or deletes
  • When we mix DDL and DML within a single query

#5 Removing Unnecessary Indexes

After fine-tuning the queries, we need to check how the indexes are used. Index maintenance requires lots of CPU and I/O. Every time we insert data into a database, SQL Server also needs to update the indexes, so it is better to remove them if they are not used.
sql server performance
SQL server provides us dm_db_index_usage_stats DMV to find index statistics. When we run the T-SQL code below, we get usage statistics for different indexes. If we find indexes that are not used at all, or used rarely, we can drop them to gain performance.

#6 SQL Server Installation And Database Setup

When setting up a database, we need to keep data and log files separately. The main reason for this is that writing and accessing data files is not sequential, whereas writing and accessing log files is sequential. If we put them on the same drive we can’t use them in an optimized way.
When we purchase Storage Area Network (SAN), a vendor may give us some recommendations on how to setup it up, but this information is not always helpful. We need to have a detailed discussion with our hardware and networking guys on how to keep data and log files separately and in an optimized way.

#7 Don’t Overload SQL Server

The primary task of any database administrator is to make sure the production server runs smoothly and serves customers as well as possible. To make this happen we need to maintain separate databases (if possible, on separate machines) for the following environments:
  • Production
  • Development
  • Testing
  • Analytical
For a production database we need a database with full recovery mode, and for other databases, a simple recovery mode is enough.
Testing on a production database will put lots of load on the transaction log, indexes, CPU and I/O. That’s why we need to use separate databases for production, development, testing and analyzing. If possible, use separate machines for each database, because it will decrease the load on the CPU and I/O.

#8 Transaction log, tempdb and memory

Log file needs to have enough free space for normal operations because an autogrow operation on a log file is time-consuming and could force other operations to wait until it is completed. To find out the log file size for each database and how much it is used, we can use DBCC SQLPERF(logspace).
The best way to set up tempdb is to put it on separate disk. We need to keep the initial size as big as we can afford because when it reaches an autogrow situation, performance will decrease.
As mentioned before, we need to make sure that SQL server runs on a separate machine, preferably one without any other application on it. We need to keep some memory for the operating system, plus some more if it is part of a cluster, so in most cases around 2GB should do.


The procedures and suggestions discussed here are for basic performance tuning only. If we follow these steps, we may, on average, get around 40 to 50 percent improvement in performance. To do advanced SQL Server performance tuning, we would need to dig much deeper into each of the steps covered here.
This post originally appeared in Toptal Engineering blog