Author: Ramesh Narayan Jangid
Viewers: 424
Categories: PHP Tutorials, PHP Performance
You often need to perform specific database server configuration changes to increase those applications' performance.
Read this article to have a complete set of steps for optimizing MySQL performance.
In this article you will learn:
1. Introduction to Storing and Retrieve Application Data in MySQL Servers
2. Introduction to MySQL Database Server
3. What is an INDEX
4. What Happens When a Column is Indexed
5. What Are Composite Indexes
6. How to Setup a Primary Key
7. How to Improve the Performance of Heavy Data Access Applications
8. How to Setup MySQL Replication
1. Introduction to Storing and Retrieve Application Data in MySQL Servers
When it comes to managing data, some people usually think of Excel.
When this comes to a website of dynamic nature, data is usually stored in a database.
The websites can also use Excel files or simple files in the file system to store data of the sites' applications.
Database servers implement algorithms for retrieving data efficiently depending on the queries they execute.
A lot of Open Source database servers are available in the market. You can search on Google for these open source databases and learn about them.
In this article, I will discuss one of the most commonly used databases for supporting websites programmed in PHP. Its MySQL!
2. Introduction to MySQL Database Server
MySQL is used for storing and retrieving data. It implements features to perform operations for getting the desired results.
MySQL supports Structured Query Language (SQL) for performing query operations.
I will not go deep into the Language (SQL) syntax. I will just continue the discussion about the features that can help applications scale website when a MySQL database is in use.
The very basic information one should know is databases saves the provided data on the server machine file system.
Whenever a user creates a database, a folder is created in the file system at a specific path. The folder name depends on the database name specified during its creation.
Tables inside the database are saved as files in the database folder. The file name of table is basically the name of the table that we ask to create.
The data retrieval process uses SQL SELECT queries to access the data is from these database table files in the file system.
This is the basics about how data is saved and retrieved.
As I mentioned above, the database server provides features to support database applications.
Table indexes are one of the most commonly known features to improve database access performance.
3. What is an INDEX
Indexes are created using the values of the columns of a table.
4. What Happens When a Column is Indexed
For example, we have two MySQL servers of the same configuration, each containing a table with a million records.
Suppose the table has around 18 columns, and the last one is is_deleted, which can have values 'Yes'/'No' (a string).
Let us execute a SELECT query on both servers like the one below.
SELECT * FROM tbl_passport WHERE is_deleted = 'No';
Both servers will take approximately the same time (say X) to return the query results.
During the execution of the SELECT query, a database server fetches each record from the table tbl_passport file in the file system to perform checks for values from the column is_deleted with the value 'No.'
When a match occurs, the record is added to the result set to be returned to the PHP script.
This operation is done a million times (in our case) or (in other words) till the end of the file is reached.
So, the computer disk has a file access operation to fetch the whole file. In other words, if this file is of a gigabyte size, then a process will access a gigabyte of data.
Now let's create a table index on column is_deleted on server number 2 and execute the above query on both servers again. In this case, the time taken by server number 2 will be less than X seconds than the server.
Why does that happen? When we create an index on the is_deleted column, the table file is scanned for the values of the is_deleted column throughout the table file, and the respective is_deleted values are grouped and mapped with the location of the table rows where their separate table records are stored in the database table file.
The generated mappings are saved in another file system file and loaded into RAM as and when required.
Next time the same query is executed, the table index file is loaded, and operations are performed for getting database row positions.
Once the database row positions are retrieved, the MySQL server fetches the database table rows by going directly to the row position instead of scanning the complete table file.
To briefly explain better, a row is the location of the data saved on the database table files. The row asks for the file system to get data from the file position; instead of scanning the whole file positions set in the file sequentially, the file has saved the data.
5. What Are Composite Indexes
An index can include multiple columns of a table. The indexes are in sorted given order, usually in ascending order. If it is necessary to set an index in descending order, that is also possible. The same applies to the composite index as well.
One can have one column in ascending order and the other in descending order and so on. It is also possible to use a specific column in multiple indexes as well.
Please check the performance of the query using the EXPLAIN query. The syntax of queries to create indexes is available on https://dev.mysql.com/doc/refman/8.0/en/create-index.html .
6. How to Setup a Primary Key
Primary keys are unique indexes. They are often used to create auto-increment integer table columns. A VARCHAR column can also be a primary key. Multiple columns combined can also be used to define a primary key.
7. How to Improve the Performance of Heavy Data Access Applications
Many times Websites struggle to return data very quickly from the database server due to a huge number of database client accesses.
One thing that you can do to improve the data access speed is to use a separate database on the DB server for each client.
This way the table data files will be lighter and queries will be faster. Still, if you find out that the table's access is slow despite the move to a separate database, you can try partitioning the table.
Partitions are done based on column values or a set of range values of the column. When we partition a table based on the value or set of range values, MySQL splits the original table data file into multiple data files.
That can be faster in terms of data access, thus affecting the performance of the queries. To demonstrate an example, let us refer to our query to the table with a million records.
SELECT * FROM tbl_passport WHERE is_deleted = 'No';
Let us partition this table on the basis of column is_deleted. The values to partition are 'Yes' or 'No'
So, 2 files will be generated.
Suppose the values 'Yes' and 'No' are equally distributed, the resulting table files split into 2 different files that will be faster to access.
When the select query condition is set for is_deleted='No', the database server will fetch from the file meant for value 'No' and so on thereby reducing query load as compared to the non-partitioned table.
Note that if we execute the query without the is_deleted column in the condition clause or specify different values of the column on which the partition was made, this will force the MySQL server to perform data access operations on 2 files.
In this case, database table partitioning can be an expensive operation. So the rule of thumb is to always use a condition to use specific values for columns considering how the partitions were defined.
You may find more about the table partition creation syntax here: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html
8. How to Setup MySQL Replication
Replication is a feature to improve the performance of the accesses to database servers with many simultaneous accesses.
The replications are of 2 types.
-
Master - Slave
-
Master - Master
Master is the server on which the SQL INSERT, UPDATE, and DELETE queries are executed. This means that the master server is only used for writing operations to the database table files.
The Slave is the server on which the SELECT queries are executed. This means that there will be only read operations on the database table files.
One can have multiple Master servers and multiple Slaves as well to support their website. You need to take care when we have multiple Master servers.
In that case, you need to avoid using the AUTO-INCREMENT option for the primary key to avoid that inserted rows in different master servers and may be adding table rows with the same primary key value.
Note that when a query is sent to the cluster for results, it is executed on a single server and not on all of them. In fact, the insert query is executed on only one of the Master servers because the server where the query is executed will propagate to the other servers regardless of which Master server insert happened.
Suppose we have 2 Master servers and an insert query is executed on each of them for the same table.
Both the servers will try to increment the primary key values by 1 and there are chances that both can have the same id values.
This can break the setup while replication. To avoid this possibility, a database server can have a configuration file to set different settings on each server about how the auto-increment values should behave.
So:
Master 1:
auto_increment_offset: 1
Master 2:
auto_increment_increment: 2
auto_increment_offset: 2
auto_increment_increment: How much to increment the next value.
auto_increment_offset: To start the value from. auto_increment_increment: 2
The configuration above will make the Master 1 server generate odd id field values and even id field values on Master 2, generating unique values for each master server during replication. Depending on the requirements of your application, we can have the following:
One Master server and multiple slaves
or
Multiple master servers and multiple Slave
Usually, multiple Masters are used for the heavy-loaded servers that execute many insert queries on the site.
Multiple Slave serves are used for the heavily loaded servers that execute many select queries on the site.
Note: data replications between servers happen asynchronously by default. You should make this a synchronous operation so the SELECT queries executed to access a specific table is available immediately after the execution of INSERT queries. In synchronous setups, INSERT SQL queries take time compared to SELECT SQL queries since it takes time to replicate the data to the rest of the servers.
You need to be a registered user or login to post a comment
Login Immediately with your account on:
Comments:
No comments were submitted yet.