Monday, September 1, 2014

How Does SQL Server work & perform DML Operation?

In this article, I am giving a quick overview about SQL Server Database Engine. After completing this article you will understand:

What is SQL Server Database Engine?
What is the use of Master Data file and Transaction Log file?
How does SQL Server Perform DML Operation?

Please give your valuable suggestions and feedback to improve this article.

What is SQL Server Database Engine?

In SQL Server, Database Engine is the core service for storing and processing and securing data in either a relational format or xml documents.

Whenever you create a new database SQL Server creates two data file.

1.       Primary Data File (Master Data File or .mdf file)
2.       Transaction Data File (Log File or .ldf file)

SQL Server can also have secondary data file along with Primary Data File. Generally .ndf file extension is used for secondary data file.

What is the use of Master data file and Transaction Log file?

Primary Data File

SQL Server stores our data in 8KB chunks called data pages. SQL Server gathers all 8KB data pages into a single file which is generally called as Master Data File (Primary Data File). Master Data File has .mdf file extension. The extension for Master Data File is not fixed, it can be anything, and SQL Server will automatically recognize its Data file if you give different file extension. Disk I/O operations are performed at the page level. SQL Server reads and write whole data pages.

When data is written into the disk an entire row of it must fit within the 8KB data Page. It is possible to share a Data Page among multiple rows, but a row can’t span multiple pages.

Let’s understand this with one simple example.

A Data page size is 8KB, if I convert 8KB into bytes it becomes 1024*8 = 8192 Bytes. Out of 8192 Bytes 132 bytes will be used by the data page to store system information about the data page i.e Page Number, the amount of free space on the page, allocation id of the object etc.

Available space in a Data page to store data will be 8192-132 byes = 8060 Bytes.

Suppose we create a table, which has only one column for which data type is Int. As you know Int data type needs 4 bytes space to store the information. Which means a data page can contain 8060/4 = 2015 rows.

Transaction Log File

The transaction log file is a critical component of the database. It contains the transaction log that records all transaction and the database modification made by each transaction. Everything that happens in the SQL Server  occurs only through the Transaction log for example In case of DML operation first data modification gets records in transaction file, if transaction gets committed then it goes to Data pages and becomes permanent part of database.. Generally .ldf file extension is used for the Transaction log file, but it can be anything. SQL Server database engine will automatically recognize the transaction log file if other file extension will be used.

In the event of system failure, the transaction log might be required to bring your database back to a consistent state. SQL server has automated recover mode that kicks when you start you database engine after failure. The transaction log should never be deleted or moved unless you fully understand the ramifications of doing this.

How does SQL Server Perform DML Operation?

When SQL Server is told to do something using SQL, then SQL Server internal component Query Optimizer looks at the query and creates best query execution plan. SQL Server query optimizer has several techniques to create best execution after looking into SQL query. Using the best query execution plan SQL Server executes the query.

When we perform any DML operation in that case SQL Server finds the required data page from the disk and reads it into the memory and makes the change into it.
SQL Server never writes back the data page into the disk just after the data modification made in memory; it keeps changed data page in the memory. It is because SQL Server waits for some more time thinking that there might be more changes coming up and it’s faster to keep them in memory for a bit. When memory gets full or close to full at that time SQL server automatically writes the data pages into disk. But keeping data pages in memory puts data at disk, in case of power failure or system failure we may lose our important data. To come up from this problem SQL Server uses Transaction log. Transaction log keeps track of data modification that has been made and not written into the disk. In case of powered failure or system failure SQL Server automatically recovers the data from the transaction log.

Let’s understand the above concept with example

In the below image on the top left we have SQL Server memory, which currently doesn't have anything. On the top right we have disk drive where files are physically stored and it is permanent. In case of power failure or system failure data stored in the disk will not be lost. At the bottom we have the Transaction log file which is also stored in the disk. In below Image as you can see we don’t have any data page in memory and there is no log information in Transaction log file. There are three data pages available in disk drive.
Now lets suppose SQL server need to make any data modification, in that case SQL Server grabs the required data page from the disk and reads it into the memory. In below image you can see one data page has been read into the memory.
After reading the data page into Memory, SQL Server makes necessary changes that can be anything Insert, update, delete. In below image you can see data modification has been made into the data page which is present in the memory. Orange mark indicates modification has been successfully made in the data page.

Just after making the data modification in the data page in Memory, SQL Server quickly records the same data modification in the Transaction log file. In below image you can see same data modification (Orange) has been logged in Transaction log file.
SQL Server never writes back the data page into the disk instantly just after the data modification made in memory.It keeps changed data page in the memory and in the Transaction log.SQL Server waits for some more time thinking that there might be more changes coming up for that data page and it’s faster to keep them in memory for a bit. Below Images shows that.
                                   
Now suppose SQL Server needs to make another data modification. Once again it grabs the required data page from the disk and reads it into the memory. As you can see from below image another data page has been read into Memory.
After reading the second data page into Memory, SQL Server makes necessary changes that can be anything Insert, update, delete. In below image you can see data modification has been made into the data page which is present in the memory. Green triangle mark indicates modification has been successfully made in the second data page. Just after making the data modification in the data page in Memory, SQL Server quickly records the same data modification in the Transaction log file. In below image you can see same data modification (Green mark) has been logged in Transaction log file, which is also present in the disk.
Suppose SQL Server is running out of memory, so it will write the data page in Disk and frees Memory. As you can see from the below Image first data page (Orange) has been written into the disk.
Once data page moves into the disk, in Transaction log it will be marked that data page has been written into the disk and in future in system failure or crises there is no need to write the same data modification in to the disk. In below image you can see in Transaction log Orange has been marked.
Now assume that something went wrong i.e system failure, power failure occurs. In that case whatever our memory has will be lost. From both the below Images you can notice Memory is becoming empty.


In the event of system failure in above case, the transaction log might be required to bring your database back to a consistent state. SQL server has automated recover mode that kicks when you start you database engine after failure. Once SQL Server starts again it looks into the Transaction log and check whether it needs to write any data modification into the disk which was not written. In above case it will find one record (Green) that needs to be modified into the disk drive. It will write the data modification into the disk and marks in Transaction log that this record has been written into the disk. From the below image you can clearly see that, even after the system failure SQL Server has successfully recovered the data modification and saved it into the disk.


I hope you enjoyed the article, don't forget to share this on facebook. Please leave your comments below to make this article better.

No comments: