Why is restarting the SQL Server instance delayed?
In this article we will look in detail at some of the most common reasons why a SQL Server instance may be slow to restart and what we can do to improve this time on a case-by-case basis.
Using very large Transaction Log with large fragmentation in VLFs
All transactions made to a database in SQL Server are recorded in a file called Transaction Log, its purpose is to record the transactions so that we have the possibility to rollback transactions and recover the databases after a drop of the SQL Server instance. The size of the transaction log is not constant, as we execute more and heavier transactions it increases. The way it grows is by creating small files called VLF (Virtual Log Files) every time the transaction log grows with manual or auto-growth, new VLFs are created.
When we have many VLFs in the transaction log of our database, fragmentation is created and performance drops, especially in the area of recovery, such as when starting the SQL Server instance.
The solution to this problem is to do SHRINK
the transaction log file to the smallest possible size that will allow us to increase it manually to the size we think it should be, without letting it do many small auto-growths of the file again which will lead us back to the original problem.
Use Full Recovery Model without frequent Transaction Log Backup
A very common phenomenon is to see databases that have FULL selected as the recovery model, not to take transaction log backup or not frequent enough. When a database is in FULL recovery model, the transaction log that records all transactions, is not emptied until we have taken a transaction log backup. This is in contrast to the SIMPLE recovery model which empties as soon as a transaction is completed (committed) and done CHECKPOINT
with its data in the datafiles.
This happens because with the FULL recovery model we have chosen, we ask SQL Server to be able to use the backups we have taken to do Point in Time Recovery, that is, to be able to go back at any point in time.
As a result, as long as we have not taken a transaction log backup, the size of the transaction log grows endlessly on the disk, creating more and more VLFs (Virtual Log Files).
The solution in this case to keep the VLFs and the transaction log from increasing respectively is to have frequent transaction log backups (every 5 to 60 minutes).
Restart SQL Server Instance during Kill and Rollback operation
Usually SQL Server knows better than we do and there are no shortcuts. If we go and do KILL
a session that executes a very large query (eg bulk deletes) without having done so first COMMIT
, will perform ROLLBACK
of the entire transaction through the transaction log and this process will take as long as its execution. If we now go and force restart the entire instance because we think it will save us time, the instance will not start until the ROLLBACK
of the transaction. So instead of solving the problem we will make it bigger.
In this case we just wait for the ROLLBACK
and we make sure that in large transactions we have loops with COMMIT
save some records so that time-consuming recovery is not needed in case we terminate this session.
Using Query Store
The data resulting from its use Query Store are first recorded in memory and at regular intervals are transferred to disk, how often this happens is defined by the parameter DATA_FLUSH_INTERVAL_SECONDS
. The default value for this process is 15 minutes. So when restarting the instance or a failover in an Always On Availability Group, it will first wait for this information to be written from memory to disk so that the process can proceed.
The solution to this problem for versions of SQL Server prior to SQL Server 2019, is to enable it trace flag 7745, with which it will not wait for the data to be written from the memory to the disk but will immediately proceed to terminate the instance. This results in losing the last few minutes of Query Store data but will speed up the process.
Using In-Memory Tables (Hekaton)
When we use a table that has all its data loaded in memory (In-Memory) with the parameter DURABILITY=SCHEMA_AND_DATA
(which is the default value), we define that we want to keep the data on the disk as well so that after restarting the instance this data is not lost but is returned to memory. As a result, when restarting the instance, we will have to wait for this data to be loaded from disk to memory.
To improve the wait it helps to use the command CHECKPOINT
before the restart, so that all the data has been written to the datafiles and there is no need to recover from the transaction log first first they are loaded back into memory.