What are Columnstore Indexes and when do we use them in SQL Server

What are Columnstore Indexes and when do we use them in SQL Server
What are Columnstore Indexes and when do we use them in SQL Server

In this article we will analyze the difference of a table that is physically stored through Clustered Rowstore Index by being through Clustered Columnstore Index. What exactly are Columnstore Indexes, when should they be used and what are their pros and cons?

What are Columnstore Indexes

The Columnstore indexes is a technology that allows to store the data of a table per column as opposed to the "classic" Rowstore indexes which are stored per row of table entries. Because each column has the same type of data, it offers up to x10x greater compression and x10x faster data reading.

How Columnstore Indexes work

During its creation Columnstore Index records are divided into groups of 1,048,576 records (102,400 minimum) called Row Groups, then each column that has the Row Group you split and compress into groups called Column Segments. Since the minimum for a Row Group is 102,400 records when there are records left over, the above records go into a special space you call Deltastore. In the Deltastore space the data is still Row based and does not have the compression of the Column Store.

What are Columnstore Indexes and when do we use them in SQL Server
docs.microsoft.com

What are the advantages of Columnstore Indexes?

  • Columnstore Indexes, as we mentioned at the beginning, because they divide the data into groups of columns, provide high compression of the data that leads to requiring less storage space.
  • The high compression combined with the fact that queries on Columnstore Indexes run on batch mode (optimized for multiple records) instead of row mode they provide faster reading of the data.
  • Queries that need only specific columns from the tables require reduced I/O in relation to Rowstore.
  • The analytical queries that they use aggregate functions such as SUM, AVG and GROUPING are performed in minimal times.

What are the disadvantages of Columnstore Indexes

  • Updating the tables in OLTP workloads with DML like insert, update and delete lags a lot compared to Rowstore.
  • Each column must fit on a single page eg a field that is varchar(max) is not supported and if we want to use it we should keep only the first 4000 characters (LEFT(column,4000)).
  • The search specific records usually takes more time.

The example

We have a table with a classic Clustered Rowstore Index at at ID. Before we do anything we enable statistics for timings and I/O:

set statistics time,io on;
go

SELECT * FROM Comments;
What are Columnstore Indexes and when do we use them in SQL Server
01

We make a second table copy of the same. Because it has a field whose size exceeds the one page (8k) it must have to create the Columnstore Index. To fit on one page we keep only the first 4000 characters:

*The order is to make the table with the entries shuffled by ID and date. We will see why below.

Then we create it Clustered Columnstore Index:

--drop table comments_columnstore
select id,CreationDate,PostId,left(Text,4000) as Text,score,userid
into Comments_Columnstore
from Comments order by score desc; 
GO

create clustered columnstore index idx_comments_clmstore on [dbo].[Comments_Columnstore]; 
GO

Reading his records Rowstore table for a month, we see that it takes 7 seconds:

SELECT id,CreationDate,PostId,Text,score
	FROM Comments
where
1=1
and CreationDate between '2010-01-01' and '2010-02-01'
order by CreationDate;
What are Columnstore Indexes and when do we use them in SQL Server
02

Accordingly to Columnstore table only takes 5 seconds. Also we can see in the messages that this time he does Segment reads:

SELECT id,CreationDate,PostId,Text,score
	FROM Comments_Columnstore
where
1=1
and CreationDate between '2010-01-01' and '2010-02-01'
order by CreationDate;
What are Columnstore Indexes and when do we use them in SQL Server
03

But because the Row Groups they are divided according to how the entries are in the table and we passed them mixed up, if we want to sort them by date we will have to make Clustered Rowstore Index on the table extinguishing the Columnstore and then rebuild the Columnstore by deleting the Rowstore.

We do this as in Columnstore Indexes we cannot yet define which field to order them with Segment Row Groups.

If we do this and run the query again we will see that from 5 seconds it dropped to 1.5 seconds:

create clustered index idx_comments_clmstore on [dbo].[Comments_Columnstore](CreationDate) with (drop_Existing = on);
go

create clustered columnstore index idx_comments_clmstore on [dbo].[Comments_Columnstore]  with (drop_Existing = on);
go

SELECT id,CreationDate,PostId,Text,score
	FROM Comments_Columnstore
where
1=1
and CreationDate between '2010-01-01' and '2010-02-01'
order by CreationDate;
What are Columnstore Indexes and when do we use them in SQL Server
04

If we run aggregate function to Rowstore table we will see that it will take 5 seconds:

SELECT score,count(id) as Count
	FROM Comments
group by score;
What are Columnstore Indexes and when do we use them in SQL Server
05

But if we run it on Columnstore table we will see what it will do only 10 milliseconds:

SELECT score,count(id) as Count
	FROM Comments_Columnstore
group by score;
What are Columnstore Indexes and when do we use them in SQL Server
06

If we do 100,000 inserts to Rowstore table will make 41 seconds:

declare @i int
set @i = 1

WHILE @i < 100000
BEGIN
insert into comments(CreationDate,PostId,Score,Text,Userid)
values ('2010-09-06 08:09:52.330',35314,3,'Test Comment',3);
set @i=@i+1;
END;
What are Columnstore Indexes and when do we use them in SQL Server
07

To Columnstore but the 100,000 table inserts they will take 54 seconds.

declare @i int
set @i = 1

WHILE @i < 100000
BEGIN
insert into comments_columnstore(CreationDate,PostId,Score,Text,Userid)
values ('2010-09-06 08:09:52.330',35314,3,'Test Comment',3);
set @i=@i+1;
END;
What are Columnstore Indexes and when do we use them in SQL Server
08

Accordingly if we do delete to Rowstore panel will do 5 seconds:

delete from comments where Text = 'Test Comment';
What are Columnstore Indexes and when do we use them in SQL Server
09

While in Columnstore table it delete it will take almost 10 seconds (ie twice the time):

delete from comments_columnstore where Text = 'Test Comment';
What are Columnstore Indexes and when do we use them in SQL Server
10

When to use a Columnstore Index

The Columnstore Indexes have been created primarily for use in Data warehouses (OLAP). They are made in Fact tables and to the very old Dimensional tables. Using them dramatically improves performance as a large amount of data is required.

Also with creation Nonclustered Columnstore Index in Rowstore tables we can increase the performance of analytical queries that use aggregate functions such as SUM, AVG and GROUPING are performed in minimal times.

On the website columnscore.com you'll find a nice quiz with questions by Brent Ozar about when we do or don't use a Columnstore Index.

Sources:

Share it

Leave a reply