How to Text Search Documents Stored in Database Table in SQL Server Using Full-Text Index
In SQL Server we have the possibility to store files such as docx, xlsx, XML, etc. in table fields. To search for words in these fields, a full table scan would have to be performed, but there is also the solution of creating one Full-Text Index. With the Full-Text Index we can search for words and phrases in table fields that are text, varchar, varbinary, char, image, etc. much more efficiently.
To activate the Full-Text Index service
In SQL Server the Full-Text Index service is installed as a separate feature and runs as a separate service.
To see if it is already installed and activated we run the following queries, if the value is 0 it means that it has not been installed / activated:
SELECT SERVERPROPERTY('IsFullTextInstalled'); GO SELECT is_fulltext_enabled FROM sys.databases WHERE database_id = DB_ID();
If it is not installed, we should proceed to install the feature.
So we open it ISO of the SQL Server installation, we choose Installation and New SQL Server stand-alone installation or add features to an existing installation.
In the Installation Type we choose Add features to an existing instance of SQL Server:
And finally to Feature Selection, we select it Full-Text and Semantic Extraction for Search:
Then we go to SQL Server Configuration Manager, SQL Server Services and we select right click and Start.
But there is one issue, by default it does not support all document types for Office like docx, xlsx, pptx and we should install it Microsoft Office 2010 Filter Pack.
We download the Microsoft Office 2010 Filter Pack from here and we install it.
After the installation is complete, run the following query that will pass the changes:
EXEC sp_fulltext_service 'update_languages'; EXEC sp_fulltext_service 'load_os_resources', 1; EXEC sp_fulltext_service 'restart_all_fdhosts';
Then by running the following query we can see that it has also added these document types to the supported ones:
SELECT * FROM sys.fulltext_document_types;
How do we load the files into a table field
We had seen you another article detailing how we can load files into the varbinary field of the table and how to export them back to a folder.
To begin with, we will have to make the table in which we will insert them. In this table we will define an identity field with sequential numbering, the file name, the file type and finally a type field varbinary(max) which will contain the entire file in binary form.
--drop table docs; create table mig_db.dbo.docs ( id int identity(1,1), docname nvarchar(200), filetype varchar(100), document varbinary(max) ); alter table mig_db.dbo.docs add constraint pk_docs primary key clustered(id);
All we have to do now is load in the array with insert one by one the files we want. The field that will contain the file is loaded as blob (binary large object) directly with the known function openrowset, as shown in the following examples:
insert into mig_db.dbo.docs select N'doc1.docx','.docx',bulkcolumn from openrowset (bulk N'c:\SQL2019\doc1.docx', single_blob) as lob; insert into mig_db.dbo.docs select N'doc2.docx','.docx',bulkcolumn from openrowset (bulk N'c:\SQL2019\doc2.docx', single_blob) as lob; insert into mig_db.dbo.docs select N'doc3.doc','.doc',bulkcolumn from openrowset (bulk N'c:\SQL2019\doc3.doc', single_blob) as lob; insert into mig_db.dbo.docs select N'excel1.xlsx','.xlsx',bulkcolumn from openrowset (bulk N'c:\SQL2019\excel1.xlsx', single_blob) as lob; insert into mig_db.dbo.docs select N'έντυπο.doc','.doc',bulkcolumn from openrowset (bulk N'c:\SQL2019\έντυπο.doc', single_blob) as lob;
After we finish loading, if we select the table we will see these files in binary format:
select * from mig_db..docs;
How we build the Full-Text Index
First we have to make one fulltext catalog which is where the full-text indexes will be stored. In this we set it to ignore tones.
create fulltext catalog document_catalogue with accent_sensitivity = off;
In the next step we create the fulltext index. We define this docname with language 'Greek' since the file can have a Greek name. We also define the field varbinary containing the file with the parameter column type in the field containing the document type.
In the index as a key we define its name primary key that we made when creating the table and its name fulltext catalog that we want to be stored.
By choice change_tracking auto we set the fulltext index to be updated automatically whenever changes are made:
create fulltext index on dbo.docs ( docname language 'Greek', document type column filetype) key index pk_docs on document_catalogue with change_tracking auto, stoplist=system;
With the parameter stoplist we can touch words like 'and', 'the', 'that' etc. which do not help in the search. In addition to the system which is the default option, we can make a custom one where we add words we don't need:
create fulltext stoplist trialstoplist from system stoplist; alter fulltext stoplist trialstoplist ADD 'αλλά' Language 'Greek'; alter fulltext index on docs set stoplist trialstoplist;
How we search the Full-Text Index
With the function contains by setting the varbinary field containing the file and the word we want, we can search the files in which it exists:
select * from docs where contains(document,'dataplatform');
We can search in the same way in the filename field, which works even without the accent:
select * from docs where contains(docname,'εντυπο');
If we want the file to contain 2 different words, just put 'ANDbetween them:
select * from docs where contains(document,'SQL AND Δοκιμαστικό');
But if we want to search for a whole sentence it should be inside quotes "" :
select * from docs where contains(document,'"SQL Server"');
In the following file we see a text with a name and a surname which may not be connected as they are scattered within the file:
If we try in the example again with 'AND' will bring us a result as there is no limit to the distance between the words:
select * from docs where contains(document,'Giorgos and Gewrgiou');
If we would like to limit this and these words to have a distance between them of less than 3 words, it is done with the function near:
select * from docs where contains(document,'near((Giorgos, Gewrgiou),3)');
As we can see this time it returned no results:
In the next example we have a text that has the phrase SQL Server:
If we use the function again contains but with the similar word 'servers', will not return results:
select * from docs where contains(document,'servers');
But we can use the function freetext which with the same word will also search for similar ones and will bring this document:
select * from docs where freetext(document,'servers');
By using the function freetexttable and containstable we can also have a ranking in the results according to the frequency with which the word is mentioned and its weight:
select * from freetexttable(dbo.docs, document,'dataplatform.gr') as ftt join dbo.docs d on ftt.[key]=d.id order by ftt.rank desc;