In a multi-tenant data insert scenario, GUID is a great data type for primary key. Since it is unique globally, you will not need to rely on any range or custom logic. However, if you extend these columns as index, you will end up with frequent index fragmentation and hence bad performance. In this post, I will be benchmarking index fragmentation with popular sequential Guid algorithms.
If the indexed values are not in sequence, the indexes will become fragmented. Hence, every time there is a need to find a record using the index, SQL will require to perform an additional sort on the page to get the record. This slows down both the read and write operation as the data grows.
SQL came up with a solution through “NewSequentialId” function. This function inherits the properties of a GUID however it ensures that it also generates the GUID in a sorted fashion. Since every entry made to the index is in an ascending order, the index will not defragment frequently. The more sorted a tree data structure is, the better it performs.
While you can solve the problem using the “NewSequentialId” function, it has some limitations.
- Inline statement / procedure cannot invoke the function; only default value constraint of a column can invoke the function.
- There is no .net alternative method for the same.
There are quite lot of algorithms floating around in the developer community which can generate a sequential GUID. In this post, I will benchmark two of most commonly referred algorithms in terms of performance, memory footprint, DTU usage and quantum of defragmentation.
Benchmark
Algorithm Options
Option 1 (GUID using NewGuid): In this option, we will create a GUID from the application and then insert the same into the table. We will use this as the base benchmark for the comparison. We are already aware of the fact that this option ends up in huge defragmentation. This option is no different from calling NewId() function at SQL.
Option 2 (Sequential GUID using SQL’s NewSequentialId): In this option, we will let the SQL to create the sequential GUID using the inbuilt function.
Option 3 (Sequential GUID based on UuidCreateSequential): In this option, we will write an algorithm which is equivalent to NewSequentialId. This will use the UuidCreateSequential; a COM system method. NewSequentialId also shares the same COM method. This algorithm is posted by David Browne (fellow MS employee) and his original post could be found at https://blogs.msdn.microsoft.com/dbrowne/2012/07/03/how-to-generate-sequential-guids-for-sql-server-in-net/
Option 4 (GUID based on NHibernate’s GuidCombGenerator): In this option, we will use the algorithm NHibernate uses to generate a GUID for certain features. Many developers have claimed to use this algorithm to overcome the issue we are discussing. The source code for the same could be found at https://github.com/nhibernate/nhibernate-core/blob/3087d48640eb64f573c0011e9a9b1567afe6adde/src/NHibernate/Id/GuidCombGenerator.cs
Scenario
Table: The table will have 2 uniqueidentifier columns (Id, AnotherId) and a nvarchar column (Value).
Index: A clustered index on Id and a non-clustered index on AnotherId and Value columns.
Transaction: We will insert 1 million non-null records in a transaction batch of 1000 records using inline query. Every batch creates a new connection.
Code
Below is the code I am using for inserting the records. It is straight forward and therefore I will excuse myself from explaining the same.
CREATE TABLE TableWithRegularGuid ( Id UNIQUEIDENTIFIER NOT NULL , AnotherId UNIQUEIDENTIFIER NOT NULL , [Value] NVARCHAR(400) NOT NULL ); CREATE CLUSTERED INDEX IX_TableWithRegularGuid_Id ON [dbo].[TableWithRegularGuid] ([Id]); CREATE NONCLUSTERED INDEX IX_TableWithRegularGuid_AnotherId_Value ON [dbo].[TableWithRegularGuid]([AnotherId], [Value]); CREATE TABLE TableWithNewSequentialIdAsDefault ( Id UNIQUEIDENTIFIER NOT NULL DEFAULT (NewSequentialId()), AnotherId UNIQUEIDENTIFIER NOT NULL DEFAULT (NewSequentialId()), [Value] NVARCHAR(400) NOT NULL ); CREATE CLUSTERED INDEX IX_TableWithNewSequentialIdAsDefault_Id ON [dbo].[TableWithNewSequentialIdAsDefault] ([Id]); CREATE NONCLUSTERED INDEX IX_TableWithNewSequentialIdAsDefault_AnotherId_Value ON [dbo].[TableWithNewSequentialIdAsDefault]([AnotherId], [Value]); CREATE TABLE TableWithExtendedUuidCreateSequential ( Id UNIQUEIDENTIFIER NOT NULL , AnotherId UNIQUEIDENTIFIER NOT NULL , [Value] NVARCHAR(400) NOT NULL ); CREATE CLUSTERED INDEX IX_TableWithExtendedUuidCreateSequential_Id ON [dbo].[TableWithExtendedUuidCreateSequential] ([Id]); CREATE NONCLUSTERED INDEX IX_TableWithExtendedUuidCreateSequential_AnotherId_Value ON [dbo].[TableWithExtendedUuidCreateSequential]([AnotherId], [Value]); CREATE TABLE TableWithNhibernateGuidComb ( Id UNIQUEIDENTIFIER NOT NULL , AnotherId UNIQUEIDENTIFIER NOT NULL , [Value] NVARCHAR(400) NOT NULL ); CREATE CLUSTERED INDEX IX_TableWithNhibernateGuidComb_Id ON [dbo].[TableWithNhibernateGuidComb] ([Id]); CREATE NONCLUSTERED INDEX IX_TableWithNhibernateGuidComb_AnotherId_Value ON [dbo].[TableWithNhibernateGuidComb]([AnotherId], [Value]);
public void InsertDataToTestTable(string tableName, Func<Guid> guidGeneratorFunc = null) { int numberOfRecords = 1000000; int numberOfSteps = numberOfRecords / 1000; // Split to batch of 1000 records. for (int stepIndex = 0; stepIndex < numberOfSteps; stepIndex++) { StringBuilder commandText = new StringBuilder($"INSERT INTO {tableName} ("); commandText.Append(guidGeneratorFunc == null ? string.Empty : " Id, AnotherId, "); commandText.Append("[Value] ) VALUES"); Random rnd = new Random(stepIndex); for (int recordIndex = 0; recordIndex < numberOfRecords / numberOfSteps; recordIndex++) { commandText.Append((recordIndex == 0) ? string.Empty : ","); commandText.Append($"("); commandText.Append(guidGeneratorFunc == null ? string.Empty : $"'{guidGeneratorFunc.Invoke()}', '{guidGeneratorFunc.Invoke()}' , "); commandText.Append("'{rnd.Next()}')"); } // Commit to the database. using (SqlConnection con = new SqlConnection("Server=xxx.database.windows.net,1433;DataBase=xxx;user Id = xxxx;Password=xxxx;Trusted_Connection=false;")) { con.Open(); using (SqlCommand cmd = new SqlCommand(commandText.ToString(), con)) { cmd.CommandType = CommandType.Text; cmd.ExecuteNonQuery(); } } Console.WriteLine($"{(stepIndex + 1) * 1000} records written to {tableName}"); } } // Caller DateTime startTime = DateTime.Now; Console.WriteLine($"Starting Option 1"); new GuidTester().InsertDataToTestTable("TableWithRegularGuid", () => Guid.NewGuid()); Console.WriteLine($"Completed in {(DateTime.Now - startTime).TotalSeconds}s"); Console.WriteLine($"Starting Option 2"); startTime = DateTime.Now; new GuidTester().InsertDataToTestTable("TableWithNewSequentialIdAsDefault"); Console.WriteLine($"Completed in {(DateTime.Now - startTime).TotalSeconds}s"); Console.WriteLine($"Starting Option 3"); startTime = DateTime.Now; new GuidTester().InsertDataToTestTable("TableWithExtendedUuidCreateSequential", () => GuidTester.NewSequentialID()); Console.WriteLine($"Completed in {(DateTime.Now - startTime).TotalSeconds}s"); Console.WriteLine($"Starting Option 4"); startTime = DateTime.Now; new GuidTester().InsertDataToTestTable("TableWithNhibernateGuidComb", () => GuidTester.GenerateGuidComb()); Console.WriteLine($"Completed in {(DateTime.Now - startTime).TotalSeconds}s"); Console.ReadLine();
Application Memory: The amount of memory used by the code. This will help us identify if there are any memory pressure from the custom algorithm. We will also look at traces for memory leaks.
Application CPU: The amount of CPU consumed.
DTU Consumption: The amount of DTU consumed by the database. In our benchmarking, I have allocated the database with a max of 100 eDTUs. I have hosted the database in Azure and the console application on my local machine. There was latency of 220ms to the Azure data center from my machine.
Pages: The number of pages for the indexes at the end of the operation.
Fragmentation %: The percentage of fragmentation on the indexes.
Outcome
There was latency of 220ms to the Azure data center from my machine.
[table id=1 /]
Verdict
The verdict with the above options is clear. Both option 2 and 3 offer the same almost the same outcome with a stark difference from option 1. Option 2 takes more pressure on the DTU consumption as SQL is responsible to create the ID. Option 3 offloads that burden to the application tier.
My personal preference would be always Option 2. When you have different tenants consuming the data tier, keeping “Guid” generation logic inside your data tier will avoid any bad inserts from the code which you don’t have control on. However, it will not be a best fit everywhere. Instances where you will need to insert data in to multiple tables and relate the data will not be easy.
I would advise not to mix up Option 2 or 3 with option 1 or 4. As both option 2 and 3 keeps the page density close to 100%, the id generated by Option 1 or 4 could heavily rearrange the indexes. You may not yield the benefit of sequential Guid.
In my solutions, I use a combination of option 2 and 3. For any third-party tenants, I either retain the Id generation logic in the data tier or give back a new Id as a replacement of the id generated by the third party. This way my database remains happy ever after.
One thought on “Benchmarking Index Fragmentation With Popular Sequential GUID Algorithms”