What is partitioning
Partitioning is basically aggregation of the data by its meaning or value on different physical structures and this also brings retrieve times to acceptible values.
As this option can be used for archiving needs it is usually used for faster reporting and query performances.
Why to use partitioning
The mostly used area of the partitioning is increasing full table scan performances. One of the most obvious reason of the unavoidable full table scans which are; accessing more than certain amount of table data (There are different methods to avoid or increasing performance on full table scans but they are not in the scope of this article while we are concerning partitioning option of Microsoft SQL Server).
As well as performance considerations, partitioning also helps archiving in large tables especially log tables that are stored on daily/monthly or even yearly basis. There are lots of structures that holds the older data of the table (say older than 6 months) on SATA disks which are cheaper and on compressed partitions.
How to use partitioning
I have used a Windows Server 2003 Standart Edition Operating system with SQL Server 2005 Enterprise edition (By the way SQL Server Standart Edition does not support partitioning). Recommended hardware requirements were 1Gb of ram 1 CPU and minimum 10Gb of disk space which i installed my virtual machine with these configurations.
First we need a comparison sql query which I used a standart built-in sql server Adventureworks database.
select cat.name as category_name, subcat.name as subcategory_name, prd.* from AdventureWorks.Production.Product prd LEFT JOIN AdventureWorks.Production.ProductSubCategory subcat ON (prd.ProductSubcategoryID=subcat.ProductSubcategoryID) LEFT JOIN AdventureWorks.Production.ProductCategory cat ON cat.ProductCategoryID=subcat.ProductCategoryID where cat.name is not null and subcat.name is not null order by cat.Name,subcat.name
Now i decided to build a structure that has four different filegroups which correspond to each category so that one query can stick on four of these datafiles simultaneously. To achieve this first i will create four filegroups and add one file to each filegroups as follows (corresponds creating tablespaces on a Oracle database and adding one datafile to each tablespace).
In the real world i would want to spread the datafiles over four individual drives to increase the disk throughput but in this test case i dont have that kind of test environment.
ALTER DATABASE AdventureWorks ADD FILEGROUP [FG1] ALTER DATABASE AdventureWorks ADD FILEGROUP [FG2] ALTER DATABASE AdventureWorks ADD FILEGROUP [FG3] ALTER DATABASE AdventureWorks ADD FILEGROUP [FG4] ALTER DATABASE AdventureWorks ADD FILE (NAME = N'FG1_F1', FILENAME = N'C:\FG1_F1.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO FILEGROUP [FG1] ALTER DATABASE AdventureWorks ADD FILE (NAME = N'FG2_F1', FILENAME = N'C:\FG2_F1.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO FILEGROUP [FG2] ALTER DATABASE AdventureWorks ADD FILE (NAME = N'FG3_F1', FILENAME = N'C:\FG3_F1.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO FILEGROUP [FG3] ALTER DATABASE AdventureWorks ADD FILE (NAME = N'FG4_F1', FILENAME = N'C:\FG4_F1.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO FILEGROUP [FG4]
I decided to define a partition strategy over the "category" column which has evenly distributed datasets (you can check by aggregating the data with "group by and count").
The first thing we need is a partitioning function which will create a basis for partition scheme. And then create the partitioning schema based on the parititon function and on the desired filegroups. After all i will create the table on the partition scheme which means using the partition function and all of the filegroups.
-- create the partition function CREATE PARTITION FUNCTION [f_part_category] (char(20)) AS RANGE LEFT FOR VALUES ('Bikes', 'Components', 'Clothing', 'Accessories'); -- check the partition function select * from sys.partition_functions name function_id type_desc ------------------ ------------- --------------- f_part_category 65539 RANGE (1 row(s) affected) -- create the partition scheme CREATE PARTITION SCHEME s_part_category AS PARTITION f_part_category TO ([FG1], [FG2], [FG3], [FG4], [FG4]); -- check the partition scheme select * from sys.partition_schemes name data_space_id type type_desc -------------------- ----------------- ---- ----------------- s_part_category 65601 PS PARTITION_SCHEME (1 row(s) affected) CREATE TABLE [dbo].[t_product_part]( [category_name] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [subcategory_name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ProductID] [int] NOT NULL, [Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [ProductNumber] [nvarchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [MakeFlag] [bit] NOT NULL, [FinishedGoodsFlag] [bit] NOT NULL, [Color] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SafetyStockLevel] [smallint] NOT NULL, [ReorderPoint] [smallint] NOT NULL, [StandardCost] [money] NOT NULL, [ListPrice] [money] NOT NULL, [Size] [nvarchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SizeUnitMeasureCode] [nchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeightUnitMeasureCode] [nchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Weight] [decimal](8, 2) NULL, [DaysToManufacture] [int] NOT NULL, [ProductLine] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Class] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Style] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ProductSubcategoryID] [int] NULL, [ProductModelID] [int] NULL, [SellStartDate] [datetime] NOT NULL, [SellEndDate] [datetime] NULL, [DiscontinuedDate] [datetime] NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON s_part_category (category_name)
Now; time to test the performance issue. As the data is not heavily loaded we can check the execution plans of the before and after queries.
insert into t_product_part select * from master.dbo.v_products2
select * from t_product_part
BEFORE PARTITIONING
AFTER PARTITIONING
Conclusion
Effective IO usage with using more than one datafile simultaneously.
Remarkable increase in the IO performance.
2.3 times faster IO performance with the sample data (with only one drive)