Tuesday, February 23, 2010

testing SQL Server 2005 partitioning option

This post is a version of the pervious "SQL Server Partitioning Option" explanatory presentation of mine which i wrote for a friend of mine. The presentation starts with what is partitioning, why to use table partitions, how to use the partitons and advantages. All the chapters basis on one test case which consists of partitioning a table and examine the IO performance before and after the partition operation.

What is partitioning
Partitioning is basically aggregation of the data by its meaning or value on different physical structures and this also brings retrieve times to acceptible values.

As this option can be used for archiving needs it is usually used for faster reporting and query performances.

Why to use partitioning
The mostly used area of the partitioning is increasing full table scan performances. One of the most obvious reason of the unavoidable full table scans which are; accessing more than certain amount of table data (There are different methods to avoid or increasing performance on full table scans but they are not in the scope of this article while we are concerning partitioning option of Microsoft SQL Server).

As well as performance considerations, partitioning also helps archiving in large tables especially log tables that are stored on daily/monthly or even yearly basis. There are lots of structures that holds the older data of the table (say older than 6 months) on SATA disks which are cheaper and on compressed partitions.

How to use partitioning
I have used a Windows Server 2003 Standart Edition Operating system with SQL Server 2005 Enterprise edition (By the way SQL Server Standart Edition does not support partitioning). Recommended hardware requirements were 1Gb of ram 1 CPU and minimum 10Gb of disk space which i installed my virtual machine with these configurations.

First we need a comparison sql query which I used a standart built-in sql server Adventureworks database.


select 
cat.name as category_name, 
subcat.name as subcategory_name, 
prd.* 
from 
AdventureWorks.Production.Product prd LEFT JOIN 
AdventureWorks.Production.ProductSubCategory subcat ON
(prd.ProductSubcategoryID=subcat.ProductSubcategoryID) LEFT JOIN 
AdventureWorks.Production.ProductCategory cat ON
cat.ProductCategoryID=subcat.ProductCategoryID
where
cat.name is not null and
subcat.name is not null
order by 
cat.Name,subcat.name


Now i decided to build a structure that has four different filegroups which correspond to each category so that one query can stick on four of these datafiles simultaneously. To achieve this first i will create four filegroups and add one file to each filegroups as follows (corresponds creating tablespaces on a Oracle database and adding one datafile to each tablespace).

In the real world i would want to spread the datafiles over four individual drives to increase the disk throughput but in this test case i dont have that kind of test environment.


ALTER DATABASE AdventureWorks ADD FILEGROUP [FG1]
ALTER DATABASE AdventureWorks ADD FILEGROUP [FG2]
ALTER DATABASE AdventureWorks ADD FILEGROUP [FG3]
ALTER DATABASE AdventureWorks ADD FILEGROUP [FG4]

ALTER DATABASE AdventureWorks
ADD FILE 
(NAME = N'FG1_F1',
FILENAME = N'C:\FG1_F1.ndf',
SIZE = 5MB,
MAXSIZE = 100MB,
FILEGROWTH = 5MB)
TO FILEGROUP [FG1]

ALTER DATABASE AdventureWorks
ADD FILE 
(NAME = N'FG2_F1',
FILENAME = N'C:\FG2_F1.ndf',
SIZE = 5MB,
MAXSIZE = 100MB,
FILEGROWTH = 5MB)
TO FILEGROUP [FG2]

ALTER DATABASE AdventureWorks
ADD FILE 
(NAME = N'FG3_F1',
FILENAME = N'C:\FG3_F1.ndf',
SIZE = 5MB,
MAXSIZE = 100MB,
FILEGROWTH = 5MB)
TO FILEGROUP [FG3]

ALTER DATABASE AdventureWorks
ADD FILE 
(NAME = N'FG4_F1',
FILENAME = N'C:\FG4_F1.ndf',
SIZE = 5MB,
MAXSIZE = 100MB,
FILEGROWTH = 5MB)
TO FILEGROUP [FG4]


I decided to define a partition strategy over the "category" column which has evenly distributed datasets (you can check by aggregating the data with "group by and count").

The first thing we need is a partitioning function which will create a basis for partition scheme. And then create the partitioning schema based on the parititon function and on the desired filegroups. After all i will create the table on the partition scheme which means using the partition function and all of the filegroups.


-- create the partition function
CREATE PARTITION FUNCTION [f_part_category] (char(20))
AS RANGE LEFT FOR VALUES ('Bikes', 'Components', 'Clothing', 'Accessories');

-- check the partition function
select * from sys.partition_functions

name                function_id   type_desc       
------------------  ------------- --------------- 
f_part_category     65539         RANGE           

(1 row(s) affected)

-- create the partition scheme
CREATE PARTITION SCHEME s_part_category
AS PARTITION f_part_category
TO ([FG1], [FG2], [FG3], [FG4], [FG4]);

-- check the partition scheme
select * from sys.partition_schemes

name                 data_space_id      type  type_desc                  
-------------------- -----------------  ----  -----------------
s_part_category      65601              PS    PARTITION_SCHEME           

(1 row(s) affected)

CREATE TABLE [dbo].[t_product_part](
[category_name] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[subcategory_name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ProductID] [int] NOT NULL,
[Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ProductNumber] [nvarchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[MakeFlag] [bit] NOT NULL,
[FinishedGoodsFlag] [bit] NOT NULL,
[Color] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[SafetyStockLevel] [smallint] NOT NULL,
[ReorderPoint] [smallint] NOT NULL,
[StandardCost] [money] NOT NULL,
[ListPrice] [money] NOT NULL,
[Size] [nvarchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[SizeUnitMeasureCode] [nchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[WeightUnitMeasureCode] [nchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Weight] [decimal](8, 2) NULL,
[DaysToManufacture] [int] NOT NULL,
[ProductLine] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Class] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Style] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ProductSubcategoryID] [int] NULL,
[ProductModelID] [int] NULL,
[SellStartDate] [datetime] NOT NULL,
[SellEndDate] [datetime] NULL,
[DiscontinuedDate] [datetime] NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
) ON s_part_category (category_name)


Now; time to test the performance issue. As the data is not heavily loaded we can check the execution plans of the before and after queries.


insert into t_product_part select * from master.dbo.v_products2

select * from t_product_part

BEFORE PARTITIONING


AFTER PARTITIONING


Conclusion
Effective IO usage with using more than one datafile simultaneously.
Remarkable increase in the IO performance.
2.3 times faster IO performance with the sample data (with only one drive)



No comments:

Post a Comment