SQL Server Analysis Services partitions – Create and Manage

Posted by Alin D on May 12, 2011

Partitions are portions of a SQL Server Analysis Services measure group that hold some or all of the measure group’s data.

When a measure group is first created, it contains a single partition corresponding to all the data in your fact table or view. Additional partitions need to be created for any measure group with more than 20 million rows.

Since a majority of corporate databases have far more than 20 million rows in fact tables, you should know how to create partitions and also be aware of good partition design practices.

ou can define partitions using the Business Intelligence Development Studio (BIDS). On the partitions’ tab within your project, simply click the New Partition link near a measure group to open the Partition Wizard. (I won’t cover the steps of the Partition Wizard here because it is fairly simple to follow).

An alternative method to creating new partitions is through XMLA scripts, which is what BIDS does behind the scenes.

You can script an existing partition in SQL Server Management Studio (SSMS) by right-clicking a partition and then choosing Script Partition as CREATE to open a new query window. You will need to edit certain properties such as the partition identifier, its name and the query used for populating the partition.

Here is a sample XMLA for a partition:

<Create xmlns=“http://schemas.microsoft.com/analysisservices/2003/engine“>
<ParentObject>
< DatabaseID>Adventure Works DW 2008</DatabaseID>
< CubeID>Adventure Works</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
</ParentObject>
<ObjectDefinition>
<Partition xmlns:xsd=“http://www.w3.org/2001/XMLSchema“xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance“xmlns:ddl2=“http://schemas.microsoft.com/analysisservices/2003/engine/2“xmlns:ddl2_2=“http://schemas.microsoft.com/analysisservices/2003/engine/2/2“xmlns:ddl100_100=“http://schemas.microsoft.com/analysisservices/2008/engine/100/100“>
<ID>Internet_Sales_2001</ID>
<Name>Internet_Sales_2001</Name>
<Source xsi:type=“QueryBinding“>
<DataSourceID>Adventure Works DW</</#0000FF>DataSourceID>
<QueryDefinition>
SELECT
[dbo].[FactInternetSales].[ProductKey],
[dbo].[FactInternetSales].[OrderDateKey],
[dbo].[FactInternetSales].[DueDateKey],
[dbo].[FactInternetSales].[ShipDateKey],
[dbo].[FactInternetSales].[CustomerKey],
[dbo].[FactInternetSales].[PromotionKey],
[dbo].[FactInternetSales].[CurrencyKey],
[dbo].[FactInternetSales].[SalesTerritoryKey],
[dbo].[FactInternetSales].[SalesOrderNumber],
[dbo].[FactInternetSales].[SalesOrderLineNumber],
[dbo].[FactInternetSales].[RevisionNumber],
[dbo].[FactInternetSales].[OrderQuantity],
[dbo].[FactInternetSales].[UnitPrice],
[dbo].[FactInternetSales].[ExtendedAmount],
[dbo].[FactInternetSales].[UnitPriceDiscountPct],
[dbo].[FactInternetSales].[DiscountAmount],
[dbo].[FactInternetSales].[ProductStandardCost],
[dbo].[FactInternetSales].[TotalProductCost],
[dbo].[FactInternetSales].[SalesAmount],
[dbo].[FactInternetSales].[TaxAmt],
[dbo].[FactInternetSales].[Freight],
[dbo].[FactInternetSales].[CarrierTrackingNumber],
[dbo].[FactInternetSales].[CustomerPONumber]
FROM [dbo].[FactInternetSales]
WHERE OrderDateKey <= ‘20011231’
</QueryDefinition>
</Source>
<StorageMode>Molap</StorageMode>
<ProcessingMode>Regular</ProcessingMode>
<ProactiveCaching>
<SilenceInterval>-PT1S</SilenceInterval>
<Latency>-PT1S</Latency>
<SilenceOverrideInterval>-PT1S</SilenceOverrideInterval>
<ForceRebuildInterval>-PT1S</ForceRebuildInterval>
<AggregationStorage>MolapOnly</AggregationStorage>
<Source xsi:type=“ProactiveCachingInheritedBinding“>
<NotificationTechnique>Server</NotificationTechnique>
</Source>
</ProactiveCaching>
<EstimatedRows

>1013</EstimatedRows>
<AggregationDesignID>Internet Sales 1</AggregationDesignID>
</Partition>
</ObjectDefinition>
</Create>

Note that when defining an effective partition, specifying the source of the data it will hold is perhaps the most important part.

As a rule of thumb, your partitions should contain between five and 20 million rows of fact data. Furthermore, you should avoid partitioning files greater than 500 MB in size. Partition files are in your Analysis Services installation folder under datadatabase_namecube_namemeasure_group_name.

You can also bind a partition to a table, view or a SQL query. If a relational data warehouse has multiple tables holding fact data, you should bind partitions to such tables as long as each table size is constrained as advised above. If you have a single, large fact table, you could write SQL queries for each Analysis Services partition to retrieve only part of this data.

Views provide a nice alternative for partition binding, especially when testing cubes. For example, if a fact table has millions of rows, processing is going to take a long time. For testing the solution, you don’t necessarily need to load all the data. Instead, create a view that selects only a portion of rows from the large table(s).

Later, when you’re ready to deploy your solution into production, alter your partition(s) definition so that they are bound to appropriate tales, queries or views.

How do you decide what data to include in each partition? SQL Server Analysis Services uses partitions to speed up MDX queries. Each partition contains an XML file that defines the range of dimension member identifiers in a given partition. When an MDX query is submitted, the Analysis Services engine decides what partition files to scan based on the values in each partition’s XML file.

The XML file is created when you process the partition and the file can be found in each partition folder (the file name is info.xml). Don’t try to edit this file – the dimension key references are internal values that cannot be retrieved from SQL Server Analysis Services.

If the data requested by an MDX query is spread across all partitions in your measure group then Analysis Services has no choice but to read every single partition. To see how every partition in the measure group is read, record a SQL Profiler trace when you run such a query. If the requested data is contained in a single, small partition, your query will only have to scan a single partition file.

Reading a single 500 MB file will invariably beat scanning through 200 files of the same size. However, if you have 200 partitions to read, Analysis Services could scan some of them in parallel, and the query won’t necessarily be 200 times slower without proper partitioning.

For best MDX query performance, you should tailor partition design to the pattern of common queries. Most SQL Server Analysis Services solutions start with measure groups partitioned using a date or periodicity dimension, each partition spanning one month’s or one day’s data.

This is a reasonable approach if your queries are typically focused on a given month or several months. But what if your queries examine data across all months and are specific to product categories? In that case, partitioning only by month won’t be optimal.

If you have 10 years worth of data partitioned by month — which is not unusual — each query would have to examine 120 partitions. In this case, query performance could improve if you further partition data by product category dimension.

For example, dealership sales cube users may only be interested in comparing sales across time for a particular product category – cars, trucks or motorcycles, for example. For such cubes, you could create partitions for each month and each product category.

Like any other SQL Server Analysis Services object, partitions have a multitude of properties. Perhaps one of the most frequently discussed is partition slice. This property defines a portion of the measure group data that Analysis Services expects to be exposed by the partition.

Most Analysis Services literature suggests that this property does not have to be set for partitions that use Multidimensional OLAP (MOLAP) storage. While in most situations Analysis Services is smart enough to figure out what dimension members are included in each partition by examining data IDs in info.xml files, to be safe you should always set the partition slice, regardless of the storage mode of your partition.

Partition slices are defined through MDX. This is an example of what a slice definition would look like for a 2001 partition:

<Slice>[Date].[Calendar].[Calendar Year]. &[2001] </Slice>

To further partition data by product categories, a slice definition would look like this:

<Slice> ([Date].[Calendar].[Calendar Year]. & [2001], [Product].[Product Categories].[Category]. & [1]) </Slice>

If you don’t specify a slice for a given dimension, SQL Server Analysis Services assumes that any member of that dimension can be found in the partition.

For example, say you specify a month and product category in partition slice but do not specify the store. Queries that examine sales data by store, but do not include any filters for date or product, may have to search through every partition.

You can also customize the storage mode for each partition. MOLAP storage mode is optimal for data retrieval — but it copies your relational data. If you prefer to leave your data in the relational format without making its copy, then you can use Relational OLAP (ROLAP) mode for infrequently accessed partitions. For example, most recent partitions can utilize MOLAP storage while historical partitions can use ROLAP.

SQL Server Analysis Services has an upper limit on the number of partitions — 2^31-1 = 2,147,483,647 — but cubes that have this many partitions are rare. Don’t be afraid to create as many partitions as needed.

Occasionally, partitions may need to be merged. For example, if a majority of your queries focus on recent data and historical queries are infrequent, you may have a separate partition for each product line and for each day for 30 or 60 days.

Once the data is stale and seldom accessed, you could combine historical partitions into weekly or monthly partitions. To merge partitions using SSMS, right-click on a partition and choose the Merge Partitions option.

This is what the XMLA for merging 2001 and 2002 partitions looks like:

<MergePartitions
xmlns=“http://schemas.microsoft.com/analysisservices/2003/engine“>
<Sources>
<Source>
<DatabaseID>Adventure Works DW 2008</DatabaseID>
<CubeID>Adventure Works</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2002</PartitionID>
</Source>
</Sources>
<Target>
<DatabaseID>Adventure Works DW 2008</DatabaseID>
<CubeID>Adventure Works</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2001</PartitionID>
</Target>
</MergePartitions>

Be aware that you can copy aggregation design from one partition to another, which I will discuss in more detail in a future article. For now know that if you’re happy with your current aggregation design, you can assign it to a newly created or an existing partition.

If a partition has 500,000 or more estimated rows (you can set estimated numbers of rows in BIDS) and you haven’t defined any aggregations for this partition, then BIDS 2008 warns that your application performance can be improved by adding aggregations.

Partitions reduce the time it takes to process your measure group because each partition only loads a portion of the entire fact table and view. Remember that during processing, SQL Server Analysis Services modifies the SQL query, defining the partition before sending it over to the relational data source.

For example, earlier I showed the definition for Internet Sales 2001 partition within the Internet Sales measure group of an Adventure Works 2008 database.

The query that Analysis Services sends to SQL Server while processing this partition is considerably different from the original query:

SELECT

[dbo_FactInternetSales].[dbo_FactInternetSalesSalesAmount0_0] AS[dbo_FactInternetSalesSalesAmount0_0],
[dbo_FactInternetSales].[dbo_FactInternetSalesOrderQuantity0_1] AS[dbo_FactInternetSalesOrderQuantity0_1],
[dbo_FactInternetSales].[dbo_FactInternetSalesExtendedAmount0_2] AS[dbo_FactInternetSalesExtendedAmount0_2],
[dbo_FactInternetSales].[dbo_FactInternetSalesTaxAmt0_3] AS[dbo_FactInternetSalesTaxAmt0_3],
[dbo_FactInternetSales].[dbo_FactInternetSalesFreight0_4] AS[dbo_FactInternetSalesFreight0_4],
[dbo_FactInternetSales].[dbo_FactInternetSalesUnitPrice0_5] AS[dbo_FactInternetSalesUnitPrice0_5],
[dbo_FactInternetSales].[dbo_FactInternetSalesTotalProductCost0_6] AS[dbo_FactInternetSalesTotalProductCost0_6],
[dbo_FactInternetSales].[dbo_FactInternetSalesProductStandardCost0_7] AS[dbo_FactInternetSalesProductStandardCost0_7],
[dbo_FactInternetSales].[dbo_FactInternetSales0_8] AS[dbo_FactInternetSales0_8],
[dbo_FactInternetSales].[dbo_FactInternetSalesPromotionKey0_9] AS[dbo_FactInternetSalesPromotionKey0_9],
[dbo_FactInternetSales].[dbo_FactInternetSalesSalesTerritoryKey0_10] AS[dbo_FactInternetSalesSalesTerritoryKey0_10],
[dbo_FactInternetSales].[dbo_FactInternetSalesProductKey0_11] AS[dbo_FactInternetSalesProductKey0_11],
[dbo_FactInternetSales].[dbo_FactInternetSalesCustomerKey0_12] AS[dbo_FactInternetSalesCustomerKey0_12],
[dbo_FactInternetSales].[dbo_FactInternetSalesCurrencyKey0_13] AS[dbo_FactInternetSalesCurrencyKey0_13],
[dbo_FactInternetSales].[dbo_FactInternetSalesOrderDateKey0_14] AS[dbo_FactInternetSalesOrderDateKey0_14],
[dbo_FactInternetSales].[dbo_FactInternetSalesShipDateKey0_15] AS[dbo_FactInternetSalesShipDateKey0_15],
[dbo_FactInternetSales].[dbo_FactInternetSalesDueDateKey0_16] AS[dbo_FactInternetSalesDueDateKey0_16]
FROM
(
SELECT
[SalesAmount] AS [dbo_FactInternetSalesSalesAmount0_0],
[OrderQuantity] AS [dbo_FactInternetSalesOrderQuantity0_1],
[ExtendedAmount] AS [dbo_FactInternetSalesExtendedAmount0_2],
[TaxAmt] AS [dbo_FactInternetSalesTaxAmt0_3],
[Freight] AS [dbo_FactInternetSalesFreight0_4],
[UnitPrice] AS [dbo_FactInternetSalesUnitPrice0_5],
[TotalProductCost] AS [dbo_FactInternetSalesTotalProductCost0_6],
[ProductStandardCost] AS[dbo_FactInternetSalesProductStandardCost0_7],
1 AS [dbo_FactInternetSales0_8],
[PromotionKey] AS [dbo_FactInternetSalesPromotionKey0_9],
[SalesTerritoryKey] AS [dbo_FactInternetSalesSalesTerritoryKey0_10],
[ProductKey] AS [dbo_FactInternetSalesProductKey0_11],
[CustomerKey] AS [dbo_FactInternetSalesCustomerKey0_12],
[CurrencyKey] AS [dbo_FactInternetSalesCurrencyKey0_13],
[OrderDateKey] AS [dbo_FactInternetSalesOrderDateKey0_14],
[ShipDateKey] AS [dbo_FactInternetSalesShipDateKey0_15],
[DueDateKey] AS [dbo_FactInternetSalesDueDateKey0_16]
FROM
(
SELECT
[dbo].[FactInternetSales].[ProductKey],
[dbo].[FactInternetSales].[OrderDateKey],
[dbo].[FactInternetSales].[DueDateKey],
[dbo].[FactInternetSales].[ShipDateKey],
[dbo].[FactInternetSales].[CustomerKey],
[dbo].[FactInternetSales].[PromotionKey],
[dbo].[FactInternetSales].[CurrencyKey],
[dbo].[FactInternetSales].[SalesTerritoryKey],
[dbo].[FactInternetSales].[SalesOrderNumber],
[dbo].[FactInternetSales].[SalesOrderLineNumber],
[dbo].[FactInternetSales].[RevisionNumber],
[dbo].[FactInternetSales].[OrderQuantity],
[dbo].[FactInternetSales].[UnitPrice],
[dbo].[FactInternetSales].[ExtendedAmount],
[dbo].[FactInternetSales].[UnitPriceDiscountPct],
[dbo].[FactInternetSales].[DiscountAmount],
[dbo].[FactInternetSales].[ProductStandardCost],
[dbo].[FactInternetSales].[TotalProductCost],
[dbo].[FactInternetSales].[SalesAmount],
[dbo].[FactInternetSales].[TaxAmt],
[dbo].[FactInternetSales].[Freight],
[dbo].[FactInternetSales].[CarrierTrackingNumber],
[dbo].[FactInternetSales].[CustomerPONumber]
FROM [dbo].[FactInternetSales]
WHERE OrderDateKey <= ‘20011231’
) AS [FactInternetSales]
)
AS [dbo_FactInternetSales]

Why should you care what query is sent to SQL Server (or another RDBMS) during partition processing? Because any kind of query hints or SET options that may be valid in a regular SQL statement, might not be supported for partition query definition.

For example, BIDS would allow us to append the SET NOCOUNT ON statement to the beginning of the partition query. If we add this option, however, SQL Server Analysis Services will report a syntax error and fail processing.

You can customize the partition’s processing mode, which defines whether aggregations are calculated during partition processing or by a lazy aggregation thread after processing is complete.

Lastly, you could use storage location property to store data in default or alternate location. This property may come in handy if the disk where partition data is normally stored reaches its storage capacity.

This entry was posted on May 12, 2011 at 13:09 and is filed under SQL. Tagged: Business Intelligence, data warehouse, fail processing, http, MDX, Measure, OLAP, partition processing, RDBMS, Sql, SQL Server Management Studio, www.w3.org/2001/XMLSchema-instance, XML. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Windows Management and Scripting

A wealth of tutorials Windows Operating Systems SQL Server and Azure

Pages

Top Posts & Pages

Latest Articles

Spam Blocked

Follow Blog via Email

RSS Feed

SCCM Tools

Twitter Updates

Alin D