Windows Management and Scripting

A wealth of tutorials Windows Operating Systems SQL Server and Azure

  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 721 other subscribers
  • SCCM Tools

  • Twitter Updates

  • Alin D

    Alin D

    I have over ten years experience of planning, implementation and support for large sized companies in multiple countries.

    View Full Profile →

Posts Tagged ‘Cygnus’

Working with Tables Constraints and Indexes in Windows 2008

Posted by Alin D on October 13, 2010

Tables store data. For each column in the table, you must select a built-in or a user-defined data type. Indexes are created on tables to maximize query performance. Constraints are associated with table columns and define the rules to which data in a particular column or columns must adhere. Constraints can also define relationships between tables, like the necessity to have a Customer entity associated with every Order entity. These are known as FOREIGN KEY constraints. In this section we’ll provide you with details about tables, constraints, and indexes.

Working with Tables and Views

Tables are the database objects that store data in a SQL Server database. Tables are structured as columns and rows, like a spreadsheet. The columns define the type and length of data they can store. Every table must have at least one column. Column names must be unique within a table; that is, you cannot specify ProductName column to appear twice in the Product table. Tables store the underlying data within the .mdf and .ndf data files as pages and extents. Columns are sometimes associated with constraints; for example, PRIMARY KEY, UNIQUE, or DEFAULT. Types of constraints will be explained later in this chapter. You can also mark columns with the following special attributes:

  • Identity columns Values for these columns are generated automatically in sequence by incrementing every time a row is added. Usually, values 1, 2, 3, n are used, but you can define your own seed (starting value) and increment value for the IDENTITY column.
  • Computed columns These columns do not store any data; instead, they define a formula that calculates the column value at query time.
  • Timestamp columns These columns are used as a mechanism for version-stamping table rows and tracking changes.
  • UniqueIdentifier columns These columns store globally unique identifiers (GUIDs). GUIDs values are used for replication and are guaranteed to be unique. GUIDs values can be generated using the NEWID() built-in function.

When defining columns for a new or existing table, you can specify column nullibility. A column is said to be nullible if it allows storing null (empty) values. You can choose to mark a column as not nullible. If anyone attempts to insert a NULL value into this column, an error will be raised, and the INSERT operation will fail. To mark a column as nullible, use the NULL keyword when defining the column in the CREATE TABLE or ALTER TABLE statements. To mark a column as not nullible, use the NOT NULL keyword. By default, columns are nullible. Columns designated as primary keys cannot be nullible.

Creating Tables

Tables can be created and modified using the SQL Server Management Studio table designer or the CREATE TABLE or ALTER TABLE statements. To access the SQL Server Management Studio graphical table designer, in Object Explorer expand the database if you wish to create the table. Then, right-click Tables and click New Table. To modify an existing table, right-click it and then click Design. The table designer shows the columns that will be created for your table at the top and the properties of the selected column in the Column Properties pane, usually located at the bottom of the screen. Figure 1 shows the use of SQL Server Management Studio to create a new table.

Figure 1. Using the SQL Server Management Studio Table Designer


To create a table using DDL, use the CREATE TABLE statement along with the syntax shown in Example 1

Example 1. Create TABLE Statement—Syntax
CREATE TABLE [database_name].[schema_name].table_name
(column1_name data_type [NULL | NOT NULL] | [PRIMARY KEY] | [IDENTITY],
Column2_name data_type [NULL | NOT NULL],
[<computed_column_definition>]


In this statement, the table_name is the name of the table you wish to create. When defining columns, you can specify whether or not they will allow NULL values. You can also state that a column will be designated as the PRIMARY KEY for the table and whether it will contain automatically incrementing values, known as IDENTITY columns. The computed_column_definition is the formula for a calculated column. When defining columns, you must designate a data type, like varchar or int, and in some cases a length.

Table 1 summarizes built-in data types that are available to you when you are defining columns.

Table 1. Built-In Data Types
Numeric Character Dates and Times Other
Tinyint Char Datetime Binary
Smallint Nchar Smalldatetime Bit
Int Varchar Date Cursor
Bigint Nvarchar Datetime2 Xml
Smallmoney Text Datetimeoffet Smalldatetime
Money Ntext Time Varbinary
Decimal Timestamp Uniqueidentifier
Double Hierarchyid
Float Rowversion
Real Sql_variant
Image


Some of the data types shown in the table also allow you to specify the length or precision for the data stored in the column you are creating. For example, a column of type char(1000) allows you to store up to 1,000 characters per row. A column of type decimal(10) allows you to store up to 10 digits on either side of the decimal point, while decimal(10,5) allows you to store numbers of up to 10 digits with up to 5 digits to the right of the decimal point. Variable-length data types, like varchar, nvarchar, and varbinary, consume only the space that the characters stored in the column take up. Fixed-length equivalents of char, nchar, and binary consume a fixed amount of space regardless of the amount of actual data contained in the column. Data types prefixed with “n”—nvarchar and nchar—store Unicode text and can be used to store characters from multiple languages in one column.

Computed Columns

Computed columns are used when you need to store a calculated value in a table. As a best practice, you should never physically store a value in a table that can be derived from other data within the table. If you do so, you will be consuming disk space unnecessarily. You will also open yourself up to logic errors, when the base columns have been updated, but the column storing the derived value is not, and vice versa. SQL Server offers the computed column feature to overcome these challenges. Use computed columns when you don’t want to manually calculate a column’s value in every query and when you wish to simplify your query syntax.

A computed column is based on a formula that can reference the values of noncomputed columns in the same row of the same table. Scalar functions can also be used in the definition of a computed column. A computed column cannot reference data in other tables or contain a subquery.

By default, computed columns are virtual, and the values in them are not stored on disk. The values are calculated every time the computed column is queried. When it is more efficient to store the values of computed columns on disk, you can use the PERSISTED keyword to instruct SQL Server to physically store the column values. PERSISTED columns are updated when other columns they are based on are updated, or when new rows are inserted into the table. Only computed columns using deterministic functions can be marked as PERSISTED. A function is deterministic when it is guaranteed to return the same value, if you pass it the same parameters. For example, the AVG function is deterministic, while the GETDATE function is not. Whether or not the computed column is persisted, you can never write directly to it. Persisted computed columns can be indexed. This is useful if you need to search and sort by the values of this column.

Creating User-Defined Data Types

Sometimes you need to create your own data types that are based on the built-in data types introduced earlier. Custom data types are also known as user-defined data types. User-defined data types are especially useful when you must store the data with the same length or precision over and over again. For example, you can create a new user-defined data type to represent people’s names. This user-defined data type can be based on nvarchar(50) and cannot contain nulls. This user-defined data type can now be bound to any column that is to contain people’s names and will be consistent throughout. Create your user-defined data types in the Model system database, so that it is automatically inherited by all new databases you create. User-defined data types are created using the CREATE TYPE statement. The syntax is shown in Example 2 :

Example 2. Create TYPE Statement—Syntax
CREATE TYPE [schema_name. ]type_name
{FROM base_type([precision],[scale])
  [NULL | NOT NULL]
}


Example 3 shows the syntax used to create a user-defined data type named PersonName and to create a table that contains two columns of type PersonName.

Example 3. Using the Create TYPE Statement
CREATE TYPE PersonName
{FROM varchar(50)
  NOT NULL
};
GO
CREATE TABLE TeamMembers
(MemberId int PRIMARY KEY,
MemberName PersonName,
ManagerName PersonName);
GO


Use the ALTER TYPE statement to change the definition of your user-defined types. The DROP TYPE statement should be used to remove the user-defined data types you no longer need in the database. You cannot remove user-defined types from the database while there are tables with columns based on these types. If you attempt to use the DROP TYPE statement to remove a data type that is in use, you will get an error message similar to: “Msg 3732, Level 16, State 1, Line 1. Cannot drop type ‘PersonName’ because it is being referenced by object ‘TeamMembers’. There may be other objects that reference this type.”

Working with Constraints

Constraints are data validation rules that are bound to a column or a set of columns in a table. Constraints can also be used to enforce a relationship between two entities represented as two tables. The available types of constraints are as follows:

  • CHECK constraints These constraints validate the integrity of data in a column by checking it against a valid comparison. For example, you can use a CHECKBirth Date earlier than 01/01/1880. You can also use a CHECK constraint to validate that an e-mail address is always at least seven characters long. constraint to ensure that no one in your Employees table has a
  • Primary Key constraints PRIMARY KEY constraints represent the unique identifier column that will enforce the uniqueness of each row. For example, you can designate the CustomerID column as the PRIMARY KEY for the Customers table. If you get two customers that have the same values in the Name column and other columns, but represent different people, you will use the PRIMARY KEY to distinguish between them. It is a best practice to always have a PRIMARY KEY in each table and to use surrogate PRIMARY KEYs that have no meaning to the application.
  • Unique constraints These constraints are similar to PRIMARY KEY constraints, except that you can have more than one unique constraint per table. For example, you can designate that the combination of FirstName, LastName, and TelephoneNumber is unique in the Customers table and that the EMailAddress column can only contain unique values.
  • FOREIGN KEY constraints These constraints enforce a relationship between two tables. For example, you can use a FOREIGN KEY constraint to specify that any row in the Orders table must have a corresponding row in the Customers table and that the tables are linked through the CustomerID column, which is included in both tables. Once this FOREIGN KEY constraint is enforced, you cannot delete a row from the Customers table that has related rows in the Orders table.
  • Default constraints Also known as “defaults,” the DEFAULT constraints specify a default value to be inserted into a column if no value is inserted. Defaults can be bound to a column that is defined as NULL or NOT NULL. An example of a default is to use the value “Not Applicable” for the ProductColor every time someone adds a product to the Products table without specifying a color.

When you attempt to insert, delete, or modify data in a table that will result in a constraint violation, the statement will roll back. DML statements, like INSERT, UPDATE, DELETE, or MERGE, always succeed or fail as a whole. For example, if you were inserting 1,000 records into a table, but one violated a PRIMARY KEY or UNIQUE constraint, all 1,000 rows would roll back and nothing would be inserted. If a DELETE statement violated a FOREIGN KEY constraint, even on one row, the entire DELETE statement would fail and nothing would be deleted. You will never receive a partial result set from a DML statement. Example 4 shows the syntax used for working with constraints.

Test Day Tip

Remember that DML statements commit as a whole or not at all. A constraint violation will cause the entire statement to fail and roll back.


Example 4. Working with Constraints
CREATE TABLE Stars
(StarID int PRIMARY KEY,
StarName varchar(50) Unique,
SolarMass decimal(10,2) CHECK(SolarMass > 0),
StarType varchar(50) DEFAULT 'Orange Giant');
GO
INSERT Stars (StarID, StarName, SolarMass)
VALUES (1, 'Pollux', 1.86);
INSERT Stars (StarID, StarName, SolarMass, StarType)
VALUES (2, 'Sun', 1, 'Yellow dwarf');
SELECT * FROM Stars
-- Results:
-- StarID      StarName   SolarMass  StarType
-- ----------- ---------- ---------- ----------
-- 1           Pollux     1.86       Orange Giant
-- 2           Sun        1.00       Yellow dwarf
INSERT Stars (StarID, StarName, SolarMass, StarType)
VALUES (2, 'Deneb', 6, 'White supergiant');
-- Results:
-- Msg 2627, Level 14, State 1, Line 1
-- Violation of PRIMARY KEY constraint 'PK__Stars__06ABC647542C7691'.
Cannot insert duplicate key in object 'dbo.Stars'.
-- The statement has been terminated.
INSERT Stars (StarID, StarName, SolarMass, StarType)
VALUES (3, 'Deneb', -6, 'White supergiant');
-- Results:
-- Msg 547, Level 16, State 0, Line 1
-- The INSERT statement conflicted with the CHECK constraint "CK__Stars__
SolarMass__58F12BAE". The conflict occurred in database "AdventureWorks",
table "dbo.Stars", column 'SolarMass'.
-- The statement has been terminated.

INSERT Stars (StarID, StarName, SolarMass, StarType)
VALUES (3, 'Deneb', 6, 'White supergiant');
SELECT * FROM Stars
-- Results:
--DROP TABLE Stars
-- StarID      StarName   SolarMass  StarType
-- ----------- ---------- ---------- ----------
-- 1           Pollux     1.86       Orange Giant
-- 2           Sun        1.00       Yellow dwarf
-- 3 Deneb 6.00 White supergiant

					    


Enforcing Referential Integrity through FOREIGN KEY Constraints

A FOREIGN KEY constraint creates a relationship between two tables, based on a value held in a column or multiple columns. One of the tables participating in the relationship contains a PRIMARY KEY used in the relationship. The value of the primary key column can only appear once in this table. You can also use a UNIQUE constraint instead of a PRIMARY KEY constraint to define a relationship. Sometimes the table holding the PRIMARY KEY is referred to as the parent table or the “one” in a one-to-many relationship. The second table also has a column that contains the same values as the PRIMARY KEY column in the parent table. In the second table these values can repeat many times. This table is referred to as the child table, or the “many” table in a one-to-many relationship.

Consider this simple example. The Customers table may contain columns CustomerID, CompanyName, and StreetAddress. The CustomerID column is the PRIMARY KEY column and contains unique values. The Orders table contains an OrderID column, a CustomerID column, and an OrderAmount column. The CustomerID column in the Orders table contains the unique identity of the customer who has placed the order. To look up which customer an order belongs to, look them up by their CustomerID. To find all orders for a particular customer, look them up by their CustomerID. The CustomerID column in the Orders table is known as a FOREIGN KEY because it is a key of a foreign entity, an entity that does not belong in the table.

By creating a FOREIGN KEY constraint between two tables, their relationship is formalized. The rules of the constraint are applied to the relationship. By default, you cannot delete a parent record, if there are related child records in the child table referenced by a FOREIGN KEY constraint. You can also explicitly specify an action to take when the parent record is deleted or the key value of the parent record is updated. To do this, use the ON UPDATE and ON DELETE optional clauses when creating a FOREIGN KEY constraint. The following actions are available:

  • NO ACTION This is the default action. No special action is taken, and if the FOREIGN KEY is violated, the statement rolls back.
  • CASCADE Propagate the update or delete action to child rows. If you delete a parent row that participates in a cascading relationship, all child rows will be deleted. If you change a key value of a parent row, the corresponding child rows will also change.
  • SET NULL Set the values of the FOREIGN KEY column to null for all related records.
  • SET DEFAULT Set the values of the FOREIGN KEY column to its default values for all related records.

Configuring & Implementing…: Null FOREIGN KEYs and Self-Referencing FOREIGN KEYs

Columns marked as FOREIGN KEYs can contain null values. However, this practice is not recommended because when a FOREIGN KEY consists of two or more columns and contains null values, the constraint cannot be verified, and the integrity of your data cannot be guaranteed.

It is also possible for a FOREIGN KEY constraint to reference columns in the same table. This is known as a self-reference, and when querying a table in this arrangement it is referred to as a self-join. An example of a self-reference is a Generations table containing names of people with the columns PersonID, PersonName, and MotherID. The mother is also a person stored in the Generations table, and therefore, you can create a FOREIGN KEY relationship from the MotherID (FOREIGN KEY column) referencing PersonID (PRIMARY KEY column).


FOREIGN KEY constraints are frequently used by queries to join the parent and child tables. For this reason, it is recommended that you create a nonclustered index on every FOREIGN KEY contained in a table.

Example 5 creates two tables and links them by a FOREIGN KEY constraint.

Example 5. Working with FOREIGN KEY Constraints
CREATE TABLE Team(
TeamID int PRIMARY KEY,
TeamName varchar(50));
GO
CREATE TABLE TeamMember(
TeamMemberID int PRIMARY KEY,
FullName varchar(100),
TeamID int CONSTRAINT FK_Team_TeamMember
FOREIGN KEY REFERENCES dbo.Team(TeamID));
GO
INSERT Team VALUES (1, 'Development'), (2, 'Testing'), (3, 'Management');
INSERT TeamMember VALUES (1, 'Valentine', 1), (2, 'Bryant', 1), (3, 'Shane', 1),
(4, 'Keith', 3)
SELECT Team.TeamID, TeamName, FullName FROM
Team LEFT JOIN TeamMember ON
Team.TeamID = TeamMember.TeamID ;
GO
-- Results:
-- TeamID      TeamName    FullName
-- ----------- ----------- -----------
-- 1           Development Valentine
-- 1           Development Bryant
-- 1           Development Shane
-- 2           Testing     NULL
-- 3           Management  Keith
DELETE FROM Team WHERE TeamID = 2;
GO
-- Results:
-- (1 row(s) affected)
DELETE FROM Team WHERE TeamID = 3;
GO
-- Results:
-- Msg 547, Level 16, State 0, Line 1

-- The DELETE statement conflicted with the REFERENCE constraint "FK_Team_
TeamMember". The conflict occurred in database "AdventureWorks", table "dbo.
TeamMember", column 'TeamID'.
-- The statement has been terminated.
ALTER TABLE TeamMember
DROP CONSTRAINT FK_Team_TeamMember;
GO
ALTER TABLE TeamMember
ADD CONSTRAINT FK_Team_TeamMember
FOREIGN KEY(TeamID) REFERENCES dbo.Team(TeamID)
ON DELETE CASCADE;
GO
DELETE FROM Team WHERE TeamID = 3;
GO
-- Results:
-- (1 row(s) affected)
DROP TABLE TeamMember;
GO
DROP TABLE Team;
GO

					    


Head of the Class…: Using CHECK and NOCHECK options

When you add FOREIGN KEY or CHECK constraints to an existing table that already contains data, by default all data in the table is validated against the constraint. Depending on the amount of data in your table, this may take a long time. If you wish to instruct SQL Server not to validate data integrity, and just create the constraint, you can specify WITH NOCHECK. The constraint will apply to new and modified rows, but the existing rows will not be examined. Data integrity is not checked when reenabling a previously disabled constraint.

It is not recommended that you specify WITH NOCHECK as this may result in erratic behavior when updating existing data. For example, you may be updating a column that has nothing to do with the constraint, and you may receive constraint violation errors. Only use WITH NOCHECK in situations that explicitly require it.

You must understand that constraints that were defined WITH NOCHECK are not used by the query optimizer to create efficient queries. To allow the query optimizer to use these constraints, you must reenable them using the ALTER TABLE statement with the CHECK CONSTRAINT ALL clause.


Creating Indexes

An index is a lookup structure created on a table to optimize, sort, and query performance. Indexes are created on a particular column or columns and store the data values for this column or columns in order. When raw underlying table data is stored in no particular order, this situation is referred to as a heap. The heap is composed of multiple pages, with each page containing multiple table rows. When raw underlying data is stored in order, sorted by a column or columns, this situation is referred to as a clustered index. For example, if you have a table named Customer, with a clustered index on the FullName column, the rows in this table will be stored in order, sorted by the full name. This means that when you are searching for a particular full name, the query optimizer component can execute the query more efficiently by performing an index lookup rather than a table scan. Only one clustered index is allowed per table; usually this is created on the column designated as the PRIMARY KEY.

You can also create additional nonclustered indexes on a table that is stored either as a heap or as a clustered index. A nonclustered index is a separate lookup structure that stores index values in order, and with each index value, it stores a pointer to the data page containing the row with this index value. Nonclustered indexes speed up data retrieval. It makes sense to create nonclustered indexes on all frequently searched fields in a table. The trade-off with indexes is write performance. Every time a new row is inserted, the index must also be updated. When writing data to a table with nonclustered indexes, sometimes the pages within the table have to be rearranged to make room for the new values. In addition, indexes are storage structures that take up disk space. Indexes are created using the CREATE INDEX statement. Example 6 shows the syntax for creating an index.

Example 6. CREATE INDEX Statement—Syntax
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON table_or_view ( column1 [ ASC | DESC ], column2, ...n)
[ INCLUDE (additional_column_name, ...n) ]
[ WHERE filter_clause]
[ WITH OPTIONS]


The CREATE INDEX statement creates a clustered or nonclustered index on a specified column or columns. You can choose to create the index as UNIQUE, which will enforce a UNIQUE constraint on the index columns. A filter_clause can be specified to create indexes only on a subset of data that meets specific criteria. This is useful for a very large table, where creating an index on all values of a particular column will be impractical. Table 2 summarizes index options that can be used with the CREATE INDEX statement.

Table 2. Index Options
Option Explanation
PAD_INDEX = ON | OFF When this option is ON, free space is allocated in each page of the index. Allows for new values to be inserted without rearranging a large amount of data. The amount of free space allocated is specified by the FILLFACTOR parameter. When this option is OFF, enough free space for one row is reserved in every page during index creation.
FILLFACTOR = fill factor percentage Specifies the percentage of each page percentage. that should be filled up with data. For example, a fill factor of 80 means 20% of each page will be empty and available for new data. The fill factor is used only when you create or rebuild an index. Fill factor and index padding are discussed in detail in
SORT_IN_TEMPDB = ON | OFF Specifies whether the data should be sorted in the tempdb database instead of the current database. This may give performance advantages if the tempdb database is stored on a different disk to the current database.
IGNORE_DUP_KEY = ON | OFF Specifies that duplication errors should be ignored when creating unique indexes.
STATISTICS_NORECOMPUTE =ON | OFF Specifies that optimization statistics should not be updated at this time.
DROP_EXISTING = ON | OFF Specifies that the existing index with the same name should be dropped and then be re-created. This equates to an index rebuild.
ONLINE = ON | OFF Specifies that the underlying table should remain online and accessible by users while the index is being built. This option is only available in SQL Server 2008 Enterprise or Developer edition.
ALLOW_ROW_LOCKS = ON | OFF Specifies whether locks should be held on each row, as necessary.
ALLOW_PAGE_LOCKS = ON | OFF Specifies whether locks should be held on each page, as necessary.
MAXDOP = max_degree_of_parallelism Specifies the maximum number of processors that are to be used during the rebuild operation.
DATA_COMPRESSION = NONE | ROW | PAGE Use data compression at row or page level of the index.


Example 7 creates a clustered index (by star name) and a nonclustered index (by star type) on the Stars table we created in the previous example. Figure 2.3IX_Star_Name can be created using the interface of SQL Server Management Studio.

Example 7. Working with Indexes
--Create the table specifying that the PRIMARY KEY index is to be created
as nonclustered
CREATE TABLE Stars
(StarID int PRIMARY KEY NONCLUSTERED,
StarName varchar(50) Unique,
SolarMass decimal(10,2) CHECK(SolarMass > 0),
StarType varchar(50) DEFAULT 'Orange Giant');
GO
CREATE CLUSTERED INDEX Ix_Star_Name
ON Stars(StarName)
WITH (PAD_INDEX = ON,
FILLFACTOR = 70,
ONLINE = ON);
GO
CREATE NONCLUSTERED INDEX Ix_Star_Type
ON Stars (StarType)
WITH (PAD_INDEX = ON,
FILLFACTOR = 90);
GO


Figure 2. Creating an Index Using SQL Server Management Studio


When you are creating a PRIMARY KEY constraint, an index on the column(s) designated as PRIMARY KEY will be created automatically. This index will be clustered by default, but this can be overridden when creating the index by specifying the PRIMARY KEY NONCLUSTERED option. As a best practice, it is recommended that you accept the default of the clustered PRIMARY KEY column, unless you have a specific reason to designate another column as the clustered index key. Usually, the automatically created index is named PK_TableName_<Unique Number>, but this can be changed at any time by renaming the index. For example, a newly created Stars table with a PRIMARY KEY of StarID automatically has an index named UQ__Stars__A4B8A52A5CC1BC92.

Exam Warning

Remember that when creating a table, a unique index will be automatically created on the columns designated as the PRIMARY KEY. If you wish to avoid the long rebuild time associated with building a clustered index, or if you wish to create the clustered index on a column different from the PRIMARY KEY, you must explicitly specify the PRIMARY KEY NONCLUSTERED option. The PRIMARY KEY will always be unique.


Working with Full–Text Indexes

Standard indexes are great when used with the simple WHERE clause of the SELECT statement. An index will greatly reduce the time it will take you to locate rows where the indexed column is equal to a certain value, or when this column starts with a certain value. However, standard indexes are inadequate for fulfilling more complex text-based queries. For example, creating an index on StarType will not help you find all rows where the StarType column contains the word “giant,” but not the word “supermassive”.

To fulfill these types of queries, you must use full-text indexes. Full-text indexes are complex structures that consolidate the words used in a column and their relative weight and position, and link these words with the database page containing the actual data. Full-text indexes are built using a dedicated component of SQL Server 2008—the Full-Text Engine. In SQL Server 2005 and earlier, the Full-Text Engine was its own service, known as full-text search. In SQL Server 2008, the Full-Text Engine is part of the database engine (running as the SQL Server Service).

Full-text indexes can be stored on a separate filegroup. This can deliver performance improvements, if this filegroup is hosted on a separate disk from the rest of the database. Only one full-text index can be created on a table, and it can only be created on a single, unique column that does not allow null values. Full-text indexes must be based on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max). You must specify a type column, when creating a full-text index on a image, varbinary, or varbinary(max) columns. The type column stores the file extension (.docx, .pdf, .xlsx) of the document stored in the indexed column.

Example 8 amends the Stars table to include a Description column and creates a full-text index on this column. The FREETEXT function allows us to search on any of the words specified using the full-text index. This yields a similar user experience as using an Internet search engine.

Example 8 Creating and Using a Full-Text Index
ALTER TABLE Stars
ADD Description ntext DEFAULT 'No description specified' NOT NULL ;
GO
CREATE FULLTEXT CATALOG FullTextCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON Stars (Description)
KEY INDEX PK__Stars__06ABC6465F9E293D;
GO
UPDATE Stars SET Description = 'Deneb is the brightest star in the
constellation Cygnus and one of the vertices of the Summer Triangle. It is
the 19th brightest star in the night sky, with an apparent magnitude of 1.25.
A white supergiant, Deneb is also one of the most luminous stars known. It
is, or has been, known by a number of other traditional names, including
Arided and Aridif, but today these are almost entirely forgotten. Courtesy
Wikipedia.'
WHERE StarName = 'Deneb';
UPDATE Stars SET Description = 'Pollux, also cataloged as Beta Geminorum,
is an orange giant star approximately 34 light-years away in the constellation
of Gemini (the Twins). Pollux is the brightest star in the constellation
(brighter than Castor (Alpha Geminorum). As of 2006, Pollux was confirmed to
have an extrasolar planet orbiting it. Courtesy Wikipedia.'
WHERE StarName = 'Pollux';
GO
SELECT StarName
FROM Stars
WHERE FREETEXT (Description, 'planet orbit, giant');
GO
-- Results:
-- StarName
-- --------------------------------------------------
-- Pollux

					    


Partitioning Data

When working with large databases, query performance often becomes an issue, even if your indexing strategy is spot-on. If you have decided that indexing is not enough to produce your desired result, your next step can be data partitioning. Data partitioning separates a database into multiple filegroups containing one or more files. These filegroups are placed on different disks, enabling parallel read and write operations, thus significantly improving performance. Approach a partitioning strategy by separating different tables and indexes into different filegroups and placing them on separate disks. As a guide, always separate large, frequently accessed tables that are in a FOREIGN KEY relationship, so that they can be scanned in parallel when performing a join.

If the desired performance is not achieved by simple partitioning, this is usually due to very large single tables. You can employ a horizontal or vertical partitioning technique to split a single large table into multiple smaller tables. Queries that access this table will run quicker, and performance of maintenance tasks, such as backup and index rebuild, will also be improved.

Horizontal Partitioning

Horizontal partitioning splits a table into several smaller tables by separating out clusters of rows, based on a partitioning function. The structure of the smaller tables will remain the same as the structure of the initial table, but the smaller tables will contain fewer rows. For example, if you have a very large table that has 100 million rows, you can partition it into 10 tables containing 10 million rows each. Date columns are often a good choice for horizontal partitioning. For example, a table could be partitioned historically by year—each year stored in a smaller table. Thus, if a query requires data for specific dates, only one smaller table needs to be scanned.

Analyze the data and how your users are accessing this data in order to derive the best horizontal partitioning strategy. Aim to partition the tables so that the majority of the queries can be satisfied from as few smaller tables as possible. To join smaller tables together, UNION queries are required, and these can degrade performance.

Vertical Partitioning

Unlike horizontal partitioning, vertical partitioning separates different columns of a single table into multiple tables. The resultant smaller tables have the same number of rows as the initial table, but the structure is different. Two types of vertical partitioning are available:

  • Normalization Normalization is the process of applying logical database design techniques to reduce data duplication. This is achieved mainly by identifying logical relationships within your data and implementing multiple tables related by FOREIGN KEY constraints.
  • Row splitting This technique separates some columns from a larger table into another table or tables. Essentially, each logical row in a table partitioned using row splitting is stored across two tables. To maintain integrity between the tables, use a FOREIGN KEY constraint when both the primary and FOREIGN KEY participants are unique. This is known as a one-to-one relationship.

If implemented correctly, vertical partitioning reduces the time it takes to scan data. Use row splitting to separate frequently used and rarely accessed columns into separate tables, and eliminate overhead. The drawback of vertical partitioning is the processing time and resources it takes to perform the joins, when needed.

Posted in Windows 2008 | Tagged: , , , , , , , , , , , , , , , , , , , | Leave a Comment »