Thursday, September 25, 2014

Primary Index

Primary Index


Every table must have at least one column as the Primary Index. The Primary Index is defined when the table is created.There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.

Primary Index Rules
Rule 1: One Primary Index per table.
Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.

Two Types of Primary Indexes (UPI or NUPI)

Unique Primary Index(UPI)
A Unique Primary Index (UPI) is unique and cannot have any duplicates.
If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column.
 
A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. 
 
We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table.         
TeradataWiki-Teradata UPI

Non-Unique Primary Index (NUPI) 
A Non-Unique Primary Index (NUPI) means that the values for the selected column can be non-unique.Duplicate values can exist.

A Non-Unique Primary Index will almost never spread the table rows evenly.
An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.

We have selected LAST_NAME to be our Primary Index. Because we have designated LAST_NAME to be a Non-Unique Primary Index we are anticipating that there will be individuals in the table with the same last name.
TeradataWiki-Teradata UPI

Multi-Column Primary Indexes: 
Teradata allows more than one column to be designated as the Primary Index. It is still only one Primary Index, but it is merely made up by combining multiple columns together. Teradata allows up to 64 combined columns to make up the one Primary Index required for a table.

On the following page you can see we have designated First_Name and Last_Name combined to make up the Primary Index.
 This is often done for two reasons:
 (1) To get better data distribution among the AMPs
 (2) Users often use multiple keys consistently to query
           
SQL Syntax for Creating a Primary Index

 Creating a Unique Primary Index  
The SQL syntax to create a Unique Primary Index is:
CREATE TABLE sample_1
(col_a INT
,col_b INT
,col_c INT)
UNIQUE PRIMARY INDEX (col_b);

Creating a Non-Unique Primary Index 
The SQL syntax to create a Non-Unique Primary Index is:

CREATE TABLE sample_2
(col_x INT
,col_y INT
,col_z INT)
PRIMARY INDEX (col_x);


Data distribution using Primary Index 
When a user submits an SQL request against a table using a Primary Index, the request becomes a one-AMP operation, which is the most direct and efficient way for the system to find a row. The process is explained below.

Teradata Primary Index

Hashing Process  
1.The primary index value goes into the hashing algorithm.
2.The output of the hashing algorithm is the row hash value.
3.The hash map points to the specific AMP where the row resides.
4.The PE sends the request directly to the identified AMP.
5.The AMP locates the row(s) on its vdisk.
6.The data is sent over the BYNET to the PE, and the PE sends the answer set on to the client application.


Duplicate Row Hash Values 

It is possible for the hashing algorithm to end up with the same row hash value for two different rows.

There are two ways this could happen:

  • Duplicate NUPI values: If a Non-Unique Primary Index is used, duplicate NUPI values will produce the same row hash value. 
  • Hash synonym: Also called a hash collision, this occurs when the hashing algorithm calculates an identical row hash value for two different Primary Index values. H

To differentiate each row in a table, every row is assigned a unique Row ID. The Row ID is the combination of the row hash value and a uniqueness value.


Row ID = Row Hash Value + Uniqueness Value 

The uniqueness value is used to differentiate between rows whose Primary Index values generate identical row hash values. In most cases, only the row hash value portion of the Row ID is needed to locate the row.
TeradataWiki-Teradata Row Hash

When each row is inserted, the AMP adds the row ID, stored as a prefix of the row.
The first row inserted with a particular row hash value is assigned a uniqueness value of 1. The uniqueness value is incremented by 1 for any additional rows inserted with the same row hash value.

Below Diagrams will show how data resides in AMP.

Even Distribution with an UPI 
TeradataWiki-Teradata


Uneven Distribution with a NUPI 
TeradataWiki-Teradata

No comments:

Post a Comment