Partitioned Primary Index
Partitioned Primary Index (PPI) is an indexing mechanism in Teradata Database.
- PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
- PPI allows you to reduce the number of rows to be processed by using partition elimination.
- PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints
The Order_Table spread across the AMPs.Notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number.
When we apply Range Query, that means it uses the keyword BETWEEN.
The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan. To avoid full table scan, we will Partition the table.
After Partitioned Table,
The example of AMPs on the top of the page. This table is not partitioned.
The example of AMPs on the bottom of the page. This table is partitioned.
Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
Notice that the rows on an AMP don‘t change AMPs because the table is partitioned. Remember it is the Primary Index alone that will determine which AMP gets a row. If the table is partitioned then the AMP will sort its rows by the partition.
Now we are running our Range Query on our Partitioned Table,each AMP only reads from one partition. The Parsing Engine will not to do full table scan. It instructs the AMPs to each read from their January Partition. You Partition a Table when you CREATE the Table.
A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.
Types of partitioning:
RANGE_N Partitioning
Below is the example for RANGE_N Partition by day.
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY RANGE_N
(ORDER_DATE BETWEEN
DATE '2012-01-01' AND DATE '2012-12-31'
EACH INTERVAL '7' DAY);
Case_N Partitioning
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY CASE_N
(ORDER_TOTAL < 1000,
ORDER_TOTAL < 2000,
ORDER_TOTAL < 5000,
ORDER_TOTAL < 10000,
ORDER_TOTAL < 20000,
NO CASE, UNKNOWN);
The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria.
For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of the partitions so it goes to the NO CASE partition.
Multi-Level Partitioning:
You can have up to 15 levels of partitions within partitions.
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY (RANGE_N
(ORDER_DATE BETWEEN
DATE '2012-01-01' AND DATE '2012-12-31'
EACH INTERVAL '1' DAY)
CASE_N (ORDER_TOTAL < 5000,
ORDER_TOTAL < 10000,
ORDER_TOTAL < 15000,
ORDER_TOTAL < 20000,
NO CASE, UNKNOWN));
Character Based Partitioning(New Feature V13.10) :
There are four new data types available for Character Based PPI. They are CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.
Example for RANGE Based Character PPI
CREATE TABLE EMP_TBL
(
EMP_NO INTEGER NOT NULL,
DEPT_NO INTEGER,
FIRST_NAME CHAR(20),
LAST_NAME VARCHAR(20),
SALARY DECIMAL(10,2),
) PRIMARY INDEX(EMP_NO)
PARTITION BY RANGE_N
(LAST NAME BETWEEN ( 'A ','B ','C ','D ','E ','F ','G ','H ',
'I ','J ','K ','L ','M ','N ','O ','P ','Q ','R ','S ','T ',
'U ','V ','W ','X ','Y ','Z ' AND 'ZZ',UNKNOWN));
Example for CASE Based Character PPI
CREATE TABLE PRODUCT_TBL
(PRODUCT_ID INTEGER NOT NULL
,PRODUCT_NAME CHAR(30)
,PRODUCT_COST DECIMAL(10,2)
,PRODUCT_DESCRIPTION VARCHAR(100)
)PRIMARY INDEX(PRODUCT_ID)
PARTITION BY CASE_N
(PRODUCT_NAME < 'Apples'
PRODUCT_NAME < 'Bananas'
PRODUCT_NAME < 'Cantaloupe'
PRODUCT_NAME < 'Grapes'
PRODUCT_NAME < 'Lettuce'
PRODUCT_NAME < 'Mangos'
PRODUCT_NAME >='Mangos' and <='Tomatoes');
Ex-Query: Find all Products between Apples and Grapes?
Ans: SELECT * FROM PRODUCT_TBL WHERE PRODUCT_NAME BETWEEN 'Apples' and 'Grapes';
Partitioning Rules:
- A table can have up to 65,535 Partitions.
- Partitioning never determines which AMP gets row.
- Partitioning determines how an AMP will sort the row on its own.
- Table can have up to 15 levels of partitions.
- A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
- Total 3 forms of Partitioning Simple, RANGE and CASE.
Advantages of partitioned tables:
- They provide efficient searches by using partition elimination at the various levels or combination of levels.
- They reduce the I/O for range constraint queries
- They take advantage of dynamic partition elimination
- They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
- The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
- Row hash locks are used for SELECT with equality conditions on the PI columns.
- Partitioned tables allow for fast deletes of data in a partition.
- They allow for range queries without having to use a secondary index.
- Specific partitions maybe archived or deleted.
- May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
- May replace a Value Ordered NUSI for access.
Disadvantages of partitioned tables:
•Rows in a partitioned table are 2 bytes longer.
•Access via the Primary Index may take longer.
•Full table joins to a NPPI table with the same PI may take longer.
•Rows in a partitioned table are 2 bytes longer.
•Access via the Primary Index may take longer.
•Full table joins to a NPPI table with the same PI may take longer.
No comments:
Post a Comment