FASTLOAD
Fastload, the name itself telling that loads data in a fast way.That means it loads huge amount of data from flat file into EMPTY tables.
Manily FastLoad was developed to load millions of rows into empty Teradata tables so it is fast.
FastLoad will create a Teradata session for each AMP in order to maximize parallel processing.This gives awesome performance in loading data.
There are more reasons why FastLoad is so fast. Below are limitations of Fast load.
1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many times on a different AMP from the data row.
2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not allowed.
4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data. Fastload can load the data into multiset tables but duplicate rows are discarded.
5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired before the load process can be restarted
6)No more than one data type conversion is allowed per column: Data type conversion cause high resource utilization on the system
Fastload requires mainly three components
Log table
Log table stores the processing record information during load.This table contains one row for every FastLoad running on the system
Empty Target table
As mentioned earlier target tables should be empty.
Error tables(two)
Each FastLoad requires two error tables. These are automatically created during run. These will populated only errors occurred during the load.
The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR format.i.e wrong data.
The second error table is for errors caused by duplicate values for Unique Primary Indexes.
FastLoad Has Two Phases:
FastLoad divides its job into two phases, both designed for speed.
Phase 1 or Acquisition Phase
The primary purpose of phase 1 is to get data from host computer into Teradata System.
The data moves in 64 K blocks and is stored in worktables on the AMPs.
The data is not stored in the correct AMP.
Phase 2 or Application Phase.
Once the data is moved from the server, each AMP will hash its worktable rows.
Each each row transfers to the worktables where they permanently resides
Rows of a table are stored on the disks in data blocks
Simple fastload Script:
LOGON TDDB/USERNAME,PWD;
CREATE TABLE EMPDB.EMP_TABLE
( EMP_NUM INTEGER
,DEPT_NUM SMALLINT
,FIRST_NAME CHAR(20)
,LAST_NAME VARCHAR(20)
,SALARY DECIMAL(8,2)
UNIQUE PRIMARY INDEX(EMP_NUM);
DEFINE EMP_NUM (INTEGER)
,DEPT_NUM (SMALLINT)
,FIRST_NAME (CHAR(20))
,LAST_NAME (VARCHAR(20))
,SALARY (DECIMAL(8,2))
FILE=C:\TEMP\EMP_FILE.txt;
BEGIN LOADING EMPDB.EMP_TABLE
ERRORILES EMPDB.EMP_ERR1, EMPDB.EMP_ERR2
CHECKPOINT 10000000;
INSERT INTO EMPDB.EMP_TABLE
VALUES
( :EMP_NUM
,:DEPT_NUM
,:FIRST_NAME
,:LAST_NAME
,:SALARY );
END LOGOFF;
LOGOFF;
when can we RESTART fastlaod and cannot?
You Cannot RESTART FastLoad if
You Can RESTART FastLoad
Below are fast load commands used for creating fastload scripts:
AXSMOD to specify an access module (e.g., OLE-DB provider) that provides data to the FastLoad utility on network-attached client systems.
SESSIONS max min to specify the number of sessions. max = maximum number of sessions that will be logged on.
ERRLIMIT to control a runaway error condition, such as a mis-definition of the input data.
Specify the maximum number of error records you want to occur before the system issues an ERROR and terminates the load.
TENACITY to specify the number of hours FastLoad will try to establish a connection. The default is no tenacity.
The statement must be placed before LOGON.
SLEEP to specify the number of minutes FastLoad waits before retrying a logon.
The default is 6 minutes. The statement must be placed before LOGON.
Manily FastLoad was developed to load millions of rows into empty Teradata tables so it is fast.
FastLoad will create a Teradata session for each AMP in order to maximize parallel processing.This gives awesome performance in loading data.
There are more reasons why FastLoad is so fast. Below are limitations of Fast load.
1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many times on a different AMP from the data row.
2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not allowed.
4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data. Fastload can load the data into multiset tables but duplicate rows are discarded.
5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired before the load process can be restarted
6)No more than one data type conversion is allowed per column: Data type conversion cause high resource utilization on the system
Fastload requires mainly three components
Log table
Log table stores the processing record information during load.This table contains one row for every FastLoad running on the system
Empty Target table
As mentioned earlier target tables should be empty.
Error tables(two)
Each FastLoad requires two error tables. These are automatically created during run. These will populated only errors occurred during the load.
The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR format.i.e wrong data.
The second error table is for errors caused by duplicate values for Unique Primary Indexes.
FastLoad Has Two Phases:
FastLoad divides its job into two phases, both designed for speed.
Phase 1 or Acquisition Phase
The primary purpose of phase 1 is to get data from host computer into Teradata System.
The data moves in 64 K blocks and is stored in worktables on the AMPs.
The data is not stored in the correct AMP.
Phase 2 or Application Phase.
Once the data is moved from the server, each AMP will hash its worktable rows.
Each each row transfers to the worktables where they permanently resides
Rows of a table are stored on the disks in data blocks
Simple fastload Script:
- Logging onto Teradata
- Defining the Teradata table that you want to load (target table)
- Defining the INPUT data file
- Telling the system to start loading
- Telling the system to to insert data into final target.
- End session.
LOGON TDDB/USERNAME,PWD;
CREATE TABLE EMPDB.EMP_TABLE
( EMP_NUM INTEGER
,DEPT_NUM SMALLINT
,FIRST_NAME CHAR(20)
,LAST_NAME VARCHAR(20)
,SALARY DECIMAL(8,2)
UNIQUE PRIMARY INDEX(EMP_NUM);
DEFINE EMP_NUM (INTEGER)
,DEPT_NUM (SMALLINT)
,FIRST_NAME (CHAR(20))
,LAST_NAME (VARCHAR(20))
,SALARY (DECIMAL(8,2))
FILE=C:\TEMP\EMP_FILE.txt;
BEGIN LOADING EMPDB.EMP_TABLE
ERRORILES EMPDB.EMP_ERR1, EMPDB.EMP_ERR2
CHECKPOINT 10000000;
INSERT INTO EMPDB.EMP_TABLE
VALUES
( :EMP_NUM
,:DEPT_NUM
,:FIRST_NAME
,:LAST_NAME
,:SALARY );
END LOGOFF;
LOGOFF;
when can we RESTART fastlaod and cannot?
You Cannot RESTART FastLoad if
- The Error Tables are DROPPED
- The Target Table is DROPPED
- The Target Table is CREATED
You Can RESTART FastLoad
- The Error Tables are NOT DROPPED in the script
- The Target Table is NOT DROPPED in the script
- The Target Table is NOT CREATED in the script
- You have defined a checkpoint
Below are fast load commands used for creating fastload scripts:
AXSMOD to specify an access module (e.g., OLE-DB provider) that provides data to the FastLoad utility on network-attached client systems.
SESSIONS max min to specify the number of sessions. max = maximum number of sessions that will be logged on.
ERRLIMIT to control a runaway error condition, such as a mis-definition of the input data.
Specify the maximum number of error records you want to occur before the system issues an ERROR and terminates the load.
TENACITY to specify the number of hours FastLoad will try to establish a connection. The default is no tenacity.
The statement must be placed before LOGON.
SLEEP to specify the number of minutes FastLoad waits before retrying a logon.
The default is 6 minutes. The statement must be placed before LOGON.
very informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Course Hyderabad
ReplyDelete