Teradata Notepad: Refresh Your knowledge

What are the different types of locks in Teradata?

Exclusive Lock:
Exclusive locks are only applied to databases or tables, never to rows.
They are the most restrictive type of lock; all other users are locked out.
Exclusive locks are used rarely, most often when structural changes are being made to the database.

Read Lock:
Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data, during which no modification of the data is permitted.

Write Lock:Write locks enable users to modify data while locking out all other users except readers not concerned about data consistency (Access lock readers).
Until a Write lock is released, no new read or write locks are allowed.

Access Lock:Access locks can be specified by users who are not concerned about data consistency.
The use of an access lock allows for reading data while modifications are in process. Access locks are designed for decision support on large tables that are updated only by small single row changes. Access locks are sometimes called stale read locks, i.e. you may get stale data that hasnot been updated

Lock types are automatically applied based on the SQL command:
SELECT -->applies a Read lock
UPDATE --> applies a Write lock
CREATE TABLE --> applies an Exclusive lock

=========================================================================

What is multiple statement processing?
Multiple statement processing increases the performance when loading into large tables. All statements are sent to parser simultaneously and all statements are executed parallel.
In sql Assistance, once you place all queries and press F5. All will be executed parallel. Make sure each sql statement ends semicolon(;).

=========================================================================

What is TENACITY and SLEEP ?
TENACITY to specify the number of hours FastLoad will try to establish a connection. The default is no tenacity. The statement must be placed before LOGON.

SLEEP to specify the number of minutes FastLoad waits before retrying a logon.The default is 6 minutes. The statement must be placed before LOGON.

=========================================================================

What is soft Referential Integrity and Batch Referential Integrity?
Soft Referential Integrity:
It provides a mechanism to allow user-specified Referential Integrity (RI) constraints that are not
enforced by the database.
Enables optimization techniques such as Join Elimination.

Batch Referential Integrity:
Tests an entire insert, delete, or update batch operation for referential integrity. If insertion, deletion, or update of any row in the batch violates referential integrity, then parsing engine software rolls back the entire batch and returns an abort message.

=========================================================================

What Teradata statistics ?

Statistics collection is essential for the optimal performance of the Teradata query optimizer.

The query optimizer relies on statistics to help it determine the best way to access data.

Statistics also help the optimizer ascertain how many rows exist in tables being queried and predict how many rows will qualify for given conditions.

Lack of statistics, or out-dated statistics, might result in the optimizer choosing a less-than-optimal method for accessing data tables.

Points:

1: Once a collect stats is done on the table(on index or column) where is this information stored so that the optimizer can refer this?

Ans: Collected statistics are stored in DBC.TVFields or DBC.Indexes. However, you cannot query these two tables.

2: How often collect stats has to be made for a table that is frequently updated?

Answer: You need to refresh stats when 5 to 10% of table's rows have changed. Collect stats could be pretty resource consuming for large tables. So it is always advisable to schedule the job at off peak period and normally after approximately 10% of data changes.

3: Once a collect stats has been done on the table how can i be sure that the optimizer is considering this before execution ? i.e; until the next collect stats has been done will the optimizer refer this?

Ans: Yes, optimizer will use stats data for query execution plan if available. That's why stale stats is dangerous as that may mislead the optimizer.

=========================================================================

What is a HOT AMP?

When the workload is not distributed across all the AMPs, only a few AMPs end up overburdened with the work. This is a hot AMP condition.

This typically occurs when the volume of data you are dealing with is high and

(a). You are trying to retrieve the data in a TERADATA table which is not well distributed across the AMPs on the system (bad Primary Index)

(b). When you are trying to join on column with highly non unique values

(c). When you apply the DISTINCT operator on a column with highly non unique values

=========================================================================

What are different SPACES available in Teradata?

There are mainly 3 types of spaces available.

Perm Space

Temp Space

Spool space

Perm Space :All databases have a defined upper limit of permanent space. Permanent space is used for storing the data rows of tables. Perm space is not pre-allocated. It represents a maximum limit.

Spool Space :All databases also have an upper limit of spool space. If there is no limit defined for a particular database or user, limits are inherited from parents.

Theoretically, a user could use all unallocated pace in the system for their query. Spool space is temporary space used to hold intermediate query results or formatted answer sets to queries. Once the query is complete, the spool space is released.

Example: You have a database with total disk space of 100GB. You have 10GB of user data and an additional 10GB of overhead. What is the maximum amount of spool space available for queries?

Answer: 80GB. All of the remaining space in the system is available for spool

Temp Space :The third type of space is temporary space. Temp space is used for Global and Volatile temporary tables, and these results remain available to the user until the session is terminated. Tables created in temp space will survive a restart.

=========================================================================

How Does Teradata Store Rows?

Teradata uses hash partitioning and distribution to randomly and evenly distribute data across all AMPs.

The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.

Each AMP is responsible for a subset of the rows of each table.

Evenly distributed tables result in evenly distributed workloads.

=========================================================================

Which one will take care when an AMP goes down?

Down amp recovery journal will start when AMP goes down to restore the data for the down amp.

Fall back is like it has redundant data,if one amp goes down in the cluster also it wont affect your queries.the query will use data from fall back rows.the down amp wont be updated use the data from fall back.

=========================================================================

Which one will take care when a NODE goes down?

In the event of node failure, all virtual processors can migrate to another available node in the clique.

All nodes in the clique must have access to the same disk arrays

=========================================================================

What is the use of EXPLIN plan?

The EXPLAIN facility allows you to preview how Teradata will execute a requested query. It returns a summary of the steps the Teradata RDBMS would perform to execute the request.

EXPLAIN also discloses the strategy and access method to be used, how many rows will be involved, and its cost in minutes and seconds. Use EXPLAIN to evaluate a query performance

and to develop an alternative processing strategy that may be more efficient. EXPLAIN works on any SQL request. The request is fully parsed and optimized, but not run. The complete plan is returned to the user in readable English statements.

EXPLAIN provides information about locking, sorting, row selection criteria, join strategy and conditions, access method, and parallel step processing. EXPLAIN is useful for performance tuning, debugging, pre-validation of requests, and for technical training.

=========================================================================

What COALESCE function?

The newer ANSI standard COALESCE can also convert a NULL to a zero. However, it can convert a NULL value to any data value as well.
The COALESCE searches a value list, ranging from one to many values, and returns the first Non-NULL value it finds.
At the same time, it returns a NULL if all values in the list are NULL.

To use the COALESCE, the SQL must pass the name of a column to the function. The data in the column is then compared for a NULL. Although one column name is all that is required, normally more than one column is normally passed to it. Additionally, a literal value, which is never NULL, can be returned to provide a default value if all of the previous column values are NULL

The syntax for the COALESCE follows:

SELECT COALESCE ( [, ] )
,( COALESCE([,] ) )
FROM GROUP BY 1 ;
=========================================================================

When will you create PPI and when will you create secondary indexes?

Partitioned Primary Indexes are Created so as to divide the table onto partitions based on Range or Values as Required. This is effective for Larger Tables partitioned on the Date and integer columns. There is no extra Overhead on the System (no Spl Tables Created ect )

Secondary Indexes are created on the table for an alternate way to access data. This is the second fastest method to retrieve data from a table next to the primary index. Sub tables are created.

PPI and secondary indexes do not perform full table scans but they access only a defined st of data in the AMP's.

=========================================================================
When you chose primary index and when will you choose secondary index?
Primary index will be chosen at the time of table creation. This will help us in data distribution, data retrieval and join operations.
Secondary indexes can be created and dropped at any time. They are used as an alternate path to access data other than the primary index.

Diff between database and user?
Both may own objects such as tables, views, macros, procedures, and functions. Both users and databases may hold privileges.
However, only users may log on, establish a session with the Teradata Database, and submit requests.

A user performs actions where as a database is passive. Users have passwords and startup strings; databases do not. Users can log on to the Teradata Database, establish sessions, and submit SQL statements; databases cannot.

Creator privileges are associated only with a user because only a user can log on and submit a CREATE statement. Implicit privileges are associated with either a database or a user because each can hold an object and an object is owned by the named space in which it resides

=========================================================================
When a PI is not mentioned on a table, how will Teradata consider the PI for that table?

If you don't specify a PI at table create time then Teradata must chose one. For instance, if the DDL is ported from another database that uses a Primary Key instead of a Primary Index, the CREATE TABLE contains a PRIMARY KEY (PK) constraint. Teradata is smart enough to know that Primary Keys must be unique and cannot be null. So, the first level of default is to use the PRIMARY KEY column(s) as a UPI.
If the DDL defines no PRIMARY KEY, Teradata looks for a column defined as UNIQUE. As a second level default, Teradata uses the first column defined with a UNIQUE constraint as a UPI.
If none of the above attributes are found, Teradata uses the first column defined in the table as a NON-UNIQUE PRIMARY INDEX (NUPI).

=========================================================================
What is covered query in Teradata?

If a SELECT query covers all the columns that are defined in the JOIN INDEX as join columns, such type of queries are called as COVERED query.
Multi-Column NUSI Columns used as a Covered Query
=========================================================================
What is the use of WITH CHECK OPTION in Teradata?
In Teradata, the additional key phase: WITH CHECK OPTION, indicates that the WHERE clause conditions should be applied during the execution of an UPDATE or DELETE against the view. This is not a concern if views are not used for maintenance activity due to restricted privileges.

=========================================================================
What is identity column?
Teradata has a concept of identity columns on their tables beginning around V2R6.x. These columns differ from Oracle's sequence concept in that the number assigned is not guaranteed to be sequential. The identity column in Teradata is simply used to guaranteed row-uniqueness.
Example:
CREATE MULTISET TABLE MyTable
(
ColA INTEGER GENERATED BY DEFAULT AS IDENTITY
(START WITH 1
INCREMENT BY 20)
ColB VARCHAR(20) NOT NULL
)
UNIQUE PRIMARY INDEX pidx (ColA);
Granted, ColA may not be the best primary index for data access or joins with other tables in the data model. It just shows that you could use it as the PI on the table.

=========================================================================
How to implement UPSERT logic in Teradata using SQL?

We have MERGE-INTO option available in Teradata data which works as an UPSERT logic in teradata. If the condition matches then update else insert.

Example: MERGE into dept_table1 as Taregt

USING (SELECT dept_no, dept_name, budget

FROM dept_table where dept_no = 20) Source

ON (Target.dept_no = 20)

WHEN MATCHED then

UPDATE set dept_name = ëBeing Renamedí

WHEN NOT MATCHED then

INSERT (dept_no, dept_name, budget)

VALUES (source.dept_no, source.dept_name, source.budget);

=========================================================================
What is value ordered NUSI?
When we define a value ordered NUSI on a column the rows in the secondary subtable get sorted based on the secondary index value. The columns should be of integer or date type.
This is used for range queries and to avoid full table scans on large tables.

=========================================================================
What are the DBQL tables.

Database Query Log tables are the tables present in DBC database which store the history of all the operations performed on the tables present in the databases.
The history could get very large so these tables should be purged when the data is no longer needed.

=========================================================================
What is the difference between sample and top?

The Sampling function (SAMPLE) permits a SELECT to randomly return rows from a Teradata database table.
It allows the request to specify either an absolute number of rows or a percentage of rows to return. Additionally, it provides an ability to return rows from multiple samples.

SELECT * FROM student_course_table SAMPLE 5;

The TOP clause is used to specify the number of records to return. The TOP clause can be very useful on large tables with thousands of records. Returning a large number of records can impact on performance.

SELECT TOP 2 * FROM EMP.

=========================================================================
What is difference between ZEROIFNULL and NULLIFZERO ?
The ZEROIFNULL function: will pass zero when data coming as null
The NULLIFZERO function: will pass null when data coming as zero.

=========================================================================
What is Range_N?
Range_N is defined on a partition primary index to specify the range of values of a column that should be assigned to a partition.
The number of partitions = the number of ranges specified + no case + unknown
no case ñ> if the value does not belong to any range
unknown -> for the values like nulls, spaces etc

=========================================================================
What is Casting in Teradata ?

It will convert the data type
The casting is similar to DDL:
CAST('02/03/2009-01:25:11' AS TIMESTAMP FORMAT 'MM/DD/YYYY-HH:MI:SS')
=========================================================================
Can we do collect stats on a table when the table is being updated?
No.We never collect stats on a table when the table is being updated. It will go for dead lock.

=========================================================================
When will you go for hash index?

A hash index organizes the search keys with their associated pointers into a hash file structure.
We apply a hash function on a search key to identify a bucket, and store the key and its associated pointers in the bucket (or in overflow buckets).
Strictly speaking, hash indices are only secondary index structures, since if a file itself is organized using hashing, there is no need for a separate hash index structure on it.

=========================================================================
Difference between Global and Volatile Temporary Table? Where they can be used and how they can be used?

Global Temporary Tables

Materialize in users temporary space
Base definitions are permanent in Data Dictionary tables
Definitions can be materialized by any user with the necessary DML privileges
Can be defined for COLLECT STATISTICS
Can survive a Teradata Database restart
Up to 2000 materialized tables per session
Materialized instance is local to a session
If not dropped manually, instance is dropped automatically at session end
An instance an be dropped manually any time during a session with DROP TEMPORARY TABLE
Materialized contents are not shareable with other sessions
A very large number of global tables can prolong logoff, because materialized global tables are dropped automatically at session end.

Volatile Tables

Occupy space in users SPOOL allocation
Definitions are kept in cache and go away at session end or during a Teradata Database restart
Private to the session in which they are created
Can be defined for COLLECT STATISTICS(TD14 feature)
Do not survive a Teradata Database reset
Up to 1000 tables per session
Local to a session
If not dropped manually, dropped automatically at session end
Can be dropped manually any time during a session with DROP TABLE
Contents are not sharable with other sessions
A very large number of volatile tables can prolong logoff, because volatile tables are dropped automatically at session end.

What are the advantages of partitioned tables:
1)They provide efficient searches by using partition elimination at the various levels or combination of levels.
2)They reduce the I/O for range constraint queries
3)They take advantage of dynamic partition elimination
4)They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices,
(i.e., you can use last name or some other value that is more readily available to query on.)
5)The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
6)Row hash locks are used for SELECT with equality conditions on the PI columns.
7)Partitioned tables allow for fast deletes of data in a partition.
8)They allow for range queries without having to use a secondary index.
9)Specific partitions maybe archived or deleted.
10)May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
11)May replace a Value Ordered NUSI for access.

=========================================================================
What are the Partitioning Rules:
1)A table can have up to 65,535 Partitions.
2)Partitioning never determines which AMP gets row.
3)Partitioning determines how an AMP will sort the row on its own.
4)Table can have up to 15 levels of partitions.
5)A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
6)Total 3 forms of Partitioning Simple, RANGE and CASE.

=========================================================================
What are the Disadvantages of partitioned tables
1)Rows in a partitioned table are 2 bytes longer.
2)Access via the Primary Index may take longer.
3)Full table joins to a NPPI table with the same PI may take longer.
=========================================================================
Difference between Set and Multiset Table?
The SET table does not allow duplicate rows while the multiset table allow duplicate rows. For a SET table, to enhance the performance and to enforce uniqueness, a UPI or a USI can be created on a particular column. For a multiset table, a NUPI or NUSI can be created on a particular column to enhance performance.

Characteristics of SET tables:
Logically correct Implementation
Automatic duplicate row elimination

Characteristics of MULTISET tables:

No duplicate row check overhead
Data integrity becomes a user responsibility

=========================================================================What are the Advantages and Disadvantages of multi-column PI?
As many as 64 columns may be included in an index. Multi-column Primary Indexes may provide better distribution.

Advantage
*More columns = more uniqueness
*Distinct value increases.
*Rows per value decreases.
*Selectivity increases.

Disadvantage
*More columns = less usability
*PI can only be used when values for all PI columns are provided in SQL statement.
*Partial values do not generate a useful hash result.
=========================================================================
What is Value List Compression? how would you implement compression ?
The Value list compression cannot compress an existing column but it can be used to add a compressed column to a table.
Use the ALTER TABLE statement to compress columns and reduce the number of I/O operations. Consider the following:
*Set the column default value to most frequent value.
*Compress to the default value.
*This is especially useful for sparsely populated columns.
*Overhead is not high.
*The I/O savings correlates to the percentage of data compressed out of a row.

Value List Compression (VLC) provides the Teradata database with the capacity to support multiple value compression for fixed width columns. When you specify a values or values, the system suppresses any data matching the compress value from a row. This saves disk space. Smaller physical row size results in less data blocks and fewer I/Os and improved overall performance.

Because VLC allows you to specify a list of compress values for a column, the system suppresses data when one of the specified values exists in the column. Up to 255 distinct values (plus NULL) may be compressed per fixed-width column.

VLC improves performance as follows:
*Reduces the I/O required for scanning tables when the tables have compressible values in their columns.
*Reduces disk space because rows are smaller.
*Permits joins to look up the tables to be eliminated.
*Improves data loading because more rows may fit into one data block after compression is applied.

=========================================================================
What have you done to resolve the running out of spool problem?
To resolve running out of spool problem,
1) The column involved in the where clause/joining column is an index column or not, should be checked .
2) Also whether statistics have been collected on the index column. If no statistics have been collected then collect stats on it.
3) Check maxspool and peak spool from dbc.diskspace per amp basis and check for data skew.
4) Also check for improper join condition or join condition missing.
4) Despite of the steps taken, the running out of spool problem comes then explicitly increase the spool space.

=========================================================================
What are the limitations of Fast load?

1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many times on a different AMP from the data row.

2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.

3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not allowed.

4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data. Fastload can load the data into multiset tables but duplicate rows are discarded.

5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired before the load process can be restarted

6)No more than one data type conversion is allowed per column: Data type conversion cause high resource utilization on the system

=========================================================================
What are the limitations of Multi load?
1)Unique Secondary Indexes are not supported on a Target Table:Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row.

2)Referential Integrity is not supported: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.

3)Triggers are not supported at load time: Disable all the Triggers prior to using it.

4)No concatenation of input files is allowed: It could impact are restart if the files were concatenated in a different sequence or data was deleted between runs.

5)No Join Indexes: All the join indexes must be dropped before running a MultiLoad and then recreate them after the load is completed

6)Will not process aggregates, arithmetic functions or exponentiation:If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it
=========================================================================
Explain the phases in Multiload?

MultiLoad Has Five IMPORT Phases:
Phase 1: Preliminary Phase : Ita Basic setup phase.Its used for several preliminary set-up activities for a successful data load.
Phase 2: DML Transaction Phase: All the SQL Data Manipulation Language (DML) statements are sent to Teradata database as Multilaod supports multiple DML functions.
Phase 3: Acquisition Phase: Once the setup completes the PE's plan stored on each AMP.Then Locks the table headers and the actual input data will also be stored in the worktable.
Phase 4: Application Phase: In this phase all DML opreations are applied on target tables.
Phase 5: Cleanup Phase: Table locks will be released and all the intermediate work tables will be dropped.
=========================================================================
What are the limitations of Tpump?

No concatenation of input data files is allowed.
TPump will not process aggregates, arithmetic functions or exponentiation.
The use of the SELECT function is not allowed.
No more than four IMPORT commands may be used in a single load task.
Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy.
On some network attached systems, the maximum file size when using TPump is 2GB.
TPump performance will be diminished if Access Logging is used.

=========================================================================
How many types of error tables in Fast Load?

Each FastLoad requires two error tables. These are automatically created during run. These will populated only errors occurred during the load.

The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR format.i.e wrong data.

The second error table is for errors caused by duplicate values for Unique Primary Indexes.
=========================================================================
What is Quantiles?

Quantiles are used to divide a number of rows into a number of partitions of roughly equal size. The most common quantile is the percentile which is based on a value of 100. It is also not uncommon to have quartiles (based on 4), tertiles (based on 3) or deciles (based on 10). Notice that by default, both the quantile column and the quantile value itself will be output in ascending sequence.

Aggregations may not be combined with OLAP functions, thus a derived or temporary table is sometimes necessary to produce the aggregation before the function is applied. The standard form of the quantile function is:

QUANTILE (quantile_constant,sortlist)

SELECT employee_number, salary_amount,
QUANTILE (100, employee_number,salary_amount )
FROM employee
QUALIFY QUANTILE (100, employee_number) < 25;

=========================================================================
What is WIDTH_BUCKET ?

The WIDTH_BUCKET function assigns partition numbers to values associated with a column or expression, based on a specified range and based on the number of partitions desired.
Like many OLAP and aggregate functions, WIDTH_BUCKET is used to extract statistical information from sample populations.
The format of the function is as follows:
WIDTH_BUCKET (column_expr,lower_bound, upper_bound ,partition_count)

=========================================================================
What is Random function?

The RANDOM function may be used to generate a random number between a specified range.
RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified, otherwise a random number between 0 and approximately 4 billion is generated.

SEL DEPARTMENT_NUMBER,RANDOM(1,9) FROM EMPLOYEE;

limitations:
RANDOM is non-ANSI standard
RANDOM may be used in a SELECT list or a WHERE clause, but not both
RANDOM may be used in Updating, Inserting or Deleting rows
RANDOM may not be used with aggregate or OLAP functions
RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause
=========================================================================

Teradata Notepad

Wednesday, August 19, 2015

Refresh Your knowledge

No comments:

Post a Comment