TD14.10 Statistics : “Threshold” option
When you submit a COLLECT STATISTICS statement in 14.10, it may or may not execute. A decision is made whether or not there is a value in recollecting these particular statistics at the time they are submitted. That decision is only considered if threshold options are being used.
There are three levels of Threshold options
System threshold: This is the default approach for applying thresholds for all 14.10 platforms. This is default approach determines the appropriate threshold for each statistic and considers how much the underlying table has changed since the last collection.
The degree of change to the table since the last collection is compared against the current state of the table, based on USECOUNT tracking of inserts, deletes and updates performed. If the change threshold has not been reached, and enough history has been collected for this statistics so that the optimizer can perceive a pattern in the data such that extrapolations can be confidently performed, then this statistics collection will be skipped.
There is a DBS Control record parameter called SysChangeThresholdOption
DBA-defined global thresholds: This thresholds override the system default, and rely on DBA-defined fixed percentages as thresholds. Once set, all statistics collection statements will use these global threshold values, unless overridden by the third level of threshold options at the statement level.
Two parameters :
DefaultUserChangeThreshold:
DefaultTimeThreshold:
Thresholds on individual statements: Optional USING clauses that are attached to COLLECT STATISTICS statements can override the system default or any global DBA-defined thresholds.
COLLECT STATISTICS USING THRESHOLD 8% AND THRESHOLD 10 DAYS
COLUMN column1 ON Table1;
Whichever threshold level is set, if the optimizer determines that the threshold has not been met, no statistics will be collected, even though they have been submitted. When a collection has been asked for but has not been executed, a StatsSkipCount column in the DBC.StatsV row that represents this statistics will be incremented.
DBQL Usecount and Threshold
The recommended percent of change thresholds rely on having DBQL USECOUNT logging turned on. USECOUNT tracks inserts, deletes and updates made to tables within a database, and as a result, can provide highly accurate information to the optimizer about how the table has changed since the last statistics collection.
The default system threshold functionality is able to be applied to a statistic collection only if USECOUNT logging has been enabled for the database that the statistics collection table belongs to. In the absence of USECOUNT data, the default threshold behaviour will be ignored. However, both DBA-defined global thresholds and statement-based thresholds are able to use percent of change thresholds even without USECOUNT logging, but with the risk of less accuracy.
In the cases where USECOUNT logging is not enabled, percent of change values are less reliable because the optimizer must rely on random AMP sample comparisons.
Percent of change is the recommended way to express thresholds.
Example :
Pick a small, non-critical database.
Enable DBQL USECOUNT logging on that database:
BEGIN QUERY LOGGING WITH USECOUNT ON TestDB;
Disable the system threshold parameter by setting DBS Control setting:
SysChangeThresholdOption = 3
Leave the global parameters disabled as they are by default:
DefaultUserChangeThreshold = 0
DefaultTimeThreshold = 0
Add USING THRESHOLD 10 PERCENT to the statistics collection statements just for the tables within your selected database:
COLLECT STATISTICS USING THRESHOLD 10% COLUMN TestID ON TestDB;
Insert a few rows into the table (less than 10% of the table size) and run an Explain of the statistics collection statement itself, and it will tell you whether or not skipping is taking place.
EXPLAIN COLLECT STATISTICS COLUMN TestID ON TestDB;
When you submit a COLLECT STATISTICS statement in 14.10, it may or may not execute. A decision is made whether or not there is a value in recollecting these particular statistics at the time they are submitted. That decision is only considered if threshold options are being used.
There are three levels of Threshold options
System threshold: This is the default approach for applying thresholds for all 14.10 platforms. This is default approach determines the appropriate threshold for each statistic and considers how much the underlying table has changed since the last collection.
The degree of change to the table since the last collection is compared against the current state of the table, based on USECOUNT tracking of inserts, deletes and updates performed. If the change threshold has not been reached, and enough history has been collected for this statistics so that the optimizer can perceive a pattern in the data such that extrapolations can be confidently performed, then this statistics collection will be skipped.
There is a DBS Control record parameter called SysChangeThresholdOption
DBA-defined global thresholds: This thresholds override the system default, and rely on DBA-defined fixed percentages as thresholds. Once set, all statistics collection statements will use these global threshold values, unless overridden by the third level of threshold options at the statement level.
Two parameters :
DefaultUserChangeThreshold:
DefaultTimeThreshold:
Thresholds on individual statements: Optional USING clauses that are attached to COLLECT STATISTICS statements can override the system default or any global DBA-defined thresholds.
COLLECT STATISTICS USING THRESHOLD 8% AND THRESHOLD 10 DAYS
COLUMN column1 ON Table1;
Whichever threshold level is set, if the optimizer determines that the threshold has not been met, no statistics will be collected, even though they have been submitted. When a collection has been asked for but has not been executed, a StatsSkipCount column in the DBC.StatsV row that represents this statistics will be incremented.
DBQL Usecount and Threshold
The recommended percent of change thresholds rely on having DBQL USECOUNT logging turned on. USECOUNT tracks inserts, deletes and updates made to tables within a database, and as a result, can provide highly accurate information to the optimizer about how the table has changed since the last statistics collection.
The default system threshold functionality is able to be applied to a statistic collection only if USECOUNT logging has been enabled for the database that the statistics collection table belongs to. In the absence of USECOUNT data, the default threshold behaviour will be ignored. However, both DBA-defined global thresholds and statement-based thresholds are able to use percent of change thresholds even without USECOUNT logging, but with the risk of less accuracy.
In the cases where USECOUNT logging is not enabled, percent of change values are less reliable because the optimizer must rely on random AMP sample comparisons.
Percent of change is the recommended way to express thresholds.
Example :
Pick a small, non-critical database.
Enable DBQL USECOUNT logging on that database:
BEGIN QUERY LOGGING WITH USECOUNT ON TestDB;
Disable the system threshold parameter by setting DBS Control setting:
SysChangeThresholdOption = 3
Leave the global parameters disabled as they are by default:
DefaultUserChangeThreshold = 0
DefaultTimeThreshold = 0
Add USING THRESHOLD 10 PERCENT to the statistics collection statements just for the tables within your selected database:
COLLECT STATISTICS USING THRESHOLD 10% COLUMN TestID ON TestDB;
Insert a few rows into the table (less than 10% of the table size) and run an Explain of the statistics collection statement itself, and it will tell you whether or not skipping is taking place.
EXPLAIN COLLECT STATISTICS COLUMN TestID ON TestDB;