How To List Table Row Counts And Index Statistics
On getting an ETL performance testing system of some sort up and running, one of the Socratic questions that arises at some point is:
“When we run a workload, how much data is there in the database and how does it affect the workload itself?”
OK, so not as Socratic as “What is justice?”, as we can actually measure this one.
The starting point being, of course, to identify that data in the first place. The snippets below provide a starting point for getting that data out of SQL Server.
Row Counts Only
The starting point can be to just list row counts of all tables in the target database. The short example below will do just that.
-- tables only, clustered or heap, nothing else
SELECT
S.name As SchemaName,
T.name As TableName,
SUM (P.rows) AS Rows
FROM
sys.tables AS T
INNER JOIN sys.schemas AS S
ON S.schema_id = T.schema_id
INNER JOIN sys.partitions AS P
ON P.object_id = T.object_id
WHERE
T.type = 'U' -- user tables only
AND P.index_id IN (0, 1) -- HEAP and CLUSTERED only
GROUP BY
S.name,
T.name
ORDER BY
SchemaName,
TableName
;
Rows Counts Per Partition
Next, it can be interesting to look at row counts per partition. This might be helpful if you’re running some crazy updates on specific partitions as part of your ETL process and you’d like to check whether it makes too big a difference to run those over low populated partitions vs highly populated ones.
-- tables only, clustered or heap, counts by partition
SELECT
S.name As SchemaName,
T.name As TableName,
P.partition_number AS PartitionNumber,
P.rows AS Rows
FROM
sys.tables AS T
INNER JOIN sys.schemas AS S
ON S.schema_id = T.schema_id
INNER JOIN sys.partitions AS P
ON P.object_id = T.object_id
WHERE
T.type = 'U' -- user tables only
AND P.index_id IN (0, 1)-- HEAP and CLUSTERED only
ORDER BY
SchemaName,
TableName,
PartitionNumber
;
Row Counts Per Partition & Index
It can also be interesting to see if ETL performance is affected by loads on top of tables with growing indexes, both in number and size. This can go both ways, of course, but it also prepares for the next step.
-- tables, clustered or heap, plus indexes, counts by partition
SELECT
S.name As SchemaName,
T.name As TableName,
P.partition_number AS PartitionNumber,
P.index_id AS IndexId,
I.name AS IndexName,
I.type_desc AS IndexType,
P.rows AS Rows
FROM
sys.tables AS T
INNER JOIN sys.schemas AS S
ON S.schema_id = T.schema_id
INNER JOIN sys.partitions AS P
ON P.object_id = T.object_id
INNER JOIN sys.indexes AS I
ON I.object_id = P.object_id
AND I.index_id = P.index_id
WHERE
T.type = 'U' -- user tables only
ORDER BY
SchemaName,
TableName,
PartitionNumber,
IndexId
;
Row Counts Per Partition & Index, Plus Index Statistics
If we’re not yet content with the previous data, we can roll up our sleeves by digging into the index stats themselves. Ask yourself, how many times did you see that T-SQL-done-ETL run incredibly slow and then magically fix itself when the underlying tables got rebuilt? As in REBUILD, really. Fragmentation can have a big impact on any query, and ETL-ities are no exception. The example below will grab basic fragmentation stats from all indexes so you can see if that’s a possible cause of red herrings in your workload. Be careful running this query on a production system during business hours, it can quite heavy, especially if you pump up the call to sys.dm_db_index_physical_stats to increase the scan accuracy.
-- tables, clustered or heap, plus indexes, and physical stats counts by partition
SELECT
S.name As SchemaName,
T.name As TableName,
P.partition_number AS PartitionNumber,
P.index_id AS IndexId,
I.name AS IndexName,
I.type_desc AS IndexType,
P.rows AS Rows,
PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
PIS.index_depth AS IndexDepth,
PIS.index_level AS IndexLevel,
PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
PIS.fragment_count AS IndexFragmentCount,
PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
PIS.page_count AS IndexPageCount
FROM
sys.tables AS T
INNER JOIN sys.schemas AS S
ON S.schema_id = T.schema_id
INNER JOIN sys.partitions AS P
ON P.object_id = T.object_id
INNER JOIN sys.indexes AS I
ON I.object_id = P.object_id
AND I.index_id = P.index_id
CROSS APPLY sys.dm_db_index_physical_stats(
/* database_id */ DB_ID(),
/* object_id */ T.object_id,
/* index_id */ I.index_id,
/* partition_number */ P.partition_number,
/* mode */ NULL) AS PIS
WHERE
T.type = 'U' -- user tables only
ORDER BY
SchemaName,
TableName,
PartitionNumber,
IndexId
;
All For All
At some point you might also need to grab a snapshot of some or all the databases in your server for workload documentation purposes. Doing this on all databases one by one can be time-consuming. The example below lets you skip the RBAR pain and grab the same stats as the previous example, but for all the databases you need at once. Remember to add a WHERE+LIKE filter on the first block to select only the database name patterns you want. Again, be weary of running this on a production system during business hours.
-- list databases to loop
DECLARE @Databases TABLE
(
DatabaseKey INT IDENTITY (1, 1) NOT NULL PRIMARY KEY CLUSTERED,
DatabaseName SYSNAME NOT NULL
)
INSERT INTO @Databases (DatabaseName)
SELECT name FROM sys.databases;
-- this will hold the final result
IF OBJECT_ID('tempdb.dbo.#Data') IS NOT NULL
DROP TABLE #Data;
CREATE TABLE #Data
(
DatabaseName SYSNAME NOT NULL,
SchemaName SYSNAME NOT NULL,
TableName SYSNAME NOT NULL,
PartitionNumber INT NOT NULL,
IndexId INT NOT NULL,
IndexName SYSNAME NULL,
IndexType SYSNAME NOT NULL,
Rows BIGINT NOT NULL,
IndexAllocationUnitType SYSNAME NULL,
IndexDepth TINYINT NULL,
IndexLevel TINYINT NULL,
IndexAverageFragmentationPercent FLOAT NULL,
IndexFragmentCount BIGINT NULL,
IndexAverageFragmentSizeInPages FLOAT NULL,
IndexPageCount BIGINT NULL,
PRIMARY KEY CLUSTERED
(
DatabaseName,
SchemaName,
TableName,
PartitionNumber,
IndexId
)
);
DECLARE @DatabaseKey INT = 1;
DECLARE @MaxDatabaseKey INT = (SELECT MAX(DatabaseKey) FROM @Databases);
WHILE @DatabaseKey <= @MaxDatabaseKey
BEGIN
DECLARE @DatabaseName SYSNAME =
(
SELECT
DatabaseName
FROM
@Databases
WHERE
DatabaseKey = @DatabaseKey
);
DECLARE @SQL NVARCHAR(MAX) = '
INSERT INTO #Data
(
DatabaseName,
SchemaName,
TableName,
PartitionNumber,
IndexId,
IndexName,
IndexType,
Rows,
IndexAllocationUnitType,
IndexDepth,
IndexLevel,
IndexAverageFragmentationPercent,
IndexFragmentCount,
IndexAverageFragmentSizeInPages,
IndexPageCount
)
SELECT
''' + @DatabaseName + ''' AS DatabaseName,
S.name As SchemaName,
T.name As TableName,
P.partition_number AS PartitionNumber,
P.index_id AS IndexId,
I.name AS IndexName,
I.type_desc AS IndexType,
P.rows AS Rows,
PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
PIS.index_depth AS IndexDepth,
PIS.index_level AS IndexLevel,
PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
PIS.fragment_count AS IndexFragmentCount,
PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
PIS.page_count AS IndexPageCount
FROM
' + QUOTENAME(@DatabaseName) + '.sys.tables AS T
INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.schemas AS S
ON S.schema_id = T.schema_id
INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.partitions AS P
ON P.object_id = T.object_id
INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.indexes AS I
ON I.object_id = P.object_id
AND I.index_id = P.index_id
CROSS APPLY sys.dm_db_index_physical_stats(
/* database_id */ DB_ID(''' + @DatabaseName + '''),
/* object_id */ T.object_id,
/* index_id */ I.index_id,
/* partition_number */ P.partition_number,
/* mode */ NULL) AS PIS
WHERE
T.type = ''U'' -- user tables only
ORDER BY
SchemaName,
TableName,
PartitionNumber,
IndexId
;
';
EXECUTE (@SQL);
SET @DatabaseKey += 1;
END
SELECT * FROM #Data;