How To List Table Row Counts And Index Statistics

On getting an ETL performance testing system of some sort up and running, one of the Socratic questions that arises at some point is:

“When we run a workload, how much data is there in the database and how does it affect the workload itself?”

OK, so not as Socratic as “What is justice?”, as we can actually measure this one.

The starting point being, of course, to identify that data in the first place. The snippets below provide a starting point for getting that data out of SQL Server.

Row Counts Only

The starting point can be to just list row counts of all tables in the target database. The short example below will do just that.

-- tables only, clustered or heap, nothing else
SELECT
	S.name As SchemaName,
	T.name As TableName,
	SUM (P.rows) AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1) -- HEAP and CLUSTERED only
GROUP BY
	S.name,
	T.name
ORDER BY
	SchemaName,
	TableName
;

Rows Counts Per Partition

Next, it can be interesting to look at row counts per partition. This might be helpful if you’re running some crazy updates on specific partitions as part of your ETL process and you’d like to check whether it makes too big a difference to run those over low populated partitions vs highly populated ones.

-- tables only, clustered or heap, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
WHERE
	T.type = 'U' -- user tables only
	AND P.index_id IN (0, 1)-- HEAP and CLUSTERED only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber
;

Row Counts Per Partition & Index

It can also be interesting to see if ETL performance is affected by loads on top of tables with growing indexes, both in number and size. This can go both ways, of course, but it also prepares for the next step.

-- tables, clustered or heap, plus indexes, counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

Row Counts Per Partition & Index, Plus Index Statistics

If we’re not yet content with the previous data, we can roll up our sleeves by digging into the index stats themselves. Ask yourself, how many times did you see that T-SQL-done-ETL run incredibly slow and then magically fix itself when the underlying tables got rebuilt? As in REBUILD, really. Fragmentation can have a big impact on any query, and ETL-ities are no exception. The example below will grab basic fragmentation stats from all indexes so you can see if that’s a possible cause of red herrings in your workload. Be careful running this query on a production system during business hours, it can quite heavy, especially if you pump up the call to sys.dm_db_index_physical_stats to increase the scan accuracy.

-- tables, clustered or heap, plus indexes, and physical stats counts by partition
SELECT
	S.name As SchemaName,
	T.name As TableName,
	P.partition_number AS PartitionNumber,
	P.index_id AS IndexId,
	I.name AS IndexName,
	I.type_desc AS IndexType,
	P.rows AS Rows,
	PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
	PIS.index_depth AS IndexDepth,
	PIS.index_level AS IndexLevel,
	PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
	PIS.fragment_count AS IndexFragmentCount,
	PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
	PIS.page_count AS IndexPageCount
FROM
	sys.tables AS T
	INNER JOIN sys.schemas AS S
		ON S.schema_id = T.schema_id
	INNER JOIN sys.partitions AS P
		ON P.object_id = T.object_id
	INNER JOIN sys.indexes AS I
		ON I.object_id = P.object_id
		AND I.index_id = P.index_id
	CROSS APPLY sys.dm_db_index_physical_stats(
		/* database_id */ DB_ID(),
		/* object_id */ T.object_id,
		/* index_id */ I.index_id,
		/* partition_number */ P.partition_number,
		/* mode */ NULL) AS PIS
WHERE
	T.type = 'U' -- user tables only
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber,
	IndexId
;

All For All

At some point you might also need to grab a snapshot of some or all the databases in your server for workload documentation purposes. Doing this on all databases one by one can be time-consuming. The example below lets you skip the RBAR pain and grab the same stats as the previous example, but for all the databases you need at once. Remember to add a WHERE+LIKE filter on the first block to select only the database name patterns you want. Again, be weary of running this on a production system during business hours.

-- list databases to loop
DECLARE @Databases TABLE
(
	DatabaseKey INT IDENTITY (1, 1) NOT NULL PRIMARY KEY CLUSTERED,
	DatabaseName SYSNAME NOT NULL
)
INSERT INTO @Databases (DatabaseName)
SELECT name FROM sys.databases;

-- this will hold the final result
IF OBJECT_ID('tempdb.dbo.#Data') IS NOT NULL
DROP TABLE #Data;

CREATE TABLE #Data
(
	DatabaseName SYSNAME NOT NULL,
	SchemaName SYSNAME NOT NULL,
	TableName SYSNAME NOT NULL,
	PartitionNumber INT NOT NULL,
	IndexId INT NOT NULL,
	IndexName SYSNAME NULL,
	IndexType SYSNAME NOT NULL,
	Rows BIGINT NOT NULL,
	IndexAllocationUnitType SYSNAME NULL,
	IndexDepth TINYINT NULL,
	IndexLevel TINYINT NULL,
	IndexAverageFragmentationPercent FLOAT NULL,
	IndexFragmentCount BIGINT NULL,
	IndexAverageFragmentSizeInPages FLOAT NULL,
	IndexPageCount BIGINT NULL,

	PRIMARY KEY CLUSTERED
	(
		DatabaseName,
		SchemaName,
		TableName,
		PartitionNumber,
		IndexId
	)
);

DECLARE @DatabaseKey INT = 1;
DECLARE @MaxDatabaseKey INT = (SELECT MAX(DatabaseKey) FROM @Databases);
WHILE @DatabaseKey <= @MaxDatabaseKey
BEGIN

	DECLARE @DatabaseName SYSNAME =
	(
		SELECT
			DatabaseName
		FROM
			@Databases
		WHERE
			DatabaseKey = @DatabaseKey
	);

	DECLARE @SQL NVARCHAR(MAX) = '
		INSERT INTO #Data
		(
			DatabaseName,
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId,
			IndexName,
			IndexType,
			Rows,
			IndexAllocationUnitType,
			IndexDepth,
			IndexLevel,
			IndexAverageFragmentationPercent,
			IndexFragmentCount,
			IndexAverageFragmentSizeInPages,
			IndexPageCount
		)
		SELECT
			''' + @DatabaseName + ''' AS DatabaseName,
			S.name As SchemaName,
			T.name As TableName,
			P.partition_number AS PartitionNumber,
			P.index_id AS IndexId,
			I.name AS IndexName,
			I.type_desc AS IndexType,
			P.rows AS Rows,
			PIS.alloc_unit_type_desc AS IndexAllocationUnitType,
			PIS.index_depth AS IndexDepth,
			PIS.index_level AS IndexLevel,
			PIS.avg_fragmentation_in_percent AS IndexAverageFragmentationPercent,
			PIS.fragment_count AS IndexFragmentCount,
			PIS.avg_fragment_size_in_pages AS IndexAverageFragmentSizeInPages,
			PIS.page_count AS IndexPageCount
		FROM
			' + QUOTENAME(@DatabaseName) + '.sys.tables AS T
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.schemas AS S
				ON S.schema_id = T.schema_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.partitions AS P
				ON P.object_id = T.object_id
			INNER JOIN ' + QUOTENAME(@DatabaseName) + '.sys.indexes AS I
				ON I.object_id = P.object_id
				AND I.index_id = P.index_id
			CROSS APPLY sys.dm_db_index_physical_stats(
				/* database_id */ DB_ID(''' + @DatabaseName + '''),
				/* object_id */ T.object_id,
				/* index_id */ I.index_id,
				/* partition_number */ P.partition_number,
				/* mode */ NULL) AS PIS
		WHERE
			T.type = ''U'' -- user tables only
		ORDER BY
			SchemaName,
			TableName,
			PartitionNumber,
			IndexId
		;
		';

	EXECUTE (@SQL);

	SET @DatabaseKey += 1;

END

SELECT * FROM #Data;

On Distributed Computing