Cardinality in Oracle Statistics

 


Cardinality in Oracle Statistics: A Practical DBA Perspective



In Oracle databases, cardinality is one of the most critical concepts that directly influences query performance. Simply put, cardinality refers to the number of rows returned by a particular operation in an execution plan. The Oracle optimizer relies heavily on cardinality estimates to decide how a query should be executed.


After years of working as a DBA, I can confidently say that most performance issues are not due to missing indexes or bad SQL alone—they often stem from incorrect cardinality estimates.





What Exactly is Cardinality?



Cardinality is the optimizer’s estimate of how many rows will be processed at each step of a query.


For example:

SELECT * FROM employees WHERE department_id = 10;

If the optimizer estimates:


  • 10 rows → low cardinality
  • 10,000 rows → high cardinality



This estimate determines whether Oracle will use:


  • Index scan (for low cardinality result)
  • Full table scan (for high cardinality result)






Types of Cardinality in Oracle



  • Table Cardinality
    Total number of rows in a table.


SELECT num_rows 

FROM dba_tables 

WHERE table_name = 'EMPLOYEES';


  • Column Cardinality (NDV – Number of Distinct Values)
    Number of unique values in a column.


SELECT column_name, num_distinct 

FROM dba_tab_col_statistics 

WHERE table_name = 'EMPLOYEES';


  • Result Set Cardinality
    Estimated number of rows returned after applying filters, joins, or aggregations.






Why Cardinality Matters



  • Execution Plan Selection
    The optimizer uses cardinality to choose join methods:
    • Nested Loop (small result sets)
    • Hash Join (large datasets)
    • Merge Join (sorted data)

  • Index Usage Decisions
    Incorrect cardinality can cause Oracle to ignore useful indexes or use inefficient ones.
  • Join Order Optimization
    The optimizer decides which table to access first based on estimated row counts.






How Oracle Calculates Cardinality



Oracle uses statistics collected via DBMS_STATS, including:


  • Number of rows
  • Number of distinct values (NDV)
  • Data distribution (histograms)
  • Null values



Example:

BEGIN

  DBMS_STATS.GATHER_TABLE_STATS(

    ownname => 'HR',

    tabname => 'EMPLOYEES'

  );

END;

/





Checking Cardinality in Execution Plan



You can see estimated cardinality using:

EXPLAIN PLAN FOR

SELECT * FROM employees WHERE department_id = 10;


SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Look for the “Rows” column — this is the estimated cardinality.


For actual vs estimated:

SELECT * 

FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));

This shows:


  • Estimated rows
  • Actual rows processed






Common Cardinality Issues (Real DBA Pain Points)



  • Outdated Statistics
    Old stats lead to wrong estimates.
  • Data Skew
    Uneven distribution (e.g., 90% rows with same value) misleads optimizer.
  • Missing Histograms
    Without histograms, Oracle assumes uniform distribution.
  • Complex Predicates
    Functions, expressions, or correlated columns reduce accuracy.






Fixing Cardinality Problems



  • Refresh Statistics Regularly


EXEC DBMS_STATS.GATHER_SCHEMA_STATS('HR');


  • Use Histograms for Skewed Data


EXEC DBMS_STATS.GATHER_TABLE_STATS(

  ownname => 'HR',

  tabname => 'EMPLOYEES',

  method_opt => 'FOR COLUMNS SIZE AUTO'

);


  • Use Extended Statistics (for correlated columns)


SELECT DBMS_STATS.CREATE_EXTENDED_STATS(

  ownname => 'HR',

  tabname => 'EMPLOYEES',

  extension => '(department_id, job_id)'

) FROM dual;


  • Monitor Execution Plans Regularly






Cardinality vs Selectivity



  • Cardinality → Number of rows returned
  • Selectivity → Fraction of rows filtered



Example:


  • Table has 1,000 rows
  • Query returns 10 rows
    • Cardinality = 10
    • Selectivity = 10 / 1000 = 0.01






Final Thoughts



Cardinality is the backbone of Oracle’s cost-based optimizer. If the estimates are correct, Oracle will almost always choose the best execution plan. If they are wrong, even the most well-written SQL can perform poorly.


As a DBA, focusing on accurate statistics, understanding data distribution, and validating execution plans is key to ensuring optimal performance. In modern Oracle versions, features like HyperLogLog and adaptive optimization help improve estimates, but the responsibility still lies with us to ensure the foundation—statistics—is solid.


Comments

Popular posts from this blog

Oracle Block Corruption - Detection and Resolution

How to clone Pluggable Database from one container to different Container Database

Restart Innodb MySQL Cluster after Complete outage(All node Down)