SQL: SAMPLE statement

I just read an article using the SAMPLE statement. I couldn’t really recall this SQL syntax, and immediately thought this had to me a new feature. After googling I realized that this was allready available in 8i. I don’t have a 8i version available, but did a small test in 9i.

drop table t;
create table t as select rownum id, rpad('x',100) data from dual connect by rownum < 10000; 
alter table t add constraint t_pk primary key (id); 
begin
     dbms_stats.gather_table_stats(USER, 't', estimate_percent=>100);
end;
/

select count(*) from t sample(10);

Result - first run:
----------------------
  COUNT(*)
----------
      1043
1 row selected.

Result - second run:
-----------------------
  COUNT(*)
----------
       974
1 row selected.

In the 9i documentation I found the following description of the statement:

The sample_clause lets you instruct Oracle to select from a random sample of rows from the table, rather than from the entire table.

Syntax 9i:

SAMPLE <BLOCK> (sample_percent)

Syntax from 10g:

SAMPLE [BLOCK] (sample_percent) [ SEED (seed_value)]

BLOCK instructs the database to attempt to perform random block sampling instead of random row sampling.

You can specify the SEED clause to instruct the database to attempt to return the same sample from one execution to the next. The seed_value must be an integer between 0 and 4294967295. If you omit this clause, then the resulting sample will change from one execution to the next. Using seed(1) will always return the same value (see below). Using sample without a seed generates a (possible) new sample every time.

Examples:

SQL> select count(*) from t sample(10) seed (1);

  COUNT(*)
----------
      1028

SQL> select count(*) from t sample(10) seed (2);

  COUNT(*)
----------
       981

SQL> select count(*) from t sample(10) seed (3);

  COUNT(*)
----------
       978

SQL> select count(*) from t sample(10) seed (4);

  COUNT(*)
----------
       980

SQL> select count(*) from t sample(10) seed (5);

  COUNT(*)
----------
      1024

SQL> select count(*) from t sample(10) seed (1);

  COUNT(*)
----------
      1028

SQL>
SQL> select count(*) from t sample(10) seed (1) where id <5000;

  COUNT(*)
----------
       496

As you see in the last example, Oracle will first apply the where filter then fetch a 10 percent sample.

Unfortunatly there are some limitations of the SAMPLE clause.

Post a Comment

Your email is never published nor shared. Required fields are marked *