Showing posts with label select avg. Show all posts
Showing posts with label select avg. Show all posts

Tuesday, April 10, 2012

Calculating Averages (Part 3)

I went to a presentation once which was attended by a few hundred people. The speaker suggested that, if you had two averages, you could calculate an overall average by taking the mid-point between the first two results. Unfortunately this almost always gives the wrong result. If you want to work out an overall average, you need to return to the source data and recalculate. You can see what I mean in the example below. A table is created for a small group of people containing the name, gender and height for each person. The average height of the men is calculated. The average height of the women is calculated. The average of the two averages is then calculated but this is a meaningless figure as it does not take into account that there are more men than women in the group. The correct overall average is then calculated by returning to the source data:

SQL> create table employee_heights
  2  (name varchar2(10),
  3  gender varchar2(1),
  4  height_in_cm number)
  5  /

Table created.

SQL> insert into employee_heights values ('John','M',170)
  2  /

1 row created.

SQL> insert into employee_heights values ('Susan','F',169)
  2  /

1 row created.

SQL> insert into employee_heights values ('Andrew','M',190)
  2  /

1 row created.

SQL> insert into employee_heights values ('Roger','M',180)
  2  /

1 row created.

SQL> insert into employee_heights values ('Peter','M',175)
  2  /

1 row created.

SQL> insert into employee_heights values ('Jane','F',160)
  2  /

1 row created.

SQL> insert into employee_heights values ('Richard','M',177)
  2  /

1 row created.

SQL> insert into employee_heights values ('Edward','M',185)
  2  /

1 row created.

SQL> insert into employee_heights values ('Neil','M',186)
  2  /

1 row created.

SQL> insert into employee_heights values ('Dawn','F',165)
  2  /

1 row created.

SQL> select avg(height_in_cm) Average_Male_Height
  2  from employee_heights
  3  where gender = 'M'
  4  /

AVERAGE_MALE_HEIGHT
-------------------
         180.428571

SQL> select avg(height_in_cm) Average_Female_Height
  2  from employee_heights
  3  where gender = 'F'
  4  /

AVERAGE_FEMALE_HEIGHT
---------------------
           164.666667

SQL> select
  2  ((select avg(height_in_cm)
  3  from employee_heights
  4  where gender = 'M')
  5  +
  6  (select avg(height_in_cm)
  7  from employee_heights
  8  where gender = 'F'))
  9  /2 Incorrect_Average_Height
10  from dual
11  /

INCORRECT_AVERAGE_HEIGHT
------------------------
              172.547619

SQL> select avg(height_in_cm) Correct_Average_Height
  2  from employee_heights
  3  /

CORRECT_AVERAGE_HEIGHT
----------------------
                 175.7

SQL>

Tuesday, April 05, 2011

Calculating Averages (Part 2)

Go to part 1

(Tested on Oracle 9 but probably not on the database used for part 1.)

In Calculating Averages (Part 1) I ran the following SQL:

select avg(num_rows) from dba_tables
where num_rows is not null;

Since then I have read that the where clause was unnecessary as Oracle does not include null values when it works out averages. I decided to see if this was true. First I checked that there were both null and not null values in the num_rows column in dba_tables:

SQL> select count(*) from dba_tables
  2  where num_rows is null;

  COUNT(*)
----------
       459

SQL> select count(*) from dba_tables
  2  where num_rows is not null;

  COUNT(*)
----------
       366

SQL>

Incidentally, you can combine the two statements above into one by using decode:

SQL> select
  2  decode(num_rows,null,'NULL','NOT NULL') row_count,
  3  count(*)
  4  from dba_tables
  5  group by decode(num_rows,null,'NULL','NOT NULL');

ROW_COUNT   COUNT(*)
--------- ----------
NOT NULL         366
NULL             459

SQL>

Then I calculated the average as before:

SQL> select avg(num_rows) from dba_tables
  2  where num_rows is not null;

AVG(NUM_ROWS)
-------------
   783867.724

SQL>

Finally, I omitted the where clause and the result was the same. This proves that avg ignores null values:

SQL> select avg(num_rows) from dba_tables;

AVG(NUM_ROWS)
-------------
   783867.724

SQL>

Wednesday, January 26, 2011

Calculating Averages (Part 1)

You can calculate average values with the AVG function as shown below. The second statement double checks the answer:

SQL> select avg(num_rows)
  2  from dba_tables
  3  where num_rows is not null
  4  /

AVG(NUM_ROWS)
-------------
   153051.912

SQL> select sum(num_rows) / count(*)
  2  from dba_tables
  3  where num_rows is not null
  4  /

SUM(NUM_ROWS)/COUNT(*)
----------------------
            153051.912

SQL>

Footnote: Since writing this post I have noticed that it is slightly misleading.
Go to part 2 for details.