Tuesday, April 10, 2012

Calculating Averages (Part 3)

I went to a presentation once which was attended by a few hundred people. The speaker suggested that, if you had two averages, you could calculate an overall average by taking the mid-point between the first two results. Unfortunately this almost always gives the wrong result. If you want to work out an overall average, you need to return to the source data and recalculate. You can see what I mean in the example below. A table is created for a small group of people containing the name, gender and height for each person. The average height of the men is calculated. The average height of the women is calculated. The average of the two averages is then calculated but this is a meaningless figure as it does not take into account that there are more men than women in the group. The correct overall average is then calculated by returning to the source data:

SQL> create table employee_heights
  2  (name varchar2(10),
  3  gender varchar2(1),
  4  height_in_cm number)
  5  /

Table created.

SQL> insert into employee_heights values ('John','M',170)
  2  /

1 row created.

SQL> insert into employee_heights values ('Susan','F',169)
  2  /

1 row created.

SQL> insert into employee_heights values ('Andrew','M',190)
  2  /

1 row created.

SQL> insert into employee_heights values ('Roger','M',180)
  2  /

1 row created.

SQL> insert into employee_heights values ('Peter','M',175)
  2  /

1 row created.

SQL> insert into employee_heights values ('Jane','F',160)
  2  /

1 row created.

SQL> insert into employee_heights values ('Richard','M',177)
  2  /

1 row created.

SQL> insert into employee_heights values ('Edward','M',185)
  2  /

1 row created.

SQL> insert into employee_heights values ('Neil','M',186)
  2  /

1 row created.

SQL> insert into employee_heights values ('Dawn','F',165)
  2  /

1 row created.

SQL> select avg(height_in_cm) Average_Male_Height
  2  from employee_heights
  3  where gender = 'M'
  4  /

AVERAGE_MALE_HEIGHT
-------------------
         180.428571

SQL> select avg(height_in_cm) Average_Female_Height
  2  from employee_heights
  3  where gender = 'F'
  4  /

AVERAGE_FEMALE_HEIGHT
---------------------
           164.666667

SQL> select
  2  ((select avg(height_in_cm)
  3  from employee_heights
  4  where gender = 'M')
  5  +
  6  (select avg(height_in_cm)
  7  from employee_heights
  8  where gender = 'F'))
  9  /2 Incorrect_Average_Height
10  from dual
11  /

INCORRECT_AVERAGE_HEIGHT
------------------------
              172.547619

SQL> select avg(height_in_cm) Correct_Average_Height
  2  from employee_heights
  3  /

CORRECT_AVERAGE_HEIGHT
----------------------
                 175.7

SQL>

No comments:

Post a Comment