Extra information about each student

If a principal is looking at a sets of marks from different classes in the school, there may be little additional information known about the students in the classes. However the teacher of a class will know a lot more about each student than a single mark.

How can additional information about the students be linked to a dot plot or stem and leaf plot? There are more advanced statistical methods to link specific types of additional information to a set of class marks. (See the full version of CAST if you are interested.) In this page, we consider two simple methods.

Student names

The simplest type of extra information is a textual label — the name of the student.

The stacked dot plot below shows the marks of 25 students.

Drag the mouse over the crosses to see the names of the students.

Although this type of dynamic display cannot be easily duplicated in Excel, it is easy to annotate a display by writing (by hand if necessary) the names of the more extreme students on a dot plot or stem and leaf plot.

Prior grouping of students

Occasionally the values in a dot plot or stem and leaf plot separate into clusters, but this is rare. However we sometimes know beforehand that the individuals belong to two or more groups. For example, a principal may want to compare results from two or more different classes (or years), or a teacher may want to compare performance of the boys and girls in the class.

In a dot plot, different colours or symbols can be used to distinguish the groups.

Three classes

The diagram below shows marks for 90 students in three classes in a school.

Click the check box under the plot to distinguish the three classes with different colours.

The groups are better distinguished if the values are separated vertically. In effect, this shows a separate dot plot for each group above the same axis.

The display below again shows the marks for students in three classes.

Click the button Animate Grouping to separate the classes.

There are clear differences between the distributions of results in the three classes, despite much overlap between these distributions. Mr Green's class tends to have higher marks than the other classes, and Mrs Paul's class is most variable.

Reasons for differences

When there is a difference between the distributions of marks in different groups, extreme care must be taken when interpreting why the differences exist. Simple statistical methods such as dot plots give no indication of which of the many possible reasons for the groups being different is the most important.

It would be wrong to immediately conclude that Mr Green is the best teacher of the three. The differences could arise because Mr Green has been allocated the best students. This might be because he is thought to be the best teacher or because he is known to be the worst teacher so the more 'difficult' students have been given to other teachers!

Significance of differences

Sometimes there is a clear difference between different groups, as in the three-class example above. However there is often enough overlap between the distributions for us to be uncertain about whether the groups really differ in any meaningful way. On the next page, Dangers of overinterpretation, we briefly consider the issue of whether features in data are worthy of reporting.

However a proper comparison of groups involves more advanced statistical concepts -- confidence intervals and hypothesis tests. (They are beyond the scope of this module.)