Eight Features: Wuensch Response

Subject: Re: Eight Features of an Ideal Intro Stat Course
         (Response to comments by Karl L. Wuensch)

     To: EdStat-L and sci.stat.edu

   From: Donald B. Macnaughton <donmac@matstat.com>

   Date: Sunday May 9, 1999

     Cc: Karl L. Wuensch <PSWUENSC@ECUVM.CIS.ECU.EDU>

Referring to a November 25 post of mine, Karl Wuensch writes (on 
November 26)

> Donald Macnaughton ... suggests:
>
>> ... I recommend that statistics teachers omit discussing uni-
>> variate distributions near the beginning of the introductory
>> course.  I recommend that teachers instead concentrate on dis-
>> cussing relationships between variables.
>
>    Don goes on to explain why (univariate is boring, bivariate 
> not) and challenges us to describe univariate exercises that
> are not boring.  I do cover univariate distributions before I
> get on to bivariate and multivariate distributions, and do warn
> my students that these can be boring compared to what is to
> follow.  

To illustrate why univariate distributions are boring compared to 
relationships between variables, suppose that as statisticians we 
are interested in some real-world variable, which I shall call Y.  
We can empirically study Y in two different ways:

1. We can study the *relationship* between Y (as the response 
   variable) and other appropriate variables (as predictor vari-
   ables).

2. We can study the univariate distribution of Y in isolation.

The second way of studying Y is simply a degenerate case of the 
first.  That is, studying the univariate distribution of ANY 
variable Y is rigorously equivalent (mathematically and empiri-
cally) to studying the relationship between Y and a set of pre-
dictor variables in the limiting case when the number of predic-
tor variables is reduced to zero.

Since the first way of studying Y subsumes the second, and since 
the first way generally gives us much better ability to predict 
and control the values of Y, the first (relationship-between-
variables) way of studying Y is more interesting than the second.  
That is, studying univariate distributions is boring compared to 
studying relationships between variables.

(I list the topics I include under the general topic of univari-
ate distributions in appendix A.  Note that my recommendation to 
omit discussing univariate distributions near the beginning of 
the introductory course applies only for students who are NOT ma-
joring in statistics -- students majoring in statistics do need 
to understand univariate distributions early in their careers.)


> I try to choose variables which my students will find interest-
> ing because they want to compare their own score on that vari-
> able with those of other persons.  For example, I collect from
> my students the first week of class a crude measure of how
> frightened of statistics they are.  They always seem to be in-
> terested in where their score falls in that distribution.

Karl gives an example of an empirical univariate distribution 
that is clearly interesting to students.  It is interesting be-
cause most students wish to know where they (as individuals) fall 
in various distributions.  That is, they wish to know how "nor-
mal" or how "average" they are.  

But although the study of the univariate distribution of "fear of 
statistics" is clearly interesting, there are two reasons why I 
believe this study fails to provide an effective example of the 
practical use of statistics:  

First, most statisticians view the field of statistics as a set 
of techniques to help us generalize from findings (of patterns) 
in a sample to correct statements about a population.  That is, 
in almost every use of statistics in empirical research the re-
searcher is not merely interested in the entities in the sample.  
Instead, he or she hopes to be able to make useful generaliza-
tions from the information gathered in the sample to the other 
entities in the population.

However, Karl is focusing on where each individual student falls 
in the univariate distribution of the fear scores.  Thus the ex-
ample is NOT generalizing from findings in a sample to statements 
about a population.  Instead, the example is "particularizing" 
from findings in the sample to statements about *individuals in 
the sample*, which is backwards.  Thus the example is not a typi-
cal use of statistics in empirical research.

I believe that Karl's students are interested in the univariate 
distribution in his example because it teaches them about *them-
selves*.  But (with apologies to Karl, who clearly has good in-
tentions) the example (because it is backwards) does not intro-
duce students to any important ideas about the (standard) practi-
cal use of statistics.

                            *   *   *

A second reason why study of the univariate distribution of "fear 
of statistics" is not an effective example of the practical use 
of statistics is that studying the univariate distribution of 
some variable in isolation usually has no obvious significant so-
cial payoff.  

By "social payoff", I mean that an effective example will provide 
some clear *basis for action* on the part of some person or group 
(Scheaffer 1992, p. 69).  I recommend that each example in an in-
troductory statistics course have an obvious social payoff be-
cause if we consistently demonstrate obvious payoffs in our exam-
ples, we are much more likely to impress students with the value 
of statistics.  

On the other hand, if our examples consistently LACK obvious pay-
offs, intelligent students will conclude that our field special-
izes in dealing with frivolous problems.

For the study of the univariate distribution of "fear of statis-
tics" 

    What is the payoff (basis for action) for some (any) per-
    son or group of the knowledge we obtain in this example?

I suggest that it is hard to see a clear payoff for any person or 
group (beyond the students in the sample) from studying the uni-
variate distribution of "fear of statistics".  Admittedly, study 
of the distribution does increase our general knowledge.  How-
ever, I prefer not to call this vague (although sometimes useful) 
benefit a direct "payoff" because it provides no obvious basis 
for action.  

My experience suggests that examples that focus on univariate 
distributions rarely demonstrate an obvious payoff.  Can you 
think of an example of a study of a univariate distribution that 
clearly shows an obvious direct payoff?  

(I discuss a situation in which univariate distributions do pro-
vide a payoff in appendix B.  I discuss elsewhere some putative 
examples of interesting univariate distributions [1998a, app. G; 
1998b].)

On the other hand, many examples of relationships between vari-
ables clearly demonstrate significant obvious direct payoffs.  
Such examples can be readily found in all fields of empirical re-
search across science, technology, business, industry, and gov-
ernment.  For example, all proper tests of new medical treatments 
can be easily viewed as studies of relationships between vari-
ables.  All such studies, when they are successful, have clear 
social payoffs in that they provide a basis for action to improve 
human health.


>      But I also introduce, at the time we are studying univari-
> ate distributions, the notion of looking at the association be-
> tween variables.  

By the "association" between variables Karl is referring to what 
I call a "relationship" between variables.  


> For example, we segregate the "fear of stats" scores of women
> from those of men and then compare those two univariate distri-
> butions.  Of course, we are really considering the relationship
> between a dichotomous variable (sex/gender) and a continuous
> one (admitted fear of stats), 

Karl turns his fear-of-statistics example into an example of a 
more typical empirical research project by introducing the notion 
of a relationship between variables.  We can show students how if 
we find a substantial relationship (in some meaningful popula-
tion) between "gender" and "fear of statistics", we have a clear 
payoff or basis for action.  The payoff occurs in the sense that 
we (as society) can take steps either to remove the cause of the 
relationship (the cause may be sexism) or to treat the two groups 
differently in order to reduce the fear in the more fearful 
group.


> but [we] have not formally talked about point biserial correla-
> tions or independent samples t-tests and the like yet.  

Karl highlights an important fact:  It is not necessary to bring 
formal statistical procedures into the discussion to discuss re-
lationships between variables.  I recommend that teachers capi-
talize on this fact and give students a strong sense of the con-
cept of a relationship between variables before introducing ANY 
formal statistical procedures.  If we show students a broad set 
of practical examples of relationships, and if these examples are 
not encumbered by the complicated procedures of statistics, the 
students come to recognize that empirical study of relationships 
between variables is the best objective method for accurate pre-
diction and control.  

AFTER students properly understand and appreciate the usefulness 
of relationships between variables as a means to prediction and 
control, we can bring the field of statistics out onto the stage.  
We can characterize the field as a set of optimal techniques for 
studying variables and relationships between variables as a means 
to accurate prediction and control.  When presented from this 
unifying point of view, the complicated procedures of statistics 
fall more easily into place.

I further describe the approach I discuss above in two papers 
(1996, 1999).


> When we go to the lab for our first computing (Minitab) exer-
> cise, the data they have (from the Minitab handbook) is on
> pulse rates before and after exercise.  They compute change
> scores, and do some univariate descriptive statistics on the
> distribution of change scores.  Again here, we are dealing with
> a univariate distribution, but in a way that does address the
> relationship between two variables (exercise and pulse rate).

We can validly study the relationship between the variables 
"amount of exercise" and "pulse rate" in the example in terms of 
the univariate distribution of the change scores.  However, I 
suggest that this approach has four disadvantages:

1. The change-score point of view cannot easily be extended to 
   more complicated situations.

2. The change-score point of view may lead us to lose sight of 
   the original variables in the research project. 

3. The change-score point of view may lead us to lose sight of 
   the fact that the example is the simplest case of the impor-
   tant statistical procedure of repeated measurements (repeated 
   measures).

4. The change-score point of view requires that students expend 
   an extra intellectual effort. 

Appendix C discusses the four disadvantages in more detail and 
notes that the relationship-between-variables point of view of 
the pulse-rate example does not have any of the above disadvan-
tages.  

In addition to having four disadvantages, the change-score point 
of view does not appear to have any significant advantages over 
the relationship-between-variables point of view.  


>      Might it be possible to have our cake and eat it too, that
> is, to follow the logical order of univariate - bivariate -
> multivariate, AND to start our focus on relationships between/
> among variables at the beginning of the course?

Karl proposes a compromise approach in which relationships are 
discussed together with univariate distributions near the begin-
ning of the introductory course.  This approach is clearly possi-
ble.  Furthermore, IF the compromise approach is the best way to 
help students understand statistics, we should certainly follow 
it.  

But is the compromise approach the best way?  We can address this 
question by considering another simpler question:  

    What advantages do we receive if we discuss univariate 
    distributions *at all* at the beginning of the introduc-
    tory course.  

The discussion above of the fear-of-statistics and pulse-rate ex-
amples suggests that, for each example

- viewing the example in terms of a univariate distribution ap-
  pears to have NO significant advantages over viewing the exam-
  ple in terms of a relationship between variables

- viewing the example in terms of a univariate distribution has 
  significant DISadvantages over viewing the example in terms of 
  a relationship between variables.

Unless someone can propose significant advantages of discussing 
univariate distributions near the beginning of the introductory 
course, I suggest that we should not discuss them there.  We 
should not discuss them because they are boring (because they 
have no obvious practical uses).

                            *   *   *

If (as I suggest) discussion of univariate distributions at the 
beginning of the introductory statistics course provides no sig-
nificant advantages, why them do some teachers discuss them?  I 
see two main reasons:

One reason for discussing univariate distributions at the begin-
ning is to support an extinct need that has become a tradition.  
I reason as follows:

In the past, before the arrival of good statistical computing 
packages, a person performing a statistical analysis had to un-
derstand the mathematics of statistics in order to carry out the 
(necessarily manual) computations.  (It is almost impossible to 
perform statistical computations manually if one does not prop-
erly understand them.)  The mathematics of statistics is largely 
based on the mathematics of univariate distributions.  Thus in 
the past careful study of univariate distributions was clearly 
necessary.

Nowadays, easy-to-use computer programs are available that can 
perform all the standard statistical analyses.  Thus users of 
these analyses need no longer manually perform statistical compu-
tations.  It follows that users (including students) need no 
longer understand the mathematics of univariate distributions to 
support their (now-computerized) computations.

It is hard to see important uses of univariate distributions be-
yond the background support they give in the statistical computa-
tions underlying the study relationships between variables.  In 
particular, only rarely does an experienced empirical researcher 
study the univariate distribution of some interesting empirical 
variable in isolation.  Instead, researchers invariably study 
their variables of interest together with some predictor vari-
ables -- that is, they study relationships between variables.


A second reason why some teachers begin with univariate distribu-
tions is that univariate distributions are (because they are a 
degenerate case) clearly simpler than relationships between vari-
ables.  The greater simplicity leads some teachers to believe 
that they should "start simple" and first cover univariate dis-
tributions before they cover relationships.  

(Some teachers may further believe that students MUST first study 
univariate distributions because [they believe] students cannot 
understand the concept of a relationship between variables until 
they have mastered the concept of a univariate distribution.  
However, this belief is incorrect because students learn to un-
derstand various relationships between variables in high-school 
science and mathematics classes, with no appeal to the concept of 
a univariate distribution, as I discuss in an earlier post 
[1998c].)

Teachers who choose to start simple with univariate distributions 
are logically correct -- univariate distributions are simpler.  
However, this approach has a serious psychological problem -- 
study of univariate distributions is boring because such study 
usually has little or no obvious payoff.  Thus many beginning 
students are alienated by univariate distributions.  This leads 
me to ask: 

    If univariate distributions are boring, provide little 
    obvious payoff, and are not necessary, what is to stop us 
    from completely bypassing discussion of univariate dis-
    tributions at or near the beginning of the introductory 
    statistics course?

If we bypass univariate distributions, and if we focus on rela-
tionships between variables as a means to accurate prediction and 
control, and if we emphasize the significant social payoffs that 
come from knowledge of such relationships, we can (because the 
material is more interesting) expect substantially greater suc-
cess at giving students a lasting appreciation of the vital role 
of our field.

-------------------------------------------------------
Donald B. Macnaughton   MatStat Research Consulting Inc
donmac@matstat.com      Toronto, Canada
-------------------------------------------------------


APPENDIX A: SUBTOPICS OF UNIVARIATE DISTRIBUTIONS

I include the following topics under the general topic of uni-
variate distributions:

- measures of the central tendency of univariate distributions 
  (e.g., mean, median)

- measures of the spread or variability of univariate distribu-
  tions (e.g., standard deviation, mean absolute deviation)

- other measures used to characterize univariate distributions 
  (e.g., other moments)

- graphical representations of univariate distributions (e.g. 
  [when only reflecting a single variable], dot plots, box plots, 
  bar charts, histograms, stem and leaf plots, density plots)

- mathematical representations of univariate distributions (e.g., 
  density functions, moment generating functions).

I believe that each of the above topics is important and belongs 
at a certain point in all students' (extended) statistical ca-
reers.  However, I recommend against teaching ANY of these topics 
at or near the beginning of the introductory course (for students 
not majoring in statistics).  Instead, as I note above, I recom-
mend that teachers first carefully cover the more important con-
cept of a relationship between variables.  After students have a 
good sense of the concept of a relationship between variables, 
discussion of univariate distributions can be introduced at ap-
propriate points (mainly in support of the study of relationships 
between variables).

I discuss the placement of discussion of univariate distributions 
in statistics courses and I propose a syllabus for the introduc-
tory course in a paper (1999, sec. 6.4 and 6.9).

(Although I recommend against covering the graphical representa-
tion of univariate distributions, I strongly recommend that 
teachers cover the graphical representation of relationships be-
tween variables early in the introductory statistics course.)


APPENDIX B: A SITUATION IN WHICH UNIVARIATE DISTRIBUTIONS DO
            PROVIDE A PAYOFF

Earlier in this post I say that studying the univariate distribu-
tion of some variable in isolation *usually* has no obvious sig-
nificant social payoff.  However, univariate distributions obvi-
ously DO provide a social payoff (i.e., a basis for action) in 
one important situation -- the situation in which an action deci-
sion is made on the basis of the univariate distribution of some 
variable.

For example, suppose we carefully survey beginning students' fear 
of statistics across a reasonable sample of institutions and stu-
dents.  And suppose our study reveals that a large proportion of 
students have high fear.  This finding might motivate various in-
terested groups to increase resources directed at reducing stu-
dents' fear.  That is, the finding provides a clear basis for ac-
tion.

But I suggest that this use of a univariate distribution as a ba-
sis for action is much less important than the use of relation-
ships between variables as a basis for action.  This is because 
in almost every situation in which we MIGHT make an action deci-
sion on the basis of a univariate distribution of some variable 
(say the variable Y), we can make a BETTER action decision on the 
basis of a closely related relationship between variables.  The 
response variable in this relationship is Y, and the predictor 
variables can be ANY other variables that we have reason to be-
lieve are related to Y.  If we empirically discover that one or 
more of the predictor variables are related to Y, we will then be 
able to use our new knowledge of the relationship to predict the 
values of Y more accurately than we could possibly predict by an 
equivalent study of the univariate distribution of Y in isola-
tion.  Assuming we have chosen the predictor variable(s) wisely, 
this improved prediction ability will give us a better basis for 
action.

For example, in studying students' fear of statistics we COULD 
study the univariate distribution of "fear of statistics" in stu-
dents as a possible basis for action to reduce students' fear.  
But we can obtain a better understanding of students' fear of 
statistics by studying the *relationship* between "fear of sta-
tistics" as the response variable and other variables such as 
"gender", "socioeconomic status", "prior training", and so on, as 
predictor variables.  This better understanding will enable us to 
make better action decisions for decreasing students' fear and 
for improving the teaching of statistics.

Thus knowledgeable empirical researchers generally concentrate 
NOT on univariate distributions, but on relationships between 
variables.  Thus the study of relationships between variables is 
much more important than the study of univariate distributions.


APPENDIX C: DISADVANTAGES OF THE CHANGE-SCORE POINT OF VIEW

In the earlier discussion of Karl's pulse-rate example I identify 
four disadvantages that the "change-score" point of view of the 
example has relative to the "relationship-between-variables" 
point of view.  Here are more detailed explanations of the four 
disadvantages:

1. If we adopt a change-score point of view, we run into diffi-
   culties if we try to extend the point of view to more compli-
   cated situations.  We can see this by noting that ALL situa-
   tions that can be viewed as studying the univariate distribu-
   tion of a set of change scores can also be easily viewed as 
   studying a relationship between two variables.  (Karl's pulse-
   rate example is, as Karl notes, an instance of this fact.)  On 
   the other hand, many situations that can easily be viewed as 
   studying a relationship between variables CANNOT easily be 
   viewed as studying change scores.  (For example, while it is 
   easy to view a multi-way analysis of variance or a multi-way 
   regression in terms of the study of a relationship between 
   variables, it is hard to view either of these statistical pro-
   cedures in terms of change scores.)  Thus although the use of 
   change scores in the pulse-rate example is *valid*, it seems 
   wise to avoid this use.  Instead, it seems more reasonable to 
   view the example in terms of the broader concept of the rela-
   tionship between "amount of exercise" (as the predictor vari-
   able) and "pulse rate" (as the response variable).  

2. A second disadvantage of the change-score approach is that if 
   we focus on the derived variable "change score", we may lose 
   sight of the original three (main) variables in the example, 
   which are "amount of exercise" (called "RAN" in the Minitab 
   data), "pulse rate at time 1", and "pulse rate at time 2".  In 
   particular, in REAL empirical research (as opposed to in dis-
   cussion at the beginning of the introductory statistics 
   course) it is useful to examine the univariate distributions 
   of the pulse rates at the two exercise levels because we may 
   find important features in these distributions.  But if we fo-
   cus on the derived variable "change score", these univariate 
   distributions are hidden.

3. A third disadvantage of the change-score approach is that if 
   we focus on the derived variable "change score", we may lose 
   sight of the fact that the example represents the simplest 
   case of a set of powerful techniques for studying relation-
   ships between variables called "repeated measurements" (or 
   "repeated measures").  Under the procedure of repeated meas-
   urements we measure the values of the response variable and 
   predictor variable(s) more than once in each of the entities 
   in the research project -- typically under different condi-
   tions each time we measure.  The procedure of repeated meas-
   urements is frequently used to increase statistical power in 
   experiments in psychology and medicine when the cost of ac-
   quiring experimental entities (usually people) is high and 
   when, in addition, it is possible for the entities to partici-
   pate in more than one treatment condition without compromising 
   the integrity of the experiment (Winer 1971, ch. 4 and 7; SAS 
   Institute 1990, pp. 951 - 958).  It makes sense to introduce 
   the pulse-rate example in terms that we can easily extend when 
   it comes time to discuss other examples of repeated measure-
   ments.  I suggest that the concepts of repeated measurements 
   are best explained in terms of relationships between vari-
   ables, and it is difficult to explain any but the simplest use 
   of repeated measurements in terms of change scores.

4. Finally, (in a point that is related to the first point above) 
   if we teach the pulse-rate example in terms of the change-
   score point of view, we force students to learn a separate 
   point of view.  Learning the change-score point of view re-
   quires an extra intellectual effort from beginning students in 
   a situation that many teachers agree is already difficult for 
   them.  We can save students this extra effort by presenting 
   the example in terms of the more general relationship-between-
   variables point of view.  


REFERENCES

Macnaughton, D. B. 1996. "The entity-property-relationship ap-
   proach to statistics:  An introduction for students." Avail-
   able at http://www.matstat.com/teach/

Macnaughton, D. B. 1998a. "Eight features of an ideal introduc-
   tory statistics course."  Available at 
   http://www.matstat.com/teach/

Macnaughton, D. B. 1998b. "Re: Eight features of an ideal intro 
   stat course (response to comments by Gary Smith)."  Posted to 
   sci.stat.edu and EdStat-L on November 23, 1998.  Available at 
   http://www.matstat.com/teach/p0036.htm

Macnaughton, D. B. 1998c. "Re: Eight features of an ideal intro 
   stat course (response to comments by Dennis Roberts)."  Posted 
   to sci.stat.edu and EdStat-L on July 23, 1998.  Available at 
   http://www.matstat.com/teach/p0033.htm

Macnaughton, D. B. 1999. "The introductory statistics course:  
   The entity-property-relationship approach." Available at 
   http://www.matstat.com/teach/

SAS Institute Inc. 1990. _SAS/STAT user's guide, version 6, vol-
   ume 2_ 4th ed. Cary, NC: author.

Scheaffer, R. L. 1992. "Data, Discernment and Decisions: An Em-
   pirical Approach to Introductory Statistics," in _Statistics 
   for the Twenty-First Century, MAA Notes No. 26,_ ed. by F. 
   Gordon and S. Gordon, Washington, DC:  Mathematical Associa-
   tion of America, pp. 69-82.

Winer, B. J. 1971. _Statistical principles in experimental de-
   sign._ 2d ed. New York: McGraw-Hill.

Return to top

Home page for Donald Macnaughton's papers about introductory statistics