EPR Approach: Response to Scheiner

Subject: Re: How Should We *Motivate* Students in Intro Stat?

     To: EdStat-L Statistics Education Discussion List,
         sci.stat.edu Usenet Newsgroup

   From: Donald B. Macnaughton <donmac@matstat.com>
                      (formerly donmac@hookup.net)

   Date: Monday February 24, 1997

     cc: Herman Rubin <hrubin@stat.purdue.edu>, 
         Samuel M. Scheiner <sam.scheiner@asu.edu>, 
         John R. Vokey <vokey@hg.uleth.ca>

John Vokey (1996) recommends that

> the focus [of the introductory statistics course for students
> who are not statistics majors] be narrowed ... to a particular
> science or discipline

Herman Rubin (1996) replies that

> Statistics is not different in different fields.

Sam Scheiner (1996) responds

> While statistics does not differ, motivations differ, interests
> differ, and the primary types of methods differ. 

Sam implies that since motivations, interests, and methods differ 
in different fields, we should structure the introductory course 
differently for students in different fields--to suit the differ-
ences in the motivations, interests, and methods.  

Sam's argument hinges on the premise 

    Motivations, interests, and methods differ in different 
    fields of empirical research.  

Is this important premise true?  Let us consider each of Sam's 
three areas of apparent difference.


DOES MOTIVATION DIFFER IN DIFFERENT FIELDS?
The first area in which Sam believes different fields differ is 
in the area of "motivations".  Three kinds of motivation are rel-
evant here:
1. the motivation that stimulates researchers to *engage* in em-
   pirical research
2. the motivation that stimulates researchers to *use statistical 
   methods* in empirical research
3. the motivation we present to *students* in order to interest 
   them in the field of statistics (which was the original sub-
   ject of this thread).

I shall concentrate on the second kind of motivation--the motiva-
tion that stimulates researchers to use statistical methods.  The 
second kind of motivation is important because it seems most rel-
evant in determining whether we should structure the introductory 
course differently in different fields to take account of differ-
ent motivations.  That is, if the motivation for using statisti-
cal methods turns out to be the same in every field, we may not 
need to take account of *any* motivational differences in the 
different fields of empirical research when we are designing the 
introductory course.

Thus the question of interest is

    Does the motivation for using statistical methods differ 
    from one field of empirical research to the next?

I believe the answer to this question depends on one's point of 
view.  That is, some points of view of empirical research clearly 
imply that the motivation for using statistical methods differs 
in different fields.  However, I believe there is also a unifying 
point of view that implies the different fields of empirical re-
search all have the *same* motivation for using statistical meth-
ods.  This point of view rests on the following premises: 

1. Every field of empirical research has a fundamental interest 
   in predicting and controlling the values of the variables that 
   are studied in that field (Macnaughton 1996a).  

2. It is possible to fully characterize almost all the statisti-
   cal methods as methods for studying variables and relation-
   ships between variables as a means to accurately predicting 
   and controlling the values of variables (Macnaughton 1996a).

3. Therefore, it is reasonable to view the *motivation* for using 
   statistical methods in empirical research as being to study 
   variables and relationships between variables as a means to 
   predicting and controlling the values of variables.

From this unifying point of view, the motivation for using sta-
tistical methods in empirical research does not differ in differ-
ent fields of empirical research.  Therefore, if we take this 
point of view, we need not structure the introductory course dif-
ferently in different fields to take account of different motiva-
tions.


DOES INTEREST DIFFER IN DIFFERENT FIELDS?
The second area in which Sam believes different fields differ is 
in the area of "interests".  Two types of interest are relevant 
here
- the interest in *subject matter* in different fields
- the interest in *statistical questions* in different fields.

Certainly interest in subject matter differs in different fields.  
For example, the field of physics is interested in various prop-
erties of particles, waves, and energy, while the field of medi-
cine is interested in various properties of the human body.  I 
shall assume the differences in subject-matter interest in dif-
ferent fields are a given in the present discussion, and I shall 
discuss them no further.  

Now let us consider the important question whether interest in 
*statistical questions* differs in different fields of empirical 
research.  Here, if we view the use of statistical methods in em-
pirical research in terms of entities, properties, variables, and 
relationships, it appears that interest in statistical questions 
does *not* differ in different fields.  Instead, it appears that 
one can usefully interpret almost all empirical research projects 
that use statistical methods by filling in the blanks in the fol-
lowing schema:

population of entities: ________________________________________

     response variable: ________________________________________

 predictor variable(s): ________________________________________

 statistical questions: 1. Is there a relationship between the 
                           response variable and the predictor 
                           variable(s) in the entities in the 
                           population?
                        2. If there is a relationship, how can we 
                           best predict or control the values of 
                           the response variable in new entities 
                           from the population on the basis of 
                           the relationship?
                        3. How accurate will the prediction or 
                           control be?

For example, if we consider a particular empirical research pro-
ject (experiment) to study a new treatment for AIDS, the com-
pleted schema might appear as follows:

population of entities: patients with some specified set of AIDS 
                        symptoms 

     response variable: a measure of the "amount of AIDS" in the 
                        patients

    predictor variable: a measure of the amount of the new treat-
                        ment administered to the patients  
                        (Often, for maximum power, only two 
                        widely discrepant amounts of treatment 
                        are used in this type of experiment.)

 statistical questions: 1. Is there a relationship in the pa-
                           tients between "amount of treatment" 
                           and "amount of AIDS"?  That is, does 
                           AIDS vary at all in sync with the var-
                           iation in treatment?
                        2. If there is a relationship, how can we 
                           best control (i.e., minimize) the 
                           amount of AIDS in patients on the ba-
                           sis of the relationship?
                        3. How accurate will the control of AIDS 
                           be?

I have studied many hundreds of empirical research projects that 
use statistical methods in a wide range of fields.  I have found 
that almost all these research projects can be usefully inter-
preted in terms of the above schema.  This leads to the following 
conjecture:

    Almost all empirical research projects that use statisti-
    cal methods in all fields of empirical research can be 
    usefully interpreted in terms of the above general 
    schema.

If the conjecture is correct, and if we take the point of view of 
the schema, the same three statistical questions are broadly ad-
dressed across all fields of empirical research.  Thus, from the 
point of view of the schema, interest in statistical questions 
does not differ in different fields of empirical research.  
Therefore, if we take the point of view of the schema, we need 
not structure the introductory course differently in different 
fields to take account of different interests in statistical 
questions.

(Readers who are unfamiliar with the schema are encouraged to try 
to interpret familiar empirical research projects in terms of the 
schema.  Readers should find that most research projects that use 
statistical methods can be interpreted in terms of the schema.  
Readers who are unaware of the wide applicability of the approach 
should, after study, find that the approach substantially in-
creases their understanding of the use of statistical methods in 
empirical research.

(On the other hand, some readers may believe the conjecture is 
incorrect.  That is, some readers may believe there is a reason-
ably broad set of empirical research projects that use statisti-
cal methods and that *cannot* be interpreted in terms of the 
schema.  These readers are invited to describe these research 
projects in this newsgroup.  Also, I will discuss in the news-
group, without naming the authors, any interesting examples sent 
to me by e-mail.

(Note that the schema does not cover *all* empirical research 
projects--only research projects that are appropriate for analy-
sis with statistical methods.  However, it appears that most re-
search projects that are not appropriate for analysis with sta-
tistical methods can also be characterized using the same unify-
ing terminology.  That is, most [all?] empirical research pro-
jects that are not appropriate for analysis with statistical 
methods can be characterized as identifying and studying enti-
ties, relationships between entities, or properties of entities.  
What else does empirical [scientific] research do beyond what I 
have described?

(Further reading:  I describe two types of [rarely occurring] 
research projects that use statistical methods and that cannot 
easily be interpreted in terms of the schema in the appendix and 
I rebut some seeming counterexamples in a paper [1996a Appendix 
B].)


DO PRIMARY TYPES OF METHODS DIFFER IN DIFFERENT FIELDS?
The third area in which Sam believes different fields differ is 
in the area of the "primary types of methods".  Two types of 
methods are relevant here:
- the primary *non-statistical* methods used in different fields 
  of empirical research
- the primary *statistical* methods used in different fields of 
  empirical research.

Of course, the primary *non-statistical* methods usually differ 
from one field of empirical research to the next.  For example, 
the primary methods of chemistry involve (among other things) 
measuring properties of substances.  These methods are quite dif-
ferent from the primary methods of experimental psychology, which 
involve (among other things) measuring properties of the behavior 
of living organisms.

Similarly, the primary *statistical* methods often differ in dif-
ferent fields of empirical research, the main difference being 
that some fields (e.g., astronomy and economics) rely heavily on 
observational statistical methods, while other fields (e.g., 
physics and biology) rely heavily on experimental statistical 
methods.  

But apart from the high-level difference between observational 
and experimental statistical methods, there is not much differ-
ence in the use of statistical methods in different fields.  That 
is, if we look across the fields of research that use observa-
tional methods, we find that most research projects that use ob-
servational methods choose linear regression to study relation-
ships between variables if the underlying assumptions of linear 
regression are adequately satisfied, and choose from among vari-
ous other statistical methods if the assumptions are not ade-
quately satisfied.  (The other methods include non-linear regres-
sion, robust regression, frequency table analysis, logistic re-
gression, and so on.)

Similarly, if we look across the fields of research that use ex-
perimental methods, we find that most research projects that use 
experimental methods choose analysis of variance and response 
surface methods to study relationships between variables if the 
underlying assumptions of these procedures are adequately satis-
fied.  If the assumptions are not adequately satisfied, most re-
search projects that use experimental methods choose from among 
various procedures that can be viewed as weaker versions of anal-
ysis of variance.

(I discuss a few special cases of statistical methods in the ap-
pendix.)

The distinction between observational and experimental methods 
raises three questions about the design of the introductory stat-
istics course
1. In the general introductory statistics course how should we 
   apportion the allotted time between observational and experi-
   mental methods?
2. In a particular introductory statistics course should we ap-
   portion the allotted time differently depending on the field 
   of study in which the students are enrolled?
3. If we decide to discuss both observational and experimental 
   methods in an introductory course, which set of methods should 
   we discuss first?

I recommend that all introductory statistics courses give stu-
dents a careful introduction to both observational research meth-
ods and experimental research methods (including discussion of 
linear regression and analysis of variance).  Observational meth-
ods are important because they are frequently used and are easier 
to understand than experimental methods.  Experimental methods 
are important because they are the touchstone of empirical (i.e., 
scientific) research, enabling us to infer causation, and thereby 
enabling us to control the values of variables.

In the general course I recommend that teachers devote slightly 
more than half the course to experimental methods because it is 
important to give students a good sense of the formal scientific 
experiment--a pivotal element of the scientific method.

For students enrolled in a particular field of study, it is rea-
sonable to moderately adjust the relative attention given to ob-
servational and experimental methods to reflect the attention 
given to the two types of methods by empirical research in that 
field.

It is reasonable to introduce observational methods first because 
students find it easier to understand a relationship between two 
continuous variables (as in bivariate regression) than to under-
stand a relationship between a continuous variable and a "manipu-
lated" discontinuous variable (as in one-way analysis of vari-
ance).  

I introduce an approach that follows the above principles in a 
paper for students (1996b).


IDEAL EXAMPLES
Sam next discusses another important aspect of tailoring the in-
troductory statistics course--the choice of examples.

> In my biometry course I give examples that involve biology and
> the types of data the students are likely to encounter in other
> classes and later research. 

Sam's approach here is ideal because consistent use of good exam-
ples from the students' chosen field substantially heightens stu-
dent interest.  Thus (when possible) it is very helpful to struc-
ture the introductory statistics course differently in different 
field in terms of the choice of examples.  (Almost every field of 
empirical research has excellent examples of both observational 
and experimental research projects.)


MATHEMATICAL THEORY

> It is possible to give the students an appreciation for the
> basic assumptions underlying statistical methods without load-
> ing them down with tons of mathematical theory. 

I fully agree.  It is unfortunate that the mathematical aspects 
of our field often overshadow the basic non-mathematical goals 
(prediction and control) of empirical research that our field so 
ably serves.


GOALS OF THE INTRODUCTORY STATISTICS COURSE

>    ( snip )
> My goal in one semester is to give [students] 
> (1) the ability to read a scientific paper and understand the
>     statistics used,
> (2) be able to perform simple statistical analyses and under-
>     stand when such analyses are valid and not valid, and
> (3) talk intelligently with a statistician concerning more
>     complicated analyses.

I recommend (1996c) that the goals of the introductory course 
should be
1. to give students a lasting appreciation of the vital role of
   the field of statistics in empirical research, and
2. to teach students to use some useful statistical methods in
   empirical research.

I believe Sam's goals and my goals are consistent.  

(Ward and Fountain [1996] describe an approach to the introduc-
tory statistics course that is philosophically similar to the ap-
proach I recommend, emphasizing prediction and using goals that 
are consistent with my goals.  Their approach appears to differ 
from my approach mainly in terms of emphasis of topics in two ar-
eas
- Ward and Fountain put greater emphasis on the underlying mathe-
  matics
- Ward and Fountain put less emphasis on the concepts of enti-
  ties, properties, and variables.)


USING JOURNAL ARTICLES IN THE INTRODUCTORY COURSE

> A primary motivating factor for the students is goal (1).
> These students are already wrestling with a complex liter-
> ature.  They very quickly appreciate an increased ability to
> comprehend what they are reading.  Perhaps this sort of ap-
> proach will work in other disciplinary statistics courses.  
> Use journal articles to illustrate different techniques.

I believe journal articles can be an effective teaching tool in a 
statistics course if the following three conditions are all sat-
isfied:
1. the students in the course are all in the same discipline 
   (i.e., the course is what Sam calls a "disciplinary" statis-
   tics course)
2. the students are all familiar enough with the discipline to 
   understand some of its research literature and
3. appropriate journal articles are available.

If the three conditions are not satisfied, I recommend against 
having students read scientific journal articles.  Instead, I 
recommend presenting students with simulated reports of real or 
realistic empirical research that have been tailored specifically 
for making points.  I shall discuss this approach further in a 
later post (1997).

In addition to *using* journal articles in courses in which the 
above three conditions are satisfied, I also believe it is useful 
to *motivate* students in these courses with the promise that 
they will learn to better understand journal articles, since im-
proved ability to understand journal articles is often a clear 
(and conscious) need of the students.  However, even in this case 
I believe it is also useful to motivate students with the promise 
that they will learn how to make accurate predictions, because 
being able to make accurate predictions is a more fundamental 
goal than being able to understand journal articles.  This is so 
because (if the conjecture above is correct) most journal arti-
cles that use statistical analysis in empirical research can be 
usefully viewed as having the goal of accurate prediction or con-
trol of the values of variables.


DOES EACH FIELD HAVE A UNIQUE STATISTICAL USAGE?
In the closing two sentences of his post, Sam returns to the 
topic of general tailoring of the introductory statistics course 
to suit a field or discipline.  He writes

> Other commentors in this thread have argued about "decisions"
> vs "predictions", etc.  Well, let the discipline dictate what
> common usage is for that field.

Here, I think Sam may have overestimated the differences in 
"usage" in different fields.  That is, as discussed above, we can 
view the statistical methods as being used in the same way in 
every field of empirical research--used to help us study vari-
ables and relationships between variables as a means to accurate 
prediction and control.  

If we teach the introductory statistics course from this unifying 
point of view, I believe we can make the field of statistics sub-
stantially easier for students to understand.


LINK
The ideas in this post are part of a broader discussion of an ap-
proach to the introductory statistics course available at

               http://www.matstat.com/teach/

--------------------------------------------------------
Donald B. Macnaughton   MatStat Research Consulting Inc.
donmac@matstat.com      Toronto, Canada
--------------------------------------------------------


APPENDIX:  INTERPRETING SOME STATISTICAL METHODS
I recommend above that we introduce students to statistical meth-
ods in terms of observational and experimental research using the 
methods of linear regression and analysis of variance.  Following 
are brief descriptions of a few specialized statistical methods 
suggesting how they fit into the rubric of relationships between 
variables:

Bayesian analysis:  methods for studying a variable or a rela-
   tionship between variables when a particular type of prior in-
   formation about the variable or relationship is available.

canonical correlation analysis:  methods for studying relation-
   ships between a set of two or more response variables (as op-
   posed to only a single response variable) and a set of predic-
   tor variables.

cluster analysis:  [this is an example of a set of statistical 
   methods that are not directly related to studying relation-
   ships between variables] methods for partitioning the entities 
   in a population into subpopulations of "similar" entities on 
   the basis of the values of variables that reflect properties 
   of the entities.

factor analysis and principal components analysis:  [this is an-
   other example of a set of statistical methods that are not di-
   rectly related to studying relationships between variables] 
   methods for studying a set of variables that reflect proper-
   ties of entities in a population and determining whether the 
   variables can be well-summarized in terms of some smaller set 
   of variables (that are generated as functions of the original 
   variables).

neural networks:  a group of methods for studying relationships 
   between variables--the methods work in ways that are analogous 
   to the way neurons work in the brain.

path analysis and linear structural relationship analysis:  meth-
   ods for studying networks of relationships between variables.

survival analysis:  methods for studying relationships between 
   variables when the response variable is the "survival" of  
   people, other organisms, or other types of entities such as 
   manufactured products (where survival is measured as the time 
   to death or time to failure); the predictor variables often 
   represent different treatments that it is hoped will increase 
   survival.

time series analysis:  methods for studying relationships between 
   variables when an important predictor variable is "time".


REFERENCES
Macnaughton, D. B. (1996a), "The Introductory Statistics Course:  
   A New Approach."  Available at http://www.matstat.com/teach/

Macnaughton, D. B. (1996b), "The Entity-Property-Relationship Ap-
   proach to Statistics:  An Introduction for Students."  Avail-
   able at http://www.matstat.com/teach/

Macnaughton, D. B. (1996c), "Goals of Your Introductory Statis-
   tics Course."  Available at http://www.matstat.com/teach/

Macnaughton, D. B. (1997), "The Choice of Examples in Intro 
   Stat."  Forthcoming; to be posted to the sci.stat.edu Usenet 
   newsgroup.

Rubin, H. (1996), "Re: How Should We *Motivate* Students in Intro 
   Stat?"  Posted to the sci.stat.edu Usenet newsgroup on Decem-
   ber 3, 1996.  Available at 
   gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat 
   (search for "How Should We" without the quotes).

Scheiner, S. M. (1996), "Re: How Should We *Motivate* Students in 
   Intro Stat?"  Posted to the sci.stat.edu Usenet newsgroup on 
   December 5, 1996.  Available at 
   gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat 
   (search for "How Should We" without the quotes).  

Vokey, J. R. (1996), "Re: How Should We *Motivate* Students in 
   Intro Stat?"  Posted to the sci.stat.edu Usenet newsgroup on 
   December 1, 1996.  Available at 
   gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat 
   (search for "How Should We" without the quotes).

Ward, J. H. and Fountain, R. L. (1996), "More Problem Solving 
   Power:  Exploiting Prediction Models and Statistical Software 
   in a One-Semester Course."  _Journal of Statistical Education_ 
   4.  Available at 
   http://www2.ncsu.edu/ncsu/pams/stat/info/jse/v4n3/ward.html

Home Page for the Entity-Property-Relationship Approach to Introductory Statistics