Subject: Re: Eight Features of an Ideal Intro Stat Course
(Second response to comments by Dennis Roberts)
To: EdStat-L and sci.stat.edu
From: Donald B. Macnaughton <donmac@matstat.com>
Date: Sunday May 2, 1999
Cc: Dennis Roberts <dmr@psu.edu>
Referring to a 98/8/3 post of mine, Dennis Roberts writes (on
98/8/4)
> ( snip )
> Donald Macnaughton wrote (in part):
>
>> - the usefulness of the field of statistics lies solely in its
>> applications in empirical research
>
> sorry ... have to disagree. while not exclusive to statistics
> ... there is a general logic in thinking that derives from such
> study ... that applies to empirical research and OTHER arenas
> of thought ...
To resolve this disagreement it is helpful to have a clear sense
of the concept of 'empirical research'. I propose the following
definition:
EMPIRICAL RESEARCH is any research in which data are
gathered from the external world and then conclusions
are drawn from the data about the external world.
Under this definition, I cannot see any (practical) uses of the
field of statistics beyond its uses in empirical research. If
Dennis still disagrees, I hope he will identify the "other arenas
of thought" (beyond empirical research) he refers to, and I hope
he will describe the usefulness of the field of statistics in
those arenas.
Two distinguished statisticians support my claim about the use-
fulness of statistics. Harry Roberts insightfully discusses us-
ing student projects to teach statistics (1992). He reinforces
his points by quoting George Box as saying:
In my view statistics has no reason for existence except
as the catalyst for investigation and discovery (p. 109).
Study of Box's writing suggests that he uses the phrase "investi-
gation and discovery" to denote what I refer to as "empirical re-
search". (That is, my experience suggests that ALL discussions
in Box's writing about the use of statistics are directly aimed
[either specifically or generally] at drawing conclusions about
the external world from data.) Thus Box's statement corroborates
my statement that Dennis quotes above.
>> - almost all empirical research projects can be usefully
>> characterized as studying relationships between variables
>
> if you consider "frequency" to be a variable ... in a simple
> frequency distribution which resides on the "y" axis ... then
> perhaps so. but ... this is kind of a stretch ... in that we
> normally don't consider 'frequency' to be a variable like most
> others that fall along a relationship graph (scatterdiagram)
> with X and y axes ...
Dennis is considering the situation in which an empirical re-
search project is studying the univariate distribution of the
variable X. He proposes (in graphical terms) a way we could view
the study of the univariate distribution of X as *actually* being
a study of a relationship between two variables. But he then re-
jects his proposal as a "stretch", and I fully agree.
In a more complete form, Dennis' argument above would seem to run
as follows:
1. It is not reasonable to view an empirical research project
that studies the univariate distribution of a variable X as
actually studying a *relationship* between X and another vari-
able Y, where Y is the variable "frequency of occurrence of
different values of X".
2. No other point of view exists under which we can reasonably
view an empirical research project that studies a univariate
distribution as studying a relationship between variables.
3. Therefore, Macnaughton's statement that almost all empirical
research projects can be usefully characterized as studying
relationships between variables is incorrect.
This argument is unsound because the second premise is incorrect
-- we can reasonably view each empirical research project that
studies a univariate distribution as studying a special type of
relationship between variables as follows: As usual, one re-
sponse variable is present in the example, namely the variable
whose univariate distribution is under study. But the number of
predictor variables, instead of being one or more, is reduced to
zero. The preceding three sentences appear to be rigorously true
as a limiting (degenerate) case in two senses
- in an empirical sense and
- in a strict mathematical sense.
That is, every empirical or mathematical procedure we use to
study univariate distributions can be easily viewed as the limit-
ing case (when the number of predictor variables is reduced to
zero) of a similar (but more complicated) procedure we use (or
could use) to study relationships between variables.
* * *
However, one need not view univariate distributions as degenerate
cases of relationships between variables to see the correctness
of the point that almost all empirical research projects can be
usefully viewed as studying relationships. For even if we say
that studies of univariate distributions are NOT studies of rela-
tionships, it is still true that almost all empirical research
projects can be usefully viewed as studying relationships between
variables. This is because very few *real* empirical research
projects study univariate distributions. Instead, almost all
real empirical research projects (or logical components of re-
search projects) can be best viewed as studying the relationship
between a single response variable and one or more predictor
variables.
Let me make some predictions:
1. More than ninety-six percent of real empirical research proj-
ects are best viewed as studying relationships between vari-
ables.
2. Less than two percent of real empirical research projects are
best viewed as studying univariate distributions.
3. Less than two percent of real empirical research projects are
best viewed as not belonging to either of the above two
groups. For example, a research project might be best viewed
as studying entities or relationships between entities as op-
posed to studying properties of entities (variables) or rela-
tionships between properties.
(Some readers will suspect that my first [i.e., 96%] prediction
above is much too high. Such a suspicion is reasonable because
many modern empirical research projects are NOT generally viewed
as studying relationships between variables. I discuss how it is
possible to usefully view many such research projects as studying
relationships in a paper [1999, app. B].)
The main point I wish to make here is the following: If almost
all real empirical research projects are easily and usefully
viewed as studying relationships between variables, and since
most of the modern statistical procedures can be easily charac-
terized as studying relationships between variables (Macnaughton
1999, sec. 4.3), and since the concept of a relationship between
variables is not hard to understand, it is reasonable to empha-
size the concept of a relationship between variables early in the
introductory statistics course.
I discuss some further issues about my three predictions above in
the appendix. I discuss some research projects that do NOT study
relationships between (or among) variables in an earlier post
(1997a, app. A). The "laws" of science are a small but important
group of statements derived from empirical *scientific* research.
In another post I report on a classification of 213 laws of sci-
ence into eight categories (with most of the laws being classi-
fied as statements of relationships [or non-relationships] be-
tween variables) (1997b, app. A).
>> - almost all the statistical methods can be usefully charac-
>> terized as methods for studying relationships between vari-
>> ables.
>
> are you admitting that some cannot be? what are these? are
> they not important too?
A few statistical methods cannot easily be characterized as
studying relationships between variables in the standard "re-
sponse variable - predictor variable(s)" sense. These methods
include
- cluster analysis
- factor analysis
- principal components analysis
- multidimensional scaling and
- a few other infrequently used statistical methods.
These methods still study variables (i.e., properties of enti-
ties), and in a loose sense they also study relationships *be-
tween* or *among* the variables. But since none of these methods
focuses on a specific response variable, I view them as a sepa-
rate group.
These methods are exceptions to the "response variable - predic-
tor variable(s)" rule and appear only rarely in real empirical
research. (I estimate that these methods appear in total in less
than one percent of reported empirical research projects that use
statistical methods.) Thus although these methods are important
in a small percentage of research projects, I believe they are
not important topics for discussion in an introductory statistics
course.
(I discuss these methods further in a paper [1999, app. C].)
>> ( snip )
>> If we spend initial time discussing univariate distributions
>> before we discuss relationships between variables, I believe
>> we *alienate* students because students find univariate dis-
>> tributions to be boring and of little obvious use.
>
> what evidence do you have to support this?
In what follows in this post, when I refer to the "introductory
(statistics) course" I mean the introductory course for students
who are NOT majoring in statistics, whom I call "non-statistics-
majors". I am not discussing the introductory course for statis-
tics majors.
My main evidence for my claim that univariate distributions are
boring and of little obvious use is that *I*, a practicing stat-
istician, find univariate distributions to be boring and of lit-
tle obvious use for beginning students, despite the fact that I
have looked carefully for practical uses.
(I am not saying that univariate distributions have NO uses --
they are indispensable tools in support of the mathematics in
most statistical analyses. I am only saying that univariate dis-
tributions have little obvious use that can be appreciated by
students at the beginning of an introductory statistics course.)
I cannot prove that univariate distributions are of little or no
use for beginning students because such a proof appears to be
logically impossible. On the other hand, if univariate distribu-
tions DO have important uses for beginning students, it should be
easy for the proponents of discussing univariate distributions to
prove THEIR case by merely describing these uses. I invite pro-
ponents of teaching univariate distributions at the beginning of
the introductory course to propose examples of univariate distri-
butions that both (a) have practical uses and (b) students find
of interest.
(I discuss some putative examples of interesting univariate dis-
tributions in a paper [1998a, app. G] and in a Usenet post
[1998b]. I shall discuss in a forthcoming post two examples of
interesting univariate distributions discussed on November 26 by
Karl Wuensch.)
* * *
My recommendation against discussing univariate distributions at
the *beginning* of the introductory statistics course raises an
important question
If univariate distributions are not to be discussed at or
near the beginning of the introductory course, where
should they be discussed?
I recommend that univariate distributions be discussed near the
*end* of the introductory statistics course or at the beginning
of a second course. I explain this recommendation and propose a
syllabus for the introductory course in a paper (1999, sec. 6.4
and 6.9).
> personally ... i think most students find statistics boring ...
> whether it be studying relationships or not ...
I agree that many students find statistics boring in SOME intro-
ductory courses. However, the fact that statistics is boring in
SOME courses does NOT allow us to conclude that statistics will
be boring if we emphasize interesting relationships between vari-
ables right from the start. This is because almost no statistics
courses presently take a relationship-between-variables approach,
so we have almost no relevant data on which to base a conclusion.
(Some leaders in statistical education have already independently
adopted the approach of emphasizing relationships between vari-
ables, although perhaps not to the extent I recommend. For exam-
ple, using an idea developed by Gudmund Iversen, George Cobb
teaches two introductory courses, both of which start with rela-
tionships -- one devoted to experimental design and applied
analysis of variance and the other devoted to applied regression
[G. Cobb, personal communication, August 21, 1996]. Similarly,
Robin Lock teaches an introductory course devoted to time series
analysis -- i.e., methods for studying relationships between
variables when an important predictor variable is "time" [Cobb
1993, sec. 3.1].)
> and this [the fact that students find statistics boring] is
> primarily because it [statistics] is foisted on them and re-
> quired ... and not naturally selected.
Dennis makes a good point -- the fact that statistics courses are
often mandatory does not endear students to statistics. Stu-
dents' lack of respect is heightened when they (as I have sug-
gested earlier) have trouble seeing any practical value in sta-
tistics. Thus to avoid making things worse in an already bad
situation, it is helpful to quickly show students the *practical
value* of statistics.
I believe we can best show students practical value by showing
them how relationships between variables enable accurate predic-
tion and control (of the values of variables).
> what i think does make a difference is to have data of interest
> to them ...
I fully agree. If we use interesting data that students can see
practical value in studying, we are much more likely to give stu-
dents a lasting appreciation of statistics. I further discuss
the choice of data in two papers (1998a, sec. 6; 1999 sec. 6.5).
> whether this [data] be studied in the context of some relation-
> ship problem or not ...
This clause gives insight into Dennis' view of relationships be-
tween variables: He seems to be suggesting that relationships
are somewhat incidental in the field of statistics.
>> On the other hand, students find relationships between vari-
>> ables to be fascinating.
>
> donald ... i think you do stretch a bit ... and are assigning a
> characteristic to intro students that they just don't possess
> ...
Dennis speaks from his experience with introductory students.
However, although he may have taught many introductory statistics
courses, he has probably not seriously *manipulated* his approach
to teaching the fundamental statistical concepts. In particular,
since Dennis' statement in the earlier quote above suggests he
believes relationships between variables are somewhat incidental
in statistics, he has presumably not emphasized relationships be-
tween variables in any of his introductory courses. Thus he can-
not speak from experience about the effectiveness of carefully
discussing relationships between variables early in the introduc-
tory course.
My own experience with students is that they find relationships
between variables to be fascinating.
>> Relationships are fascinating
>
> maybe some are ... but NOT because of the following ..
>
>> because study of relationships is the only known objective
>> method for accurate prediction and control
>
> how do students know that?
Most entering students have no knowledge of the broad usefulness
of relationships between variables for accurate prediction and
control. We can, however, easily enrich our students with this
powerful knowledge.
-------------------------------------------------------
Donald B. Macnaughton MatStat Research Consulting Inc
donmac@matstat.com Toronto, Canada
-------------------------------------------------------
APPENDIX: A SURVEY OF EMPIRICAL RESEARCH PROJECTS
In the body of this post I predict that more than ninety-six per-
cent of real empirical research projects are best viewed as
studying relationships between variables. My prediction is not
based on a proper statistical survey of empirical research proj-
ects and is instead based simply on my experience as a statisti-
cian. I make the prediction as a straw man basis for discussion.
A proper statistical survey might draw a sample of reports of re-
search projects from a sample of journals reporting empirical re-
search results and then use experts in empirical research to (1)
identify the entities and variables of interest in each research
project and (2) classify the main focus (or foci) of each re-
search project into one of the three categories I describe in the
body of this post. I hope that an interested reader will perform
a proper empirical version of the survey.
Ironically, the survey is a survey (across empirical research
projects) of the univariate distribution of the nominal-level
variable "main focus of the research project". Thus here, de-
spite my discussion above and elsewhere of how univariate distri-
butions are generally uninteresting, we have an interesting uni-
variate distribution. However, I suggest that we can make this
distribution *more* interesting if we turn it into a study of a
relationship between two variables. That is, we could study
(across empirical research projects) the distribution of the
(nominal-level) response variable "main focus of the research
project" as a function of the (nominal-level) predictor variable
"branch of empirical research to which the research project be-
longs" (e.g., medicine, physics, sociology, etc.). Perhaps the
distribution is the same in each branch of empirical research or
perhaps it is different.
REFERENCES
Cobb, G. W. 1993. "Reconsidering statistics education: A Na-
tional Science Foundation conference." _Journal of Statistics
Education 1(1)._ Available at
http://www.amstat.org/publications/jse/v1n1/cobb.html
Macnaughton, D. B. 1997a. "Re: How should we *motivate* students
in intro stat? (response to comments by John R. Vokey)."
Posted to sci.stat.edu and EdStat-L on April 6, 1997 and re-
vised on June 1, 1997. Available at
http://www.matstat.com/teach/p0024.htm
Macnaughton, D. B. 1997b. "EPR approach and scientific explana-
tion (response to comments by Robert Frick)." Posted to
sci.stat.edu and EdStat-L on July 23, 1997. Available at
http://www.matstat.com/teach/p0026.htm
Macnaughton, D. B. 1998a. "Eight features of an ideal introduc-
tory statistics course." Available at
http://www.matstat.com/teach/
Macnaughton, D. B. 1998b. "Re: Eight features of an ideal intro-
ductory statistics course (response to comments by Gary
Smith)." Posted to sci.stat.edu and EdStat-L on November 23,
1998. Available at http://www.matstat.com/teach/p0036.htm
Macnaughton, D. B. 1999. "The introductory statistics course:
The entity-property-relationship approach." Available at
http://www.matstat.com/teach/
Roberts, H. V. 1992. "Student-Conducted Projects in Introductory
Statistics Courses." In Gordon, F. S. and Gordon, S. P. (eds.)
_Statistics for the Twenty-First Century, MAA Notes, No. 26,_
Washington, DC: Mathematical Association of America. pp. 109 -
121.
Home page for Donald Macnaughton's papers about introductory statistics