Subject: Re: Eight Features of an Ideal Intro Stat Course (Second response to comments by Dennis Roberts) To: EdStat-L and sci.stat.edu From: Donald B. Macnaughton <donmac@matstat.com> Date: Sunday May 2, 1999 Cc: Dennis Roberts <dmr@psu.edu>
Referring to a 98/8/3 post of mine, Dennis Roberts writes (on 98/8/4) > ( snip ) > Donald Macnaughton wrote (in part): > >> - the usefulness of the field of statistics lies solely in its >> applications in empirical research > > sorry ... have to disagree. while not exclusive to statistics > ... there is a general logic in thinking that derives from such > study ... that applies to empirical research and OTHER arenas > of thought ... To resolve this disagreement it is helpful to have a clear sense of the concept of 'empirical research'. I propose the following definition: EMPIRICAL RESEARCH is any research in which data are gathered from the external world and then conclusions are drawn from the data about the external world. Under this definition, I cannot see any (practical) uses of the field of statistics beyond its uses in empirical research. If Dennis still disagrees, I hope he will identify the "other arenas of thought" (beyond empirical research) he refers to, and I hope he will describe the usefulness of the field of statistics in those arenas. Two distinguished statisticians support my claim about the use- fulness of statistics. Harry Roberts insightfully discusses us- ing student projects to teach statistics (1992). He reinforces his points by quoting George Box as saying: In my view statistics has no reason for existence except as the catalyst for investigation and discovery (p. 109). Study of Box's writing suggests that he uses the phrase "investi- gation and discovery" to denote what I refer to as "empirical re- search". (That is, my experience suggests that ALL discussions in Box's writing about the use of statistics are directly aimed [either specifically or generally] at drawing conclusions about the external world from data.) Thus Box's statement corroborates my statement that Dennis quotes above. >> - almost all empirical research projects can be usefully >> characterized as studying relationships between variables > > if you consider "frequency" to be a variable ... in a simple > frequency distribution which resides on the "y" axis ... then > perhaps so. but ... this is kind of a stretch ... in that we > normally don't consider 'frequency' to be a variable like most > others that fall along a relationship graph (scatterdiagram) > with X and y axes ... Dennis is considering the situation in which an empirical re- search project is studying the univariate distribution of the variable X. He proposes (in graphical terms) a way we could view the study of the univariate distribution of X as *actually* being a study of a relationship between two variables. But he then re- jects his proposal as a "stretch", and I fully agree. In a more complete form, Dennis' argument above would seem to run as follows: 1. It is not reasonable to view an empirical research project that studies the univariate distribution of a variable X as actually studying a *relationship* between X and another vari- able Y, where Y is the variable "frequency of occurrence of different values of X". 2. No other point of view exists under which we can reasonably view an empirical research project that studies a univariate distribution as studying a relationship between variables. 3. Therefore, Macnaughton's statement that almost all empirical research projects can be usefully characterized as studying relationships between variables is incorrect. This argument is unsound because the second premise is incorrect -- we can reasonably view each empirical research project that studies a univariate distribution as studying a special type of relationship between variables as follows: As usual, one re- sponse variable is present in the example, namely the variable whose univariate distribution is under study. But the number of predictor variables, instead of being one or more, is reduced to zero. The preceding three sentences appear to be rigorously true as a limiting (degenerate) case in two senses - in an empirical sense and - in a strict mathematical sense. That is, every empirical or mathematical procedure we use to study univariate distributions can be easily viewed as the limit- ing case (when the number of predictor variables is reduced to zero) of a similar (but more complicated) procedure we use (or could use) to study relationships between variables. * * * However, one need not view univariate distributions as degenerate cases of relationships between variables to see the correctness of the point that almost all empirical research projects can be usefully viewed as studying relationships. For even if we say that studies of univariate distributions are NOT studies of rela- tionships, it is still true that almost all empirical research projects can be usefully viewed as studying relationships between variables. This is because very few *real* empirical research projects study univariate distributions. Instead, almost all real empirical research projects (or logical components of re- search projects) can be best viewed as studying the relationship between a single response variable and one or more predictor variables. Let me make some predictions: 1. More than ninety-six percent of real empirical research proj- ects are best viewed as studying relationships between vari- ables. 2. Less than two percent of real empirical research projects are best viewed as studying univariate distributions. 3. Less than two percent of real empirical research projects are best viewed as not belonging to either of the above two groups. For example, a research project might be best viewed as studying entities or relationships between entities as op- posed to studying properties of entities (variables) or rela- tionships between properties. (Some readers will suspect that my first [i.e., 96%] prediction above is much too high. Such a suspicion is reasonable because many modern empirical research projects are NOT generally viewed as studying relationships between variables. I discuss how it is possible to usefully view many such research projects as studying relationships in a paper [1999, app. B].) The main point I wish to make here is the following: If almost all real empirical research projects are easily and usefully viewed as studying relationships between variables, and since most of the modern statistical procedures can be easily charac- terized as studying relationships between variables (Macnaughton 1999, sec. 4.3), and since the concept of a relationship between variables is not hard to understand, it is reasonable to empha- size the concept of a relationship between variables early in the introductory statistics course. I discuss some further issues about my three predictions above in the appendix. I discuss some research projects that do NOT study relationships between (or among) variables in an earlier post (1997a, app. A). The "laws" of science are a small but important group of statements derived from empirical *scientific* research. In another post I report on a classification of 213 laws of sci- ence into eight categories (with most of the laws being classi- fied as statements of relationships [or non-relationships] be- tween variables) (1997b, app. A). >> - almost all the statistical methods can be usefully charac- >> terized as methods for studying relationships between vari- >> ables. > > are you admitting that some cannot be? what are these? are > they not important too? A few statistical methods cannot easily be characterized as studying relationships between variables in the standard "re- sponse variable - predictor variable(s)" sense. These methods include - cluster analysis - factor analysis - principal components analysis - multidimensional scaling and - a few other infrequently used statistical methods. These methods still study variables (i.e., properties of enti- ties), and in a loose sense they also study relationships *be- tween* or *among* the variables. But since none of these methods focuses on a specific response variable, I view them as a sepa- rate group. These methods are exceptions to the "response variable - predic- tor variable(s)" rule and appear only rarely in real empirical research. (I estimate that these methods appear in total in less than one percent of reported empirical research projects that use statistical methods.) Thus although these methods are important in a small percentage of research projects, I believe they are not important topics for discussion in an introductory statistics course. (I discuss these methods further in a paper [1999, app. C].) >> ( snip ) >> If we spend initial time discussing univariate distributions >> before we discuss relationships between variables, I believe >> we *alienate* students because students find univariate dis- >> tributions to be boring and of little obvious use. > > what evidence do you have to support this? In what follows in this post, when I refer to the "introductory (statistics) course" I mean the introductory course for students who are NOT majoring in statistics, whom I call "non-statistics- majors". I am not discussing the introductory course for statis- tics majors. My main evidence for my claim that univariate distributions are boring and of little obvious use is that *I*, a practicing stat- istician, find univariate distributions to be boring and of lit- tle obvious use for beginning students, despite the fact that I have looked carefully for practical uses. (I am not saying that univariate distributions have NO uses -- they are indispensable tools in support of the mathematics in most statistical analyses. I am only saying that univariate dis- tributions have little obvious use that can be appreciated by students at the beginning of an introductory statistics course.) I cannot prove that univariate distributions are of little or no use for beginning students because such a proof appears to be logically impossible. On the other hand, if univariate distribu- tions DO have important uses for beginning students, it should be easy for the proponents of discussing univariate distributions to prove THEIR case by merely describing these uses. I invite pro- ponents of teaching univariate distributions at the beginning of the introductory course to propose examples of univariate distri- butions that both (a) have practical uses and (b) students find of interest. (I discuss some putative examples of interesting univariate dis- tributions in a paper [1998a, app. G] and in a Usenet post [1998b]. I shall discuss in a forthcoming post two examples of interesting univariate distributions discussed on November 26 by Karl Wuensch.) * * * My recommendation against discussing univariate distributions at the *beginning* of the introductory statistics course raises an important question If univariate distributions are not to be discussed at or near the beginning of the introductory course, where should they be discussed? I recommend that univariate distributions be discussed near the *end* of the introductory statistics course or at the beginning of a second course. I explain this recommendation and propose a syllabus for the introductory course in a paper (1999, sec. 6.4 and 6.9). > personally ... i think most students find statistics boring ... > whether it be studying relationships or not ... I agree that many students find statistics boring in SOME intro- ductory courses. However, the fact that statistics is boring in SOME courses does NOT allow us to conclude that statistics will be boring if we emphasize interesting relationships between vari- ables right from the start. This is because almost no statistics courses presently take a relationship-between-variables approach, so we have almost no relevant data on which to base a conclusion. (Some leaders in statistical education have already independently adopted the approach of emphasizing relationships between vari- ables, although perhaps not to the extent I recommend. For exam- ple, using an idea developed by Gudmund Iversen, George Cobb teaches two introductory courses, both of which start with rela- tionships -- one devoted to experimental design and applied analysis of variance and the other devoted to applied regression [G. Cobb, personal communication, August 21, 1996]. Similarly, Robin Lock teaches an introductory course devoted to time series analysis -- i.e., methods for studying relationships between variables when an important predictor variable is "time" [Cobb 1993, sec. 3.1].) > and this [the fact that students find statistics boring] is > primarily because it [statistics] is foisted on them and re- > quired ... and not naturally selected. Dennis makes a good point -- the fact that statistics courses are often mandatory does not endear students to statistics. Stu- dents' lack of respect is heightened when they (as I have sug- gested earlier) have trouble seeing any practical value in sta- tistics. Thus to avoid making things worse in an already bad situation, it is helpful to quickly show students the *practical value* of statistics. I believe we can best show students practical value by showing them how relationships between variables enable accurate predic- tion and control (of the values of variables). > what i think does make a difference is to have data of interest > to them ... I fully agree. If we use interesting data that students can see practical value in studying, we are much more likely to give stu- dents a lasting appreciation of statistics. I further discuss the choice of data in two papers (1998a, sec. 6; 1999 sec. 6.5). > whether this [data] be studied in the context of some relation- > ship problem or not ... This clause gives insight into Dennis' view of relationships be- tween variables: He seems to be suggesting that relationships are somewhat incidental in the field of statistics. >> On the other hand, students find relationships between vari- >> ables to be fascinating. > > donald ... i think you do stretch a bit ... and are assigning a > characteristic to intro students that they just don't possess > ... Dennis speaks from his experience with introductory students. However, although he may have taught many introductory statistics courses, he has probably not seriously *manipulated* his approach to teaching the fundamental statistical concepts. In particular, since Dennis' statement in the earlier quote above suggests he believes relationships between variables are somewhat incidental in statistics, he has presumably not emphasized relationships be- tween variables in any of his introductory courses. Thus he can- not speak from experience about the effectiveness of carefully discussing relationships between variables early in the introduc- tory course. My own experience with students is that they find relationships between variables to be fascinating. >> Relationships are fascinating > > maybe some are ... but NOT because of the following .. > >> because study of relationships is the only known objective >> method for accurate prediction and control > > how do students know that? Most entering students have no knowledge of the broad usefulness of relationships between variables for accurate prediction and control. We can, however, easily enrich our students with this powerful knowledge. ------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc donmac@matstat.com Toronto, Canada ------------------------------------------------------- APPENDIX: A SURVEY OF EMPIRICAL RESEARCH PROJECTS In the body of this post I predict that more than ninety-six per- cent of real empirical research projects are best viewed as studying relationships between variables. My prediction is not based on a proper statistical survey of empirical research proj- ects and is instead based simply on my experience as a statisti- cian. I make the prediction as a straw man basis for discussion. A proper statistical survey might draw a sample of reports of re- search projects from a sample of journals reporting empirical re- search results and then use experts in empirical research to (1) identify the entities and variables of interest in each research project and (2) classify the main focus (or foci) of each re- search project into one of the three categories I describe in the body of this post. I hope that an interested reader will perform a proper empirical version of the survey. Ironically, the survey is a survey (across empirical research projects) of the univariate distribution of the nominal-level variable "main focus of the research project". Thus here, de- spite my discussion above and elsewhere of how univariate distri- butions are generally uninteresting, we have an interesting uni- variate distribution. However, I suggest that we can make this distribution *more* interesting if we turn it into a study of a relationship between two variables. That is, we could study (across empirical research projects) the distribution of the (nominal-level) response variable "main focus of the research project" as a function of the (nominal-level) predictor variable "branch of empirical research to which the research project be- longs" (e.g., medicine, physics, sociology, etc.). Perhaps the distribution is the same in each branch of empirical research or perhaps it is different. REFERENCES Cobb, G. W. 1993. "Reconsidering statistics education: A Na- tional Science Foundation conference." _Journal of Statistics Education 1(1)._ Available at http://www.amstat.org/publications/jse/v1n1/cobb.html Macnaughton, D. B. 1997a. "Re: How should we *motivate* students in intro stat? (response to comments by John R. Vokey)." Posted to sci.stat.edu and EdStat-L on April 6, 1997 and re- vised on June 1, 1997. Available at http://www.matstat.com/teach/p0024.htm Macnaughton, D. B. 1997b. "EPR approach and scientific explana- tion (response to comments by Robert Frick)." Posted to sci.stat.edu and EdStat-L on July 23, 1997. Available at http://www.matstat.com/teach/p0026.htm Macnaughton, D. B. 1998a. "Eight features of an ideal introduc- tory statistics course." Available at http://www.matstat.com/teach/ Macnaughton, D. B. 1998b. "Re: Eight features of an ideal intro- ductory statistics course (response to comments by Gary Smith)." Posted to sci.stat.edu and EdStat-L on November 23, 1998. Available at http://www.matstat.com/teach/p0036.htm Macnaughton, D. B. 1999. "The introductory statistics course: The entity-property-relationship approach." Available at http://www.matstat.com/teach/ Roberts, H. V. 1992. "Student-Conducted Projects in Introductory Statistics Courses." In Gordon, F. S. and Gordon, S. P. (eds.) _Statistics for the Twenty-First Century, MAA Notes, No. 26,_ Washington, DC: Mathematical Association of America. pp. 109 - 121.
Home page for Donald Macnaughton's papers about introductory statistics