Subject: Re: How Should We *Motivate* Students in Intro Stat? To: EdStat-L Statistics Education Discussion List, sci.stat.edu Usenet Newsgroup From: Donald B. Macnaughton <donmac@matstat.com> (formerly donmac@hookup.net) Date: Monday February 24, 1997 cc: Herman Rubin <hrubin@stat.purdue.edu>, Samuel M. Scheiner <sam.scheiner@asu.edu>, John R. Vokey <vokey@hg.uleth.ca>
John Vokey (1996) recommends that > the focus [of the introductory statistics course for students > who are not statistics majors] be narrowed ... to a particular > science or discipline Herman Rubin (1996) replies that > Statistics is not different in different fields. Sam Scheiner (1996) responds > While statistics does not differ, motivations differ, interests > differ, and the primary types of methods differ. Sam implies that since motivations, interests, and methods differ in different fields, we should structure the introductory course differently for students in different fields--to suit the differ- ences in the motivations, interests, and methods. Sam's argument hinges on the premise Motivations, interests, and methods differ in different fields of empirical research. Is this important premise true? Let us consider each of Sam's three areas of apparent difference. DOES MOTIVATION DIFFER IN DIFFERENT FIELDS? The first area in which Sam believes different fields differ is in the area of "motivations". Three kinds of motivation are rel- evant here: 1. the motivation that stimulates researchers to *engage* in em- pirical research 2. the motivation that stimulates researchers to *use statistical methods* in empirical research 3. the motivation we present to *students* in order to interest them in the field of statistics (which was the original sub- ject of this thread). I shall concentrate on the second kind of motivation--the motiva- tion that stimulates researchers to use statistical methods. The second kind of motivation is important because it seems most rel- evant in determining whether we should structure the introductory course differently in different fields to take account of differ- ent motivations. That is, if the motivation for using statisti- cal methods turns out to be the same in every field, we may not need to take account of *any* motivational differences in the different fields of empirical research when we are designing the introductory course. Thus the question of interest is Does the motivation for using statistical methods differ from one field of empirical research to the next? I believe the answer to this question depends on one's point of view. That is, some points of view of empirical research clearly imply that the motivation for using statistical methods differs in different fields. However, I believe there is also a unifying point of view that implies the different fields of empirical re- search all have the *same* motivation for using statistical meth- ods. This point of view rests on the following premises: 1. Every field of empirical research has a fundamental interest in predicting and controlling the values of the variables that are studied in that field (Macnaughton 1996a). 2. It is possible to fully characterize almost all the statisti- cal methods as methods for studying variables and relation- ships between variables as a means to accurately predicting and controlling the values of variables (Macnaughton 1996a). 3. Therefore, it is reasonable to view the *motivation* for using statistical methods in empirical research as being to study variables and relationships between variables as a means to predicting and controlling the values of variables. From this unifying point of view, the motivation for using sta- tistical methods in empirical research does not differ in differ- ent fields of empirical research. Therefore, if we take this point of view, we need not structure the introductory course dif- ferently in different fields to take account of different motiva- tions. DOES INTEREST DIFFER IN DIFFERENT FIELDS? The second area in which Sam believes different fields differ is in the area of "interests". Two types of interest are relevant here - the interest in *subject matter* in different fields - the interest in *statistical questions* in different fields. Certainly interest in subject matter differs in different fields. For example, the field of physics is interested in various prop- erties of particles, waves, and energy, while the field of medi- cine is interested in various properties of the human body. I shall assume the differences in subject-matter interest in dif- ferent fields are a given in the present discussion, and I shall discuss them no further. Now let us consider the important question whether interest in *statistical questions* differs in different fields of empirical research. Here, if we view the use of statistical methods in em- pirical research in terms of entities, properties, variables, and relationships, it appears that interest in statistical questions does *not* differ in different fields. Instead, it appears that one can usefully interpret almost all empirical research projects that use statistical methods by filling in the blanks in the fol- lowing schema: population of entities: ________________________________________ response variable: ________________________________________ predictor variable(s): ________________________________________ statistical questions: 1. Is there a relationship between the response variable and the predictor variable(s) in the entities in the population? 2. If there is a relationship, how can we best predict or control the values of the response variable in new entities from the population on the basis of the relationship? 3. How accurate will the prediction or control be? For example, if we consider a particular empirical research pro- ject (experiment) to study a new treatment for AIDS, the com- pleted schema might appear as follows: population of entities: patients with some specified set of AIDS symptoms response variable: a measure of the "amount of AIDS" in the patients predictor variable: a measure of the amount of the new treat- ment administered to the patients (Often, for maximum power, only two widely discrepant amounts of treatment are used in this type of experiment.) statistical questions: 1. Is there a relationship in the pa- tients between "amount of treatment" and "amount of AIDS"? That is, does AIDS vary at all in sync with the var- iation in treatment? 2. If there is a relationship, how can we best control (i.e., minimize) the amount of AIDS in patients on the ba- sis of the relationship? 3. How accurate will the control of AIDS be? I have studied many hundreds of empirical research projects that use statistical methods in a wide range of fields. I have found that almost all these research projects can be usefully inter- preted in terms of the above schema. This leads to the following conjecture: Almost all empirical research projects that use statisti- cal methods in all fields of empirical research can be usefully interpreted in terms of the above general schema. If the conjecture is correct, and if we take the point of view of the schema, the same three statistical questions are broadly ad- dressed across all fields of empirical research. Thus, from the point of view of the schema, interest in statistical questions does not differ in different fields of empirical research. Therefore, if we take the point of view of the schema, we need not structure the introductory course differently in different fields to take account of different interests in statistical questions. (Readers who are unfamiliar with the schema are encouraged to try to interpret familiar empirical research projects in terms of the schema. Readers should find that most research projects that use statistical methods can be interpreted in terms of the schema. Readers who are unaware of the wide applicability of the approach should, after study, find that the approach substantially in- creases their understanding of the use of statistical methods in empirical research. (On the other hand, some readers may believe the conjecture is incorrect. That is, some readers may believe there is a reason- ably broad set of empirical research projects that use statisti- cal methods and that *cannot* be interpreted in terms of the schema. These readers are invited to describe these research projects in this newsgroup. Also, I will discuss in the news- group, without naming the authors, any interesting examples sent to me by e-mail. (Note that the schema does not cover *all* empirical research projects--only research projects that are appropriate for analy- sis with statistical methods. However, it appears that most re- search projects that are not appropriate for analysis with sta- tistical methods can also be characterized using the same unify- ing terminology. That is, most [all?] empirical research pro- jects that are not appropriate for analysis with statistical methods can be characterized as identifying and studying enti- ties, relationships between entities, or properties of entities. What else does empirical [scientific] research do beyond what I have described? (Further reading: I describe two types of [rarely occurring] research projects that use statistical methods and that cannot easily be interpreted in terms of the schema in the appendix and I rebut some seeming counterexamples in a paper [1996a Appendix B].) DO PRIMARY TYPES OF METHODS DIFFER IN DIFFERENT FIELDS? The third area in which Sam believes different fields differ is in the area of the "primary types of methods". Two types of methods are relevant here: - the primary *non-statistical* methods used in different fields of empirical research - the primary *statistical* methods used in different fields of empirical research. Of course, the primary *non-statistical* methods usually differ from one field of empirical research to the next. For example, the primary methods of chemistry involve (among other things) measuring properties of substances. These methods are quite dif- ferent from the primary methods of experimental psychology, which involve (among other things) measuring properties of the behavior of living organisms. Similarly, the primary *statistical* methods often differ in dif- ferent fields of empirical research, the main difference being that some fields (e.g., astronomy and economics) rely heavily on observational statistical methods, while other fields (e.g., physics and biology) rely heavily on experimental statistical methods. But apart from the high-level difference between observational and experimental statistical methods, there is not much differ- ence in the use of statistical methods in different fields. That is, if we look across the fields of research that use observa- tional methods, we find that most research projects that use ob- servational methods choose linear regression to study relation- ships between variables if the underlying assumptions of linear regression are adequately satisfied, and choose from among vari- ous other statistical methods if the assumptions are not ade- quately satisfied. (The other methods include non-linear regres- sion, robust regression, frequency table analysis, logistic re- gression, and so on.) Similarly, if we look across the fields of research that use ex- perimental methods, we find that most research projects that use experimental methods choose analysis of variance and response surface methods to study relationships between variables if the underlying assumptions of these procedures are adequately satis- fied. If the assumptions are not adequately satisfied, most re- search projects that use experimental methods choose from among various procedures that can be viewed as weaker versions of anal- ysis of variance. (I discuss a few special cases of statistical methods in the ap- pendix.) The distinction between observational and experimental methods raises three questions about the design of the introductory stat- istics course 1. In the general introductory statistics course how should we apportion the allotted time between observational and experi- mental methods? 2. In a particular introductory statistics course should we ap- portion the allotted time differently depending on the field of study in which the students are enrolled? 3. If we decide to discuss both observational and experimental methods in an introductory course, which set of methods should we discuss first? I recommend that all introductory statistics courses give stu- dents a careful introduction to both observational research meth- ods and experimental research methods (including discussion of linear regression and analysis of variance). Observational meth- ods are important because they are frequently used and are easier to understand than experimental methods. Experimental methods are important because they are the touchstone of empirical (i.e., scientific) research, enabling us to infer causation, and thereby enabling us to control the values of variables. In the general course I recommend that teachers devote slightly more than half the course to experimental methods because it is important to give students a good sense of the formal scientific experiment--a pivotal element of the scientific method. For students enrolled in a particular field of study, it is rea- sonable to moderately adjust the relative attention given to ob- servational and experimental methods to reflect the attention given to the two types of methods by empirical research in that field. It is reasonable to introduce observational methods first because students find it easier to understand a relationship between two continuous variables (as in bivariate regression) than to under- stand a relationship between a continuous variable and a "manipu- lated" discontinuous variable (as in one-way analysis of vari- ance). I introduce an approach that follows the above principles in a paper for students (1996b). IDEAL EXAMPLES Sam next discusses another important aspect of tailoring the in- troductory statistics course--the choice of examples. > In my biometry course I give examples that involve biology and > the types of data the students are likely to encounter in other > classes and later research. Sam's approach here is ideal because consistent use of good exam- ples from the students' chosen field substantially heightens stu- dent interest. Thus (when possible) it is very helpful to struc- ture the introductory statistics course differently in different field in terms of the choice of examples. (Almost every field of empirical research has excellent examples of both observational and experimental research projects.) MATHEMATICAL THEORY > It is possible to give the students an appreciation for the > basic assumptions underlying statistical methods without load- > ing them down with tons of mathematical theory. I fully agree. It is unfortunate that the mathematical aspects of our field often overshadow the basic non-mathematical goals (prediction and control) of empirical research that our field so ably serves. GOALS OF THE INTRODUCTORY STATISTICS COURSE > ( snip ) > My goal in one semester is to give [students] > (1) the ability to read a scientific paper and understand the > statistics used, > (2) be able to perform simple statistical analyses and under- > stand when such analyses are valid and not valid, and > (3) talk intelligently with a statistician concerning more > complicated analyses. I recommend (1996c) that the goals of the introductory course should be 1. to give students a lasting appreciation of the vital role of the field of statistics in empirical research, and 2. to teach students to use some useful statistical methods in empirical research. I believe Sam's goals and my goals are consistent. (Ward and Fountain [1996] describe an approach to the introduc- tory statistics course that is philosophically similar to the ap- proach I recommend, emphasizing prediction and using goals that are consistent with my goals. Their approach appears to differ from my approach mainly in terms of emphasis of topics in two ar- eas - Ward and Fountain put greater emphasis on the underlying mathe- matics - Ward and Fountain put less emphasis on the concepts of enti- ties, properties, and variables.) USING JOURNAL ARTICLES IN THE INTRODUCTORY COURSE > A primary motivating factor for the students is goal (1). > These students are already wrestling with a complex liter- > ature. They very quickly appreciate an increased ability to > comprehend what they are reading. Perhaps this sort of ap- > proach will work in other disciplinary statistics courses. > Use journal articles to illustrate different techniques. I believe journal articles can be an effective teaching tool in a statistics course if the following three conditions are all sat- isfied: 1. the students in the course are all in the same discipline (i.e., the course is what Sam calls a "disciplinary" statis- tics course) 2. the students are all familiar enough with the discipline to understand some of its research literature and 3. appropriate journal articles are available. If the three conditions are not satisfied, I recommend against having students read scientific journal articles. Instead, I recommend presenting students with simulated reports of real or realistic empirical research that have been tailored specifically for making points. I shall discuss this approach further in a later post (1997). In addition to *using* journal articles in courses in which the above three conditions are satisfied, I also believe it is useful to *motivate* students in these courses with the promise that they will learn to better understand journal articles, since im- proved ability to understand journal articles is often a clear (and conscious) need of the students. However, even in this case I believe it is also useful to motivate students with the promise that they will learn how to make accurate predictions, because being able to make accurate predictions is a more fundamental goal than being able to understand journal articles. This is so because (if the conjecture above is correct) most journal arti- cles that use statistical analysis in empirical research can be usefully viewed as having the goal of accurate prediction or con- trol of the values of variables. DOES EACH FIELD HAVE A UNIQUE STATISTICAL USAGE? In the closing two sentences of his post, Sam returns to the topic of general tailoring of the introductory statistics course to suit a field or discipline. He writes > Other commentors in this thread have argued about "decisions" > vs "predictions", etc. Well, let the discipline dictate what > common usage is for that field. Here, I think Sam may have overestimated the differences in "usage" in different fields. That is, as discussed above, we can view the statistical methods as being used in the same way in every field of empirical research--used to help us study vari- ables and relationships between variables as a means to accurate prediction and control. If we teach the introductory statistics course from this unifying point of view, I believe we can make the field of statistics sub- stantially easier for students to understand. LINK The ideas in this post are part of a broader discussion of an ap- proach to the introductory statistics course available at http://www.matstat.com/teach/ -------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc. donmac@matstat.com Toronto, Canada -------------------------------------------------------- APPENDIX: INTERPRETING SOME STATISTICAL METHODS I recommend above that we introduce students to statistical meth- ods in terms of observational and experimental research using the methods of linear regression and analysis of variance. Following are brief descriptions of a few specialized statistical methods suggesting how they fit into the rubric of relationships between variables: Bayesian analysis: methods for studying a variable or a rela- tionship between variables when a particular type of prior in- formation about the variable or relationship is available. canonical correlation analysis: methods for studying relation- ships between a set of two or more response variables (as op- posed to only a single response variable) and a set of predic- tor variables. cluster analysis: [this is an example of a set of statistical methods that are not directly related to studying relation- ships between variables] methods for partitioning the entities in a population into subpopulations of "similar" entities on the basis of the values of variables that reflect properties of the entities. factor analysis and principal components analysis: [this is an- other example of a set of statistical methods that are not di- rectly related to studying relationships between variables] methods for studying a set of variables that reflect proper- ties of entities in a population and determining whether the variables can be well-summarized in terms of some smaller set of variables (that are generated as functions of the original variables). neural networks: a group of methods for studying relationships between variables--the methods work in ways that are analogous to the way neurons work in the brain. path analysis and linear structural relationship analysis: meth- ods for studying networks of relationships between variables. survival analysis: methods for studying relationships between variables when the response variable is the "survival" of people, other organisms, or other types of entities such as manufactured products (where survival is measured as the time to death or time to failure); the predictor variables often represent different treatments that it is hoped will increase survival. time series analysis: methods for studying relationships between variables when an important predictor variable is "time". REFERENCES Macnaughton, D. B. (1996a), "The Introductory Statistics Course: A New Approach." Available at http://www.matstat.com/teach/ Macnaughton, D. B. (1996b), "The Entity-Property-Relationship Ap- proach to Statistics: An Introduction for Students." Avail- able at http://www.matstat.com/teach/ Macnaughton, D. B. (1996c), "Goals of Your Introductory Statis- tics Course." Available at http://www.matstat.com/teach/ Macnaughton, D. B. (1997), "The Choice of Examples in Intro Stat." Forthcoming; to be posted to the sci.stat.edu Usenet newsgroup. Rubin, H. (1996), "Re: How Should We *Motivate* Students in Intro Stat?" Posted to the sci.stat.edu Usenet newsgroup on Decem- ber 3, 1996. Available at gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat (search for "How Should We" without the quotes). Scheiner, S. M. (1996), "Re: How Should We *Motivate* Students in Intro Stat?" Posted to the sci.stat.edu Usenet newsgroup on December 5, 1996. Available at gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat (search for "How Should We" without the quotes). Vokey, J. R. (1996), "Re: How Should We *Motivate* Students in Intro Stat?" Posted to the sci.stat.edu Usenet newsgroup on December 1, 1996. Available at gopher://jse.stat.ncsu.edu:70/7waissrc%3A/edstat/edstat (search for "How Should We" without the quotes). Ward, J. H. and Fountain, R. L. (1996), "More Problem Solving Power: Exploiting Prediction Models and Statistical Software in a One-Semester Course." _Journal of Statistical Education_ 4. Available at http://www2.ncsu.edu/ncsu/pams/stat/info/jse/v4n3/ward.html
Home Page for the Entity-Property-Relationship Approach to Introductory Statistics