Home Subject: The Most Exciting Talk at the 2003 Joint Statistical Meetings To: EdStat E-Mail List ApStat E-Mail List sci.stat.edu Usenet Newsgroup From: Donald B. Macnaughton < donmac@matstat.com > Date: Friday August 22, 2003 Cc: Candace Schau ----------------------------------------------------------------- THE JOINT STATISTICAL MEETINGS The JSM (Joint Statistical Meetings) is a conference sponsored by five large North American statistical organizations and is held annually in August. This year the JSM was in San Francisco and was attended by almost 6000 statisticians from around the world. Among the many sessions at the JSM were 22 sessions that pre- sented around 110 talks related to statistics education. The talks I attended always reflected teachers' dedication to helping students to understand statistics and almost always provided use- ful new perspectives on statistics teaching. THE SATS The most exciting talk at the JSM for me was presented by Candace Schau (pronounced "shaw"), who discussed her test for measuring students' attitudes toward statistics (Survey of Attitudes Toward Statistics -- SATS). The test is designed to be administered to students twice -- at the beginning of a course and at the end. The test contains twenty-eight questions (items) that reflect four subscales. The subscales measure properties of the students that Schau has labeled Affect, Cognitive Competence, Value, and Difficulty. Students can usually complete the test in under ten minutes. USEFULNESS OF THE SATS As I discuss in a 2002 paper, I believe a reasonable first goal of an introductory statistics course is To give students a lasting appreciation of the vital role of the field of statistics in empirical research. Under this goal, the SATS is useful because it enables us to ac- curately measure students' appreciation of statistics (using the Value subscale of the SATS). It is especially useful to measure each student's appreciation immediately before and immediately after a course because the difference between the two scores for a student is a precise gauge of the effect of the course on the student. It seems reasonable to say The greater the average improvement in SATS Value sub- scale scores as a result of a course, the higher (in one reasonable sense) the quality of the course. Thus the SATS is a useful test to help us to improve statistics education. Most statisticians are familiar with the experience of revealing their vocation to someone at a party, and having that person say that they once took a statistics course, and it was the worst course they ever took, or some similar negative comment. This supports the view that a person's attitudes toward statistics (which can often be negative) tend to persist throughout the per- son's lifetime. (This point was suggested at the JSM by Sterling Hilton.) In contrast, much of the knowledge students learn in an introductory statistics course is forgotten, often with a steep forgetting curve that begins when the student finishes the final exam. Since attitudes toward statistics generally persist more strongly in a person's consciousness than knowledge about statistics, and since students' attitudes toward statistics are often negative, this suggests that it's at least as important to study how to im- prove students' attitudes toward statistics as it is to study how to impart specific statistical knowledge. This point further suggests that the SATS is a useful test to help us to improve statistics education. SATS AVAILABILITY The SATS and supporting material are available without charge from Dr. Schau's web site (Schau 2003). In addition, Dr. Schau's consulting firm is available to assist with projects related to the SATS. INTERPRETING SATS SCORES A reasonable rudimentary approach to interpreting an administra- tion of the SATS is to subtract each student's pre-test score for each sub-scale from his or her corresponding post-test score and then to study the univariate distribution (across students) of each of the four differences. (The SATS is designed so that such subtraction is reasonable.) We would like the mean of each of these distributions to be (in the appropriate direction) as far away from zero as possible. (If we wish, we can also perform a statistical test of whether the mean change in attitude scores is different from zero, or different from some other fixed value. However, from the point of view of improving statistics education such tests are less im- portant because it is relative differences in scores between ap- propriately compared teaching methods that are important, not ab- solute scores. Relative differences in scores are best obtained in experiments, as discussed below.) WHAT IF ATTITUDES GET WORSE? A teacher may administer the SATS and find that students' atti- tudes are generally worse at the end of the course than at the beginning. If so, what can the teacher do? First, it seems clear that something can be done. This is be- cause (as I think most readers will agree) the field of statis- tics is a vital cornerstone of science (and also of all other types of empirical research). Thus the field clearly merits se- rious respect and appreciation. Thus if students' attitudes to- ward statistics become worse after taking a particular statistics course, this merely reflects the fact that the teacher (like many others) has not yet found a good approach to instilling a strong sense of the value of our field in students. When the teacher finds this approach, the students' attitudes toward statistics will improve. A reasonable way to find the best approach to teaching an intro- ductory statistics course is to study the literature of modern statistics education, which contains many proposals for improving the introductory course. Careful implementation of the most promising of these proposals is almost certain to improve stu- dents' attitudes toward statistics. Appendix A gives some entry points to the literature. DISENTANGLING APPROACHES TO STATISTICS EDUCATION I believe we will disentangle the many approaches to teaching the introductory statistics course and discover how to optimize stu- dents' attitudes toward statistics (and perhaps optimize other important response variables) through the use of standard statis- tical tools in empirical research, with particular emphasis on designed experiments. Carefully designed experiments to compare approaches to teaching statistics appear to be the best way to find the best approaches to help people to recognize the useful- ness of our field. (Appendix B discusses some technical aspects of the design and analysis of experiments in statistics education.) Designed experiments in statistics education will follow and ex- pand the leadership that is currently being provided by Hilton, Christensen, Collings, Hadfield, Schaalje, and Tolley (1999). Don Macnaughton ------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc donmac@matstat.com Toronto, Canada ------------------------------------------------------- APPENDIX A: REFERENCES TO MATERIAL IN STATISTICS EDUCATION For introductory statistics teachers interested in improving their courses, here are some links to material about statistics education. First is a list of some journals that specialize wholly or partly in articles about statistics education: - Journal of Statistics Education. This online journal is aimed at statistics educators. http://www.amstat.org/publications/jse/ - Statistics Education Research Journal. Aimed at improving sta- tistics education. http://fehps.une.edu.au/F/s/curric/cReading/serj/index.html - Chance. This magazine-style journal is aimed at "everyone". http://www.amstat.org/publications/chance/ - Stats. Aimed at students. http://www.amstat.org/publications/stats/ - Teaching Statistics. Aimed at teachers of students aged up to 19 who use statistics in their work. http://www.blackwellpublishing.com/journal.asp?ref=0141-982X - "Teachers' Corner" of The American Statistician. This section of the journal publishes general articles about teaching sta- tistics. http://www.amstat.org/publications/tas/ - "Statistics Teacher Network." A newsletter aimed at statistics teachers. http://www.bio.ri.ccf.org/ASA/stn.html The American Statistical Association has an active section on Statistical Education that regularly makes helpful contributions to the advancement of statistical education. Information about this group is available at http://www.stat.ncsu.edu/stated/homepage.html and information about joining is at http://www.amstat.org/membership/join.html The International Association of Statistical Education (IASE) is an affiliate of the International Statistical Institute. The IASE also regularly makes helpful contributions to the advance- ment of statistical education, including sponsoring an important conference on statistical education every four years. Informa- tion about this group is available at http://www.cbs.nl/isi/iase.htm The American Statistical Association has published a set of care- fully developed formal recommendations about teaching statistics in undergraduate major and minor programs. These are available at http://www.amstat.org/education/Curriculum_Guidelines.html Here are some general books about teaching statistics: Gordon, F., and Gordon, S. (eds.) 1992. Statistics for the Twenty- First Century, MAA Notes No. 26. Washington, DC: Mathematical Association of America. Hawkins, A., Jolliffe, F., and Glickman, L. 1992. Teaching Statistical Concepts. London: Longman. Hoaglin, D. C., and Moore, D. S. (eds.) 1992. Perspectives on Contemporary Statistics, MAA Notes No 21. Washington, DC: Mathematical Association of America. Moore, T. L. (ed.) 2000. Teaching Statistics: Resources for Undergraduate Instructors, MAA Notes No. 52. Washington, DC: Mathematical Association of America. Many good introductory statistics textbooks are available. Here are some introductory textbooks I have studied that I like: De Veaux, R. D., Velleman, P. F., and Bock, D. E. Intro Stats. Boston, MA: Pearson. Freedman, D., Pisani, R., and Purves, R. 1998. Statistics (3rd ed). New York: W. W. Norton. Moore, D. S. 2003. The Basic Practice of Statistics (3rd ed). New York: W. H. Freeman. Rossman, A. J. and Chance, B. L. 2001. Workshop Statistics: Discovery with Data (2nd ed). Emeryville, CA: Key College Publishing. Utts, J. M. and Heckard, R. F. 2004. Mind on Statistics (2nd ed). Belmont, CA: Brooks/Cole/Thomson. Watkins, A. E., Scheaffer, R. L., and Cobb, G. W. 2004. Statistics in Action: Understanding a World of Data. Emeryville, CA: Key Curriculum Press. Finally, I have made some suggestions for improving the introduc- tory statistics course. Discussion is available at http://www.matstat.com/teach/ APPENDIX B: TECHNICAL ISSUES IN THE ANALYSIS OF STATISTICS EDUCATION EXPERIMENTS This appendix discusses experiments that compare methods of teaching an introductory statistics course. In these experiments the response variable is a SATS subscale score (or some other pair of pre- and post- measures of the students) and the main predictor variable reflects the different teaching methods that are under study. The goal of these experiments is to determine whether and how attitudes depend on teaching methods. In other words, we would like to know which of the various teaching ap- proaches yields the best attitudes. Knowing this helps us to im- prove the design of introductory statistics courses. In these experiments the researcher can experimentally compare any teaching methods of interest. For example, activities can be compared with lectures, group work can be compared with individ- ual work, or emphasis on one set of basic statistical concepts can be compared with emphasis on another such set. It is useful to note that a researcher will be successful at finding relationships between attitudes and one or more predictor variables only to the extent that the chosen predictor variables have enough slope on the response surface to yield detectability. Thus the predictor variables and the experimental design in re- search in statistics education should be chosen with care. It is possible to use multivariate methods to simultaneously model the relationship between all four SATS subscale scores and one or more predictor variables. In this case the response vari- able is viewed as a four-component vector instead of as a single scalar value. However, the multivariate approach generally pro- vides no significant advantages, is much harder to understand, and seems to link the four subscales together too tightly. Thus researchers typically treat each subscale score independently of the others in separate analyses. Thus a reasonable approach to the main analysis for an experiment with SATS scores is to per- form four separate (mixed-model) analyses of variance, one for each subscale of the SATS. A typical statistics education experiment will study several classes or sections of students in which teachers use one method of teaching the course and several other classes in which teach- ers use a second competing method. Several classes with each method are necessary because the teaching methods vary between classes, but teachers (and other properties of the classes) also generally vary between classes. Thus if we use only two classes, any differences we find between the methods may be actually due to the difference between the teachers, and not due to the dif- ferences in the methods. Thus multiple classes are needed to even out teacher (and other) effects. The "Interpreting SATS Scores" section above discusses how it is reasonable to study in isolation the univariate distribution of the differences between pre-test and post-test SATS scores. How- ever, it is not reasonable to give these differences to an analy- sis of variance or linear models computer program for analysis. Instead, a better approach is to give the program the original pre-test and post-test scores (as opposed to giving it only the differences between the scores or only the post-test scores). This allows the program to take account of the uncollapsed set of experimental data, which generally enables a more comprehensive and more sensitive analysis. (One could even give the program the individual SATS item scores for each student, as opposed to giving it only the subscale scores. [Each subscale score is simply the sum of the item scores for the items associated with the subscale.] However, the individual item scores of a test are rarely studied in the main analysis of an education experiment, perhaps because the item scores for a subscale are viewed as merely reflecting somewhat independent measures of the same property. Thus the within-stu- dent variation in the item scores reflects nothing more than the measurement error in the individual items, which is generally not of interest under the goals of the experiment.) In the type of experiment under discussion the response variable (e.g., one of the four subscales of the SATS) is applied to each student twice -- at the beginning of the course and at the end. This dual administration of the response variable implies that the experiment is a "repeated-measurements" experiment. The use of repeated measurements is often effective because (with an appropriate design and analysis) the comparisons for the key statistical tests of the experiment are made within the experi- mental entities. That is, in the key comparison(s) an experimen- tal entity is compared with itself. These within-entities com- parisons generally exhibit substantially less random variability than the corresponding between-entities comparisons. This im- plies (through a mathematical argument) that statistical tests associated with the within-entities comparisons are generally substantially more powerful than the tests we would obtain if the procedure of repeated measurements were not used. Thus consider an experiment that studies the relationship between a SATS subscale score (measured both before and after the course) as the response variable and a predictor variable that reflects two (or more) teaching methods. An important statistical test for a difference in the effectiveness of the methods is the test of the two-way interaction between the factors (teaching) Method and Time (of testing). (This interaction test is relevant because if the methods being compared differ in their effects on students' attitudes, the pre- to-post change in attitudes of the group of students receiving one method of teaching will be different from the corresponding change in the other group(s). The Method by Time interaction test is explicitly designed to detect such a difference.) Examination of the layout of the type of experiment under study implies that the Method by Time interaction test is a within- students test. Thus this test tends to be substantially more powerful for detecting differences than the between-students (and between-classes) "main-effect" Method test. Thus although the main-effect test is of clear interest, it is important to perform the interaction test because the latter test may be (correctly) statistically significant even though the former test is not. (A general rule is that if a main-effect component in an analysis of variance table is a within-entities component [e.g., Time in the present example], then all interactions between this compo- nent and other components [e.g., the Method by Time interaction in the present example] are also within-entities components.) The experiment under discussion has the following attributes: - The main predictor variable (sometimes called a "factor" in analysis of variance) is (teaching) Method, which is a "fixed" effect in the model (equation). - The predictor variable Time (of testing) is a "repeated meas- urements" effect in the model. Time is also a fixed effect be- cause the two times of measurement are fixed (relative to the course) at "immediately before the course" and "immediately af- ter the course". - If students are grouped in classes or sections, the predictor variable Class is a generally viewed as a "random" effect in the model. - If the experiment spans more than one academic institution, the predictor variable Institution is reasonably viewed as reflect- ing another random effect in the model. (An experiment that spans more than one institution resembles a multicenter clini- cal trial and thus multicenter principles may provide useful precedents.) - We might also include predictor variables (effects in the model) for students' gender, age, and perhaps other thought- likely-to-be-useful predictor variables. (However, these pre- dictor variables are generally of less direct interest than the main predictor variable [Method] and interactions between it and other predictor variables.) - The experiment contains many possible interactions between the various predictor variables. Specifying how the interactions are to be treated in the analysis is somewhat complicated. - The experiment will likely end up being unbalanced (e.g., with different numbers of classes receiving the different teaching methods) which is another minor complication. The preceding points suggest that proper analysis of this type of experiment is complicated. Thus some statistical software pack- ages may be unable to perform the analysis. (SPSS and SAS have mixed model routines that can handle the key aspects of this type of analysis for many experimental designs.) We include the Class factor in the analysis because it reflects a significant aspect of the experimental situation and because it enables us to drain away attributable variation in the values of the response variable -- variation that can be associated with variation in the Class predictor variable. This generally yields a smaller between-students residual (i.e., leftover) variation, which yields more powerful between-students statistical tests. Thus in the experiment under discussion some of the variation in the values of the response variable may be associable with the differences in the different teachers for the different classes. Draining away this variation will make the between-students sta- tistical test(s) more powerful. (However, as suggested above, the between-students tests are generally less important than the within-students tests.) If teachers teach more than one class, we could also include a (random) Teacher factor in the analysis. (If teachers only teach one class, the Class factor takes account of teacher [main] ef- fects.) As noted, a key statistical test in the experiment under discus- sion is the within-students test of the two-way interaction be- tween the factors (teaching) Method and Time (of testing). This suggests that we may be able to ignore the between-students Class and (if applicable) Institution factors in the analysis, which simplifies the analysis to a standard repeated measurements analysis. I recommend that researchers perform the analysis both ignoring and taking account of the Class factor and report both results to help others understand and use the methods. (In cer- tain [perhaps many or all] cases the p-value is exactly the same for the Method by Time interaction whether the Class factor is included in the model or not.) The repeated-measurements aspect and the random Class factor as- pect of the analysis yield a somewhat complex covariance struc- ture for the values of the response variable. This structure must be adequately modeled in order to provide correct and most powerful statistical tests for evidence of a difference between the two or more teaching methods being compared in the experi- ment. (However, if we are unsure of the covariance structure, we can tell the program to assume an "unstructured" covariance matrix, which may somewhat lessen the power of the statistical tests, but will ensure that the covariance structure is adequately modeled. Another commonly assumed covariance structure is called "compound symmetry".) Researchers interested in these analyses may find it useful to study books about repeated measurements, longitudinal, and hier- archical linear models (or consult with their authors) to ensure that the optimum experimental design and analysis methods are used. For a relatively small cost, this approach can substan- tially increase your chance of finding what you're looking for, if it exists. This approach also reduces the chance of drawing incorrect conclusions, a pitfall of complex analyses. A useful way to understand mixed-model analysis computer programs is to generate realistic data under various models and to analyze the generated data with a mixed-model program (or programs). An easy way to generate data is to use statistical software and a model equation. That is, one uses random number generators or fixed values (as necessary) to generate appropriate values of predictor variables and one uses a properly parameterized model equation (with a random number generator for the error term) to generate values of the response variable from the values of the predictor variables. Most general statistical software can be programmed in minutes to generate realistic research data using this method. (Many software products give examples of such data generation in their documentation.) If we use an analysis pro- gram to study relationships (or lack of relationships) between variables that we ourselves have "installed" in the data, we gain effective experience in how the program works, and we also gain effective experience in how statistical models work. Some students' attitudes about the field of statistics are disap- pointingly negative. Thus a good feature of research in improv- ing students' attitudes is that we have substantial room for im- provement. If our field is really as useful as many of us like to think, proper experimentation (using a broad range of promis- ing teaching methods) will undoubtedly lead us to improve stu- dents' attitudes. This will help students to appreciate the cen- tral role our field plays (or can play) throughout empirical re- search. This will substantially increase the use and overall contribution of the field of statistics. REFERENCES Hilton, S. C., Christensen, H. B., Collins, B. J., Hadfield, K., Schaalje, B., and Tolley, D. 1999. "A Randomized, Controlled Experiment to Assess Technological Innovations in the Class- room on Student Outcomes: An Overview of a Clinical Trial in Education," in American Statistical Association Proceedings of the Section on Statistical Education, pp. 209-212. (See also the subsequent two articles in the same Proceedings volume.) Macnaughton, D. B. 2002. "The Introductory Statistics Course: The Entity-Property-Relationship Approach." Available at http://www.matstat.com/teach/ Schau, Candace. 2003. "Survey of Attitudes Toward Statistics" Available at http://www.unm.edu/~cschau/satshomepage.htm
Home page for the Entity-Property-Relationship Approach to Introductory Statistics