Subject: EPR Approach and Scientific "Explanation" To: EdStat-L Statistics Education Discussion List, sci.stat.edu Usenet Newsgroup From: Donald B. Macnaughton <donmac@matstat.com> (formerly donmac@hookup.net) Date: Wednesday July 23, 1997 cc: Robert Frick <rfrick@psych1.psy.sunysb.edu>
Bob Frick recently wrote to me about the link between the entity- property-relationship (EPR) approach to statistics and the idea of a scientific "explanation". I am very grateful to Bob for his insightful remarks. Since the issues Bob raises may interest readers of this group (and with Bob's permission), following are his remarks together with my reply. 1. IS A RELATIONSHIP BETWEEN PROPERTIES A LAW? Bob starts with two minor issues that set the scene for more im- portant issues that begin below in section 3. He writes > ( snip ) > I think the goal of science is to collect and test laws and > theories (with the exceptions that you mentioned). However, > the "relationship among properties" sounds like a law. I agree. In 1992 I did a survey of the named "laws" of science and found that most of these laws are statements of relationships between or among properties of entities. (I briefly describe the survey in appendix A.) Although I agree that a relationship between properties sounds like a law, it seems counter to standard usage to view *all* ac- cepted relationships between properties as being "laws" of sci- ence because (as suggested in appendix A) many scientists and statisticians restrict their use of the term "law" to only a small subset (probably less than 500) of all the hundreds of thousands of reliably known relationships between properties. 2. TWO SENSES OF THE WORD "THEORY" I wrote in an earlier post (1997) ... in medical research a scientist might theorize (per- haps on the basis of laboratory research) that a particu- lar drug has an effect on AIDS. The implication here of this theory is ... In response to this passage, Bob writes > (And, the claim that a drug has a particular effect on AIDS is > a law, not a theory, to my way of defining things.) Bob's puzzlement over my use of the word "theory" is due to an error on my part in overlooking the fact that the word "theory" has more than one meaning. I wrote thinking of one meaning and Bob interpreted the passage thinking of another (more common) meaning. Specifically, if we use the word "theory" to mean a "body of interrelated scientific statements", then a statement about an untested drug effect is certainly *not* a theory. On the other hand, if we use the word "theory" to mean a hypothesis, then a statement about an untested drug effect clearly *is* a theory. I have corrected the text of the essay to remove this source of confusion, which I thank Bob for identifying. In the following discussion I shall use the word "theory" only in the sense of a body of interrelated scientific statements. 3. WHAT A THEORY? > So, it seems to me that you overemphasize laws and under- > emphasize theories. Of course, that fits with how statistics > is used. I view a full-blown scientific theory as being simply a body of interrelated statements about 1. the existence of entities 2. the existence of properties of entities (and how to measure them) 3. actual values of properties 4. distributions of values of properties 5. relationships between entities 6. relationships between properties (relationships between vari- ables). (From the point of view of benefit or payoff, the most important statements in a scientific theory are the statements about rela- tionships between properties because those statements give us the ability to predict or control the values of properties -- an ability that often has substantial social or commercial value.) Although I have spent considerable time trying, I have been un- able to think of much that a scientific theory is beyond the above six types of statements. Can you think of any statements in any scientific theory that are not one of the above six types? On the other hand (as Bob has suggested in a follow-up e-mail), not all sets of interrelated statements that are of the above six types are scientific theories. For example, the statements of astrology are a set of interrelated statements of (some of) the above six types, but these statements do not qualify as a scien- tific theory. To qualify as a scientific theory, the statements must be derived or verified through *careful empirical research*. Thus astrology contains a set of statements of relationships be- tween properties, namely, relationships between (a) the positions of the heavenly bodies at the moment of a person's birth and (b) the personality traits of a person. (E.g., if the sun is in Leo when you are born, you will have high leadership ability.) Since there appears to be no concerted movement by astrologers to de- rive or verify these statements through careful empirical re- search, astrology does not qualify as a scientific theory. But if these putative relationships between properties ever *are* un- equivocally verified through careful empirical research, astrol- ogy *will* then become a valid (though puzzling) scientific the- ory. In summary, it is reasonable to view a scientific theory as a set of interrelated statements of some or all of the above six types that have been derived or verified (or are awaiting intended ver- ification) through careful empirical research. 4. WHAT IS "EXPLANATION"? "Explanation" (that is, the ability to "explain") is a highly re- garded aspect of theories. Bob suggests a most interesting prob- lem respecting explanation as follows: > Consider this. We know that there is a sun, a moon, and water > on the earth. We know that there is gravity, and we know that > the tide is highest when the sun and the moon are together, ... > A nice collection of entities, relationships between entities, > and relationships between properties. The remaining problem is > to explain *why* there are tides. But to me this is the heart > of science. Bob suggests that entities, properties, and relationships are nice, but the heart of science is *explanation*. Implicit in his suggestion is the important conjecture that A scientific *explanation* or understanding of some phe- nomenon consists of something *beyond* mere identifica- tion and linking together of all the relevant entities, properties, and relationships associated with the phe- nomenon. Is this interesting conjecture true? I devote sections 5 through 16 below to evaluating the conjecture. In section 5 I propose a proper scientific explanation of why there are tides. In sec- tions 6 through 15 I disassemble the explanation into components, looking to see what the explanation is made of. In section 16 I summarize the findings. (Some readers may wonder whether the following discussion has much to do with the field of statistics. The link is that the discussion centers around the concept of a relationship between variables, and the methods of statistics can all be characterized as methods for studying variables and [more importantly] rela- tionships between variables [Macnaughton 1996].) 5. AN EXPLANATION OF TIDES Isaac Newton and Pierre Laplace gave us the currently accepted explanation of why there are tides (Chandrasekhar 1995, ch. 21). To give us something to work with, let me give a simplified ver- sion of their explanation as follows: We can "explain" tides in terms of the ethereal and ubiquitous type of entity we call a "force". (Individual forces are entities in the sense that most people linguistically treat them so, by denoting them with a noun [i.e., the noun "force"]. Whenever people use a noun, it seems reason- able to assume that the noun is denoting some "thing", i.e., some entity. Like other entities, forces have properties. Four im- portant properties of any force are magnitude, direction, locus of application, and type.) An important type of force that affects tides is the force of gravity. This force simultaneously pulls the water in the oceans in three different directions -- toward the earth, toward the sun, and toward the moon. Consider some particular water molecule in the ocean, which I shall call water molecule M. The magnitude of each of the three gravitational forces on this water molecule is given by Newton's law of gravitation (a statement of a relationship between proper- ties of entities), m1 m2 f = G --------- r-squared where f = the magnitude of the force exerted on water molecule M by the other body (i.e., by the sun, the moon, or the earth) m1 = the mass ("weight") of water molecule M m2 = the mass of the other body r = the distance between water molecule M and the center of the other body G = the gravitational constant, the (empirically determined) parameter of the equation. The direction of the force specified by this equation is away from water molecule M and toward the other body along the imagi- nary line joining the centers of mass of water molecule M and the other body. A second important type of force that affects tides is the cen- trifugal force, which is a force felt by bodies that are rotat- ing, and which is directed outward, perpendicular to the axis of rotation. (Most physicists view centrifugal forces as "imaginary" forces in the sense that these forces are only present within a rotating frame of reference, and it is often more reasonable to explain phenomena in terms of a non-rotating [inertial] frame of refer- ence. However, since the current explanation is in the frame of reference of an individual [rotating] water molecule, it is ap- propriate to use centrifugal forces in the explanation.) The magnitude of the centrifugal force felt by a rotating body is given by the relationship between properties f = m omega-squared r where f = the magnitude of the centrifugal force m = the mass of the rotating body omega = the rate of rotation of the rotating body r = the perpendicular distance between the rotating body and the axis of rotation. (The above relationship between properties is simply a reworking of Newton's second law of motion [f = m a] for the case when the acceleration is radial.) Water molecule M is undergoing three types of rotation - rotation about the axis of rotation of the earth - rotation about the center of mass of the earth and moon - rotation about the center of mass of the earth, moon, and sun. For any instantaneous configuration of the earth, moon, and sun, the centrifugal force felt by water molecule M due to each of the above three types of rotation can be computed using the above re- lationship between properties. It is useful to visualize the three gravitational forces and the three centrifugal force acting on water molecule M as vectors, which are imaginary arrows whose tails are attached to the water molecule. The direction of each arrow indicates the direction of the associated force, and the length of each arrow is propor- tional to the strength of the associated force, as given by the above equations. Using simple vector algebra, we can compute the *vector sum* of the six force vectors acting on water molecule M. The vector sum is a single vector pointing in the direction of the *net* force on the molecule. (This vector points almost directly [and occa- sionally directly] toward the center of the earth, because the force of the earth's gravity on water molecule M is several or- ders of magnitude greater than the next strongest of the five other forces.) If we decompose the net force on water molecule M into two vector components, one directed radially toward the center of the earth, and the other directed tangentially to the surface of the earth, we find that the molecule usually has a small component of force on it tangential to the surface of the earth. This tangential force is due to the gravity of the sun and moon and due to the three types of rotation. This force varies in strength and di- rection over the course of a day, as the positions of the sun and the moon change relative to water molecule M. Every water molecule on the earth experiences the tangential forces I describe above, and at any point in time the strength and direction of these forces vary in a smooth fashion over the surface of the earth. These tangential forces cause movement of the molecules in the direction of the forces. A noticeable ef- fect of the movement of the molecules in large bodies of water is a lowering of the water level at a shore the water moves away from, and a rising of the water level at a shore the water moves toward. The tides are these effects. I propose that the above paragraphs are a reasonable *high-level* scientific "explanation" of why there are tides. A lower-level "complete" scientific explanation of why there are tides simply embellishes the above explanation with details, which may include - values of the variables in the two equations to allow computa- tion of the actual forces - the use of trigonometry (a set of abstract relationships be- tween variables that reliably map to the geometry of the real world) to help compute distances and to help with vector addi- tion and vector decomposition - a reworking of the equations in terms of force potential, thereby enabling derivation of a differential equation that can be solved to obtain the equation of the height of the equilib- rium surface above or below the nominal height of the ocean un- der any configuration of the earth, sun, and moon, as given by Chandrasekhar (1995, sec 115, eq. 15); this equation is another relationship between properties - equations (stating relationships between properties) that take further account of the fact that the system is not static, and instead the earth, sun, and moon are all in relative motion (Laplace's dynamical theory) - equations (more relationships between properties) showing the height of the tide at a location as a function of both the water flow rate and the local ocean-floor topography - other relationships between properties that play (second-order) roles in determining the heights of tides. Further discussion of tides is given by Pugh (1987) and Marchuk and Kagan (1989). 6. DISASSEMBLING THE EXPLANATION OF TIDES Now that we have a scientific explanation of why there are tides, let us disassemble the explanation to see what it is made of. My goal here is to identify *all* the components of the explanation, so that we can see whether the explanation goes beyond stating and exercising the concepts of entities, properties, and rela- tionships. In the following sections I attempt to break out all the general ideas that are present in the above explanation of tides. 7. EXISTENCE OF ENTITIES Clearly the explanation of tides relies on the existence of cer- tain entities. In particular, the explanation relies on the im- plied existence of - the earth - the oceans on the earth - the land masses on the earth - the sun - the moon - the force of gravity - the centrifugal force - the tides. Bob implies the existence of some of these entities quite clearly in the second and third sentences in his paragraph above: "We know that there is a sun, ...". That is, "We know that there ex- ists a sun, ...". The exercise of first identifying the main en- tities in the system we wish to study is an effective way to set the scene for subsequent discussion. (It is reasonable to view the tides as an entity because, as noted above, any noun denotes a "thing" that humans treat as an entity. The tides are [or the tide is] an instance of an entity that manifests itself [mainly] in terms of changing heights of water over the course of a day on the shore of a large body of water. That is, the tides appear as changing values of the prop- erty "water height". Note how the human mind likes to make an entity out of any "thing" of interest. In the case of the tides, the thing is an instance of the more general type of entity we call a "process". To be meaningful, an entity must have proper- ties. An important property of the entity [process] "tide" is "water height".) 8. EXISTENCE OF PROPERTIES In addition to relying on implicit statements about the existence of entities, the explanation of tides also relies on implicit statements about the existence of various *properties* of the en- tities, for example - the explanation relies on two particular properties of the gravitational force: "magnitude" and "direction" - the explanation relies on a property of matter: "mass" - the explanation relies on the property "distance between two physical objects", which is determined through a coordinate sy- stem, as discussed in the next paragraph. 9. COORDINATE SYSTEM The explanation of tides makes implicit reference to an astro- nomical coordinate system, which is an imaginary entity that helps us to take account of the relative positions of the sun, the moon, and the earth. The coordinates of any body in the co- ordinate system can be viewed as properties of the body, specify- ing the body's location in the coordinate system. Any pair of objects in a coordinate system has the property "distance be- tween", which we need to know for the denominator term in the law of gravitation. 10. VALUES OF PROPERTIES Any detailed explanation of tides must refer to the *values* of the properties referenced in the explanation. For example, if we wish to use the law of gravitation to compute the force that the sun exerts on water molecule M, we must refer to the actual mass of water molecule M and the actual mass of the sun, as well as the actual distance between water molecule M and the sun. 11. DISTRIBUTIONS OF VALUES The explanation of tides refers to the concept of the *distribu- tion* of the values of a property through the idea of the distri- bution of the values of the tangential component of the net force on molecule M over the course of a day. (Statements of distributions are often accorded less importance than the other types of statements. This may be because the study of a distribution can be viewed as a degenerate case of the study of a relationship between variables. That is, the study of a distribution is the study of the response variable in a rela- tionship between variables in which the set of predictor vari- ables is empty. The use of a non-empty set of predictor vari- ables is preferred because it gives us a more complete under- standing of the response variable under study [generally even when no relationship is found].) 12. RELATIONSHIPS BETWEEN ENTITIES The explanation of tides relies on the locational relationships between the earth, the sun, and the moon. (Some examples of possible type of relationships between entities X and Y are - X is a sub-unit or subsystem of Y - X is a "child" [or other relative] of Y - X is [in some sense] contiguous with or near to Y.) 13. RELATIONSHIPS BETWEEN PROPERTIES The explanation of tides relies on many relationships between properties. First, the explanation relies on Newton's law of gravitation, which is the relationship between properties that allows us to predict the magnitude of a gravitational force. Second, the explanation relies on Newton's second law of motion, which is the relationship between properties that allows us to predict the magnitude of a centrifugal force. Third, the explanation relies on a relationship between proper- ties reflected in the concepts of vector addition and vector de- composition of the forces on a body. I discuss how these con- cepts reflect a relationship between properties in appendix B. In addition, the detailed explanation of tides relies on many other relationships between properties, as suggested in the list at the end of section 5. 14. LINKING OF CONCEPTS The explanation of tides relies on a linking together of all the concepts discussed in sections 7 through 13. The linking of con- cepts in an explanation may be carried out through - simple juxtaposition of relevant statements - the rules of deductive logic or - (when mathematical concepts are involved) the rules of mathe- matics. To facilitate studying the linking between the statements, con- sider the following condensed version of the explanation of tides: (a) There are forces. (b) There are various forces acting on a water molecule on the earth. (c) The forces acting on a water molecule can be computed using certain known relationships between properties. (d) The *net* force acting on a water molecule can be computed from the various individual forces using a certain known re- lationship between properties (= vector addition). (e) The net force acting on a water molecule can be resolved into radial and tangential components using a certain known rela- tionship between properties (= vector decomposition). (f) If the computations in steps (c) through (e) are carried out, we find that the net tangential force on a water molecule on the earth is usually non-zero. (g) If the net tangential force on a water molecule is non-zero, the molecule will move tangentially, according to Newton's second law of motion (which is a statement of a relationship between properties). (h) Tides are simply the movement of the water molecules in the oceans caused by the above tangential force components. Note how statements (c) through (g) center around the concept of a relationship between properties of entities. The actual linkages between statements (a) through (h) are - statement (b) is a special set of cases of statement (a) - statement (c) is an expansion of statement (b) - statements (c) through (f) use relationships between properties to derive the conclusion that the net tangential force on a water molecule is usually non-zero - statement (g) uses a relationship between properties to con- clude that the non-zero tangential forces on the water mole- cules in the oceans imply that the water molecules will move - statement (h) states that the tides are the movement described in statement (g). 15. PREDICTION: THE DEFINITIVE TEST OF AN EXPLANATION In passing, it is important to note that the definitive test of whether the above explanation of tides (or any explanation) is valid is not performed on anything internal to the explanation. Instead, the test is whether the explanation can accurately *predict* the values of the relevant property or variable ("water height" in the case of tides) in the future. That is, scientists deem the scientifically "correct" explanation of tides (or any other phenomenon) to be the explanation that makes the best pre- dictions. (If I have interpreted the formal scientific work on tides cor- rectly, most oceanographers and physicists will agree that the explanation of tides given above [when embellished with the de- tails at the end of section 5] makes better predictions of tides than any other known explanation of tides.) Thus note the crucial role that prediction plays in an explana- tion. In fact, for some people (including me), the possibility of future prediction (or future control) is the actual *goal* of any explanation. That is, people seem to accumulate explanations of phenomena in the hope that some of these explanations will later be useful in predicting or controlling the values of prop- erties of entities. 16. SUMMING UP THE ANALYSIS OF THE EXPLANATION OF TIDES In sections 6 through 14 I analyzed the explanation of tides I gave in section 5. The analysis suggests that a standard scien- tific explanation of tides contains (sometimes implicitly) the following types of statements: - statements of the existence of entities - statements of the existence of properties of entities - statements of the values of properties - statements of the distribution of the values of properties - statements of relationships between entities - statements of relationships between properties - statements linking the other statements together. Thus it appears that the scientific explanation of tides contains the same six types of linked-together statements that I list in section 3 as being the six types of statements that make up a scientific theory. Furthermore, it appears that there is *nothing more* to the sci- entific explanation of tides beyond the relevant instances and linking together of the six types of statements. More generally, I suspect that most (all?) scientific explana- tions can be reasonably viewed as consisting *solely* of relevant instances of the six types of statements, and statements that link the statements together. Can you think of any counterexam- ples? (Some scientific explanations use another type of statement: statements that develop a metaphor. However, metaphors play only an indirect role in explanations, which I discuss in appendix C.) In section 4 I stated the conjecture that A scientific *explanation* or understanding of some phe- nomenon consists of something *beyond* mere identifica- tion and linking together of all the relevant entities, properties, and relationships associated with the phe- nomenon. The discussion in sections 5 through 16 suggests that the above conjecture is false and, instead, the following conjecture is true: A scientific explanation or understanding of some phe- nomenon is simply identification and linking of relevant empirically-studied entities, properties, and relation- ships. 17. LINK I discuss issues about using the entity-property-relationship ap- proach in an introductory statistics course in material at http://www.matstat.com/teach/ -------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc. donmac@matstat.com Toronto, Canada -------------------------------------------------------- APPENDIX A: A CLASSIFICATION OF THE LAWS OF SCIENCE To investigate how the "laws" of science relate to the concept of a relationship between properties, two assistants each carefully scanned each page of the 2088-page _McGraw-Hill Dictionary of Scientific and Technical Terms_ (Parker 1989) for entries that contain the word "law" in the definiendum. They found 213 en- tries that define different laws of science. For each entry I then tried to express the definition for the entry in terms of the concepts of entities, properties, and relationships. I found that three quarters of the laws could be best interpreted as statements of relationships between properties of entities (relationships between variables). The remainder of the laws fell into seven other categories. The eight categories (together with the number of laws in each category) are shown in the fol- lowing table: Classification of 213 "Laws" of Science Defined in the 1989 _McGraw-Hill Dictionary of Scientific and Technical Terms_ ___________________________________________________________ Number of Laws in Percent Law Category Category* of Total ___________________________________________________________ relationship between properties 184 75 non-relationship between properties 27 11 (including 10 "conservation" laws) law of mathematics 14 6 (axiom or theorem) relationship between entities 9 4 value of a property 5 2 distribution of the values of a property 4 2 existence of a property 2 1 existence of an entity 1 .4 other 0 0 ____ ____ TOTALS 246 100 ___________________________________________________________ * The column total is greater than 213 because some laws contained two or more independent statements, each of which was classified separately. The first two rows in the body of the table suggest that the ma- jority of the laws of science are statements about relationships between properties of entities (relationships between variables). In addition, the classification suggests that many (if not all) of the laws of empirical science can be explained in terms of the concepts of entities, properties, and relationships. APPENDIX B: VECTOR OPERATIONS AS RELATIONSHIPS BETWEEN PROPERTIES In section 13 I noted that the addition and decomposition of vec- tors can be viewed as reflecting relationships between proper- ties. This appendix expands on this point. Suppose we apply two forces, f1 and f2, to a free body. Then the body will move in a certain direction with a certain amount of acceleration. It is an interesting question whether there is a single third force, fe, such that if we apply only fe to the body, it will have the same effect on the body as if we had ap- plied f1 and f2. This question was addressed empirically by Simon Stevin, Isaac Newton, and Pierre Varignon in the late sixteenth and seventeenth centuries (Mason 1962, 152). These researchers applied two forces in different directions to a body, and then observed the motion of the body. They then, through a process of trial and error, found a single force that would generate the same motion of the body. By examining a large number of cases (using differ- ent bodies, different sizes of forces, and different directions), they found that the specifications of the single force that du- plicated the action of the two forces could always be obtained through a simple rule -- the well-known parallelogram rule of vector addition. Vector addition of two forces can be symbolized as fe = f1 + f2 (1) where fe, f1, and f2 are all force vectors (each encapsulating two properties of the associated force: magnitude and direc- tion). The three vectors are defined as f1 = magnitude and direction of directed force 1 on the body f2 = magnitude and direction of directed force 2 on the body fe = magnitude and direction of the effective directed force on the body from f1 and f2. The plus sign in (1) does not signify standard addition of num- bers but instead signifies "vector addition" of vectors, as de- fined by the parallelogram rule. The terms fe, f1, and f2 in (1) all represent properties or vari- ables. (In this case they are a special type of variable, name- ly, vectors.) Thus note that (1) is a statement of a relation- ship between properties (relationship between variables). A natural point of view of (1) is that fe is the response variable and f1 and f2 are the predictor variables. Along with the concept of vector addition, the explanation of tides relies on the concept of vector "decomposition" of the net force acting on water molecule M into the radial and tangential components. This decomposition is justified through a reversal of the operation of vector addition. This reversal is simply an- other aspect of the relationship between properties shown in (1), and has been shown to be valid through the same empirical re- search by Stevin, Newton, and Varignon. APPENDIX C: THE ROLE OF METAPHOR IN EXPLANATION A metaphor can offer helpful support to an explanation if there is an isomorphism between (a) elements (entities, properties, or relationships) in the new system being explained and (b) elements (entities, properties, or relationships) in some other familiar system. The linking of the two systems through the metaphor helps stu- dents to understand the new system, because some or all of the relationships between elements in the new system can be mentally tracked in terms of known relationships between elements in the familiar system. However, although metaphors are useful memory aids, they gener- ally do not offer direct logical support for an explanation, be- cause there is usually no substantive reason (beyond the princi- ple of parsimony) why the two systems should behave similarly. REFERENCES Chandrasekhar, S. (1995), _Newton's Principia for the Common Reader,_ Oxford: Clarendon Press. Macnaughton, D. B. (1996), "The Introductory Statistics Course: A New Approach." Available at http://www.matstat.com/teach/ Macnaughton, D. B. (1997), "Response to Comments by John R. Vokey." Posted to EdStat-L and sci.stat.edu on April 6, 1997 under the title "Re: How Should We *Motivate* Students in Intro Stat?" Available at http://www.matstat.com/teach/p0024.htm Marchuk, G. I. and Kagan, B. A. (1989), _Dynamics of Ocean Tides._ Translated by V. M. Divid, N. N. Protsenko, and Y. U. Rajabov. Dordrecht, Netherlands: Kluwer. Mason, S. F. (1962), _A Brief History of the Sciences,_ New York: Macmillan. Parker, S. P., ed. (1989), _McGraw-Hill Dictionary of Scientific and Technical Terms,_ New York: McGraw-Hill. Pugh, D. T. (1987), _Tides, Surges and Mean Sea-Level,_ Chiches- ter, England: John Wiley.
Home Page for the Entity-Property-Relationship Approach to Introductory Statistics