(3). Tools to evaluate causality from different scientific domains
***Some preliminary ground-setting first; causality rests on contrary-to-fact assumptions: this “what if…?” permeates all statistical modeling: the hypothesis testing procedure starts with a long list of ‘if’s, e.g. ‘if patients are independent cases, if the sample was selected randomly, if the distribution of the outcome resembles a normal distribution’, and so on (multiple regression implies several assumptions).
(i). The path analytic ‘tracing rule’ and ‘causal calculus’.
*** The ‘tracing rule’ is a visual inspection rule that allows one to ‘turn correlation into causation’ by decomposing a correlation into its causal and non-causal components. It has been updated recently to handle visually Judea Pearl’s ‘causal calculus’[i]: deriving observational/associational consequences from hypothesized/known causal structures. I will show that this simply means turning expressions that contain ‘unobservables’ (counter-factuals) of the form A1ci If.Lower.BMI into the observable counterparts A1ciLowered.BMI.
To take the simple example from part 1, one common assumption (or expectation, ‘hypothesis’) is that
A1ci If.LowerBMI < A1ci If.HigherBMI
which says that if a patient’s BMI were to drop (‘IF’), his/her A1c would drop too; ignoring for now how this A1c(BMI) functional relation might really look like[ii], some U shape, or more complicated, we commonly just try to ‘confirm’ the presence of this effect Effect BMI -> A1c by ‘fitting’ (forcing unto data!) a linear relation, by means of a linear regression, written at the level of patient i, like
A1ci = Effect.BMI -> A1c · BMIi + errori where the little dot. in Effect.BMI -> A1c reminds us that this effect is one and the same for all folks, i.e. there is no I subscript! If we could derive a universal value (like the g value for the gravity force between 2 bodies), treatments and interventions would be much simpler! This gets complicated when we add a third variable, say Systolic blood pressure SysBP; because we can assume this model behind the data:
A1ci = Effect*.BMI -> A1c · BMIi + Effect*.SysBP -> A1c · SysBPi +error*i (we ignore the ‘intercept’ term, the conditional mean), where we added * (per Reichenbach too [1], p. 137) to mark ‘a change’ in the initial quantity: the initial effect Effect.BMI -> A1c changes to Effect*.BMI -> A1c, it becomes now “the effect of BMI on A1c, controlled for the other effect, of SysBP on A1c’. Commonly, when SysBP and BMI correlate positively, Effect*.BMI -> A1c < Effect.BMI -> A1c.
*** Path analysis was the first formal method promising to separate out the causal and non-causal components from the ‘surface’ association/correlation BMI↔A1c: The answer is simply: (1). Direct causal effect BMI->A1c; (2). Direct causal effect A1c->BMI; (3). Causal effects on both from a common cause 3rd->A1c & 3rd->BMI; (4). Other combinations of these where other variables are involved, like causes of these 3 variables: spelling out graphically how we expect them to be related causally produces a path model[iii]; alternatively, some use equations.
If BMI -> SysBP -> A1c (with no direct BMI -> A1c causal effect), we would expect to see in observational data the correlation between BMI & A1c to be about the product of the correlation between BMI & SysBP and the correlation between SysBP & A1c; if however the causal real-world looks instead like BMI -> A1c -> SysBP, different observational consequences ensue.[iv]
*** Let’s see how would this work, why it does:
If we have some ‘insight’ that one predictor is actually both predicting the final outcome, but also be itself an outcome of the other predictor, we have a second equation to show
A1ci = Effect.BMI -> A1c · BMIi + Effect.SysBP -> A1c · SysBPi + errorA1c i
SysBPi = Effect.BMI -> SysBP · BMIi + errorSysBP i
Which invites replacing from the second expression into the first, to find out the A1c effect:
A1ci = Effect.BMI -> A1c · BMIi + Effect.SysBP -> A1c · (Effect.BMI -> SysBP · BMIi + errorSysBP i) + errorA1c i
so we get
A1ci = BMIi · (Effect.BMI -> A1c + Effect.SysBP -> A1c · Effect.BMI -> SysBP) + (errorA1c i + errorSysBP i)
This directly ‘demonstrates’ for the 3 variable indirect effect model the ‘tracing rule’[v], but we express it in the Barron & Kenny [5] famous language instead: c = c’ + a·b, or the total effect of BMI, c, is composed of a direct effect Effect.BMI -> A1c c’, and an ‘indirect effect’ ind. through the mediator SysBP: Effect.SysBP -> A1c · Effect.BMI -> SysBP. ind. = a·b .
*** Note that this is a rather big departure from the ‘2 merely related predictors’ model: the BMI -> A1c effect there was just the direct effect seen here in the mediation model: mediation changes the very logic and interpretation of the ‘hypothesized effect’ we are searching for! Despite this obvious state of affairs, some still resist the ‘indirect effect’ approach and even its very wording, preferring to talk about tenebrious constructs, like ‘seemingly unrelated regressions’ [6].
*** A more important causation note: turning a co-predictor into a mediator by allowing a direct affect between the primary and (now) secondary predictor changes the causal setting and implications. The best example is the ‘Race –> Health Outcome’ research questions, which are regularly evaluated by including income, e.g., as covariate, to ‘control for it’, i.e. give us the ‘pure’ racial difference effect; the problem is however that Income might causally follow Race, and hence should NOT be controlled for.
+++ Moreover, the ‘gender hiring discrimination’ question, shown by Judea Pearl in BoW (section ‘In search of a language (the Berkeley admissions paradox)’ also see MMM) may turn out to be answerable by assessing the direct effect, distinct from the indirect effect (through say Department type, or even Qualifications ).
*** It can be shown that any statistical model can be ‘solved for’ using the tracing rule, without any software aid: instrumental variable model, latent (common factor) model, latent growth model, etc.
(ii). Other fields: POs and econom~ics/etrics
Several ‘tools’[vi] (‘hammers’ for the same ‘nail’) exist, the graphical ones[vii] are more intuitive, so I focus on them; e.g.:
- Directed Acyclic GraphsModernCausality * Elias Barenboim developed a revolutionary tool that coded the entire ’causal calculus’ math and can derive step by step the implications of any causal model: CausalFusion.net (requires free registration).
- Propensity Score[viii] MatchingClassicalStats (Matching and Subclassification, both under the Potential Outcomes Causal ModelClassicalStats)
- Regression DiscontinuityEconomics
- Instrumental VariablesEconomics
- Difference-in-DifferencesEconomics
- Synthetic ControlNewEconomics
Other tools (still emerging, being explored)a re mentioned in the ‘Opportunities’ part 4.
*** The last part, # 4, will briefly go over some remaining challenges and opportunities for both advancing this field, and for better explaining it, like the ‘equivalence of potential outcomes (‘Rubin’, more properly Cochran’s… see note viii below and image insert) and causal calculus (Pearl) approaches to causality’.
References mentioned
- Reichenbach, H., The philosophy of space and time. 1957: Courier Corporation.
- Kenny, D.A., Statistics for the social and behavioral sciences. Posted by author at https://davidakenny.net/doc/statbook/kenny87.pdf. 1987: Little, Brown Boston.
- Istre, G.R., et al., Increasing the Use of Child Restraints in Motor Vehicles in a Hispanic Neighborhood. American Journal of Public Health, 2002. 92(7): p. 1096-1099.
- Wright, S., The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. Proceedings of the National Academy of Sciences, 1920. 6(6): p. 320-332.
- Baron, R.M. and D.A. Kenny, The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 1986. 51(6): p. 1173-1182.
- Beasley, T.M., Seemingly unrelated regression (SUR) models as a solution to path analytic models with correlated errors. Multiple linear regression viewpoints, 2008. 34(1): p. 1-7.
- Commenges, D. and A. Gégout-Petit, A general dynamical statistical model with causal interpretation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009. 71(3): p. 719-736.
- Wright, S., Correlation and causation. Part I Method of path coefficients. Journal of agricultural research, 1921. 20(7): p. 557-585.
- Wright, S., The Method of Path Coefficients https://drive.google.com/file/d/1Iq3vEFZna4XXzjs-JIW9mnBu7dT-muBC/view?usp=sharing. The Annals of Mathematical Statistics, 1934. 5(3): p. 161-215.
- Wold, H.O., Econometric model building: essays on the causal chain approach. 1964: North-Holland Publishing Company.
- Blalock, H.M., Causal models in the social sciences. 1985: Transaction Publishers.
- Duncan, O.D., Introduction to structural equation models. 1975, New. York: Academic Press.
- Howard, R.A. and J.E. Matheson, Influence diagrams, in Readings on the Principles and Applications of Decision Analysis: General collection, R.A. Howard, Editor. 1981, Strategic Decisions Group.
- Shachter, R.D., Probabilistic inference and influence diagrams. Operations research, 1988. 36(4): p. 589-604.
- Smith, J.Q., Influence diagrams for Bayesian decision analysis. European journal of operational research, 1989. 40(3): p. 363-376.
- Lauritzen, S.L., Lectures on contingency tables. University of Aalborg Press, Aalborg, Denmark. 1979: Inst. of mathematical statistics, University of Copenhagen.
- Wermuth, N. and S.L. Lauritzen, Graphical and recursive models for contingency tables. Biometrika, 1983. 70(3): p. 537-522.
- Kiiveri, H., T.P. Speed, and J.B. Carlin, Recursive causal models https://drive.google.com/file/d/1EFYMJqWs2LrcAlcg4Z0hxIoyg7MQR1tw/view?usp=sharing. Journal of the australian Mathematical Society, 1984. 36(1): p. 30-52.
- Pearl, J., Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 1986. 29(3): p. 241-288.
- Pearl, J., Probabilistic reasoning in intelligent systems: networks of plausible inference https://drive.google.com/file/d/1gYtsPNIoFolgrveDF7kjLawU11yISmdO/view?usp=sharing. 1988: Morgan Kaufmann.
- Geiger, D. and J. Pearl, On the logic of causal models arXiv preprint arXiv:1304.2355, in Machine Intelligence and Pattern Recognition 1990. p. 3-14.
- Cochran, W.G., The comparison of percentages in matched samples https://drive.google.com/file/d/1oqg2fU4RjcSq1LQULIzLtlUkiqBbhYe2/view?usp=sharing. Biometrika, 1950. 37(3/4): p. 256-266.
- Imbens, G.W. and D.B. Rubin, Causal inference in statistics, social, and biomedical sciences. 2015: Cambridge University Press.
- Coman, E., H. Wu, and S. Assari, Exploring Causes of Depression and Anxiety Health Disparities (HD) by Examining Differences between 1:1 Matched Individuals. Brain Sciences 2018. 8(12): p. https://pubmed.ncbi.nlm.nih.gov/30487396/.
Footnotes:
[i] “BoW, p. & “It happened not because I am smarter but because I took Sewall Wright’s idea seriously and milked it to its logical conclusions as much as I could.” SEMNET
[ii] We cannot hope to one day uncover the exact form of this relation, like in physics we have the pressure, volume, and temperature related through a clear law-like relation, or even like some economists propose specific relations between their variables, like in the
[iii] “A statistical model is a formal representation of a set of relationships between variables. […] Sometimes the designation between independent and dependent variable depends on the variables under study and the researcher’s theoretical orientation. For instance, researchers study the relationship between self-esteem and academic performance. Some designate self-esteem as the independent variable and academic performance as the dependent variable. Others reverse the designations. […] A representation of a model that uses arrows is called a path diagram.” [2] p. 184-5
[iv] Note that the ‘no direct effect, the entire effect is indirect, through a mediator’ situation is not far-fetched: the famous one is the ‘blessing of the cars à auto accidents’ intervention [3]:, which worked only through the ‘drivers using more often seat belts’ mediator: the direct effect is not logically possible.
Also of note: rarely we have only three variables on hand to investigate, the causal models are often larger than this, and the consequences are more intricate, but an online app does this reasoning for us: see an example from Family Practice at http://dagitty.net/m4TETpl (model derived from some data analyses though).
[v] “The correlation between two variables can be shown to equal the sum of the products of the chains of path coefficients along all of the paths by which they are connected.” [4] p. 329 & Fig. 6
[vi] “From the beginning, graphs have played an important role in representing the set of causal influences. The pioneering work of Wright (1921, 1934) has inspired the more recent developments of structural equation models (Joreskog, 1978) and graphical models (Dawid, 1979; Lauritzen and Wermuth, 1989; Cox and Wermuth, 1996). An approach using the modelling of ‘potential outcome’, which is often called the counterfactual approach, has been proposed in the context of clinical trials by Rubin (1974) and further studied by Holland (1986) among others. The counterfactual approach has been extended to the study of longitudinal incomplete data in several papers, the results of which have been gathered together by van der Laan and Robins (2002). Spirtes et al. (2000) and Pearl (2000) have developed the issue of investigating causality with graphical models.” [7] p.719-20
[vii] One of the densest statistical quote I know of is ““Networks employing Directed Acyclic Graphs (DAGs) have a long and rich tradition, starting with the geneticist Wright (1921[8]). He developed a method called path analysis [Wright, 1934[9]] which later on, became an established representation of causal models in economics [Wold, 1964[10]], sociology [Blalock, 1971[11]] and psychology [Duncan, 1975[12]]. Influence diagrams represent another application of DAG representation [Howard and Matheson, 1981[13]], [Shachter, 1988[14]] and [Smith, 1987[15]]. These were developed for decision analysis and contain both chance nodes and decision nodes (our definition of causal models excludes decision nodes). Recursive models is the name given to such networks by statisticians seeking meaningful and effective decompositions of contingency tables (Lauritzen, 1982[16]), (Wermuth & Lauritzen, 1983[17]], [Kiiveri et al, 1984[18]]. Bayesian Belief Networks (or Causal Networks) is the name adopted for describing networks that perform evidential reasoning ((Pearl, 1986a[19], 1988[20]]). This paper establishes a clear semantics for these networks that might explain their wide usage as models for forecasting, decision analysis and evidential reasoning.” [21] p.136
[viii] Note that Donald Rubin, who is credited with ‘inventing propensity scores’ does not cite his doctoral advisor’s Cochran 1950 paper in Biometrika [22], where he first proposed the matching tool first; no mention in the book with Imbens [23]; I cite him in [24]