{"id":66,"date":"2025-04-22T21:52:04","date_gmt":"2025-04-23T01:52:04","guid":{"rendered":"https:\/\/health.uconn.edu\/causality\/?page_id=66"},"modified":"2026-03-22T13:22:46","modified_gmt":"2026-03-22T17:22:46","slug":"walkstats","status":"publish","type":"page","link":"https:\/\/health.uconn.edu\/causality\/walkstats\/","title":{"rendered":"Research and basic statistical analyses"},"content":{"rendered":"<p><span style=\"text-decoration: underline\"><strong>(2).<\/strong> Walk-through basic statistical analyses in Excel, with a causality aim<\/span><\/p>\n<p><strong>(0). Preliminaries<\/strong><\/p>\n<p><strong>0.i.<\/strong> What is \u2018scientific research\u2019?<\/p>\n<p>*** Simply: a systematic discovery process that follows agreed-upon methodology and verification rules. Practically: understanding the natural (and more) world, deriving the \u2018laws\u2019 that govern it (or law-like rules <a href=\"#_edn1\" name=\"_ednref1\"><span>[i]<\/span><\/a>).<\/p>\n<p>There are many \u2018sciences\u2019 out there <a href=\"#_edn2\" name=\"_ednref2\"><span>[ii]<\/span><\/a>, and the community of researchers advancing each of their fields, sometimes radically change their approach (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Paradigm_shift\">paradigm shifts<\/a>, Kuhn, e.g.).<\/p>\n<p>The \u2018scientific method\u2019 in general has been extensively investigated; three such less-known Herculean efforts are worth noting: Bernard Bolzano [9] in 1837, Alfred Korsybski [10] in 1933, and James Fetzer [1] in 1981.<\/p>\n<p><strong>0.ii.<\/strong> What is research methodology?<\/p>\n<p>*** Each science has slightly different such methods or procedures for arriving at agreed-upon truths; philosophy has its own research methodology too, for instance [11] <a href=\"#_edn3\" name=\"_ednref3\"><span>[iii]<\/span><\/a>.<\/p>\n<p>\u2018Research design\u2019 belongs to the \u2018research methodology\u2019 package, and it means the specific procedure for measurement and collection of data, analysis and interpretation (\u2018inferring truths\u2019). Health research, clinical research, and public health research have slightly different emphases, but share largely some elements, like: measurement\/instrument development; validity and reliability; analysis of data and causal inference.<\/p>\n<p><strong>0.iii.<\/strong> How does statistics\/data analytics mix in the research methodology equation?<\/p>\n<p>(an applied causality focused &#8216;walk through intro to statistics&#8217; is at <a href=\"https:\/\/tinyurl.com\/causalstats\">Tinyurl.com\/CAUSALSTATS )<\/a><\/p>\n<p><strong>(A).<\/strong> Variables are mainly of two kinds: categorical (\u2018this\u2019 vs. \u2018not this\u2019) or continuous (e.g. 29.5, one\u2019s BMI, Body Mass Index) <a href=\"#_edn4\" name=\"_ednref4\"><span>[iv]<\/span><\/a>: the math differs broadly between the two <a href=\"#_edn5\" name=\"_ednref5\"><span>[v]<\/span><\/a>, and there are other variants (counts of \u2018this\u2019, e.g.), but modern stats can handle them easily (not always intuitively though, see \u2018logistic regression\u2019: entire textbooks are devoted to this analytic model, for good reasons).<\/p>\n<p>We skip for now the \u2018types of data\u2019 that health and clinical researchers extract and investigate, but just note that some comes from within organs, i.e. investigating molecular-level dynamical processes, other within-person still but \u2018lots if it\u2019 (\u2018big data\u2019), like genetic data, or brain imaging, some more ripe for between-person RQs, like BMI, while other data come in from higher levels, like regional epidemiologic data: sometimes we are called to combine them, and ask more complex RQs, and use more complex statistics (multi-level models, e.g.).<\/p>\n<p><strong>(i). <\/strong>Because people differ so much, and on so many levels, the \u2018typical\u2019 value of a variable is a useful practical \u2018tool\u2019 in modeling health relevant processes: formally it is called the \u2018expected value\u2019, but the \u2018mean\u2019 <a href=\"#_edn6\" name=\"_ednref6\"><span>[vi]<\/span><\/a> is a more common label. The intuition is quicker (while simplifying!) visually: \u2018averaging\u2019 is a form of \u2018fitting\u2019 values in a simpler shape: if some have low BMI values, some high values, the average value comes from inquiring: \u2018what if all had the same value (but the total would be the same), what would that be\u2019? Van Inwagen\u2019s example shows just that <a href=\"#_edn7\" name=\"_ednref7\"><span>[vii]<\/span><\/a>.<\/p>\n<p><strong>(ii). <\/strong>At its core, statistics is merely a process of counting \u2018ducks\u2019 <a href=\"#_edn8\" name=\"_ednref8\"><span>[viii]<\/span><\/a>, and comparing totals, of and between the \u2018same\u2019 things <a href=\"#_edn9\" name=\"_ednref9\"><span>[ix]<\/span><\/a>. Practically, however, analyses handle two types of variables:<\/p>\n<p><strong>(1).<\/strong> Categorical, of which the yes\/no (1\/0) binary kind is the main one, all the other ones (like \u2018racial group\u2019) being reduceable to as many binary ones as categories (e.g. White\/not, Black\/not, etc) <a href=\"#_edn10\" name=\"_ednref10\"><span>[x]<\/span><\/a>;<\/p>\n<p><strong>(2). <\/strong>Continuous, like BMI or A1C or Systolic blood pressure (SysBP) <a href=\"#_edn11\" name=\"_ednref11\"><span>[xi]<\/span><\/a>.<\/p>\n<p><strong>(iii). <\/strong>Statistical tests answer Research Questions (RQs), and absent RQs are uninformative with respect to data: they, by themselves, do not \u2018make sense of data\u2019. So let\u2019s take some 1 by 1, btw, all are implanted in Excel, posted @ <a href=\"https:\/\/www.tinyurl.com\/101statsexcel\">Tinyurl.com\/101STATSEXCEL<\/a><\/p>\n<p>***Statistics quick intro accompanying the Youtube walkthroughs.<\/p>\n<p><strong>(1).<\/strong> First from 1. <a href=\"https:\/\/tinyurl.com\/INTRSTATS1\">Tinyurl.com\/INTRSTATS1<\/a><\/p>\n<p><strong>RQ.1. <\/strong>Research Question 1: Are there more overweight males than females?<\/p>\n<p>*** The RQ invites relating 2 binary variables, and the chi-squared test is implemented by hand in the worksheet: it simply compares 4 values, crossing the 2 levels of each binary variable, so how many cases are in each cross-categories (0,0), (1,0), (0,1), and (1,1), to another such set of 4 numbers, the 4 n\u2019s in the null-hypothesis (H0) setup of \u2018independence\u2019, or formally P(Overweight | Gender) = P(Overweight), meaning gender does not add information in predicting if one is overweight (in the expected context); this translates directly in the same % of overweight in the males as in the females group. Note that the cells\u00a0 (0,0) and (1,1) indicate the presence of a \u2018relation\u2019, same in one variable goes hand in hand with same in the other, while (1,0) and (0,1)point to lack of a relationship, high in one go hand in hand with \u2018low\u2019 in the other one. Note also that chi-squared test is non-directional, it does not tell us which (if!) causes the other one\u2026<\/p>\n<p>*** This asks whether these conditional (on gender, knowing gender, \u2018fixing\u2019 gender by \u2018seeing\u2019 it, not by intervention!, not \u2018setting\u2019 it) probabilities are equal<\/p>\n<p>P(Overweight | Males) P(Overweight | Females)<\/p>\n<p><strong>RQ.1.b. <\/strong>Research Question 1: Do males and females differ in body mass?<\/p>\n<p>*** This asks us to compare some average BMI levels, of males and females, so it asks (if we use Expectation(BMI) for Average(BMI)) to compare these averages (conditional on gender, knowing it, \u2018fixing\u2019 it by \u2018seeing\u2019 it)<\/p>\n<p>Average(BMI | Males) E(BMI | Females)<\/p>\n<p>*** A plain t-test for independent samples would answer this quite well, but many statistical tools can \u2018hammer this nail\u2019: a regression of BMI on \u2018female gender\u2019 (=1, or \u2018yes\u2019) would yield as a conditional mean (called \u2018intercept\u2019 in regression modeling, a bit confusing) the value E(BMI | Males), and the regression coefficient would tell us how much higher (if &gt;0) or lower (if&lt;0) is the BMI for females.<\/p>\n<p><strong>RQ.2.<\/strong> Research Question 2: Do patients change their BMI category (overweight vs. not) pre to post?<\/p>\n<p>*** This illustrates how <strong>a same<\/strong> 2by2 cross-table can be read completely differently: the 4 inside cells (0,0), (1,0), (0,1), and (1,1) now mean something else (than the chi-squared test example in <strong>RQ.1.<\/strong>): \u00a0(0,0) and (1,1) mean stability=no change, whereas (1,0) and (0,1) mean changes (up, and down). So the H0 changes its context and setup: it means stability, or expects no cases in the (1,0) and (0,1) cells. It also points to a possible alternative test, by creating a \u2018change score\u2019.<\/p>\n<p><strong>RQ.2.b. <\/strong>Research Question 2: Do patients change their BMI levels?<\/p>\n<p>*** This can be answered by a simple \u2018related samples\u2019 t-test, which is in fact a \u2018model of change\u2019, of which we can use better ones (see [21] or <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/24124419\/\">URL<\/a>)<\/p>\n<p><strong>(2). <\/strong>Then from <a href=\"https:\/\/tinyurl.com\/INTRSTATS2\">Tinyurl.com\/INTRSTATS2<\/a><\/p>\n<p><strong>RQ.3. <\/strong>Is the level of HgA1c predicted by BMI?<\/p>\n<p>*** This implements by hand a simple regression analysis, of one continuous (\u2018dependent\u2019, we assign this status) variable unto another continuous variable (&#8216;independent\u2019, improperly called, it is <em>not<\/em> truly <em>not<\/em> dependent on anything). Excel can run such an analysis with its free \u2018Data Analysis\u2019 add-on.<\/p>\n<p><strong>RQ.4.: <\/strong>What is the effect of BMI on Systolic Blood Pressure (BP)?<\/p>\n<p>*** This prepares us for the 3 continuous variables models to follow; it\u2019s just like RQ.3.<\/p>\n<p><strong>(3). <\/strong>Then from <a href=\"https:\/\/tinyurl.com\/INTRSTATS4\">Tinyurl.com\/INTRSTATS4<\/a><\/p>\n<p><strong>RQ.4.a.<\/strong>: What is the effect of BMI on Systolic Blood Pressure (BP), net of (distinct from) that of A1c?<\/p>\n<p>*** This is now a multiple regression analysis, also implemented by hand, to illustrate how the initial effect\u2019 seen RQ.4. changes, and why: when the 2 predictors are themselves co-related, this change will always happen.<\/p>\n<p><strong>RQ.4.b.<\/strong>: Does the direct effect of BMI on Systolic Blood Pressure (BP), vary with the level of HgA1c?<\/p>\n<p>*** This is a natural extension of the RQ.4.a. above, because if there is such \u2018effect modification\u2019, we cannot really speak of \u2018an\u2019 effect of BMI, but a range of such effects,<\/p>\n<p><strong>RQ.4.c.<\/strong>: What is the total effect of BMI on Systolic Blood Pressure (BP), including the indirect effect through HgA1c?<\/p>\n<p>*** This RQ steps us out of the regression modeling domain, and allows for one variable to be both a predictor\/cause and an outcome\/effect: the mediator is such a variable. This changes the logic drastically: we do not speak of just one direct effect of BMI, but a larger one, which has a direct and an indirect pathways into the final outcome: the RQ itsels contains this word in it, for this reason (some are interested in the indirect effect only \u2018Is A1c mediation the effect of \u2026?\u2019)<\/p>\n<p><strong>RQ.4.d.<\/strong>: Does the indirect effect of BMI on Systolic Blood Pressure (BP), through HgA1c, vary with the very level of BMI?<\/p>\n<p>*** This is a simple extension similar to the move from multiple regression to multiple regression with interaction; now we step into what is improperly perhaps called \u2018causal\u2019 mediation: such a model can be easily analyzed, it can be done \u2018by hand\u2019 too (I recommend intuitive graphical freeware <a href=\"https:\/\/onyx-sem.com\/\">Onyx<\/a>, e.g.). More details here [22] or at <a href=\"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC5298993\/\">URL<\/a>.<\/p>\n<p><strong>RQ.5.<\/strong>:Is there a common cause (latent factor, e.g. &#8220;Bad Health&#8221;) behind all 3: BMI, HgA1c, and Systolic BP?<\/p>\n<p>*** This illustrates how simple it is to \u2018run a factor analysis by hand\u2019 in Excel: the factor loadings are calculated using the \u2018tracing rule\u2019 shown in part 3 here. It shows that latent variables\u2019 are just variables themselves, we just don\u2019t have the data for them in the datasets!<\/p>\n<p><strong>RQ.6. <\/strong>Research Question: Are cases changing over 3 time points similarly? (Are those starting off higher changing slower?)<\/p>\n<p>*** This similarly shows how one can derive variances and the covariance between two latent variables, also by hand (but replicated in <a href=\"https:\/\/onyx-sem.com\/\">Onyx<\/a>)<\/p>\n<p><strong>(4). <\/strong>And then in part 3 the spatial analytics example from\u00a0 Tinyurl.com\/INTRSTATS3 with <a href=\"https:\/\/tinyurl.com\/BLOGSTATS3\">Tinyurl.com\/BLOGSTATS3<\/a> Excel @ <a href=\"https:\/\/tinyurl.com\/SPATIALSSM\">Tinyurl.com\/SPATIALSSM<\/a><\/p>\n<p><strong>\u00a0<\/strong><strong>RQ.7.:<\/strong> Are states with more residents in poverty expected to live shorter lives? By how much, comparing classic vs. spatial regression?<\/p>\n<p>*** This implements in Excel for the first time (I think) a spatial regression: extending a regression, done by hand, to account for the interdependencies (\u2018nonindependence\u2019, \u2018autocorrelation\u2019) present in spatial )areal, regional) data. More details here [23] or at <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/34910138\/\">URL<\/a>.<\/p>\n<p>*** Other such walk-throughs are within reach, e.g. for survival analysis, and such.<\/p>\n<p>***I will show in part 3 several tools to derive causal effects from non-interventional or observational data.<\/p>\n<p>References<\/p>\n<ol>\n<li>Fetzer, J.H., <em>Scientific knowledge: Causation, explanation, and corroboration<\/em>. Vol. 69. 1981: Springer Science &amp; Business Media.<\/li>\n<li>Munson, R., <em>Why medicine cannot be a science.<\/em> The Journal of Medicine and Philosophy, 1981. <strong>6<\/strong>(2): p. 183-208.<\/li>\n<li>Gabbay, D.M., et al., <em>Philosophy of medicine<\/em>. Vol. 16. 2011: Elsevier.<\/li>\n<li>Solomon, M., J.R. Simon, and H. Kincaid, <em>The Routledge companion to philosophy of medicine<\/em>. 2016: Taylor &amp; Francis.<\/li>\n<li>Nordenfelt, L., <em>Concepts and measurement of quality of life in health care<\/em>. 1994: Springer Science &amp; Business Media.<\/li>\n<li>Gifford, F., <em>Philosophy of medicine<\/em>. Vol. 16. 2011: Elsevier.<\/li>\n<li>Marcum, J.A., <em>An introductory philosophy of medicine: Humanizing modern medicine<\/em>. Vol. 99. 2008: Springer Science &amp; Business Media.<\/li>\n<li>Laake, P., H.B. Benestad, and B.R. Olsen, <em>Research methodology in the medical and biological sciences<\/em>. 2007: Academic Press.<\/li>\n<li>Bolzano, B., <em>Theory of science (Wissenschaftslehre)<\/em>. 1837 (2014).<\/li>\n<li>Korsybski, A., <em>Science and sanity An introduction to Non-Aristotelian Systems<\/em>. 1933, New York: Science Press Printing Co.<\/li>\n<li>Williamson, T., <em>Philosophical Method: A Very Short Introduction <\/em><a href=\"https:\/\/drive.google.com\/file\/d\/1VtLAqu9Vp8A7b7UTBqfX1GU1mDEnETMP\/view?usp=sharing\"><em>https:\/\/drive.google.com\/file\/d\/1VtLAqu9Vp8A7b7UTBqfX1GU1mDEnETMP\/view?usp=sharing<\/em><\/a>. 2020: Oxford University Press.<\/li>\n<li>Bandyopadhyay, P.S. and M.R. Forster, <em>Philosophy of Statistics <\/em><a href=\"https:\/\/drive.google.com\/file\/d\/0B3MsYamUWdKwczhPWW9HeWx6UTg\/view?usp=sharing&amp;resourcekey=0-FYiCvcDt74KNWUGEm1vpYg\"><em>https:\/\/drive.google.com\/file\/d\/0B3MsYamUWdKwczhPWW9HeWx6UTg\/view?usp=sharing&amp;resourcekey=0-FYiCvcDt74KNWUGEm1vpYg<\/em><\/a>, ed. D.M. Gabbay, P. Thagard, and J. Woods. Vol. 7. 2011: Elsevier.<\/li>\n<li>Albert, J.B. and A.J. Rossman, <em>Workshop Statistics: Discovery with Data. A Bayesian approach <\/em><a href=\"https:\/\/drive.google.com\/file\/d\/1ok2n3ju23wOenxws-g7hx7HGlZe9-X6f\/view?usp=sharing\"><em>https:\/\/drive.google.com\/file\/d\/1ok2n3ju23wOenxws-g7hx7HGlZe9-X6f\/view?usp=sharing<\/em><\/a>. 2001: John Wiley &amp; Sons.<\/li>\n<li>Hoffman, L., <em>Longitudinal analysis: Modeling within-person fluctuation and change<\/em>. 2015: Routledge.<\/li>\n<li>Devore, J.L., <em>Probability and statistics for Engineering and the Sciences.<\/em> Pacific Grove: Brooks\/Cole, 2016.<\/li>\n<li>Inwagen, P.v., <em>The Rev\u2019d Mr Bayes and the Life Everlasting <\/em><a href=\"https:\/\/drive.google.com\/file\/d\/1Ipx1DwJQnvXvyPgjr5EGF2OT8Al4N7gk\/view?usp=share_link\"><em>https:\/\/drive.google.com\/file\/d\/1Ipx1DwJQnvXvyPgjr5EGF2OT8Al4N7gk\/view?usp=share_link<\/em><\/a>, in <em>Reason and Faith: Themes from Richard Swinburne<\/em>, M. Bergmann and J.E. Brower, Editors. 2016, Oxford University Press. p. 196-219.<\/li>\n<li>Conover, W.J., <em>Practical nonparametric statistics<\/em>. Food and Agriculture Organization of the United Nations. 1999, New York: Willey &amp; Sons.<\/li>\n<li>Roughgarden, J., <em>Theory of population genetics and evolutionary ecology: an introduction.<\/em> 1979.<\/li>\n<li>Fetzer, J.H., <em>Sociobiology and epistemology<\/em>. Vol. 180. 1985: Springer Science &amp; Business Media.<\/li>\n<li>Hagood, M.J., <em>Statistics for Sociologists<\/em>. 1941, Henry Holt &amp; Co.: New York.<\/li>\n<li>Coman, E.N., et al., <em>The paired t-test as a simple latent change score model.<\/em> Frontiers in Quantitative Psychology and Measurement, 2013. <strong>4, Article 738<\/strong>.<\/li>\n<li>Coman, E.N., F. Thoemmes, and J. Fifield, <em>Commentary: Causal Effects in Mediation Modeling: An Introduction with Applications to Latent Variables <\/em><a href=\"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC5298993\/\"><em>https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC5298993\/<\/em><\/a><em>.<\/em> Frontiers in Psychology, 2017. <strong>8<\/strong>(151).<\/li>\n<li>Coman, E.N., S. Steinbach, and G. Cao, <em>Spatial Perspectives in Family Health Research <\/em><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/34910138\/\"><em>https:\/\/pubmed.ncbi.nlm.nih.gov\/34910138\/<\/em><\/a> <a href=\"https:\/\/academic.oup.com\/fampra\/article\/39\/3\/556\/6463006\" class=\"broken_link\"><em>https:\/\/academic.oup.com\/fampra\/article\/39\/3\/556\/6463006<\/em><\/a><em>.<\/em> Family Practice, 2022. <strong>39<\/strong>(3): p. 556\u201356.<\/li>\n<\/ol>\n<p>Footnotes:<\/p>\n<p><a href=\"#_ednref1\" name=\"_edn1\"><span>[i]<\/span><\/a> Example of formal treatment of such options: \u201cLet us differentiate terminologically between lawlike sentences (whether subjunctive or causal conditional in form) and instantiations of sentences of this kind by referring to the latter as &#8220;nomological conditionals&#8221; (of either subjunctive or causal conditional form). We may further distinguish the two kinds of nomological conditionals as &#8220;simple&#8221; and as &#8220;causal&#8221;, respectively.<\/p>\n<p>It obviously follows that lawlike sentences are completely general nomological conditionals, i.e., all nomological conditionals are instantiations of lawlike sentences. It also follows (less obviously, perhaps) that nomic conditionals of neither kind are logically true, since the lawlike sentences which they instantiate attribute permanent properties to the members of reference classes, where the possession of those properties is never implied by the descriptions of those classes.\u201d [1] p. 49.<\/p>\n<p><a href=\"#_ednref2\" name=\"_edn2\"><span>[ii]<\/span><\/a> How many \u2018sciences\u2019 are there? Is \u2018implementation science\u2019 a science? \u2018Communications science\u2019? \u2018Exercise Science\u2019? Ask Claude.ai. e.g. [see Appendix]. There are also \u2018memory science\u2019, \u2018learning science\u2019, and \u2026 the science of \u2018cause and effect\u2019 (or \u2018causal inference\u2019, see <a href=\"https:\/\/bayes.cs.ucla.edu\/WHY\/why-ch1.pdf\">Pearl\u2019s BoW<\/a> p. 10: \u2018not a fancy name\u2019).<\/p>\n<p>&#8211; Sciences can be thought of as \u2018physical sciences, life sciences, and Earth sciences\u2019, or on another dimension \u2018Social Sciences, Formal Sciences, and Applied Sciences\u2019.\u00a0 Medicine e.g. belongs to life sciences and is an applied science (some questioned its membership among sciences, see [2], published in <a href=\"https:\/\/academic.oup.com\/jmp\/issue\/49\/3\" class=\"broken_link\"><em>The Journal of Medicine and Philosophy<\/em><\/a>! Philosophy of medicine is a field that looks at issues like: frequency and propensity, causality and causal inference in medicine, the interpretation of probability in causal models for medicine , discovery in medicine , realism and constructivism in medicine , race in medicine , phenomenology and hermeneutics in medicine and philosophy of epidemiology, and medicine as a commodity [3, 4], also [5-7].<\/p>\n<p>*** Another ordering dimension can be \u2018how causal their theories\u2019, or how strong causal statements and predictions a science makes, or using alternative language how much uncertainty, unexplained \u2018data\u2019, or \u2018model error\u2019 component each science carries.<\/p>\n<p>* \u201cSciences provide different approaches to the study of man: man can be scrutinized in terms of molecules, tissues and organs, as a living creature, and as a social and a spiritual person. Correspondingly, philosophy of science investigates the philosophical assumptions, foundations, and implications of the sciences. It is an enormous field, covering sciences such as mathematics, computer sciences and logic (the formal sciences), social sciences, the natural sciences, and also methodologies of some of the humanities, such as history.\u201d [8] p.1, Chapter 1, Philosophy of Science, Bj\u00f8rn Hofmann, S\u00f8ren Holm and Jens-Gustav Iversen; \u201cThe glue of the world: causation: A pivotal task of the biomedical sciences is to find the causes of phenomena, such as disease.\u201d p. 2<\/p>\n<p>&amp; *** \u201cMedicine can, of course, be scientific in ways that are easily specified, and medicine can participate in scientific research and contribute to scientific understanding.\u201d &amp; \u201cscience and medicine are inherently different.\u201d\u00a0 &amp; \u201cmedicine and science differ both in their aims and their criteria for success: the aim of medicine is to promote health through the prevention and treatment of disease, while the aim of science is to acquire knowledge; medicine judges its cognitive formulations by their practical results in promoting health, while\u00a0 science evaluates its theories by the criterion of truth.\u201d &amp; \u201cmedicine (as\u00a0 medical practice) has a moral aspect that is not present in science\u201d).<\/p>\n<p><a href=\"#_ednref3\" name=\"_edn3\"><span>[iii]<\/span><\/a> Philosophers have been involved in serious statistical matters too for a long while, especially in the causality domain; but there is also Philosophy of statistics [12], which has a chapter on \u2018Various Issues about Causal Inference\u2019 for instance, but also on \u2018Approaches to Simplicity Related to Inference and Truth\u2019 and on \u2018Attempts to Understand Different Aspects of \u201cRandomness\u201d \u2019.<\/p>\n<p><a href=\"#_ednref4\" name=\"_edn4\"><span>[iv]<\/span><\/a><strong> i.<\/strong> \u201cIn this book, we will distinguish between two different types of variables. A categorical variable is a characteristic of an individual which can be broken down into different classes or categories.<\/p>\n<p>Simple examples of a categorical variable are the eye color of a student, the political affiliation of a voter, the manufacturer of your current car, and the letter grade in a particular class.<\/p>\n<p>Typically, a categorical variable is nonnumerical, although numbers are occasionally used in classification.<\/p>\n<p>The social security number of a person is an example of a categorical variable, since its main purpose is to identify or classify individuals. Binary variables are categorical variables for which only two possible categories exist.<\/p>\n<p>A measurement variable is a number associated with an individual that is obtained by means of some measurement. Examples of a measurement variable include your age, your height, the weight of your car, and the distance that you traveled during your Thanksgiving vacation. A measurement variable will have a range of possible numerical values. A person\u2019s age, for example, ranges from 0 to approximately 100.\u201d [13]\u00a0 p. 5<\/p>\n<ol>\n<li><strong> <\/strong>\u201cThroughout the text, I will use the phrase continuous for quantitative variables (even if they are not truly continuous in the sense of having all possible intermediate values between integers), and the phrase categorical for discrete, grouping variables (i.e., in which differences between specific levels are of interest, although those levels may or may not be ordered).\u201d [14] p. 9<\/li>\n<\/ol>\n<p><a href=\"#_ednref5\" name=\"_edn5\"><span>[v]<\/span><\/a> The \u2018basic difference\u2019 can be shown in how\u00a0 one calculates typical or expected values: For binary data we look for the probability of a quality (a 1, or a \u2018yes\u2019, e.g. being diabetic), P(Diabetic), or when considering other binary ones, like gender, the conditional probability P(Diabetic | Male), e.g., whereas for continuous ones we look for typical or average values, expressed with the \u2018expected value\u2019 E symbol E(A1c), invented by Blaise Pascal in 1665: the\u00a0 \u2018expectation\u2019, commonly called the mean, or the conditional expectation E(A1c | Male), e.g. (commonly called the \u2018intercept\u2019): For continuous but discrete variables we calculate as \u03bc<sub>X <\/sub>= E(X) = \u03a3<strong><sub>i<\/sub><\/strong> [x<strong><sub>i<\/sub><\/strong>\u00b7p(x<strong><sub>i<\/sub><\/strong>)], whereas fully continuous ones with \u03bc<sub>X <\/sub>= E(X) =\u00a0 \u00a0[see [15], p. 152]<\/p>\n<p><a href=\"#_ednref6\" name=\"_edn6\"><span>[vi]<\/span><\/a> There are many formulas for \u00a0means, the arithmetic mean is the most known one, and it emerges from the general \u2018expectation\u2019 definition shown in the note above, when all values are equally probable; in some cases, these individual values are not equally probable, e.g. when they are not independent, as in the case of regional or spatial data: an individual region\u2019s value depends on its neighbors\u2019 values.<\/p>\n<p><a href=\"#_ednref7\" name=\"_edn7\"><span>[vii]<\/span><\/a> The intuition becomes evident from van Inwagen\u2019s <a href=\"https:\/\/drive.google.com\/file\/d\/1Ipx1DwJQnvXvyPgjr5EGF2OT8Al4N7gk\/view?usp=sharing\">chapter<\/a>: \u201cThe Mean Value Theorem tells us that the area of the rectangle with base a and height h is equal to the area of the shaded region\u2014the area of the region under the graph of g whose width is the same as the rectangle with base a. And that\u2019s so intuitive an idea that in some developments of real analysis it\u2019s the basis of the definition of the average value of a function on an interval. \u201d [16] p. : 205-6\u00a0 (image below)<a href=\"https:\/\/health.uconn.edu\/causality\/wp-content\/uploads\/sites\/264\/2025\/04\/inwagen_mean.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/health.uconn.edu\/causality\/wp-content\/uploads\/sites\/264\/2025\/04\/inwagen_mean.png\" alt=\"\" width=\"271\" height=\"284\" class=\" wp-image-88 aligncenter\" \/><\/a><\/p>\n<p><a href=\"#_ednref8\" name=\"_edn8\"><span>[viii]<\/span><\/a> \u201cThe process of computing probabilities often depends on being able to count, in the usual sense of counting, &#8220;1, 2, 3,&#8221; and so on. The usual way of counting becomes quite tedious in some complicated situations\u201d [17] p. 5-6.<\/p>\n<p>The probability theory foundation rests on counting unique and non-overlapping units\/entities, or events. The logical operations involved in the very definitions of conditional and joint probabilities require distinguishability, when one e.g. talks about P(AB) and P(A | B): events A and B happening means A, and \u2018then\u2019 B happening, separately, distinctly: but \u2018then\u2019 here encodes simultaneity, for is time passes, we have complications, one being that the \u2018independence\u2019 is not a symmetric relation anymore. But see \u201cSometimes the notions of independent events and mutually exclusive events are confused with each other, because both notions give the impression that &#8220;the two events do not have anything to do with each other.&#8221; The property of independence depends not only on the two events being considered but also on the particular probability function defined on the sample space. It is possible for P(AB) and P(A)P(B) to be equal to each other with one set of probabilities and to be unequal with another set of probabilities. But see: &#8220;mutually exclusive&#8221; simply means the two events have no points in common, and no matter what probability function is defined on the sample space, AB is empty, so P(AB) = 0. If A and B are mutually exclusive, they will be independent only if either P(A) or P(B) equals zero, since Equation 5 must be satisfied.\u201d [17] P. 19<\/p>\n<p><a href=\"#_ednref9\" name=\"_edn9\"><span>[ix]<\/span><\/a> \u2018Same\u2019 and \u2018different\u2019 can run into ontological troubles when \u2018small\u2019 and \u2018large\u2019 infinity are added in the arguments: a tricycle is indistinguishable from the same object in which one removes an infinitesimally small \u2018chunk\u2019, say part of the \u2018third\u2019 wheel, and so one after removing a 2<sup>nd<\/sup> such small part: if we continue this process until the 3<sup>rd<\/sup> wheel is completely removed, we arrive at a bicycle, which has a different ontological status, but each intermediate steps were indistinguishable entities. A good visual depiction of the challenge is the \u2018turning of a square into a circle\u2019. Another exercise applies to removing neurons from a human brain 1 by 1, and replacing them with electronic counterparts: at what point the entity becomes a \u2018machine\u2019?<\/p>\n<p>* Another complication is the \u2018decomposition\u2019 statistical habit: we divide up the variance into 2 (mutually exclusive, we assume, hence additive) components, like: \u201cThe total variance in a phenotypic trait in a population can be divided into two types, between-genotype-variance and within-genotype-variance. Some phenotypic differences in a population are due to genotypic differences; this variance, the between-genotype-variance, is represented by V<strong><sub>G<\/sub><\/strong>&#8216; The remaining variance, obviously, is not due to genotypic differences. The name given this type of variance is environmental variance, V<strong><sub>e<\/sub><\/strong>. 4 Thus we have the following equation (where V<strong><sub>T<\/sub><\/strong>&#8216; stands for the total variance): 5 V<strong><sub>T<\/sub><\/strong> = V<strong><sub>G<\/sub><\/strong> + V<strong><sub>e<\/sub><\/strong> (3) [This equation and what follows in the genetic definition of heritability require some strong assumptions. See Roughgarden ([18] 1979, chapter 9) for details.]\u201d [19], p. 60<\/p>\n<p><a href=\"#_ednref10\" name=\"_edn10\"><span>[x]<\/span><\/a> An older description is more intuitive than newer nominal, ordinal, scale, and ration levels: Margaret Hagood ([20]) listed in 1946 three type of variables (or characteristics) \u201cI. Nonquantitative, A. <em>Dichotomous<\/em><\/p>\n<p>(Example: Sex-Male or not); or B. <em>Manifold<\/em>\u00b7 classifications (Example: Regional location-Northeast, Southeast, Southwest, Middle States, Northwest, Far West); II. \u201cQuantitative characteristics for which precise measuring devices have not been developed\u201d (Example: Condition of housing-Good, fair, poor); and III. Quantitative characteristics for which measuring devices provide measures with units equal and additive: A. Those for which incidence is measured in integers, (Example: Fertility (number of children ever borne)); B. Those for which finely graduated degrees of incidence can be measured (measures made Off a theoretically continuous scale) (Example: Age)\u201d p. 106.<\/p>\n<p><a href=\"#_ednref11\" name=\"_edn11\"><span>[xi]<\/span><\/a> Note that analysts regularly turn continuous into categorical, and the other way around too: when A1c levels are split into normal, prediabetic and diabetic, we lose precision by categorizing; when we analyze the binary diabetic vs. not\u2019 binary outcome in a logistic regression, we do the opposite: we \u2018stretch\u2019 a binary variable to be able to model a continuous \u2018probability of being diabetic\u2019, with continuous values between 0 and 1. Latent variables follow the same logic: \u2018addiction\u2019 can be seen as a continuous latent variables, emerging from 2 categories, not addicted\u2019 and \u2018addicted\u2019.<\/p>\n<p>*** Some categorical variables have a clear \u2018more\u2019 direction, they are called \u2018ordinal\u2019; a natural one is \u2018partially engorged\u2019, \u00a0\u2018slightly engorged\u2019 and \u2018engorged\u2019 ticks: the <a href=\"https:\/\/portal.ct.gov\/caes\">CAES station<\/a> tests them for CT residents for free, but cannot test the \u2018not engorged enough\u2019 ones, practically.<\/p>\n<p>*** This kind of gradation appears when talking about concepts like \u2018causation\u2019 too, where we have partial (and graded) causes, not merely \u2018yes\u2019 or \u2018no causation, predominant causes (\u2018collusion\u2019 too) including predominant direction in a feedback loop, like blood glucose level and blood pressure).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(2). Walk-through basic statistical analyses in Excel, with a causality aim (0). Preliminaries 0.i. What is \u2018scientific research\u2019? *** Simply: a systematic discovery process that follows agreed-upon methodology and verification rules. Practically: understanding the natural (and more) world, deriving the \u2018laws\u2019 that govern it (or law-like rules [i]). There are many \u2018sciences\u2019 out there [ii], [&hellip;]<\/p>\n","protected":false},"author":2514,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"acf":[],"publishpress_future_action":{"enabled":false,"date":"2026-07-21 09:45:02","action":"change-status","newStatus":"draft","terms":[],"taxonomy":""},"_links":{"self":[{"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/pages\/66"}],"collection":[{"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/users\/2514"}],"replies":[{"embeddable":true,"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":5,"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/pages\/66\/revisions"}],"predecessor-version":[{"id":136,"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/pages\/66\/revisions\/136"}],"wp:attachment":[{"href":"https:\/\/health.uconn.edu\/causality\/wp-json\/wp\/v2\/media?parent=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}