Types of errors and their effects on causality conclusions

*** There is fluid language and understanding of errors, and how they are handled, in medicine, social sciences, and more precise sciences like physics and engineering[i]. In its broadest sense, they represent imprecision, uncertainty, ambiguity of knowledge, of how the world works[ii].

1. Types of errors and their explications

*** Perhaps the first needed logically explanation is of ‘standard error’ (SE): this is one of the most confusing terms in statistics, being a ‘misnomer’ for several reasons: (1). It’s not at all ‘standard’, much like the ‘standard deviation’ is not either, they are used to standardize the values of the variable or parameter they refer to: e.g. the z-score is a variable scaled such that it represents how many standard deviations above/below a mean each value sits at; SEs are ‘measured’ in the units of the variable/parameter they apply to; (2). It does not have a meaning in itself, until we add the ‘target’ quantity: SE of what? It can be ‘SE of the mean’, SE of a regression parameter’, etc.; (3). Its ‘error’ part does carry the meaning of uncertainty of a quantity, but it starts carrying sense along its original value, so commonly this is written as some X (+/-x) ; (4). The ‘standardizing’ involved in creating z-scores is the same one involved in applying a z-test to a statistical parameter: dividing its value by its SE, and deciding this way how many SE’s away from 0 (the common null hypothesis spot: this can change though) the parameter sits; e.g. see the ‘SE of the mean’ (note that for small N’s, the t-test is instead used, but z and t(df) tests are the same for large samples).

*** A first setting the stage: statistics differs from ‘pure’ math/arithmetic/calculus in how strict/sharp vs. ‘loose’ the equality/equivalence relation is: we accept that 1 2; but statistics on the other hand uses modified relational rules, which allow one to sometimes declare

1 2 (when 2 is an average[iii]  falling less than two standard errors further away than (the average) 1). This adds a new source of error/noise/uncertainty, which makes statistics more challenging[iv] (i.e. ‘at what juncture can we say 1 2?’)[v].

***The labeling entices some strong ideological debates, e.g. reactions to calling the unexplained variability in an outcome ‘residual error’ instead of the ‘proper’ disturbance[vi]. As with other statistical terminology, e.g. direct and indirect effects, scientific domain loyalties engender strong disagreements (the ‘direct effect’ is in reality itself a ‘residual effect’, it does not have ontological standing by itself, because every time a new mediator is ‘added’, this ‘direct’ effect changes: it’s what’s left as direct effect, ‘as of now’).

*** From a longstanding tradition, errors are ‘latent variables’ or ‘unobserved quantities’, unobservable[vii] really (if you sense a tinge of ‘potential outcomes’ flavor here, yes, POs are handled as a form of latent/missing ‘thing’, in SWIGS e.g. (Single World Intervention Graphs) they really are modeled like folks model the observed ones, with directional arrows between them and such).

*** Economists call them ‘error-contaminated’ or mismeasured data [5], which rely on ‘observable imperfect proxies’ to get at the ‘quantities that are unobservable’  [6].

In computer-kind work, errors are ‘noise’ that need reduced to extract the ‘signal’[viii]; in cypher & encryption worlds, the approach is from the opposite direction: how to scramble the signal so much that no algorithm can recover it from the ‘noise’.

2. Measurement imprecision and medical decisions

*** ‘Measurement’ in medicine appears to have distinct challenges, and preferred language and tools. In contrast to measurement of psychological constructs (the domain of ‘psychometrics’[ix]), we can talk about ‘biometrics’.

What we try to gauge (and wipe away) if the ‘measurement error’, or as it is called sometimes ‘experimental error’ [7] , or ‘separate out’ these 2 components: A1ci Measured = A1ci True + Meas.ErrorA1c.i Unobservable where we mark the noise’ as not measurable; it is a ‘left over’ part after we ‘take out’ the true value: not measurable directly[x]!

3. Modeling errors; how do errors trickle down/not

*** The basic setup in statistics invites explaining variability among persons in some medical outcomes, like A1c[xi], by a ‘predictor’, say BMI, or modeling A1c = f(BMI). The functional form f is a fleeting quantity, and trying to find a value for is (as the g value in the gravitational attraction formula is), is not going to bear fruit, mostly because there are many more causes of one’s A1c value than BMI: these ‘many others’ are grouped into the ‘error term’ in a linear regression model.

– In physics e.g., there are formal ways in which uncertainties ‘propagate’ (see [9] “when two numbers x and y are measured and the results are used to calculate the difference q = x – y. We found that the uncertainty in q is just the sum δq = δx + δy of the uncertainties in x and y.” p. 45[xii].

Topics for later expansions:

+1. ‘Latent’/true measures[xiii]

+2. Imprecision and health disparities conclusions

+3. Tracing rule for following implications

*** Bollen’s MIIVsem procedure: why it illuminates

*** What correlated errors imply

FOOTnotes

[i] One can rank order sciences along how strong causal statements about the studied phenomena they can make, which can also be alternatively formulated as the ‘percent error’ left in their explicating theories; compare e.g. the A1c(BMI) functional relation in medicine, at one endpoint, to the Cobb-Douglas production function in economics (Y = A * K^α * L^β, where Y is output, K is capital, L is labor, A is a constant representing total factor productivity, and α and β are parameters representing the share of output attributable to each input), and, at the other endpoint, to the ‘pressure*volume ~= temperature’ law of the ‘ideal gas’ in physics (or chemistry).

[ii] Uncertainty measures, from  https://bit.ly/measure_hd

1. Tolerance of Ambiguity scale (Budner 1962)  – 16 items, Psychology (personality)
2. AT-20 (MacDonald 1970) – 20 items, Psychology (decision making)
3. MAT50 (Norton 1975)  – 61 items, Psychology (personality)
4. Physicians’  Reactions to Uncertainty scale (PRU) (Gerrity, DeVellis et al. 1990) – 22 items, Health care
5. Multiple Stimulus Types Ambiguity Tolerance Scale-I  and –II (MSTAT-I & MSTAT-II) (McLain 1993, McLain 2009) – 22 items, Psychology (decision making)
6. Tolerance For Ambiguity (TFA) scale (Geller, Tambor et al. 1993) – 7 items, Health care
7. Intolerance for Uncertainty Scale (IUS) French (Freeston, Rhéaume et al. 1994) English (Buhr and Dugas 2002) – 27 items, Psychology (clinical)
8. Need For Closure Scale (NFCS) (Webster and Kruglanski 1994) – 47 items, Psychology (social)
9.Attitudinal Ambiguity Tolerance scale (AAT) (Durrheim and Foster 1997) 45 items – Psychology (personality)
10. Uncertainty Response Scale (Greco and Roger 2001) – 48 items, Psychology (personality)
11. Intolerance of Uncertainty Scale short form (IUS-12) (Carleton, Norton et al. 2007) – 12 items, Psychology (clinical)
12. Intolerance of Uncertainty Index (Carleton, Gosselin et al. 2010) – 30 items, Psychology (clinical)
13. Intolerance of Uncertainty Scale for Children (IUSC) (Comer, Roy et al. 2009) – 27 items, Psychology (clinical)
14. Ambiguity Aversion in Medicine (AA-Med) scale (Han, Reeve et al. 2009) – 6 items, Health care
15. Tolerance of Ambiguity Scale (Herman, Stevens et al. 2010) – 12 items, Psychology (organizational)
16. Dealing with Uncertainty Questionnaire (DUQ) (Schneider, Lowe et al. 2010) – 10 items, Health care
17. Tolerance of Ambiguity in Medical Students and Doctors (TAMSAD) (Hancock, Roberts et al. 2015) – 29 items, Health care
18. Multidimensional Attitude toward Ambiguity Scale (MAAS) (Lauriola, Foschi et al. 2015) – 30 items, Psychology (decision making)

[iii] The ‘typical’ or average American’ appears to be a fictional entity, statistically generated (the statement “The average American owns 1.02 cars” is “about the average American”, [1] p. 301), of which there are plenty, some which exist, other which don’t: “I intend to use the word exists so that it encompasses exactly those objects that orthodox philosophers hold to exist. In particular, it includes all the ordinary physical objects that we normally take to exist, and it does not include unicorns, gold mountains, winged horses, round squares (round square things), Pegasus, or Sherlock Holmes. The theory given below will say that there are unicorns, there is such a thing as Pegasus, etc., but that none of these exist. “ [2] p. 11 & “If we forget or inhibit our philosophical training for the moment, we are all prepared to cite examples of nonexistent objects: Pegasus, Sherlock Holmes, unicorns, centaurs, . . . . Those are all possible objects, but we can find examples of impossible ones, too; Quine’s example of the round square cupola on Berkeley College will do. It is an impossible object, and it certainly doesn’t exist, so it seems to be an example of an impossible nonexistent object. With so many examples at hand, what is more natural than to conclude that there are nonexistent objects-lots of them [2], p. 2

[iv] The logic of hypothetical reasoning is an old topic in logic, [3] and is involved in both the classical ‘hypothesis testing’ scientific procedure, and the more modern causal inference advances, based on ‘what-if’ contrary-to-fact ‘potential outcomes’ reasoning.

[v] The very concept of ‘=’ is itself a debate topic, among those with philosophical inclinations.

[vi] Judea Pearl says “u’s stand for omitted factors” in [4]

[vii] “Higher-dimensional sphere packings are hard to visualize, but they are eminently practical objects” https://www.quantamagazine.org/sphere-packing-solved-in-higher-dimensions-20160330/ Some things can be ‘practical’, even if not accessible to us (the field of spirituality is filled with such entities!).

[viii] Unexpected insights into this comes from niche mathematical problems, e.g. “Dense sphere packings are intimately related to the error-correcting codes used by cell phones, space probes and the Internet to send signals through noisy channels.” https://www.quantamagazine.org/sphere-packing-solved-in-higher-dimensions-20160330/ (seen from https://www.youtube.com/watch?v=dr2sIoD7eeU  ‘The things you’ll find in higher dimensions’)

[ix] Note that some ‘yardsticks’ we devise are both bio~ and psych~metrics, like ‘self rated health’, which is a common 1 ‘item’/question/method health ‘outcome’ (per googleAI: “In general, would you say your health is excellent, very good, good, fair, or poor?”): it has usually 5 response options, but to appreciate the thickness of the brush used here, look at the 40 nuances of ‘bad’/’good’ descriptors here: How good is “good”?

[x] If you sense here a similarity to the simple linear prediction model commonly used, the linear regression, you are not the first; here, however we have a ‘flipped’ direction, and it becomes evident why this model is ‘not identified’ if we count the ‘superscripts’. In a regression, we would predict A1c from BMI, say, so we have

A1ci Measured = A1ci True + β BMI->A1c· BMIi Measured + Residual.ErrorA1c.i Unobservable which allows us to estimate β, and then even ‘generate’ the as a derivate of the analysis: simply the difference between the predicted value and the observed one! In this ‘classical test theory view, A1ci Measured = A1ci True + Meas.ErrorA1c.i Unobservable, we don’t know where to ‘split’ the observed quantity, i.e. how much of ‘noise’ to carve out so we have its’ true value left: what we need to estimate is this very ‘strength of relation’ coefficient, which in measurement (psychometric) parlance is the ‘loading’ λ: A1ci Measured = λ True.A1c->A1c· A1ci True + Meas.ErrorA1c.i Unobservable

*** One could ‘run a model’ with only 2 indicators, but in that case one needs some ‘identifying’ assumptions, i.e. to reduce the number of estimates, or increase the df = -1 to a df = 0: we can do that by forcing the 2 loadings to be equal: “additional restrictions are needed to accomplish model identification, such as indicator loading equality (true score–equivalent measures) and/or error variance equality (e.g., parallel measures; Lord & Novick, 1968).” [8] p. 231 (Bollen says in the context of 1 same measure, repeated: “Xt and Xt+1 are parallel measures λt = λt + 1 = 1 and VAR(et ) = VAR(et + 1)] p. 201

*** A simple ‘tracing’ exercise shows how and why in this case we need 3 such measured instances’ of A1c per patient, to be able to ‘get at’ the trub value, say A1c measured with 3 types of devices (hence we have 3 such equations, indexed by j: A1ci Measured.j = λj True.A1c->A1c.j · A1ci True.j + Meas.ErrorA1c.i Unobservable.j, each A1c j  with its own ‘imperfection’. I show in the Excel online how to calculate these loadings by hand, which then can be used to calculate the ‘composite reliability’ (per Raykov’s method: simply the percent of the total variability that is not ‘noise’ WWW e.g.). Note that we can then talk about the reliability of each measurement instrument/ measurement method ρA1cj, and of the ‘true’ measure, the ‘scale reliability’ ρTrueA1c.

[xi] Note that a model with only A1c in it, no predictors, is better represented as a A1c i = Average(A1c i)  + 1 · u i (or simply u -> A1c, which makes clearer that the entire variable is a ‘bog fat error’): this is an unappreciated option available in several statistical software: Stata e.g. can ‘run’ such a regression, without a predictor (code is simply reg A1c !!!) which will display no regression coefficient, of course, but only the ‘intercept’, i.e. the conditional mean, but having nothing to condition on’, this one is the sample mean estimate, which will be accompanied by its ‘standard error’.

[xii] Note that, as with the note i above, the way physicists ‘handle’ error propagation is formally based on differentiating an equation, of the outcome, as a function of its ‘predictors’: one needs to know this ‘law’ to be abbe to take the partial derivatives with respect to each predictor one at a time. In most applications in medicine and social sciences, we simply assume additive linear relations.

[xiii] Econometricians use alternate language, not ‘latent variables’, but plain ‘unobservables’, and rarely make use of the graphical depiction, instead using equation-based functional relations like: “two measurements, X and Y, are produced by mutually independent unobservables, U, V, and W, through the system, X = g(U,V) and Y = h(U,W)” [10] p. 1, for the model U + V -> X & U + W => Y; on this Cam Mcintosh keeps us all on our tiptoes on SEMNET ( some references he suggested, e.g. [5, 10-12])

References

  1. Van Inwagen, P., Creatures of fiction. American philosophical quarterly, 1977. 14(4): p. 299-308.
  2. McCann, H.J., Creation and the Sovereignty of God https://drive.google.com/file/d/1809fTVLp3sGooYLS7LfU-O2mWbimwXHP/view?usp=sharing. 2012: Indiana University Press.
  3. Rescher, N., Hypothetical reasoning Studies in logic and the foundations of mathematics. 1964, Amsterdam: North-Holland Pub. Co. Amsterdam.
  4. Pearl, J., Causal Diagrams – a threat to correctness. 01/12/2012.
  5. Schennach, S.M., Recent Advances in the Measurement Error Literature. Annual Review of Economics, 2016. 8(Volume 8, 2016): p. 341-377.
  6. Schennach, S., Measurement systems. Journal of Economic Literature, 2022. 60(4): p. 1223-1263.
  7. Altman, D.G. and J.M. Bland, Measurement in medicine: the analysis of method comparison studies. Journal of the Royal Statistical Society Series D: The Statistician, 1983. 32(3): p. 307-317.
  8. Raykov, T., Evaluation of Scale Reliability for Unidimensional Measures Using Latent Variable Modeling. Measurement and Evaluation in Counseling and Development, 2009. 42(3): p. 223.
  9. Taylor, J., Introduction to error analysis, the study of uncertainties in physical measurements. 1997.
  10. Hu, Y. and Y. Sasaki, Identification of paired nonseparable measurement error models. Econometric Theory, 2016. 33(4): p. 955-979.
  11. Cunha, F., J.J. Heckman, and S.M. Schennach, Estimating the Technology of Cognitive and Noncognitive Skill Formation. Econometrica, 2010. 78(3): p. 883-931.
  12. Zheng, Y., et al., Nonparametric Factor Analysis and Beyond. arXiv preprint arXiv:2503.16865, 2025.