1 February 2025 |
Judea Pearl, in Causality: Models, Reasoning, and Inference, claims his structural causal modeling methodology has "familiar properties of physical mechanisms." Does it really? There are several aspects of physical theories, and aspects of scientific inference in general, that his account doesn't address:
Below, I illustrate these questions, provide some answers, and leave a few further, more specific, questions open.
The core of Newtonian mechanics is usually understood to be \(F = m\ddot{x}\), plus trivial calculus relations. This is a little simplistic, though; to illustrate feedback, we'll use the general case in which \(F = \dot{p}\) and \(p = m\dot{x}\), i.e. \(F = m\ddot{x} + \dot{m}\dot{x}\). Mechanics views trajectories as effects to be explained, and its other constructs as causes, so structural causal modeling tells us we should orient these equations such that the right-hand sides contain manipulatable variables through which the single trajectory quantity on the left is controlled: \[x = x_0 + \int_0^t\dot{x}dt,\] \[\dot{x} = \dot{x}_0 + \int_0^t\ddot{x}dt,\] \[\ddot{x} =\frac{F - \dot{m}\dot{x}}{m}.\]
Figure 1: Causal network corresponding to the structural equations above.
This has been generated in line with the principles I glean from Pearl: a quantity, even if time-varying or spatio-temporal, is represented as if it were a single number, without indicating its dependence. In trying to practically use this representation, I think there'd be many problems:
The role of time and the integrals in the equations above are a little strange. The critical shift in perspective here is that instead of \(x\) we ought to consider \(x(t)\), i.e. a different variable at each value of time. The exposition is much simplified if we use language from synthetic differential geometry, which allows us to use infinitesimals rather freely for intermediate computations. On this view, we obtain \[x(t + dt) = x(t) + \dot{x}(t)dt,\] \[\dot{x}(t + dt) = \dot{x}(t) + \ddot{x}(t)dt,\] \[\ddot{x}(t + dt) = \frac{F(t) - \dot{m}(t)\dot{x}(t)}{m(t)}.\]
Figure 2: De-looped dynamic causal network for Newtonian mechanics. This is one step of a structure iterated forwards and backwards indefinitely.
A remarkable feature of the above transformation: it de-loops the network. I consider it reasonable to posit that all relationships Pearl would like to model as a cyclic causal graph (including endomorphic edges, which Pearl's definition of "graph" explicitly forbids) ought to be considered a dynamic (or, more generally, functional) causal graph of the form above, and that there exists a systematic way to go from one representation to the other. I would reify this into a principle of scientific inference: causal feedback loops imply you've improperly agglomerated some variables. Pearl's example of the "actual cause" of a house fire is the closest thing I can find to this, but no special attention is drawn to the spatial and temporal aspects there.
What is the proper de-looping algorithm? In the infinitesimal form, are there re-orderings of the edges that lead to equivalent outcomes? Can we de-loop with respect to quantities other than time, e.g. space? Should there be conventions for suppressing the especially "boring" kinds of causal relationships, that follow for essentially mathematical reasons?
Above, I have presented a model where the quantities at each timeslice depend only on instantaneously prior quantities. I strongly believe something like this holds in general, which I will again reify into a "dynamic Reichenbach's rule" (in analogy with the "no correlation without common causation" maxim): if your system exhibits non-Markov behavior, you have forgotten to pull a latent state variable into the analysis. Non-Markov systems are said to exhibit "memory," and this principle just says you can always make such memory an explicit variable. This helps explain why Markovian stochastic processes are the norm in practice, just as Reichenbach's rule helps explain why independent distributions are the norm in practice.
The manipulability account of causation is commonly charged with vicious circularity, on account of manipulation itself being a causal notion. This fails to recognize that the manipulation is a different causal connection than the one under investigation—the claim "x causes y" means "if I can cause x to change, y will change." But causing x to change is clearly a "smaller" or "more direct" step than manipulating y itself. An illustrative mental picture is of "breaking down" a causal arrow into successively finer sequences of arrows.
Figure 3: A direct causal chain (I gave up trying to do this in Graphviz…)
Ultimately, this clearly bottoms out in relationships that our physics cannot further decompose, for instance, "the electromagnetic field at position x and time t of value E causes electromagnetic field at position x + dx and t + dt of value E', via equality." Physical principle suggests that all direct causes are infintesimal1: every causal chain abstracts over "atoms" relating events infinitesimally adjacent in space and time. This provides a great explanation of why the equations of physics are differential equations. Additionally, it sheds some light on the nature of cause: there is usually no preferred directionality of the spatial infinitesimal physical relationships. Manipulating one value affects adjacent values, and vice versa.
Of course, not every causal analysis need drill down fully to this level, and when obvious larger-scale events are to hand it is instructive to use those instead. But it should be remembered these are abstractions over the details. How do theses causal concepts relate to the mature language of mathematical physics, where one abstracts over things like units, gauge transformations, symmetries corresponding to conservation laws, and coordinate systems? Can causal networks corresponding to physical descriptions be realized as bundles over manifolds? How do you display this graphically? Can you read off and numerically solve PDEs from graphical models?
I've been exposed to some material on systems theory and systems engineering. Something that's always bothered me is the fairly unspecified use of "system," and the incomplete semantics of the schematic diagrams that practitioners tend to think in terms of. The causal inference perspective has suggested a characterization: the schematic diagrams' edges are ways cause can flow, and a "system" is just a collection of objects that interacts via a stable causal interface (set of ingoing and outgoing edges). What's the relationship between a system-type model, which one might call a behavior model or a schematic, and Pearl-type causal models? It certainly seems like the former contains multiple causal models: an electrical schematic lets us predict flows in either direction across a resistor, as the effect of voltage change events on either side, for instance. The nodes of a causal model are events (proxied through changes in measurable quantities), while the nodes of the schematic are objects. The edges of a causal model are directed, and accord with time, while schematics are undirected, and usually reflect spatial decomposition. Yet they're clearly intimately related.
There are two levels of causation Pearl considers: token causation, and type causation. Cause, of course deals with events, which are nothing more than changes in quantities in a suitably generalized sense2. Token causation deals with specific events: the quantities in the causal model are quantities of particular objects, e.g. "the total downward force on the left wing of the air vehicle at T+12:32." Type causation handles general events: the quantites are types of quantities, and the events types of events, divorced from the particular objects that might bear or express them: "force." I propose that the system model or schematic is a way to bridge token and type causation: it specifies objects, quantities/properties thereof ("state"), and parts of the object that causally interact with the environment ("ports"). Type causation laws then explicitly relate the state, port, and environmental quantities. Specific interventions, measured states of the environment, knowledge of the properties of the object, and so on then produce token-causation predictions, explanations, and counterfactuals for actual events. A "system" is nothing more than a state-and-ports description of the ways an object can possibly interact with its environment, its "causal interface" relative the rest of the world.
Pearl attributes causal modeling's utility to "autonomy of mechanism." That is, one can make modifications to causal diagrams that are representative of modifications to the phenomena they represent, because each causal link is in some sense "autonomous." What precisely does this mean? Can it be refined into a condition that, say, indicates what variables are productive to choose for analysis at a particular level of detail? Similarly, why do certain groups of objects seem more "natural" to group together into systems than others? Given a physical asset as an integrated whole, say a desktop computer, why are there natural divisions of its matter into parts, and why are some of those parts grouped into subsystems? It's to some degree a question of design requirements and intent, but since reverse engineering and biology are possible there also should be some systematic heuristics that allow us to handle this.
A causal model expresses belief about stable mechanisms in the world. However, in gathering and using causal knowledge, there's always another element: the causal means by which the mechanism in question is actually controlled, and the causal means by which the effects of the control propagate back to the experimenting agent. For example, consider testing Ohm's law with a classic cat's-whisker meter. If you observe a violation of Ohm's law, should you conclude the law is invalid? That you've discovered the next Hall effect? Did you burn out the meter last week, or take readings from a 135° angle to its face? Is your voltage source staying stable? The mechanisms at play in the effector and sensor chains are just as relevant to observing, using, and validating causal relationships as the causal mechanism under test.
Figure 4: Causal test harness, in dotted circles. I imagine a force transducer (that measures some trajectory internally) and a beam balance, leading to mass and force estimates, which combine into an acceleration estimate according to the proposed underlying causal mechanism.
I believe the right way to conceive of this is as placing a "test harness" augmentation on the causal graph, encoding beliefs about how you effect your do(x)
and how you observe your observation nodes. This reveals a fundamental point about "objectively testing" a scientific theory: one always must assume some facts about the causal test harness, about the way data are created, observed, and reported, in order to evaluate any scientific theory, on the pain of a typical kind of infinite regression. This propagates all the way down to deciding what the quantities placed on a causal network even mean: for our example above, how do you actually measure force and mass without making any assumptions about the way force and mass actually work? The only way I see to actually produce a trustworthy force transducer or mass scale is first to assume that Newton's law holds, then to assume some extra facts about masses or forces of some objects, and finally define some indication of the apparatus as a base unit. For instance, with a beam balance one must assume that gravity is a force, and that objects fall with uniform acceleration because they have constant force and constant mass (ruling out there being any height-dependent object mass variation that exactly cancels out height-dependent gravitational force variation, or there being time variations in force). Only then can you define a certain object as a mass unit, and start making comparisions. And once you start making comparisons, you must assume that if reference objects balance once that they stay approximately the same in mass (if you would have compared them when you made X measurement, they would have balanced again). This is essentially Quine's point about confirmation-holism: there are always side-conditions to tests of laws using scientific apparatus, that often assume the very law under study holds of the apparatus! You can't evaluate scientific explanations from any neutral "place to stand;" the right way to proceed is to accept the theory on its own terms and see whether, as a whole, it provides a more or less satisfactory account of experience than an alternative.
Pearl's IC and IC* algorithms for direct and latent causal inference reduce to a bunch of tests for conditional independence. There are two proposed scientific induction norms that make the output unique: minimality, that if one causal structure can mimic another then the (non-strictly) smaller one is preferzred, and stability, that if multiple causal structures of the same size can mimic each other, the best one is the most stable with respect to small changes in its error distribution parameters.
E.T. Jaynes has soundly convinced me of logical Bayesianism with his Probability Theory: The Logic of Science. In it, his maximum entropy principle is justified by reference to quite similar concerns to stability: there are overwhelmingly more situations in which a maximum entropy distribution realizes the constraints imposed by the data than for more committal choices. Are there connections here? Might all causal models with stable distributions realize maximum-entropy priors for the error parameters?
Pearl's account of observational causal inference assumes that the data is fixed, and one is interested in gathering causal relationships from that fixed data. Fine for statisticians, but not for scientists or human beings, who must economize their attention. What makes a variable a legitimate variable? Given my current causal structure and my economic values, what should I measure next, and for how long, to most quickly remove the doubts I care about?
I use this term making no commitment on whether the fine structure of the universe is discrete or continuous. If the former, "dq" means the smallest possible step in quantity "q."
A list of "related" changes in quantities is something like a "process." Can we further specify "related" in terms of cause?