bayesian_sem_workflow.ipynb

https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/bayesian_sem_workflow.ipynb

(sem_bayes_workflow)=

Bayesian Workflow with SEMs

:::{post} September, 2025 :tags: confirmatory factor analysis, structural equation models, :category: advanced, reference :author: Nathaniel Forde :::

This case study extends the themes of {ref}contemporary Bayesian workflow <bayesian_workflow> and {ref}Structural Equation Modelling <cfa_sem_notebook>. While both topics are well represented in the PyMC examples library, our goal here is to show how the principles of the Bayesian workflow can be applied concretely to structural equation models (SEMs). The iterative and expansionary strategies of model development for SEMs provide an independent motivation for the recommendations of {cite:p}gelman2020bayesian stemming from the SEM literature broadly. But {cite:p}kline2023principles in particular highlights the utility of a staggered approach to model development as a diagnostic for model mis-specification.

A further goal is to strengthen the foundation for SEM modeling in PyMC. We demonstrate how to use different sampling strategies, both conditional and marginal formulations, to accommodate mean structures and hierarchical effects. These extensions showcase the flexibility and expressive power of Bayesian SEMs.

Structure of the Presentation

Workflow: Bayesian and SEM
- Job Satisfaction Data
- Mathematical Specification
Modelling
- CFA
- SEM Conditional Formulation
- SEM Marginal Formulation
- SEM Mean Structure Formulation
- Sensitivity Analysis: Comparing Model Fits
Parameter Recovery Models
- SEM Hierarchical Formulation
- SEM + Discrete Choice
Conclusion: Statistical Modelling and Craft

The Bayesian Workflow

Recall the stages of the Bayesian workflow.

:::{admonition} The Bayesian Workflow Stages :class: tip

Conceptual model building: Translate domain knowledge into statistical assumptions
Prior predictive simulation: Check if priors generate reasonable data
Computational implementation: Code the model and ensure it runs
Fitting and diagnostics: Estimate parameters and check convergence
Model evaluation: Assess model fit using posterior predictive checks
Model comparison: Compare alternative models systematically
Model expansion or simplification: Iterate based on findings
Decision analysis: Use the model for predictions or decisions :::

The structure of the SEM workflow mirrors the Bayesian workflow closely. Each step—from specifying measurement models to refining residual covariances—reflects the same cycle of hypothesis formulation, model testing, and iterative improvement.

The SEM Workflow

Confirm the Factor Structure (CFA):
- Validate that our measurement model holds i.e. our initial conceptual modelling.
- Ensure latent constructs are reliably represented by observed indicators.
Layer Structural Paths:
- Add theoretically-motivated regressions between constructs.
- Assess whether hypothesized relationships improve model fit.
Refine with Residual Covariances:
- Account for specific shared variance not captured by factors.
- Keep structure transparent while improving realism.
Iteratively Validate:
- Each step asks: Does this addition honor theory? Improve fit?
- Workflow = constant negotiation between parsimony and fidelity.

These approaches complement one another. We'll see how the iterative and expansionary approach to model development is crucial for understanding the subtleties of SEM models. How our understanding grows as we track their implications across increasingly expressive candidate model structures.

In many settings, statistical modelling has become a game — a contest of metrics, accuracy scores, and leaderboard performances. Each modelling project promises improvement, but only within the narrow boundaries the metric defines. Our rich, complex aims of scientific discovery, understanding, and generalisable insight are replaced by simplified numbers that are easy to compare and optimize. The result is a kind of methodological gamification: we chase scores instead of comprehension. This is the contrast case against which we set Bayesian workflow. We'll make the case below to treat statistical modelling not as simplistic competition, but as a rewarding craft. A craft that underpins the entire scientific enterprise; our most successful epistemological project.

Job Satisfaction and Bayesian Workflows

The data we will examine for this case study is drawn from an example discussed by {cite:p}vehkalahti2019multivariate around the drivers of Job satisfaction. In particular the focus is on how Constructive thought strategies can impact job satisfaction. We have 12 related measures.

Constructive Thought Strategies (CTS): Thought patterns that are positive or helpful, such as:
- Self-Talk (positive internal dialogue): ST
- Mental Imagery (visualizing successful performance or outcomes): MI
- Evaluating Beliefs & Assumptions (i.e. critically assessing one’s internal assumptions): EBA
Dysfunctional Thought Processes: (DA1–DA3)
Subjective Well Being: (UF1, UF2, FOR)
Job Satisfaction: (JW1–JW3)

The idea is that the covariance structure of these variables can be properly discerned with a Structural equation modelling approach. In their book Vehkalahti and Everitt report the derived correlation and covariance structure. Here we will sample data from the multivariate normal distribution they described and proceed in steps to model each component of their SEM model. This will allow us to additively layer and motivate each part of the SEM model.

Interestingly, the Bayesian workflow embodies the same constructive strategies it studies. At each stage, we ask: Do I believe this model? What assumptions am I making? Is the evidence consistent with my expectations? This reflective process , mirroring the “evaluating beliefs and assumptions” mindset, builds models that are not only statistically robust but conceptually grounded. By the end, we arrive at a set of defensible findings which, fittingly, support a genuine sense of satisfaction with a job well done.

Mathematical Specification

Before we turn to implementation, let’s formalize the model mathematically.

In our set up of a Structural Equation Model we have observed variables $y \in R^{p}$, here (p=12) and $\eta \in R^{m}$ latent factors (m=4). The basic structural equation model (SEM) consists of two parts - the measurement model and the structural regressions.

The Measurement Model is the factor-structure we seek to confirm in our analysis. It is called a measurement model because we view the observable metrics as indicators of the thing we actually want to measure. The observable metrics are grouped under a unifying "factor" or construct. The idea that each of the indicators are imprecise gauges of the latent factor. The hope is that collectively they provide a better gauge of this hard to measure quantity e.g. satisfaction and well-being. This can be thought of as a data-reduction technique, where we reduce the complex multivariate data set to a smaller collection of inferred features. However, in most SEM applications the factors themselves are of independent interest, not merely a modelling convenience.

In factor analysis we posit a factor-structure and estimate how each latent factor determines the observed metrics. The assumed data generating structure says that the factors cause the observed metrics. The inferential task works backwards, we want to infer the shape of the latent factors conditional on the observed metrics.

\quad \varepsilon_i \sim \mathcal N(0, \Psi).

where $\Lambda$ is a 12 x 4 matrix, and $\eta$ is an $N$ x 4 matrix, for $N$ observations i.e. the matrix of latent scores on each of the four factors for all individual responses. In the measurement model we're aiming to ensure that the observed metrics are well grouped under a single factor. That they "move" well together and respond to changes in the factor.

The Structural model encodes the regression paths between the latent constructs. Mathematically this is achieved within a 4 X 4 matrix $B$, where the latent factors are specified as predictors of other latent factors as theory dictates i.e no latent factor predicts itself, but some may bear on others. In our case we're aiming to see how constructive thought strategies predicts job satisfaction. The influence paths of one factor on another can be direct and mediated through the other factors.

\eta_i = B \eta_i + \zeta_i, \quad \zeta_i \sim \mathcal N(0, \Psi_{\zeta}).

\eta_i = (I - B)^{-1} \zeta_i.

In the structural model we specify how we believe the latent constructs relate to one another. The term $(I - B)^{-1}$ is sometimes called the total effects matrix because it can be expanded as $I + B + B^{2} + B^{3}...$ summing all possible chains of paths in the system. As we'll see the structural and mediating paths between these latent constructs can be additive. This observation allows us to very elegantly derive total, indirect and direct effects within the system.

We can express the SEM in either a conditional or marginal formulation. The conditional form explicitly samples the latent variables, while the marginal form integrates them out of the likelihood.

Conditional Formulation

This formulation treats the latent variables as parameters to be sampled directly. This is conceptually straightforward but often computationally demanding for Bayesian samplers.

\zeta_i \sim \mathcal N(0, \Psi_{\zeta}).

\eta_i = (I-B)^{-1} \zeta_i.

\mu_i = \Lambda \eta_i.

y_i \mid \eta_i \sim \mathcal MvN(\mu_i, \Psi)

which highlights that the conditional formulation samples the latent variables explicitly.

Marginal Formulation

Here the focus is on deriving the covariance matrix.

$\Sigma_{\mathcal{y}} = \Psi + \Lambda(I - B)^{-1}\Psi_{\zeta}(I - B)^{T}\Lambda^{T}$

$\mathcal{y} \sim MvN(\mu, \Sigma_{y})$

This approach marginalises out the latent draws in the likelihood and avoids sampling the latent terms directly. Instead we can estimate a point estimate for each of the latent scores by regression approach calculating.

$\hat{\eta} = \hat{\Psi}\hat{\Lambda^{T}}\hat{\Sigma^{-1}}(y - \hat{\mu})$

We'll introduce each of these components are additional steps as we layer over the basic factor model

Setting up Utility Functions

For this exercise we will lean on a range of utility functions to build and compare the expansionary sequence. These functions include repeated steps that will be required for any SEM model. These functions modularize the model-building process and make it easier to compare successive model expansions.

The most important cases are functions like make_lambda to sample and fix the scale of the covariates that contribute to each latent factor. Similarly, we have the make_B which samples the parameter values of the path coefficients between the latent constructs, while arranging them in a matrix that can be passed through matrix multiplication routines. Additionally, we have a make_Psi function which samples parameter values for particular covariances that gets deployed to encode aspects of the variance in our system not captured by the covariances among the latent factors. These three helper functions determine the structure of the SEM model and variants of each can be used to construct any SEM structure.

We also save some plotting functions which will be used to compare models.

Confirming Factor Structure

In this section, we translate the theoretical structure of a confirmatory factor model (CFA) into a concrete implementation. The diagram below distinguishes between covariance relationships among latent factors (red dotted arrows) and factor loadings linking latent constructs to observed indicators (black arrows). We've highlighted with red [1] that the first "factor loading" is always fixed to 1, so to (a) define the scale of the factor and (b) allow identification of the other factor loadings within that factor.

[Image blocked: No description]

In the model below we sample draws from the latent factors eta and relate them to the observables by the matrix computation pt.dot(eta, Lambda.T). This computation results in a "pseudo-observation" matrix which we then feed through our likelihood to calibrate the latent structures against the observed data. This is the conditional mean of each observation. The covariances (i.e. red arrows) among the latent factors is determined with chol. These are the general patterns we'll see in all models below, but we add complexity as we go.

The model diagram should emphasise how the sampling of the latent structure is fed-forward into the ultimate likelihood term. Note here how our likelihood term is specified as a independent Normals. This is a substantive assumption which is later revised. In a full SEM specification we will change the likelihood to use Multivariate normal distribution with specific covariance structures.

In Lavaan notation this is the model we are aiming at:

Where the =~ symbol denotes the "is measured by" relation. Now we fit the model, to sample from the prior, condition on the likelihood and derive the posterior estimates for the model parameters.

A Sampled Lambda Matrix

Let's inspect a sampled $\Lambda$ parameter. Note how each factor (column index) records three positive parameters, while the first of each parameters is fixed to 1. This is to ensure that the scale of the latent factor is well defined - indexed in magnitude to one of the observed metrics.

Model Diagnostics and Assessment

The next series of plots are aimed at checking whether the sampler has sufficiently explored the parameter space. The difference between the spread of the prior and tightness of the posterior says something of what the model has learned through the process of Bayesian updating. Here both the prior and posterior are centred on 0, and so the learnings can appear "undramatic", but it is often sensible to standardise and scale the the variables before fitting the model. This makes it easier to learn the factor-structure without having to worry about the mean structure since all variables are transformed to centre on 0.

Now for each latent variable (satisfaction, well being, constructive, dysfunctional), we will plot a forest/ridge plot of the posterior distributions of their factor scores eta as drawn. Each panel will have a vertical reference line at 0 (since latent scores are typically centered/scaled).These panels visualize the distribution of estimated latent scores across individuals.

Then we summarizes posterior estimates of model parameters (e.g factor loadings, regression coefficients, variances, etc.), providing a quick check against identification constraints (like fixed loadings) and effect directions.

Finally we will plot the lower-triangle of the residual correlation matrix with a blue–white–red colormap (−1 to +1). This visualizes residuals of the model implied versus true correlations among observed indicators after the SEM structure is accounted for — helping detect model misfit or unexplained associations.

These plots indicate a fairly promising modelling strategy. The estimated factor Loadings are all close to 1 which implies a conformity in the magnitude and scale of the indicator metrics within each of the four factors.The indicator(s) are strongly reflective of the latent factor although UF1 and FOR seem to be moving in opposite directions. We will want to address this later when we specify covariance structures for the residuals.

The Posterior Predictive Residuals are close to 0 which suggests that model is well able to capture the latent covariance structure of the observed data. The latent factors move together in intuitive ways, with high Satisfaction ~~ high Well Being.~~

Below these model checks we will now plot some diagnostics for the sampler. The energy plots should show overlapping distributions and the effective sample size should hopefully be high across the slew of focal parameters.

The sampler diagnostics give no indication of trouble. This is a promising start. We now shift the SEM setting to layer in Structural regressions. These relations are usually the focus of an analysis.

Structuring the Latent Relations

The next expansionary move in SEM modelling is to consider the relations between the latent constructs. These are generally intended to have a causal interpretation. The constructs are hard to measure precisely but collectively (as a function of multiple indicator variables) we argue they are exhaustively characterised. Although each construct is imprecisely measured, together their indicators are assumed to capture the construct’s essential variation. This assumption underpins the causal interpretation of structural paths.

Bollen argues as follows:

[...], we cannot isolate a dependent variable from all influences but a single explanatory variable, so it is impossible to make definitive statements about causes. We replace perfect isolation with pseudo-isolation by assuming that the disturbance (i.e., the composite of all omitted determinants) is uncorrelated with the exogenous variables of an equation. - Bollen in Structural Equations with Latent Variables pg45

This is a claim of conditional independence which licenses the causal interpretation of the the arrows in the below plot. The fact that the latent relations operate a higher level of abstraction makes it easier to postulate these "clean" direct paths between constructs. The model makes an argument - to have properly measured the latent constructs, and isolated their variation - to support the causal claim. Criticisms of the model proceed by assessing how compelling these postulates are in the context of the fitted model.

[Image blocked: No description]

The isolation or conditional independence of interest is encoded in the model with the sampling of the gamma variable. These are drawn from a process that is structurally divorced from the influence of the exogenous variables. For instance if we have $\gamma_{cts} \perp!!!\perp \eta_{dtp}$ then the $\beta_{cts -> dpt}$ coefficient is an unbiased estimate of the direct effect of CTS on DTP because the remaining variation in $\eta_{dtp}$ is noise by construction.

Each additional arrow in the structural model thus encodes a substantive theoretical claim about causal influence. You are making claims of causal influence. Arrows should be added in line with plausible theory, while parameter identification is well supported. In our case we have structured the DAG following the discussion in {cite:p}vehkalahti2019multivariate which will allow us to unpick the direct and indirect effects below. In Lavaan syntax the model we want to specify is:

where the ~ denotes a regression relationship.

We have also added the covariance structure on the residuals by supplying a multivariate normal likelihood with a diagonal covariance structure. This is akin to the independent normals we saw in the CFA model, but hints at the extra structure we can (and will) impose on the residuals.

A Sampled B Matrix

Now we can sample the B matrix and observe its structure. The sampled $B$ matrix compactly represents the system of regression relationships among latent constructs

It's best to see the matrix as encoding target variables in the columns and predictor variables in the rows. Here we have the set up our matrix so that satisfaction is predicted by three variables, but not itself. The values in the first column are sampled coefficient values in the regression:

$\eta_{sat} \sim B_{1, 0}\eta_{well} + B_{2, 0}\eta_{dys} + B_{3, 0}\eta_{con}$

where we've used pythonic indexing of the matrix. Note how values of $\eta$ appear on both sides of the equation. We are solving a system of equations simultaneously i.e. we need to solve for $\eta = B\eta $ where values of $\eta_{i}$ appear on both sides. Solving the matrix untangles the mess of simultaneous equations, but also allows us to derive to total, direct and indirect effects of each latent factor across all the pathways of the system.

This compact representation of three separate regressions equations is used to sort out the mutual influences, interactions and distortions our habits of thought have on our job satisfaction. It's true that the relationships between these latent constructs need not be linear and simplistic. There may be messy non-linear relationships between satisfaction and well being, but we are a deliberately abstracting some of that away in favour of a tractable, quantifiable measure of linear effect. This is often a simplification, but parsimony is also a goal of the modelling process.

Model Diagnostics and Assessment

Let's examine our model sense checks. The posterior predictive distributions retain a healthy appearence. Our prior allowed for a wide-realisation of values, and yet the posterior has a shrunk the range considerably closer to the standardised 0-mean data.

The derived covariance structures show improvement on the model implied residuals. Additionally we now get insight into the implied paths and relationships between the latent constructs. These move in compelling ways. Dysfunctional thought processes have a probable negative impact on well being, and similarly for job satisfaction. Conversely constructive thought processes have a probable positive direct effect on well being and satisfaction. Although the latter appears slight.

However, the model diagnostics appear less robust. The sampler seemed to have difficulty with sampling the parameters for the path-coefficients mu_betas.

The sampler diagnostics suggest that the model is having trouble samplng from the B matrix. This is a little concerning because the structural relations are the primary parameters of interest in the SEM setting. Anything which undercuts our confidence in their estimation, undermines the whole modelling exercise. Because the conditional SEM showed sampler challenges on mu_betas, we now try a marginal formulation.

Marginal Formulation of the Likelihood

In the next model we focus on the covariance matrix implied by the model rather than conditional regressions. We still use the B matrix, but the role it plays differs slightly. This approach cuts down on the burden for the sampler, by removing the need to parameterise $\eta$ directly. Instead B is used to generate the system's covariance structure and regression like effects are derived as a result. Additionally, we add specific covariance structure to the residuals. In the Lavaan Syntax we have:

where we have finally added structure to the residuals of the multivariate normal outcome by imposing a non-zero covariance relation between UF1 and FOR.

Model Diagnostics and Assessment

Note here how the prior range supasses the previous posterior predictive checks. The model is exploring a much wider potential outcome space, but still neatly resolves to a 0-mean system after updating the posterior.

The parameter estimates and other model health metrics seem in line with the covariance structure observed in the confirmatory factor model which suggests that even with our extra parameterisation we retain fidelty to the data generating process.

However, now the sampler health looks much, much better. Marginalising the likelihood seems to have considerable improved the sampling diagnostics.

Mean Structure SEM

Let's add a bit more structure and push this sampling strategy. Here we've allowed the the mean realisation of the the indicator variables can vary. The $\mu$ parameter of our likelihood needs to reflect this assumption to better capture the structure of the data. This is useful when believe there are baseline differences across groups in the data or you want to compare the mean-differences across two populations.

Model Diagnostics and Assessment

We can now see how the prior draws vary more explicitly around 0 seeking the mean realisation of each outcome variable. We've asked the model to learn "more" structure and it naturally explores more of the parameter space to evaluate comparable likelihoods.

We can now assess the tau parameters to see where the model locates the mean realisation of each indicator variable.

The posterior predictive distribution of the model implied residuals seems comparable to the cases above, but this time we've also isolated the mean structure.

The sampler diagnostics also seem healthy.

We can also pull out the indirect and direct effects. This is one of the biggest pay-offs for SEM modelling. We've done the work of assessing measurement error and building an abstraction layer of what-we-care-about over the observed indicators. We've considered various structures of the inferential relationships and isolated those direct effects from undue confounding influences. Now we can pull out the impact of mediation and moderation.

Sensitivity Analysis: Comparing Model Implications

Let's first compare the model implied total effects, and the degrees of moderation between constructive and dysfunctional habits of thought on the satisfaction outcome.

Even though constructive thought processes have a directly positive effect on job satisfaction, we can see that when they are paired with dysfunctional thought processes the positive effects are moderated. Note to all employers: well-being and general welfare is seen as the most substantively positive predictor of job satisfaction. Fix your process certainly, improve your workflows, encourage constructive practices and dialogues, but it's hard to beat a secure welfare and fair compensation when aiming to improve work satisfaction.

We'll also pull together the model implied residuals on the correlation coefficients. This will help us see how whether our different model structures are consistently deriving similar stories about the data generating relationships.

We see similar residuals for all models except a degradation as the model tries to account for the novel mean-structured data. This could be attributed to the greater complexity of the parameter space with the same amount of data, or an idiosyncracy of the sampling process.

Below we can see that the models are substantially similar in key dimensions; the magnitude and direction of the implied effects are similar across each model. This is an important observation. It is this kind of robustness to model specification that we want to see in this kind of iterative modelling. This should give us confidence that the model is well specified and picking up on the actual relationships between these latent constructs.

We can also derive global estimates of model fit such as the leave-one-out cross validation (LOO) scores. These should be seen as additional data points along side the more local measures of model fit we can see in the residual plots. These scores highlight an interesting distinction between the marginal and conditional formulations of a SEM model.

The conditional model formulas show a lower and therefore better performance. This is typical because the conditional formulation is forced to use a conditional calculation making use of the sampled eta. In short we're using a probability

$p(y | \eta, \theta)$

whereas the marginal formulation does not make use of eta in the likelihood calculation.

$p(y | \theta)$

As such, we're comparing apples and oranges when we try and compare marginal and conditional SEMs on their LOO scores. The interpretive gloss here is that it's better to see the conditional performance metric as a score for reconstruction error. While the marginal scores can be seen as proper proxies for out-of-sample predictions error. They are both useful measures but represent slightly different performance measures. This is an easy detail to miss when you're naively chasing performance benchmarks. When selecting models with metric optimisation in mind, we need to understand what we're actually optimising.

In this case we can see that the SEM structure does seem to improve on the CFA model. But that the mean structure model doesn't improve on the baseline marginal structure. Coupled with the comparability of the Residual errors, we can compellingly claim that the Marginal SEM model is the preferred model. It recovers the plausible posterior predictive statistics and does so with reasonably parsimony. It is coherent with insights from the other moodel but improves on them in a range of ways, from computational efficiency to preferable performance measures.

These kinds of sensitivity analysis help with model validation, but we can be even more clinical in our assessment of these models. We can perform parameter recovery exercises to properly validate that our model specification can identify the structural parameters of our complex SEM architecture.

Hierarchical Model on Structural Components

The mean-structure SEM provides a baseline description of how latent factors generate observed indicators. In real applications, however, we often expect these relationships to differ across groups or conditions. A hierarchical extension allows us to model those differences directly—testing whether the structural paths encoded in B are invariant across groups. For example, employees differ by team, firm, or treatment condition. The next natural question in a Bayesian workflow is therefore not just “What are the structural relations?” but “How stable are they across contexts?” This represents a key step in the Bayesian workflow: expanding the model’s expressive power while checking how robust its assumptions remain.

[Image blocked: No description]

Let's assume we are aiming to assess a causal difference between treatment and control groups.

We're now going to add hierarchical structure to the B coefficients and feed these through the likelihood term to infer how the structural coefficients respond to group differences.

Notice that we haven’t yet provided any observed data to the model — this is intentional. Our next step is to simulate data by performing a forward pass through the system. We initialize two versions of the model: (1) with tight priors and (2) with wide priors on the data-generating parameters. We’ll sample from the tightly parameterized model to generate indicator data consistent with our chosen parameter settings.

This has generated synthetic observations spawned from the SEM system we have just set up.

We now want to see if we can infer the parameters used to generate the data from the synthetic_y. To do so, we use the wide-parameterisation of the model, this is more realistic in that we likely don't know the true range of the priors when we come to real world data.

The model samples well and gives good evidence of distinct posterior distributions across the parameters suggesting the model is capable of identification.

The Parameter Recovery Process

With this set up we can actually assess the degree of parameter recovery because we know the true values. This is a pivotal part of the model building process, akin to how writer's read aloud their own work to test it for assonance, cogency and flow. In simulation based probabilistic modelling we are able generate data from models with known parameters, and recover the latent parameter values through the inferential workflow. We see how our model performs, much a like how a writer must test if their audience truly grasps their intended message.

The posterior estimates can “recover” the true values within uncertainty, ensuring the model is faithful to the data generating process. Were the effort at parameter recover to fail, we would equally have learned something about our model. Parameter recovery exercises helps discover issues of mis-specification or unidentified parameters. They tell us how informative our data is with respect to our data generating process, they clarify the degree to which the data constrains (or fails to constrain) the model’s parameters.

Verlyn Klinkenborg begins his justly famous book Several short sentences about writing with a reminder that applies equally to modelling:

"Here, in short, is what i want to tell you. Know what each sentence says, What it doesn't say, And what it implies. Of these, the hardest is know what each sentence actually says" - V. Klinkenborg

This advice transfers exactly to the art of statistical modelling. To know what our model says, we need to say it aloud. We need to feel how it lands with an audience. We need to understand is implications and limitations. Simulation studies and parameter recovery exercises, speak our models aloud; their failures, like their successes are transparent and each iteration strengthens the quality of the work.

The Bayesian workflow explores the depths of meaning achieved by our statistical approximations. It traces out the effects of interlocking components and the layered interactions of structural regressions. In each articulation we're testing which aspects resonate in the telling. What shape the posterior? How plausible the range of values? How faithful are our predictions to reality? On these questions we weigh each model just as the writer weighs each sentence for their effects.

Hypothesis Evaluation: How does group membership change direct effects?

In this case we've encoded a difference in the regression effects for each of the two groups. These effects are easily recovered with quantifiable measures of uncertainty around the treatment effects.

In applied work, these are precisely the implications we want to surface and understand. From a workflow perspective, our models should clarify these relationships rather than add noise. If we're assessing a particular hypothesis or aiming to estimate a concrete quantity, the model specification should be robust enough to support those inferences. This is where parameter recovery exercises can lend assurances and bolster confidence in the findings of empirical work. Here we've shown that our model specification will support inferences about about a class of particular causal contrasts i.e. how treatment changes the direct effects of one latent construct on another.

Another way we might interrogate the implications of a model is to see how well it can predict "downstream" outcomes of the implied model. How does job-satisfaction relate to attrition risk and approaches to work?

SEM with Discrete Choice Component

Combining SEM structures with Discrete choice models involves adding an extra likelihood term dependent on the latent factors. HR managers, for instance, often monitor attrition decisions and conceptualize them as driven by abstract notions such as job satisfaction. We now have tools to measure these latent constructs — but can we predict attrition outcomes from them?

To explore this, we’ll embed a discrete choice scenario into the SEM framework. The goal is to predict a categorical outcome — whether an employee stays, quits, or quiet-quits — as a function of their job satisfaction and their perceived utility of work. Once again, we’ll frame this as a parameter recovery exercise.

[Image blocked: No description]

The discrete choice setting is intuitive in this context because we can model the individual's subjective utility of work as a function of their job satisfaction. Within rational-choice theory, this utility determines the decision outcome.

The modelling is similar to the basic SEM set up, but we've additionally included a multinomial outcome for each of the available alternatives. Note however, that we have no alternative-specific covariates (i.e. price of the choice) since the draws of the latent constructs are fixed predictors for each of the three outcomes. As such we need to constrain one of the alternatives to 0 so it acts as the reference class and allows identification of the coefficient weights for the other alternatives. This is a basic implementation of a discrete choice model where we allow alternative-specific intercept terms to interact with the beta coefficient for each latent construct. Other variants are possible, but this example will allow us to infer how job satisfaction drives choices about work.

We can see here how our model structure now has two likelihood terms that are both based on the latent constructs eta.

The Parameter Recovery Process

To demonstrate parameter recovery we need to sample from both outcomes simultaneously.

Now we see how much we can infer from these two sampled outcomes.

The factor structures are nicely recovered.

The B parameters seem to be approximately well identified.

Now we look at the beta coefficients from the discrete choice setting, noting that the model can recover sensible values.

Similarly, the intercept terms are properly recovered.

But what does the model say about the relationships between our latent constructs?

We can recover the inverse relationship we encoded in the outcomes between job-satisfaction and the choice to stay. Similarly, this posterior predictive checks look sound. This is encouraging.

The "action" in human decision making is often understood to be driven by these hard-to-quantify constructs that determine motivation. SEM with a discrete choice component offers us a way to model these processes, while allowing for measurement error between the observables and the latent drivers of choice. Secondly, we are triangulating the values of the system between two sources of observable data. On the one hand, we measure latent constructs in the SEM with a range of survey measures (JW1, JW2, ... ) but then calibrate the consequences of that measurement against revealed choice data. This dual calibration offers a powerful approach to understanding how latent dispositions translate into observable actions.

We abstract over the expressed attitudes of rational agents, deriving interpretable representations of latent attitudes from expressed opinions. These representations are then further calibrated against the observed choices made by the agents. This two-step process of information compression and prediction serves to concisely quantify and evaluate the idiosyncratic attitudes of a population of complex agents. As we iteratively layer-in these constructs in our model development, we come to understand their baseline and interactive effects. This perspective helps us gauge the coherence between attitudes and actions of the agents under study.

The same workflow extends seamlessly to computational agents, where latent variables represent more opaque internal states or reward expectations. In both human and artificial systems, discrete choice modelling provides a common language for interpreting how such latent structure generates behavioral choices.

Conclusion: Workflow and Craft in Statistical Modelling

We have now seen how to articulate Structural Equation models and their variants in PyMC. The SEM workflow is, at heart, Bayesian in temperament. Hypothesise and construct. Construct then Estimate. Estimate and check. Check then refine. Refine then expand... Both disciplines reject the checklist mentality of “fit once, report, move on.” Instead, they cultivate a focused, deliberate practice. Each demands an apprenticeship in which skill is honed: skill to see how assumptions shape understanding, and how the world resists the imposition of false structures. Skill to find the right structures. Each iteration is a dialogue between theory and evidence. At each juncture we ask whether this model speaks true? Whether this structure reflects the facts to hand?

Of course, the craft ideal has its limits. The Bayesian workflow, for all its discipline and transparency, cannot by itself guarantee truth. Every model remains a conditional story, framed by the priors we choose and the theories we inherit. Iteration can refine these frames, but it can also entrench them. The aspiration to know what our models say must also include the humility to accept what they cannot reveal. Theory driven models, developed through iterative checking and clear communication, are more robust to misspecification, more transparent to collaborators, and more resistant to the subtle distortions of overfitting and overconfidence.

In the end, the value of craft in statistical modeling lies not in merely improving benchmark metrics, but in the depth of understanding we cultivate through careful communication and justification. The Bayesian workflow reminds us that modeling is not the automation of insight, but its deliberate construction. Our workflow is a process of listening, revising, and re-articulating until the model speaks clearly. Like any craft, its worth is measured not by throughput but by fidelity: how honestly our structure reflects the world it seeks to describe. Each diagnostic, each posterior check, each refinement of a latent path is a form of attention — a small act of resistance against the flattening logic of target metrics and checklists. These constructive habits and reflective practices are the source of fulfillment in the work. To practice modeling as craft is to reclaim pride in knowing what our models say, what they do not say, and what they imply - and to find, in that discipline and skilled attention, the satisfaction of meaningful work and useful science.

Authors

Authored by Nathaniel Forde in November 2025

References

:::{bibliography} :filter: docname in docnames :::

Watermark

:::{include} ../page_footer.md :::