Bayesian Analysis of Aviation Accidents

Author

Andreas Bogossian & Markus Kruth

Introduction

The dataset used in the project describes the air plane crashes from 1908 to 2024. In our project, we’ll only focus on crashes that happened after the year 1970. (WHY???) We make this decision to get a better picture of the current state of the industry. The dataset can be found and downloaded from Kaggle [1].

Motivation

rewrite these? Aviation has undergone massive developments since the first commercial flight in 1914. Airplane safety has always been the top priority for airlines; a single airplane crash will drive away customers to safer perceived airlines and, of course, cost lives. Although this priority of making airplanes as safe as possible has significantly reduced the number of crashes over the years, accidents still continue to occur. However, not all accidents are the same; survivability in crashes varies significantly. Understanding which factors influence survivability in crashes remains crucial for manufacturers and airlines seeking to minimize the loss of life. While accident prevention remains the most important priority, understanding the fatality patterns when accidents do occur is equally important in discussions of airplane safety.

Problem

When aviation accidents occur, the fatality rates vary significantly from incidents where all passengers survive to catastrophic losses of all who were on board. This variation raises critical questions: Has the likelihood of surviving an airplane crash improved over time? Do certain airplane manufacturers demonstrate consistently higher crash survival rates than others? How much of the variation in crash fatality rates can be attributed to crash specific details of individual accidents versus inherent airplane characteristics? These are questions that we hope to answer in our report.

Similar Projects

Kaggle has three projects linked to the plain crash dataset [1]
All with python, not R like ours
None of the projects use the bayesian data analysis tools we use in our project (hierarchical and non-hierarchical models, priors and posteriors)
Two of the three projects have exploratory data analysis like ours (in the beginning of our report)
Other than the three notebooks in Kaggle, there are no other similar projects with the dataset that the team would be aware of

Data Description

Crashes Over Time

# Figure 1: Number of crashes - data
fig1 <- crashes %>%
  group_by(year) %>%
  summarise(n_crashes = n()) %>%
  ggplot(aes(x = year, y = n_crashes)) +
  geom_line() +
  labs(title = "Number of Crashes (Data)",
       x = "Year", y = "Number of Crashes")

# Figure 2: Number of crashes - trend
fig2 <- crashes %>%
  group_by(year) %>%
  summarise(n_crashes = n()) %>%
  ggplot(aes(x = year, y = n_crashes)) +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(title = "Number of Crashes (Trend)",
       x = "Year", y = "Number of Crashes")

fig1 + fig2

Fatalities Over Time

# Figure 3: Fatalities - data points
fig3 <- crashes %>%
  ggplot(aes(x = year, y = fatalities)) +
  geom_point(alpha = 0.3) +
  labs(title = "Fatalities (Data Points)",
       x = "Year", y = "Fatalities")

# Figure 4: Fatalities - trend
fig4 <- crashes %>%
  ggplot(aes(x = year, y = fatalities)) +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(title = "Fatalities (Trend)",
       x = "Year", y = "Fatalities")

fig3 + fig4

Fatality Rate Over Time

# Compute yearly mean fatality rate
rate_ts <- crashes %>%
  group_by(year) %>%
  summarise(
    mean_rate = mean(fatality_rate, na.rm = TRUE),
    sd_rate = sd(fatality_rate, na.rm = TRUE),
    n = n(),
    se = ifelse(n > 1, sd_rate / sqrt(n), NA_real_)
  )

# Figure 5: Fatality rate - data points
fig5 <- crashes %>%
  ggplot(aes(x = year, y = fatality_rate)) +
  geom_point(alpha = 0.3) +
  labs(title = "Fatality Rate (Data Points)",
       x = "Year", y = "Fatality Rate")

# Figure 6: Fatality rate - trend
fig6 <- ggplot(rate_ts, aes(x = year, y = mean_rate)) +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(title = "Fatality Rate (Trend)",
       x = "Year", y = "Mean Fatality Rate")

fig5 + fig6

To conduct a modern analysis on plane crash fatalities, we do not want to include observations from far back in the history. We need some reasonable cutoff point that keeps sufficient amount of data and allows us to model modern aircraft data. The fatalities trend shows a turning point at around 1975 so it is reasonable to choose that as our observation cutoff year.

len_before <- nrow(crashes)
crashes_filtered <- crashes %>% filter(year > 1975)
len_after <- nrow(crashes_filtered)

cat("Observations before filtering:", len_before, "\n")

Observations before filtering: 5013

cat("Observations after filtering:", len_after, "\n")

Observations after filtering: 2188

cat("Rows removed:", len_before - len_after, "\n")

Rows removed: 2825

Keep only manufacturers where we have data for over 10 crashes

len_before <- nrow(crashes_filtered)
n_manufacturers_before <- n_distinct(crashes_filtered$aircraft_manufacturer)

min_crashes <- 10
crashes_filtered <- crashes_filtered %>%
  group_by(aircraft_manufacturer) %>%
  filter(n() >= min_crashes) %>%
  ungroup()

len_after <- nrow(crashes_filtered)
n_manufacturers_after <- n_distinct(crashes_filtered$aircraft_manufacturer)

cat("Observations before filtering:", len_before, "\n")

Observations before filtering: 2188

cat("Observations after filtering:", len_after, "\n")

Observations after filtering: 1766

cat("Rows removed:", len_before - len_after, "\n")

Rows removed: 422

cat("Manufacturers before filtering:", n_manufacturers_before, "\n")

Manufacturers before filtering: 304

cat("Manufacturers after filtering:", n_manufacturers_after, "\n")

Manufacturers after filtering: 40

crashes_filtered %>%
  select(where(is.numeric)) %>%
  skim()

Data summary
Name	Piped data
Number of rows	1766
Number of columns	7
_______________________
Column type frequency:
numeric	7
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	1994.59	11.41	1976.0	1986.00	1994.00	2003.00	2024.00	▆▇▆▅▁
day	1	15.96	8.73	1.0	9.00	16.00	24.00	31.00	▇▇▇▇▆
ground	1	3.89	92.80	0.0	0.00	0.00	0.00	2750.00	▇▁▁▁▁
fatalities	1	29.57	48.77	0.0	4.00	11.00	32.00	583.00	▇▁▁▁▁
aboard	1	45.07	64.65	1.0	8.00	18.00	52.00	644.00	▇▁▁▁▁
fatality_rate	1	0.78	0.34	0.0	0.64	1.00	1.00	1.00	▂▁▁▁▇
year_scaled	1	0.95	0.46	0.2	0.60	0.92	1.29	2.13	▆▇▆▅▁

Empirical Fatality Rates by Manufacturer

# Compute empirical fatality rates by manufacturer (≥10 crashes)
empirical_rates <- crashes_filtered %>%
  group_by(aircraft_manufacturer) %>%
  summarise(
    mean_rate = mean(fatality_rate, na.rm = TRUE),
    sd_rate = sd(fatality_rate, na.rm = TRUE),
    n = n(),
    se = ifelse(n > 1, sd_rate / sqrt(n), NA_real_),
    .groups = "drop"
  ) %>%
  mutate(
    lower_ci = mean_rate - 1.96 * se,
    upper_ci = mean_rate + 1.96 * se
  ) %>%
  arrange(mean_rate)

# Plot empirical rates
empirical_rates %>%
  mutate(aircraft_manufacturer = fct_reorder(aircraft_manufacturer, mean_rate)) %>%
  ggplot(aes(x = mean_rate, y = aircraft_manufacturer)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = lower_ci, xmax = upper_ci), height = 0.2) +
  geom_vline(
    xintercept = mean(crashes_filtered$fatality_rate), 
    linetype = "dashed", 
    color = "red"
  ) +
  labs(
    title = "Empirical Fatality Rates by Manufacturer",
    subtitle = "Data only (no pooling); dashed line = overall mean",
    x = "Fatality Rate", 
    y = "Manufacturer"
  ) +
  theme_minimal()

Data Modeling

We address these questions using hierarchical Bayesian modeling of fatality rates across aviation accidents from 1970 to 2024. In this approach we will assume that the aircrafts from the same manufacturers share some of the same characteristics, like similar design choices, engineering approaches, and safety systems. We still keep the information that all the aircrafts are similar with each other by having the high-level pooling for all the aircrafts. We will compare this partially-pooled model with two different models: fully pooled model and fully separate model.

Hierarchical Models

Figure 1: Hierarchical Model (partial pooling)

What are hierarchies? Levels of abstractions that group data points into groups
Examples of hierarchies plane crash dataset:
- manufacturer
- plane model
- time of year plane crashed (month)
A benefit of hierarchical models is their ability to estimate results for lower levels in the hierarchy even when there is limited data available. [2]
partial pooling

Non-Hierarchical Models

Figure 2: Non-Hierarchical Model (no pooling)

Figure 3: Non-Hierarchical Model (complete pooling)

Non-hierarchical models fit parameters for each opservation seperately
Manufacturers are not independent
decades are not independent
Countries/regions are not independent
Complete pooling/no pooling

Prior selection

Global intercept

For the intercept term \(\alpha\), we can use an informative prior by using the observed fatality rate of \(\approx 0.8\) from the data. We can use a normal distribution with the mean value coming from the observed fatality rate transformed onto the logit scale by

\(\operatorname{logit}(p) = \ln\!\left(\frac{p}{1-p}\right).\)

For example, \(\operatorname{logit}(0.8) \approx 1.386\), which we round to \(1.4\). For a weakly informative prior like ours, a standard deviation of \(1\) is a reasonable choice to keep the coefficient in a reasonable ballpark without adding too much constraint.

Year coefficient

For the \(\beta\) coefficient on the normalized year we can be more conservative. The fatality rate plot shows only little yearly change, so it is reasonable to constrain the \(\beta\) prior to smaller values. A normal distribution with mean \(0\) and standard deviation \(0.5\) seems reasonable. The 95% credible interval for \(\beta\) is then approximately \([-1.0, 1.0]\).

Group level standard deviation

In our hierarchical model we have an extra intercept term based on the aircraft manufacturer, which is drawn from a normal distribution with a mean of 0 and standard deviation of \(\sigma\). We need a appropriate prior for this group level standard deviation term \(\sigma\). Observing the manufacturer level data, we see that the fatality rates fall roughly between 0.6 and 0.95 on all manufacturers. We can assume that this interval corresponds to \(\pm2\sigma\), so 95% of the groups. Converting this to the logit scale, we can calculate

\(\sigma\): \(2\sigma \approx \frac{\operatorname{logit}(0.95) - \operatorname{logit}(0.6)}{2} \Rightarrow \sigma = \frac{\operatorname{logit}(0.95) - \operatorname{logit}(0.6)}{4} \approx 0.635\).

A halfnormal distribution with standard deviation of 0.6 should be a good prior choice. The mean of this distribution is \(E[X] = \sigma\sqrt{\frac{2}{\pi}} \approx 0.479\), which seems reasonable for our model.

Complete pooling (non-hierarchical model)

Complete Pooling Model

This model treats all plane crashes as coming from a single population, ignoring differences between countries, manufacturers, or time periods.

Fit Complete Pooling Model

We’ll model the fatality rate using a binomial model with logit link. This approach: - Naturally handles proportions including 0s and 1s - Models fatalities out of total aboard (respects the actual counts) - Uses the logit transformation internally - More appropriate for count data than beta regression

# Check the data structure
crashes_filtered %>%
  summarise(
    n_zeros = sum(fatalities == 0),
    n_all_died = sum(fatalities == aboard),
    min_rate = min(fatality_rate),
    max_rate = max(fatality_rate)
  )

# A tibble: 1 × 4
  n_zeros n_all_died min_rate max_rate
    <int>      <int>    <dbl>    <dbl>
1      34       1063        0        1

complete_pool_formula <- bf(
  fatalities | trials(aboard) ~ 1 + year_scaled,
  family = binomial(link = "logit")
  )

priors <- c(
  prior(normal(1.4, 1), class = Intercept),
  prior(normal(0, 0.5), class = b)
  )

# Fit complete pooling model with binomial family
complete_pool_model <- brm(
  formula = complete_pool_formula,
  prior = priors,
  data = crashes_filtered,
  seed = 42,
  refresh = 0,
  cores = 4
)

Model Summary

summary(complete_pool_model)

 Family: binomial 
  Links: mu = logit 
Formula: fatalities | trials(aboard) ~ 1 + year_scaled 
   Data: crashes_filtered (Number of observations: 1766) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       0.81      0.02     0.78     0.84 1.00     3641     2792
year_scaled    -0.18      0.02    -0.21    -0.15 1.00     3901     2809

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior Predictive Checks

# Check model fit
pp_check(complete_pool_model, ndraws = 100)

Visualize Predictions

# Create prediction data
# For binomial models, we need to specify the number of trials
# We'll use the mean number aboard as a reference point
pred_data <- tibble(
  year_scaled = seq(min(crashes_filtered$year_scaled), 
                    max(crashes_filtered$year_scaled), 
                    length.out = 100),
  aboard = round(mean(crashes_filtered$aboard))  # Use mean as reference
)

# Generate predictions (these are expected counts of fatalities)
predictions <- pred_data %>%
  add_epred_draws(complete_pool_model, ndraws = 1000) %>%
  mutate(.epred_rate = .epred / aboard)  # Convert to rate for plotting

# Plot predictions with uncertainty
ggplot() +
  stat_lineribbon(
    data = predictions,
    aes(x = year_scaled, y = .epred_rate),
    .width = c(0.50, 0.80, 0.95),
    alpha = 0.5
  ) +
  geom_point(
    data = crashes_filtered,
    aes(x = year_scaled, y = fatality_rate),
    alpha = 0.2,
    size = 1
  ) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "Complete Pooling Model: Fatality Rate Over Time",
    subtitle = paste0("Predictions shown for flights with ", 
                     round(mean(crashes_filtered$aboard)), " people aboard (mean)"),
    x = "Year (Standardized)",
    y = "Fatality Rate",
    fill = "Credible Interval"
  ) +
  theme_minimal()

Extract and Interpret Coefficients

# Extract posterior samples
posterior_samples <- as_draws_df(complete_pool_model)

# Summarize key parameters on logit scale
posterior_samples %>%
  select(b_Intercept, b_year_scaled) %>%
  pivot_longer(everything(), names_to = "parameter", values_to = "value") %>%
  group_by(parameter) %>%
  summarise(
    mean = mean(value),
    sd = sd(value),
    q025 = quantile(value, 0.025),
    q975 = quantile(value, 0.975)
  )

# A tibble: 2 × 5
  parameter       mean     sd   q025   q975
  <chr>          <dbl>  <dbl>  <dbl>  <dbl>
1 b_Intercept    0.813 0.0165  0.780  0.845
2 b_year_scaled -0.180 0.0156 -0.212 -0.150

# Convert intercept to probability scale for interpretation
posterior_samples %>%
  mutate(baseline_prob = plogis(b_Intercept)) %>%
  summarise(
    mean_baseline_fatality_rate = mean(baseline_prob),
    q025 = quantile(baseline_prob, 0.025),
    q975 = quantile(baseline_prob, 0.975)
  )

# A tibble: 1 × 3
  mean_baseline_fatality_rate  q025  q975
                        <dbl> <dbl> <dbl>
1                       0.693 0.686 0.699

Model Interpretation

The complete pooling model assumes: - All crashes share the same underlying fatality rate pattern - Temporal trend captured by year_scaled coefficient - No variation between countries, manufacturers, or other grouping factors - This serves as a baseline for comparison with partial pooling (hierarchical) models

No pooling (non-hierarchical model)

No Pooling Model

This model treats each aircraft manufacturer completely separately, fitting independent fatality rate patterns for each manufacturer without borrowing any information between groups.

Fit No Pooling Model

We’ll fit separate binomial models per manufacturer using fixed effects. This approach: - Gives each manufacturer its own intercept and temporal trend - Uses no information sharing (no pooling) between manufacturers - Requires sufficient data per manufacturer (minimum 10 crashes) - Uses the same binomial family and logit link as other models

# Check the data structure
crashes_filtered %>%
  summarise(
    n_manufacturers = n_distinct(aircraft_manufacturer),
    n_crashes = n(),
    n_zeros = sum(fatalities == 0),
    n_all_died = sum(fatalities == aboard),
    min_rate = min(fatality_rate),
    max_rate = max(fatality_rate)
  )

# A tibble: 1 × 6
  n_manufacturers n_crashes n_zeros n_all_died min_rate max_rate
            <int>     <int>   <int>      <int>    <dbl>    <dbl>
1              40      1766      34       1063        0        1

# Fit no-pooling model with separate intercept and slope per manufacturer
# Using 0 to remove the global intercept, then add manufacturer-specific intercepts
no_pool_formula <- bf(
    fatalities | trials(aboard) ~ 0 + aircraft_manufacturer + aircraft_manufacturer:year_scaled,
    family = binomial(link = "logit")
    )

priors <- c(
        prior(normal(0, 1), class = b) # Weakly informative prior for all
        )

no_pool_model <- brm(
    formula = no_pool_formula,
    prior = priors,
  data = crashes_filtered,
    seed = 42,
    refresh = 0,
    cores = 4
)

Model Summary

summary(no_pool_model)

 Family: binomial 
  Links: mu = logit 
Formula: fatalities | trials(aboard) ~ 0 + aircraft_manufacturer + aircraft_manufacturer:year_scaled 
   Data: crashes_filtered (Number of observations: 1766) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                                                   Estimate Est.Error l-95% CI
aircraft_manufacturerAerospatiale                      1.73      0.54     0.69
aircraft_manufacturerAirbus                            0.09      0.07    -0.05
aircraft_manufacturerAntonov                           1.68      0.08     1.52
aircraft_manufacturerATR                               0.35      0.40    -0.44
aircraft_manufacturerBeechcraft                        1.84      0.42     1.05
aircraft_manufacturerBell                              1.94      0.41     1.14
aircraft_manufacturerBoeing                            0.72      0.03     0.67
aircraft_manufacturerBritishAerospace                 -0.75      0.43    -1.59
aircraft_manufacturerBritten                           1.27      0.29     0.70
aircraft_manufacturerBrittenNorman                     0.74      0.37     0.00
aircraft_manufacturerCanadair                          0.35      0.54    -0.72
aircraft_manufacturerCASA                              1.52      0.43     0.71
aircraft_manufacturerCASA212Aviocar                   -0.25      0.48    -1.22
aircraft_manufacturerCessna                            1.51      0.45     0.65
aircraft_manufacturerConvair                           0.90      0.22     0.48
aircraft_manufacturerDeHavilland                       1.75      0.30     1.16
aircraft_manufacturerdeHavillandCanada                 0.91      0.11     0.69
aircraft_manufacturerDornier                          -0.11      0.46    -1.02
aircraft_manufacturerDouglas                           1.40      0.12     1.17
aircraft_manufacturerEmbraer                           1.25      0.36     0.56
aircraft_manufacturerEmbraer110P1                      1.97      0.48     1.05
aircraft_manufacturerFairchild                         3.17      0.31     2.54
aircraft_manufacturerFokker                            1.97      0.11     1.76
aircraft_manufacturerGrumman                          -0.02      0.34    -0.69
aircraft_manufacturerHawkerSiddeley                    0.39      0.19     0.02
aircraft_manufacturerIlyushin                          1.84      0.10     1.64
aircraft_manufacturerLearjet                           1.31      0.64     0.11
aircraft_manufacturerLet                               0.08      0.40    -0.71
aircraft_manufacturerLockheed                          0.16      0.07     0.02
aircraft_manufacturerMcDonnellDouglas                  0.05      0.04    -0.04
aircraft_manufacturerMi                               -0.84      0.34    -1.50
aircraft_manufacturerMil                               4.34      0.35     3.68
aircraft_manufacturerPilatus                           1.04      0.59    -0.09
aircraft_manufacturerPiper                             1.61      0.39     0.84
aircraft_manufacturerShorts                           -0.25      0.39    -1.00
aircraft_manufacturerSikorsky                          0.76      0.21     0.37
aircraft_manufacturerSwearingen                        2.34      0.56     1.28
aircraft_manufacturerTupolev                           0.98      0.06     0.86
aircraft_manufacturerVickersViscount                   2.42      0.34     1.75
aircraft_manufacturerYakovlev                          0.60      0.15     0.32
aircraft_manufacturerAerospatiale:year_scaled         -0.14      0.54    -1.19
aircraft_manufacturerAirbus:year_scaled                0.04      0.05    -0.07
aircraft_manufacturerAntonov:year_scaled              -0.22      0.08    -0.37
aircraft_manufacturerATR:year_scaled                   0.55      0.24     0.06
aircraft_manufacturerBeechcraft:year_scaled            0.42      0.41    -0.37
aircraft_manufacturerBell:year_scaled                  0.81      0.47    -0.05
aircraft_manufacturerBoeing:year_scaled               -0.34      0.03    -0.39
aircraft_manufacturerBritishAerospace:year_scaled      1.32      0.46     0.48
aircraft_manufacturerBritten:year_scaled              -0.05      0.31    -0.65
aircraft_manufacturerBrittenNorman:year_scaled         0.31      0.37    -0.41
aircraft_manufacturerCanadair:year_scaled              0.66      0.37    -0.07
aircraft_manufacturerCASA:year_scaled                  0.16      0.37    -0.57
aircraft_manufacturerCASA212Aviocar:year_scaled        2.08      0.61     0.88
aircraft_manufacturerCessna:year_scaled               -0.74      0.47    -1.69
aircraft_manufacturerConvair:year_scaled              -0.65      0.26    -1.16
aircraft_manufacturerDeHavilland:year_scaled          -0.22      0.28    -0.77
aircraft_manufacturerdeHavillandCanada:year_scaled     0.15      0.11    -0.07
aircraft_manufacturerDornier:year_scaled               1.05      0.39     0.29
aircraft_manufacturerDouglas:year_scaled              -0.37      0.16    -0.70
aircraft_manufacturerEmbraer:year_scaled              -0.43      0.25    -0.94
aircraft_manufacturerEmbraer110P1:year_scaled         -0.14      0.59    -1.32
aircraft_manufacturerFairchild:year_scaled            -1.36      0.46    -2.23
aircraft_manufacturerFokker:year_scaled               -1.49      0.11    -1.71
aircraft_manufacturerGrumman:year_scaled               0.57      0.39    -0.18
aircraft_manufacturerHawkerSiddeley:year_scaled        0.03      0.26    -0.47
aircraft_manufacturerIlyushin:year_scaled             -0.07      0.10    -0.26
aircraft_manufacturerLearjet:year_scaled               0.69      0.57    -0.41
aircraft_manufacturerLet:year_scaled                   0.40      0.31    -0.20
aircraft_manufacturerLockheed:year_scaled              1.00      0.09     0.83
aircraft_manufacturerMcDonnellDouglas:year_scaled     -0.13      0.05    -0.23
aircraft_manufacturerMi:year_scaled                    1.86      0.29     1.29
aircraft_manufacturerMil:year_scaled                  -1.67      0.28    -2.24
aircraft_manufacturerPilatus:year_scaled               1.14      0.57     0.06
aircraft_manufacturerPiper:year_scaled                 0.12      0.44    -0.69
aircraft_manufacturerShorts:year_scaled                1.11      0.42     0.28
aircraft_manufacturerSikorsky:year_scaled              0.93      0.26     0.44
aircraft_manufacturerSwearingen:year_scaled           -0.74      0.42    -1.59
aircraft_manufacturerTupolev:year_scaled              -0.01      0.06    -0.13
aircraft_manufacturerVickersViscount:year_scaled       1.98      0.79     0.44
aircraft_manufacturerYakovlev:year_scaled              0.56      0.18     0.21
                                                   u-95% CI Rhat Bulk_ESS
aircraft_manufacturerAerospatiale                      2.82 1.00     4388
aircraft_manufacturerAirbus                            0.24 1.00     4580
aircraft_manufacturerAntonov                           1.84 1.00     4287
aircraft_manufacturerATR                               1.14 1.00     4337
aircraft_manufacturerBeechcraft                        2.69 1.00     4758
aircraft_manufacturerBell                              2.73 1.00     5301
aircraft_manufacturerBoeing                            0.78 1.00     4782
aircraft_manufacturerBritishAerospace                  0.06 1.00     4083
aircraft_manufacturerBritten                           1.83 1.00     4991
aircraft_manufacturerBrittenNorman                     1.47 1.00     3881
aircraft_manufacturerCanadair                          1.43 1.00     4927
aircraft_manufacturerCASA                              2.35 1.00     4415
aircraft_manufacturerCASA212Aviocar                    0.68 1.00     4495
aircraft_manufacturerCessna                            2.42 1.00     5163
aircraft_manufacturerConvair                           1.32 1.00     4538
aircraft_manufacturerDeHavilland                       2.36 1.00     4388
aircraft_manufacturerdeHavillandCanada                 1.12 1.00     4705
aircraft_manufacturerDornier                           0.80 1.00     4041
aircraft_manufacturerDouglas                           1.63 1.00     4582
aircraft_manufacturerEmbraer                           1.99 1.00     4337
aircraft_manufacturerEmbraer110P1                      2.93 1.00     4643
aircraft_manufacturerFairchild                         3.76 1.00     4884
aircraft_manufacturerFokker                            2.18 1.00     4525
aircraft_manufacturerGrumman                           0.64 1.00     4570
aircraft_manufacturerHawkerSiddeley                    0.75 1.00     4744
aircraft_manufacturerIlyushin                          2.03 1.00     4296
aircraft_manufacturerLearjet                           2.53 1.00     5016
aircraft_manufacturerLet                               0.88 1.00     4350
aircraft_manufacturerLockheed                          0.30 1.00     4534
aircraft_manufacturerMcDonnellDouglas                  0.13 1.00     4594
aircraft_manufacturerMi                               -0.19 1.00     4447
aircraft_manufacturerMil                               5.05 1.00     4283
aircraft_manufacturerPilatus                           2.22 1.00     4774
aircraft_manufacturerPiper                             2.38 1.00     4607
aircraft_manufacturerShorts                            0.51 1.00     5039
aircraft_manufacturerSikorsky                          1.17 1.00     4148
aircraft_manufacturerSwearingen                        3.49 1.00     4683
aircraft_manufacturerTupolev                           1.09 1.00     4889
aircraft_manufacturerVickersViscount                   3.10 1.00     5263
aircraft_manufacturerYakovlev                          0.89 1.00     3996
aircraft_manufacturerAerospatiale:year_scaled          0.94 1.00     4288
aircraft_manufacturerAirbus:year_scaled                0.15 1.00     4682
aircraft_manufacturerAntonov:year_scaled              -0.06 1.00     4501
aircraft_manufacturerATR:year_scaled                   1.03 1.00     4375
aircraft_manufacturerBeechcraft:year_scaled            1.24 1.00     4748
aircraft_manufacturerBell:year_scaled                  1.79 1.00     4831
aircraft_manufacturerBoeing:year_scaled               -0.28 1.00     4814
aircraft_manufacturerBritishAerospace:year_scaled      2.22 1.00     4037
aircraft_manufacturerBritten:year_scaled               0.53 1.00     4969
aircraft_manufacturerBrittenNorman:year_scaled         1.04 1.00     3646
aircraft_manufacturerCanadair:year_scaled              1.41 1.00     4848
aircraft_manufacturerCASA:year_scaled                  0.87 1.00     4498
aircraft_manufacturerCASA212Aviocar:year_scaled        3.30 1.00     4618
aircraft_manufacturerCessna:year_scaled                0.20 1.00     4999
aircraft_manufacturerConvair:year_scaled              -0.15 1.00     4355
aircraft_manufacturerDeHavilland:year_scaled           0.31 1.00     4535
aircraft_manufacturerdeHavillandCanada:year_scaled     0.37 1.00     4903
aircraft_manufacturerDornier:year_scaled               1.83 1.00     4398
aircraft_manufacturerDouglas:year_scaled              -0.05 1.00     4517
aircraft_manufacturerEmbraer:year_scaled               0.06 1.00     4177
aircraft_manufacturerEmbraer110P1:year_scaled          1.00 1.00     4832
aircraft_manufacturerFairchild:year_scaled            -0.42 1.00     4893
aircraft_manufacturerFokker:year_scaled               -1.28 1.00     4460
aircraft_manufacturerGrumman:year_scaled               1.36 1.00     4827
aircraft_manufacturerHawkerSiddeley:year_scaled        0.55 1.00     4572
aircraft_manufacturerIlyushin:year_scaled              0.13 1.00     4237
aircraft_manufacturerLearjet:year_scaled               1.82 1.00     5114
aircraft_manufacturerLet:year_scaled                   1.01 1.00     4394
aircraft_manufacturerLockheed:year_scaled              1.17 1.00     4420
aircraft_manufacturerMcDonnellDouglas:year_scaled     -0.03 1.00     4786
aircraft_manufacturerMi:year_scaled                    2.44 1.00     4329
aircraft_manufacturerMil:year_scaled                  -1.13 1.00     4285
aircraft_manufacturerPilatus:year_scaled               2.29 1.00     4881
aircraft_manufacturerPiper:year_scaled                 1.01 1.00     4644
aircraft_manufacturerShorts:year_scaled                1.93 1.00     4958
aircraft_manufacturerSikorsky:year_scaled              1.44 1.00     4206
aircraft_manufacturerSwearingen:year_scaled            0.07 1.00     4624
aircraft_manufacturerTupolev:year_scaled               0.11 1.00     5127
aircraft_manufacturerVickersViscount:year_scaled       3.60 1.00     5346
aircraft_manufacturerYakovlev:year_scaled              0.90 1.00     4167
                                                   Tail_ESS
aircraft_manufacturerAerospatiale                      3225
aircraft_manufacturerAirbus                            2970
aircraft_manufacturerAntonov                           2926
aircraft_manufacturerATR                               3301
aircraft_manufacturerBeechcraft                        3039
aircraft_manufacturerBell                              3269
aircraft_manufacturerBoeing                            3221
aircraft_manufacturerBritishAerospace                  2879
aircraft_manufacturerBritten                           2776
aircraft_manufacturerBrittenNorman                     3131
aircraft_manufacturerCanadair                          3055
aircraft_manufacturerCASA                              2954
aircraft_manufacturerCASA212Aviocar                    2873
aircraft_manufacturerCessna                            3062
aircraft_manufacturerConvair                           3165
aircraft_manufacturerDeHavilland                       2647
aircraft_manufacturerdeHavillandCanada                 2988
aircraft_manufacturerDornier                           2971
aircraft_manufacturerDouglas                           2802
aircraft_manufacturerEmbraer                           3275
aircraft_manufacturerEmbraer110P1                      3224
aircraft_manufacturerFairchild                         3467
aircraft_manufacturerFokker                            3064
aircraft_manufacturerGrumman                           3200
aircraft_manufacturerHawkerSiddeley                    2942
aircraft_manufacturerIlyushin                          3006
aircraft_manufacturerLearjet                           3207
aircraft_manufacturerLet                               3161
aircraft_manufacturerLockheed                          3416
aircraft_manufacturerMcDonnellDouglas                  2925
aircraft_manufacturerMi                                3154
aircraft_manufacturerMil                               3261
aircraft_manufacturerPilatus                           3017
aircraft_manufacturerPiper                             3389
aircraft_manufacturerShorts                            2913
aircraft_manufacturerSikorsky                          2757
aircraft_manufacturerSwearingen                        3193
aircraft_manufacturerTupolev                           2973
aircraft_manufacturerVickersViscount                   2836
aircraft_manufacturerYakovlev                          2980
aircraft_manufacturerAerospatiale:year_scaled          3050
aircraft_manufacturerAirbus:year_scaled                3305
aircraft_manufacturerAntonov:year_scaled               2978
aircraft_manufacturerATR:year_scaled                   3191
aircraft_manufacturerBeechcraft:year_scaled            3151
aircraft_manufacturerBell:year_scaled                  3086
aircraft_manufacturerBoeing:year_scaled                3123
aircraft_manufacturerBritishAerospace:year_scaled      2845
aircraft_manufacturerBritten:year_scaled               3027
aircraft_manufacturerBrittenNorman:year_scaled         3306
aircraft_manufacturerCanadair:year_scaled              2916
aircraft_manufacturerCASA:year_scaled                  3210
aircraft_manufacturerCASA212Aviocar:year_scaled        2857
aircraft_manufacturerCessna:year_scaled                3132
aircraft_manufacturerConvair:year_scaled               3237
aircraft_manufacturerDeHavilland:year_scaled           3196
aircraft_manufacturerdeHavillandCanada:year_scaled     2667
aircraft_manufacturerDornier:year_scaled               2628
aircraft_manufacturerDouglas:year_scaled               2647
aircraft_manufacturerEmbraer:year_scaled               2997
aircraft_manufacturerEmbraer110P1:year_scaled          3254
aircraft_manufacturerFairchild:year_scaled             2915
aircraft_manufacturerFokker:year_scaled                3013
aircraft_manufacturerGrumman:year_scaled               2942
aircraft_manufacturerHawkerSiddeley:year_scaled        3018
aircraft_manufacturerIlyushin:year_scaled              3232
aircraft_manufacturerLearjet:year_scaled               3385
aircraft_manufacturerLet:year_scaled                   3215
aircraft_manufacturerLockheed:year_scaled              3007
aircraft_manufacturerMcDonnellDouglas:year_scaled      2967
aircraft_manufacturerMi:year_scaled                    3272
aircraft_manufacturerMil:year_scaled                   3180
aircraft_manufacturerPilatus:year_scaled               3187
aircraft_manufacturerPiper:year_scaled                 3363
aircraft_manufacturerShorts:year_scaled                3000
aircraft_manufacturerSikorsky:year_scaled              2816
aircraft_manufacturerSwearingen:year_scaled            3177
aircraft_manufacturerTupolev:year_scaled               3232
aircraft_manufacturerVickersViscount:year_scaled       2883
aircraft_manufacturerYakovlev:year_scaled              3350

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior Predictive Checks

# Check model fit
pp_check(no_pool_model)

pp_check(no_pool_model, type = "scatter_avg", group = "aircraft_manufacturer")

pp_check(no_pool_model, type = "stat", stat = "mean")

pp_check(no_pool_model, type = "stat", stat = "sd")

Visualize Predictions

# Select top manufacturers for visualization
top_manufacturers <- crashes_filtered %>%
  count(aircraft_manufacturer, sort = TRUE) %>%
  slice_head(n = 9) %>%
  pull(aircraft_manufacturer)

# Create prediction data for these manufacturers
pred_data <- expand_grid(
  aircraft_manufacturer = top_manufacturers,
  year_scaled = seq(min(crashes_filtered$year_scaled), 
                    max(crashes_filtered$year_scaled), 
                    length.out = 100)
) %>%
  mutate(aboard = round(mean(crashes_filtered$aboard)))

# Generate predictions (these are expected counts of fatalities)
predictions <- pred_data %>%
  add_epred_draws(no_pool_model, ndraws = 1000) %>%
  mutate(.epred_rate = .epred / aboard)  # Convert to rate for plotting

# Plot predictions with uncertainty, faceted by manufacturer
ggplot() +
  stat_lineribbon(
    data = predictions,
    aes(x = year_scaled, y = .epred_rate),
    .width = c(0.50, 0.80, 0.95),
    alpha = 0.5
  ) +
  geom_point(
    data = crashes_filtered %>% filter(aircraft_manufacturer %in% top_manufacturers),
    aes(x = year_scaled, y = fatality_rate),
    alpha = 0.2,
    size = 1
  ) +
  facet_wrap(~ aircraft_manufacturer, scales = "free_y", ncol = 3) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "No Pooling Model: Manufacturer-Specific Fatality Rates",
    subtitle = "Each manufacturer has independent intercept and slope (top 9 by crash count)",
    x = "Year (Standardized)",
    y = "Fatality Rate",
    fill = "Credible Interval"
  ) +
  theme_minimal()

Extract and Interpret Coefficients

# Extract posterior samples
posterior_samples <- as_draws_df(no_pool_model)

# Get coefficient names
coef_names <- colnames(posterior_samples)[grep("^b_", colnames(posterior_samples))]

# Summarize all coefficients
posterior_samples %>%
  select(all_of(coef_names)) %>%
  pivot_longer(everything(), names_to = "parameter", values_to = "value") %>%
  group_by(parameter) %>%
  summarise(
    mean = mean(value),
    sd = sd(value),
    q025 = quantile(value, 0.025),
    q975 = quantile(value, 0.975),
    .groups = "drop"
  ) %>%
  arrange(mean)

# A tibble: 80 × 5
   parameter                                       mean    sd   q025    q975
   <chr>                                          <dbl> <dbl>  <dbl>   <dbl>
 1 b_aircraft_manufacturerMil:year_scaled        -1.67  0.280 -2.24  -1.13  
 2 b_aircraft_manufacturerFokker:year_scaled     -1.49  0.107 -1.71  -1.28  
 3 b_aircraft_manufacturerFairchild:year_scaled  -1.36  0.462 -2.23  -0.424 
 4 b_aircraft_manufacturerMi                     -0.843 0.336 -1.50  -0.194 
 5 b_aircraft_manufacturerBritishAerospace       -0.747 0.426 -1.59   0.0590
 6 b_aircraft_manufacturerSwearingen:year_scaled -0.738 0.419 -1.59   0.0681
 7 b_aircraft_manufacturerCessna:year_scaled     -0.737 0.472 -1.69   0.200 
 8 b_aircraft_manufacturerConvair:year_scaled    -0.647 0.261 -1.16  -0.145 
 9 b_aircraft_manufacturerEmbraer:year_scaled    -0.433 0.255 -0.936  0.0596
10 b_aircraft_manufacturerDouglas:year_scaled    -0.367 0.162 -0.696 -0.0543
# ℹ 70 more rows

# Convert manufacturer intercepts to probability scale for interpretation
# Extract just the main effects (intercepts) for top manufacturers
manuf_intercepts <- posterior_samples %>%
  select(starts_with("b_aircraft_manufacturer")) %>%
  select(!ends_with("year_scaled")) %>%
  pivot_longer(everything(), names_to = "manufacturer", values_to = "logit_value") %>%
  mutate(
    manufacturer = str_remove(manufacturer, "^b_aircraft_manufacturer"),
    probability = plogis(logit_value)
  ) %>%
  group_by(manufacturer) %>%
  summarise(
    mean_fatality_rate = mean(probability),
    q025 = quantile(probability, 0.025),
    q975 = quantile(probability, 0.975),
    .groups = "drop"
  ) %>%
  arrange(desc(mean_fatality_rate))

print(manuf_intercepts)

# A tibble: 40 × 4
   manufacturer    mean_fatality_rate  q025  q975
   <chr>                        <dbl> <dbl> <dbl>
 1 Mil                          0.986 0.975 0.994
 2 Fairchild                    0.958 0.927 0.977
 3 VickersViscount              0.915 0.852 0.957
 4 Swearingen                   0.902 0.783 0.971
 5 Fokker                       0.877 0.853 0.899
 6 Embraer110P1                 0.868 0.740 0.949
 7 Bell                         0.868 0.758 0.939
 8 Ilyushin                     0.862 0.837 0.884
 9 Beechcraft                   0.855 0.740 0.936
10 DeHavilland                  0.848 0.761 0.913
# ℹ 30 more rows

Model Interpretation

The no-pooling model assumes: - Each manufacturer has its own independent fatality rate pattern - No information is shared between manufacturers (no pooling) - Temporal trends vary by manufacturer - This provides maximum flexibility but requires sufficient data per manufacturer - Comparison point: shows what the data suggests when we allow complete independence

Partial pooling (hierarchical model)

hierarchical_formula <- bf(
  fatalities | trials(aboard) ~ 1 + year_scaled + (1 + year_scaled | aircraft_manufacturer),
  family = binomial(link = "logit")
  )

priors <- c(
    prior(normal(1.4, 1), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(normal(0, 0.6), class = sd), # brms uses halfnormal automatically for sd 
    prior(lkj(2), class = cor)
  )

# Specify hierarchical model with filtered manufacturers
partial_pool_model <- brm(
  formula = hierarchical_formula,
  prior = priors,
  data = crashes_filtered,
  seed = 42,
  control = list(adapt_delta = 0.95),  # Can lower this now
  refresh = 0,
  cores = 4
)

Model 3 Diagnostics

# Convergence diagnostics summary
summary(partial_pool_model)

 Family: binomial 
  Links: mu = logit 
Formula: fatalities | trials(aboard) ~ 1 + year_scaled + (1 + year_scaled | aircraft_manufacturer) 
   Data: crashes_filtered (Number of observations: 1766) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~aircraft_manufacturer (Number of levels: 40) 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)                  1.21      0.14     0.97     1.51 1.00     1023
sd(year_scaled)                0.91      0.12     0.70     1.16 1.00      763
cor(Intercept,year_scaled)    -0.74      0.08    -0.87    -0.56 1.00      962
                           Tail_ESS
sd(Intercept)                  1539
sd(year_scaled)                1309
cor(Intercept,year_scaled)     1716

Regression Coefficients:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       1.25      0.20     0.86     1.63 1.01      430      815
year_scaled     0.04      0.16    -0.27     0.35 1.01      543     1188

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# Trace plots for key parameters
mcmc_plot(partial_pool_model, type = "trace", 
          variable = c("b_Intercept", "b_year_scaled", 
                      "sd_aircraft_manufacturer__Intercept",
                      "sd_aircraft_manufacturer__year_scaled"))

# Posterior predictive check
pp_check(partial_pool_model, ndraws = 100, type = "dens_overlay") +
  labs(title = "Posterior Predictive Check: Partial Pooling Model",
       x = "Number of Fatalities",
       y = "Density")

Model comparison

no_pool_loo <- loo(no_pool_model)
complete_pool_loo <- loo(complete_pool_model)
partial_pool_loo <- loo(partial_pool_model)

# Compare models
loo_compare(no_pool_loo, complete_pool_loo, partial_pool_loo)

                    elpd_diff se_diff
no_pool_model           0.0       0.0
partial_pool_model     -3.2      14.7
complete_pool_model -1953.3     692.1

Discussion

known issues
potential improvements

Conclusions

What was learned?

Self reflection

What did the group learn during the project?

References

[1]

Jogwums, “Air crashes full data (1908-2024).” Kaggle, 2024. Available: https://www.kaggle.com/datasets/jogwums/air-crashes-full-data-1908-2023

[2]

Sellforte, “Compared: Bayesian hierarchical vs non-hierarchical modeling.” [Online]. Available: https://sellforte.com/blog/compared-bayesian-hierarchical-vs-non-hierarchical-modeling