Distributions Guide
PAL provides a comprehensive set of statistical distributions for actuarial modelling. This tutorial covers how to choose, parameterise and use them.
Setup
import numpy as np
from pal import config, distributions, set_random_seed
config.n_sims = 10_000
set_random_seed(42)
Generating Samples
Every distribution has a generate() method that returns a
StochasticScalar — a vector of simulated values:
loss = distributions.LogNormal(mu=10, sigma=1.5).generate()
loss.mean() # => 68,673
loss.std() # => 205,459
np.percentile(loss.values, 99.5) # => 1,111,353
The number of samples is controlled by config.n_sims (default
100,000).
Analytical Functions
Distributions also provide cdf() and invcdf() without needing to
generate samples:
ln = distributions.LogNormal(mu=10, sigma=1.5)
ln.cdf(50_000) # => 0.7076 (P(X ≤ 50,000))
ln.invcdf(0.5) # => 22,026 (median)
ln.invcdf(0.995) # => 1,049,416 (99.5th percentile)
These are useful for quick calculations, curve-fitting checks and validating simulation results.
Available Distributions
Severity (Continuous) Distributions
Distribution |
Parameters |
Typical Use |
|---|---|---|
|
|
Attritional losses, claim sizes |
|
|
Aggregate losses, waiting times |
|
|
Large/catastrophe losses |
|
|
Excess losses above a threshold |
|
|
Heavy-tailed loss distributions |
|
|
Time-to-failure, survival analysis |
|
|
Symmetric risks, economic variables |
|
|
Loss ratios, probabilities |
|
|
Inter-arrival times, simple decay |
|
|
Income distributions, survival |
|
|
Growth models |
|
|
Equal-likelihood scenarios |
|
|
Bayesian priors |
|
|
Heavy-tailed alternatives |
|
|
Flexible heavy tails |
|
|
Heavy-tailed alternatives |
|
|
Extreme value modelling |
|
|
Extreme value modelling |
Frequency (Discrete) Distributions
Distribution |
Parameters |
Typical Use |
|---|---|---|
|
|
Claim counts (fixed exposure) |
|
|
Over-dispersed claim counts |
|
|
Events out of fixed trials |
|
|
Sampling without replacement |
Comparing Severity Distributions
The choice of severity distribution significantly affects tail behaviour. Here are several distributions simulated with similar central tendency but very different tails:
Distribution Mean Std 99.5th
------------------------------------------------------------------------
LogNormal(mu=10, sigma=1.5) 68,673 205,459 1,111,353
Gamma(alpha=5, theta=1000) 5,020 2,247 12,551
Pareto(shape=2, scale=10000) 20,107 32,032 149,509
GPD(shape=0.5, scale=1000, loc=0) 2,021 6,406 27,902
Weibull(shape=1.5, scale=1000) 898 615 3,082
Key observations:
LogNormal has the heaviest tail — the 99.5th percentile is 16× the mean. Suitable for large-loss classes where extreme events dominate.
Pareto also has a heavy tail (99.5th is 7.4× the mean) but its minimum value is bounded by the scale parameter.
GPD is the natural choice for modelling excesses above a threshold (peaks-over-threshold approach).
Gamma is lighter-tailed (99.5th is only 2.5× the mean) and suited for aggregate losses or attritional classes.
Weibull is even lighter — useful for modelling time-to-failure or operational risks.
Comparing Frequency Distributions
Distribution Mean Std Max
--------------------------------------------------------------
Poisson(mean=5) 5.0 2.3 15
Poisson(mean=50) 50.0 7.0 75
NegBinomial(n=5, p=0.5) 5.0 3.2 22
Binomial(n=100, p=0.1) 10.0 3.0 24
Poisson — variance equals the mean. Standard choice when claims arrive independently at a constant rate.
Negative Binomial — variance exceeds the mean (over-dispersed). Use when there is parameter uncertainty or heterogeneity in the claim arrival rate.
Binomial — bounded count (0 to n). Use when there is a fixed number of exposures and each can generate at most one claim.
Choosing a Severity Distribution
A practical decision tree:
Do you have data above a threshold? →
GPD(peaks over threshold)Is the tail very heavy (power-law)? →
ParetoorBurrIs the distribution right-skewed with moderate tail? →
LogNormalorGammaIs it symmetric? →
NormalorLogisticIs it bounded between 0 and 1? →
BetaModelling time or duration? →
WeibullorExponential
Stochastic Parameters
Distribution parameters can themselves be stochastic. Pass a
StochasticScalar as a parameter to create a mixed distribution:
set_random_seed(42)
# Uncertain claim rate: mean is itself random
uncertain_rate = distributions.Gamma(alpha=25, theta=2).generate()
claims = distributions.Poisson(mean=uncertain_rate).generate()
This produces over-dispersed counts because the Poisson mean varies across simulations, adding an extra layer of variability (this is equivalent to a Negative Binomial in the Poisson-Gamma case).
Working with Generated Variables
StochasticScalar objects support standard arithmetic and numpy
operations:
loss = distributions.LogNormal(mu=14, sigma=0.5).generate()
# Arithmetic
with_expenses = loss * 1.10
capped = np.minimum(loss, 5_000_000)
# Statistics
loss.mean()
loss.std()
np.percentile(loss.values, [25, 50, 75, 95, 99, 99.5])
# Visualisation
loss.show_cdf("Loss Distribution")
See Also
Getting Started — first steps with PAL
Frequency-Severity Modelling — combining frequency and severity distributions
Coupling Groups, Copulas and Variable Reordering — adding dependencies between variables