Distributions Guide

PAL provides a comprehensive set of statistical distributions for actuarial modelling. This tutorial covers how to choose, parameterise and use them.

Setup

import numpy as np

from pal import config, distributions, set_random_seed

config.n_sims = 10_000
set_random_seed(42)

Generating Samples

Every distribution has a generate() method that returns a StochasticScalar — a vector of simulated values:

loss = distributions.LogNormal(mu=10, sigma=1.5).generate()

loss.mean()                           # => 68,673
loss.std()                            # => 205,459
np.percentile(loss.values, 99.5)      # => 1,111,353

The number of samples is controlled by config.n_sims (default 100,000).

Analytical Functions

Distributions also provide cdf() and invcdf() without needing to generate samples:

ln = distributions.LogNormal(mu=10, sigma=1.5)

ln.cdf(50_000)       # => 0.7076  (P(X ≤ 50,000))
ln.invcdf(0.5)       # => 22,026  (median)
ln.invcdf(0.995)     # => 1,049,416  (99.5th percentile)

These are useful for quick calculations, curve-fitting checks and validating simulation results.

Available Distributions

Severity (Continuous) Distributions

Distribution	Parameters	Typical Use
`LogNormal`	`mu`, `sigma`	Attritional losses, claim sizes
`Gamma`	`alpha`, `theta`, `loc=0`	Aggregate losses, waiting times
`Pareto`	`shape`, `scale`	Large/catastrophe losses
`GPD`	`shape`, `scale`, `loc`	Excess losses above a threshold
`Burr`	`power`, `shape`, `scale`, `loc`	Heavy-tailed loss distributions
`Weibull`	`shape`, `scale`, `loc=0`	Time-to-failure, survival analysis
`Normal`	`mu`, `sigma`	Symmetric risks, economic variables
`Beta`	`alpha`, `beta`, `scale=1`, `loc=0`	Loss ratios, probabilities
`Exponential`	`scale`, `loc=0`	Inter-arrival times, simple decay
`LogLogistic`	`shape`, `scale`, `loc=0`	Income distributions, survival
`Logistic`	`mu`, `sigma`	Growth models
`Uniform`	`a`, `b`	Equal-likelihood scenarios
`InverseGamma`	`alpha`, `theta`, `loc=0`	Bayesian priors
`Paralogistic`	`shape`, `scale`, `loc=0`	Heavy-tailed alternatives
`InverseBurr`	`power`, `shape`, `scale`, `loc`	Flexible heavy tails
`InverseParalogistic`	`shape`, `scale`, `loc=0`	Heavy-tailed alternatives
`InverseWeibull`	`shape`, `scale`, `loc=0`	Extreme value modelling
`InverseExponential`	`scale`, `loc=0`	Extreme value modelling

Frequency (Discrete) Distributions

Distribution	Parameters	Typical Use
`Poisson`	`mean`	Claim counts (fixed exposure)
`NegBinomial`	`n`, `p`	Over-dispersed claim counts
`Binomial`	`n`, `p`	Events out of fixed trials
`HyperGeometric`	`ngood`, `nbad`, `population_size`	Sampling without replacement

Comparing Severity Distributions

The choice of severity distribution significantly affects tail behaviour. Here are several distributions simulated with similar central tendency but very different tails:

Distribution                               Mean          Std        99.5th
------------------------------------------------------------------------
LogNormal(mu=10, sigma=1.5)              68,673      205,459     1,111,353
Gamma(alpha=5, theta=1000)                5,020        2,247        12,551
Pareto(shape=2, scale=10000)             20,107       32,032       149,509
GPD(shape=0.5, scale=1000, loc=0)         2,021        6,406        27,902
Weibull(shape=1.5, scale=1000)              898          615         3,082

Key observations:

LogNormal has the heaviest tail — the 99.5th percentile is 16× the mean. Suitable for large-loss classes where extreme events dominate.
Pareto also has a heavy tail (99.5th is 7.4× the mean) but its minimum value is bounded by the scale parameter.
GPD is the natural choice for modelling excesses above a threshold (peaks-over-threshold approach).
Gamma is lighter-tailed (99.5th is only 2.5× the mean) and suited for aggregate losses or attritional classes.
Weibull is even lighter — useful for modelling time-to-failure or operational risks.

Comparing Frequency Distributions

Distribution                             Mean     Std      Max
--------------------------------------------------------------
Poisson(mean=5)                           5.0     2.3       15
Poisson(mean=50)                         50.0     7.0       75
NegBinomial(n=5, p=0.5)                   5.0     3.2       22
Binomial(n=100, p=0.1)                   10.0     3.0       24

Poisson — variance equals the mean. Standard choice when claims arrive independently at a constant rate.
Negative Binomial — variance exceeds the mean (over-dispersed). Use when there is parameter uncertainty or heterogeneity in the claim arrival rate.
Binomial — bounded count (0 to n). Use when there is a fixed number of exposures and each can generate at most one claim.

Choosing a Severity Distribution

A practical decision tree:

Do you have data above a threshold? → GPD (peaks over threshold)
Is the tail very heavy (power-law)? → Pareto or Burr
Is the distribution right-skewed with moderate tail? → LogNormal or Gamma
Is it symmetric? → Normal or Logistic
Is it bounded between 0 and 1? → Beta
Modelling time or duration? → Weibull or Exponential

Stochastic Parameters

Distribution parameters can themselves be stochastic. Pass a StochasticScalar as a parameter to create a mixed distribution:

set_random_seed(42)

# Uncertain claim rate: mean is itself random
uncertain_rate = distributions.Gamma(alpha=25, theta=2).generate()
claims = distributions.Poisson(mean=uncertain_rate).generate()

This produces over-dispersed counts because the Poisson mean varies across simulations, adding an extra layer of variability (this is equivalent to a Negative Binomial in the Poisson-Gamma case).

Working with Generated Variables

StochasticScalar objects support standard arithmetic and numpy operations:

loss = distributions.LogNormal(mu=14, sigma=0.5).generate()

# Arithmetic
with_expenses = loss * 1.10
capped = np.minimum(loss, 5_000_000)

# Statistics
loss.mean()
loss.std()
np.percentile(loss.values, [25, 50, 75, 95, 99, 99.5])

# Visualisation
loss.show_cdf("Loss Distribution")