Variables

Multi-dimensional stochastic variables for actuarial modeling.

This module provides the ProteusVariable class, which represents multi-dimensional stochastic variables commonly used in actuarial and risk modeling. A ProteusVariable can contain different types of stochastic objects across multiple dimensions, enabling complex risk factor modeling.

Key features: - Multi-dimensional stochastic variables with named dimensions - Support for various stochastic types (StochasticScalar, FreqSevSims, etc.) - Mathematical operations across dimensions and simulations - Correlation analysis and upsampling capabilities - Export functionality for analysis and reporting

NOTE: The serialization/deserialization methods (from_csv, from_dict, from_series)

are currently incomplete and have significant limitations. A comprehensive codec system is planned to address these issues. See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

The ProteusVariable is designed for actuarial applications such as: - Multi-factor risk modeling (e.g., frequency, severity, inflation) - Portfolio-level aggregation across risk dimensions - Scenario analysis with correlated risk factors - Capital modeling with interdependent variables

Example

>>> from pal.stochastic_scalar import StochasticScalar
>>> from pal.frequency_severity import FreqSevSims
>>>
>>> # Create a multi-dimensional risk variable
>>> risk_var = ProteusVariable(
...     dim_name="insurance_risk",
...     values={
...         "frequency": StochasticScalar([10, 12, 8, 15]),
...         "severity": StochasticScalar([5000, 6000, 4500, 7000]),
...         "expense_ratio": StochasticScalar([0.3, 0.32, 0.28, 0.35])
...     }
... )
>>> total_cost = (
...     risk_var["frequency"]
...     * risk_var["severity"]
...     * (1 + risk_var["expense_ratio"])
... )
class pal.variables.ProteusVariable(dim_name, values)[source]

Bases: Generic[T]

A generic, homogeneous container for multivariate variables in simulations.

ProteusVariable is a hierarchical structure that holds multiple variables of the SAME type (homogeneous container). Each instance must contain either all scalars, all vectors (like StochasticScalar), or all nested ProteusVariables - but never a mix of different types.

Type Parameter:
T: The type of values stored. By convention, T should be a ScalarOrVector

type (NumericLike | VectorLike), though the parameter is unconstrained to allow flexible type inference. Usage with non-ScalarOrVector types may not be fully supported by all operations.

Key Features: - Homogeneous: All values in a single instance must be the same type.

Like List[T], you cannot mix types within one container.

  • Type Safety: Operations like mean() return type T, preserving type information through the computation.

  • Nesting: ProteusVariable containing ProteusVariable enables hierarchical data structures (e.g., risks by region by peril)

  • Dictionary Access: Sub-elements accessed via [] notation with string keys or integer indices

Examples

>>> # Homogeneous scalar container
>>> scalar_risks = ProteusVariable(
...     dim_name="risk_amounts",
...     values={"fire": 100000, "flood": 200000}  # All int
... )
>>> # Homogeneous vector container
>>> vector_risks = ProteusVariable(
...     dim_name="stochastic_losses",
...     values={
...         "fire": StochasticScalar([100, 200, 300]),
...         "flood": StochasticScalar([150, 250, 350])
...     }  # All StochasticScalar
... )
>>> # Homogeneous nested container
>>> nested_risks = ProteusVariable(
...     dim_name="regions",
...     values={
...         "north": scalar_risks,
...         "south": scalar_risks
...     }  # All ProteusVariable instances
... )
>>> # INVALID - mixing types not allowed
>>> # mixed = ProteusVariable(values={"a": 100, "b": StochasticScalar([1])})
>>> # This would violate homogeneity and cause type errors

Note: Statistical operations should be performed using numpy and scipy functions directly on ProteusVariable instances. For example: - Use np.percentile(variable, p) - Use np.mean(variable) - Use pal.stats.tvar(variable, p)

__init__(dim_name, values)[source]

Initialize a ProteusVariable.

Parameters:
  • dim_name (str) – Name of the dimension.

  • values (dict[str, TypeVar(T)]) – A dict containing variables that must support PAL variable operations.

Raises:

TypeError – If values is not a mapping type.

dim_name: str
values: dict[str, T]
dimensions: list[str]
count(value)[source]

Count occurrences of value in the container.

Required for Sequence protocol compatibility.

Return type:

int

index(value, start=0, stop=None)[source]

Return index of first occurrence of value.

Required for Sequence protocol compatibility.

Raises:

ValueError – If value is not found.

Return type:

int

get_value_at_sim(sim_no)[source]

Get values at specific simulation number(s).

Parameters:

sim_no (int | StochasticScalar) – Simulation index(es) to extract. Can be a single numeric value, a list of integers, or a VectorLike object such as StochasticScalar.

Return type:

ProteusVariable[Union[TypeVar(T), StochasticScalar]]

Returns:

A new ProteusVariable with values at the specified simulation indices.

upsample(n_sims)[source]

Upsample the variable to the specified number of simulations.

Return type:

ProteusVariable[TypeVar(T)]

sum()[source]

Return the sum across the outer dimension.

Return type:

TypeVar(T)

validate_freqsev_consistency(_is_nested=False)[source]

Validate that all FreqSevSims have consistent sim_index.

When a ProteusVariable contains multiple FreqSevSims objects, operations like sum() or aggregation require that all FreqSevSims have identical simulation indices for meaningful results. This method recursively checks for that consistency across nested ProteusVariable structures.

All leaf values in the ProteusVariable tree must be FreqSevSims with matching simulation indices. Nested ProteusVariable structures are supported and will be recursively validated.

Use this validation before performing aggregation operations on ProteusVariable instances containing FreqSevSims to ensure the results will be valid.

Parameters:

_is_nested (bool) – Internal parameter for tracking recursion depth. Do not set manually.

Returns:

  • is_valid: True if all leaf values are FreqSevSims with matching sim_index,

    or if there are 0 FreqSevSims (trivially consistent)

  • error_message: Empty string if valid, descriptive error message otherwise

  • sim_index: Representative sim_index array if valid and FreqSevSims found,

    None if no FreqSevSims or invalid

Return type:

tuple[bool, str, ndarray[tuple[Any, ...], dtype[Any]] | None]

Example

>>> freq_sev_1 = FreqSevSims([0, 1, 2], [10, 20, 30], 3)
>>> freq_sev_2 = FreqSevSims([0, 1, 2], [15, 25, 35], 3)
>>> var = ProteusVariable(
...     "losses", {"fire": freq_sev_1, "flood": freq_sev_2}
... )
>>> is_valid, msg, sim_idx = var.validate_freqsev_consistency()
>>> if is_valid:
...     total = var.sum()  # Safe to sum
classmethod from_csv(file_name, dim_name, values_column, simulation_column='Simulation')[source]

Import a ProteusVariable from a CSV file.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values regardless of intended type - Cannot preserve generic type information through deserialization - No support for nested ProteusVariable structures

Parameters:
  • file_name (str) – Path to the CSV file to read

  • dim_name (str) – Name of the dimension column in the CSV

  • values_column (str) – Name of the column containing the values

  • simulation_column (str) – Name of the column containing simulation indices

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values loaded from the CSV

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_dict(data)[source]

Create a ProteusVariable from a dictionary.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values from float lists - Cannot preserve generic type information - No support for nested structures or other value types

Parameters:

data (dict[str, list[float]]) – Dictionary mapping dimension labels to lists of float values

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values created from the data

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_series(data)[source]

Create a ProteusVariable from a pandas Series.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Creates scalar values, not StochasticScalar - Cannot preserve generic type information - Limited to single simulation (n_sims=1)

Parameters:

data (Series) – Pandas Series with values to load

Return type:

ProteusVariable[float]

Returns:

ProteusVariable with scalar values from the Series

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

correlation_matrix(correlation_type='spearman')[source]

Compute correlation matrix between variables.

Return type:

list[list[float]]

show_histogram(title=None)[source]

Show a histogram of the variable values.

Parameters:

title (str | None) – The title of the histogram. If None, no title is set.

Return type:

None

show_cdf(title=None)[source]

Plot the cumulative distribution function (cdf) of the variable values.

Parameters:

title (str | None) – Optional title for the cdf. If None, no title is set.

Return type:

None

ProteusVariable

class pal.variables.ProteusVariable(dim_name, values)[source]

Bases: Generic[T]

A generic, homogeneous container for multivariate variables in simulations.

ProteusVariable is a hierarchical structure that holds multiple variables of the SAME type (homogeneous container). Each instance must contain either all scalars, all vectors (like StochasticScalar), or all nested ProteusVariables - but never a mix of different types.

Type Parameter:
T: The type of values stored. By convention, T should be a ScalarOrVector

type (NumericLike | VectorLike), though the parameter is unconstrained to allow flexible type inference. Usage with non-ScalarOrVector types may not be fully supported by all operations.

Key Features: - Homogeneous: All values in a single instance must be the same type.

Like List[T], you cannot mix types within one container.

  • Type Safety: Operations like mean() return type T, preserving type information through the computation.

  • Nesting: ProteusVariable containing ProteusVariable enables hierarchical data structures (e.g., risks by region by peril)

  • Dictionary Access: Sub-elements accessed via [] notation with string keys or integer indices

Examples

>>> # Homogeneous scalar container
>>> scalar_risks = ProteusVariable(
...     dim_name="risk_amounts",
...     values={"fire": 100000, "flood": 200000}  # All int
... )
>>> # Homogeneous vector container
>>> vector_risks = ProteusVariable(
...     dim_name="stochastic_losses",
...     values={
...         "fire": StochasticScalar([100, 200, 300]),
...         "flood": StochasticScalar([150, 250, 350])
...     }  # All StochasticScalar
... )
>>> # Homogeneous nested container
>>> nested_risks = ProteusVariable(
...     dim_name="regions",
...     values={
...         "north": scalar_risks,
...         "south": scalar_risks
...     }  # All ProteusVariable instances
... )
>>> # INVALID - mixing types not allowed
>>> # mixed = ProteusVariable(values={"a": 100, "b": StochasticScalar([1])})
>>> # This would violate homogeneity and cause type errors

Note: Statistical operations should be performed using numpy and scipy functions directly on ProteusVariable instances. For example: - Use np.percentile(variable, p) - Use np.mean(variable) - Use pal.stats.tvar(variable, p)

__init__(dim_name, values)[source]

Initialize a ProteusVariable.

Parameters:
  • dim_name (str) – Name of the dimension.

  • values (dict[str, TypeVar(T)]) – A dict containing variables that must support PAL variable operations.

Raises:

TypeError – If values is not a mapping type.

dim_name: str
values: dict[str, T]
dimensions: list[str]
count(value)[source]

Count occurrences of value in the container.

Required for Sequence protocol compatibility.

Return type:

int

index(value, start=0, stop=None)[source]

Return index of first occurrence of value.

Required for Sequence protocol compatibility.

Raises:

ValueError – If value is not found.

Return type:

int

get_value_at_sim(sim_no)[source]

Get values at specific simulation number(s).

Parameters:

sim_no (int | StochasticScalar) – Simulation index(es) to extract. Can be a single numeric value, a list of integers, or a VectorLike object such as StochasticScalar.

Return type:

ProteusVariable[Union[TypeVar(T), StochasticScalar]]

Returns:

A new ProteusVariable with values at the specified simulation indices.

upsample(n_sims)[source]

Upsample the variable to the specified number of simulations.

Return type:

ProteusVariable[TypeVar(T)]

sum()[source]

Return the sum across the outer dimension.

Return type:

TypeVar(T)

validate_freqsev_consistency(_is_nested=False)[source]

Validate that all FreqSevSims have consistent sim_index.

When a ProteusVariable contains multiple FreqSevSims objects, operations like sum() or aggregation require that all FreqSevSims have identical simulation indices for meaningful results. This method recursively checks for that consistency across nested ProteusVariable structures.

All leaf values in the ProteusVariable tree must be FreqSevSims with matching simulation indices. Nested ProteusVariable structures are supported and will be recursively validated.

Use this validation before performing aggregation operations on ProteusVariable instances containing FreqSevSims to ensure the results will be valid.

Parameters:

_is_nested (bool) – Internal parameter for tracking recursion depth. Do not set manually.

Returns:

  • is_valid: True if all leaf values are FreqSevSims with matching sim_index,

    or if there are 0 FreqSevSims (trivially consistent)

  • error_message: Empty string if valid, descriptive error message otherwise

  • sim_index: Representative sim_index array if valid and FreqSevSims found,

    None if no FreqSevSims or invalid

Return type:

tuple[bool, str, ndarray[tuple[Any, ...], dtype[Any]] | None]

Example

>>> freq_sev_1 = FreqSevSims([0, 1, 2], [10, 20, 30], 3)
>>> freq_sev_2 = FreqSevSims([0, 1, 2], [15, 25, 35], 3)
>>> var = ProteusVariable(
...     "losses", {"fire": freq_sev_1, "flood": freq_sev_2}
... )
>>> is_valid, msg, sim_idx = var.validate_freqsev_consistency()
>>> if is_valid:
...     total = var.sum()  # Safe to sum
classmethod from_csv(file_name, dim_name, values_column, simulation_column='Simulation')[source]

Import a ProteusVariable from a CSV file.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values regardless of intended type - Cannot preserve generic type information through deserialization - No support for nested ProteusVariable structures

Parameters:
  • file_name (str) – Path to the CSV file to read

  • dim_name (str) – Name of the dimension column in the CSV

  • values_column (str) – Name of the column containing the values

  • simulation_column (str) – Name of the column containing simulation indices

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values loaded from the CSV

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_dict(data)[source]

Create a ProteusVariable from a dictionary.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values from float lists - Cannot preserve generic type information - No support for nested structures or other value types

Parameters:

data (dict[str, list[float]]) – Dictionary mapping dimension labels to lists of float values

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values created from the data

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_series(data)[source]

Create a ProteusVariable from a pandas Series.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Creates scalar values, not StochasticScalar - Cannot preserve generic type information - Limited to single simulation (n_sims=1)

Parameters:

data (Series) – Pandas Series with values to load

Return type:

ProteusVariable[float]

Returns:

ProteusVariable with scalar values from the Series

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

correlation_matrix(correlation_type='spearman')[source]

Compute correlation matrix between variables.

Return type:

list[list[float]]

show_histogram(title=None)[source]

Show a histogram of the variable values.

Parameters:

title (str | None) – The title of the histogram. If None, no title is set.

Return type:

None

show_cdf(title=None)[source]

Plot the cumulative distribution function (cdf) of the variable values.

Parameters:

title (str | None) – Optional title for the cdf. If None, no title is set.

Return type:

None

StochasticScalar

Stochastic scalar variables for Monte Carlo simulation.

Provides the StochasticScalar class for representing and manipulating scalar-valued stochastic variables in actuarial and risk modeling applications. Supports arithmetic operations, statistical functions, and numpy integration.

class pal.stochastic_scalar.StochasticScalar(values)[source]

Bases: ProteusStochasticVariable

A class to represent a single scalar variable in a simulation.

coupled_variable_group: CouplingGroup
__init__(values)[source]

Initialize a stochastic scalar.

Parameters:

values (TypeAliasType) – An array of values that describe the distribution for the scalar variable.

values: ndarray[tuple[Any, ...], dtype[number[Any]]]
n_sims: int = None

The number of simulations in the variable.

property ranks: StochasticScalar

Return the ranks of the variable.

tolist()[source]

Convert the values to a Python list.

Return type:

list[Union[float, int, number]]

mean()[source]

Return the mean of the variable across the simulation dimension.

Return type:

float

sum()[source]

Return the sum of the variable across the simulation dimension.

Return type:

float

all()

Check if all values in the variable are True (non-zero).

Return type:

bool

Returns:

True if all values are non-zero, False otherwise.

any()

Check if any value in the variable is True (non-zero).

Return type:

bool

Returns:

True if any value is non-zero, False otherwise.

astype(dtype)

Convert the underlying values to a specified dtype.

Parameters:

dtype (dtype[Any] | type[Any]) – The data type to convert to.

Return type:

ndarray[tuple[Any, ...], dtype[Any]]

Returns:

A new numpy array with the specified dtype.

std()[source]

Return the standard deviation across the simulation dimension.

Return type:

float

percentile(p)[source]

Return the percentile of the variable across the simulation dimension.

Parameters:

p (Union[float, int, number, list[Union[float, int, number]]]) – The percentile level (between 0 and 100).

Return type:

Union[float, int, number, list[Union[float, int, number]]]

Returns:

The percentile value.

tvar(p)[source]

Calculate the Tail Value at Risk (TVaR) at a given percentile.

Parameters:

p (Union[float, int, number, list[Union[float, int, number]]]) – The percentile level (between 0 and 100) to calculate TVaR.

Return type:

Union[float, int, number, list[Union[float, int, number]]]

Returns:

The TVaR value as a float.

upsample(n_sims)[source]

Increase the number of simulations in the variable.

Return type:

Self

show_histogram(title=None)[source]

Show a histogram of the variable.

Parameters:

title (str | None) – Title of the histogram plot. Defaults to None.

Return type:

None

show_cdf(title=None)[source]

Show a plot of the cumulative distribution function (cdf) of the variable.

Parameters:

title (str | None) – Title of the cdf plot. Defaults to None.

Return type:

None

class pal.stochastic_scalar.StochasticScalar(values)[source]

Bases: ProteusStochasticVariable

A class to represent a single scalar variable in a simulation.

coupled_variable_group: CouplingGroup
__init__(values)[source]

Initialize a stochastic scalar.

Parameters:

values (TypeAliasType) – An array of values that describe the distribution for the scalar variable.

values: ndarray[tuple[Any, ...], dtype[number[Any]]]
n_sims: int = None

The number of simulations in the variable.

property ranks: StochasticScalar

Return the ranks of the variable.

tolist()[source]

Convert the values to a Python list.

Return type:

list[Union[float, int, number]]

mean()[source]

Return the mean of the variable across the simulation dimension.

Return type:

float

sum()[source]

Return the sum of the variable across the simulation dimension.

Return type:

float

all()

Check if all values in the variable are True (non-zero).

Return type:

bool

Returns:

True if all values are non-zero, False otherwise.

any()

Check if any value in the variable is True (non-zero).

Return type:

bool

Returns:

True if any value is non-zero, False otherwise.

astype(dtype)

Convert the underlying values to a specified dtype.

Parameters:

dtype (dtype[Any] | type[Any]) – The data type to convert to.

Return type:

ndarray[tuple[Any, ...], dtype[Any]]

Returns:

A new numpy array with the specified dtype.

std()[source]

Return the standard deviation across the simulation dimension.

Return type:

float

percentile(p)[source]

Return the percentile of the variable across the simulation dimension.

Parameters:

p (Union[float, int, number, list[Union[float, int, number]]]) – The percentile level (between 0 and 100).

Return type:

Union[float, int, number, list[Union[float, int, number]]]

Returns:

The percentile value.

tvar(p)[source]

Calculate the Tail Value at Risk (TVaR) at a given percentile.

Parameters:

p (Union[float, int, number, list[Union[float, int, number]]]) – The percentile level (between 0 and 100) to calculate TVaR.

Return type:

Union[float, int, number, list[Union[float, int, number]]]

Returns:

The TVaR value as a float.

upsample(n_sims)[source]

Increase the number of simulations in the variable.

Return type:

Self

show_histogram(title=None)[source]

Show a histogram of the variable.

Parameters:

title (str | None) – Title of the histogram plot. Defaults to None.

Return type:

None

show_cdf(title=None)[source]

Show a plot of the cumulative distribution function (cdf) of the variable.

Parameters:

title (str | None) – Title of the cdf plot. Defaults to None.

Return type:

None