pal.variables module

Multi-dimensional stochastic variables for actuarial modeling.

This module provides the ProteusVariable class, which represents multi-dimensional stochastic variables commonly used in actuarial and risk modeling. A ProteusVariable can contain different types of stochastic objects across multiple dimensions, enabling complex risk factor modeling.

Key features: - Multi-dimensional stochastic variables with named dimensions - Support for various stochastic types (StochasticScalar, FreqSevSims, etc.) - Mathematical operations across dimensions and simulations - Correlation analysis and upsampling capabilities - Export functionality for analysis and reporting

NOTE: The serialization/deserialization methods (from_csv, from_dict, from_series)

are currently incomplete and have significant limitations. A comprehensive codec system is planned to address these issues. See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

The ProteusVariable is designed for actuarial applications such as: - Multi-factor risk modeling (e.g., frequency, severity, inflation) - Portfolio-level aggregation across risk dimensions - Scenario analysis with correlated risk factors - Capital modeling with interdependent variables

Example

>>> from pal.stochastic_scalar import StochasticScalar
>>> from pal.frequency_severity import FreqSevSims
>>>
>>> # Create a multi-dimensional risk variable
>>> risk_var = ProteusVariable(
...     dim_name="insurance_risk",
...     values={
...         "frequency": StochasticScalar([10, 12, 8, 15]),
...         "severity": StochasticScalar([5000, 6000, 4500, 7000]),
...         "expense_ratio": StochasticScalar([0.3, 0.32, 0.28, 0.35])
...     }
... )
>>> total_cost = (
...     risk_var["frequency"]
...     * risk_var["severity"]
...     * (1 + risk_var["expense_ratio"])
... )
class pal.variables.ProteusVariable(dim_name, values)[source]

Bases: Generic[T]

A generic, homogeneous container for multivariate variables in simulations.

ProteusVariable is a hierarchical structure that holds multiple variables of the SAME type (homogeneous container). Each instance must contain either all scalars, all vectors (like StochasticScalar), or all nested ProteusVariables - but never a mix of different types.

Type Parameter:
T: The type of values stored. By convention, T should be a ScalarOrVector

type (NumericLike | VectorLike), though the parameter is unconstrained to allow flexible type inference. Usage with non-ScalarOrVector types may not be fully supported by all operations.

Key Features: - Homogeneous: All values in a single instance must be the same type.

Like List[T], you cannot mix types within one container.

  • Type Safety: Operations like mean() return type T, preserving type information through the computation.

  • Nesting: ProteusVariable containing ProteusVariable enables hierarchical data structures (e.g., risks by region by peril)

  • Dictionary Access: Sub-elements accessed via [] notation with string keys or integer indices

Examples

>>> # Homogeneous scalar container
>>> scalar_risks = ProteusVariable(
...     dim_name="risk_amounts",
...     values={"fire": 100000, "flood": 200000}  # All int
... )
>>> # Homogeneous vector container
>>> vector_risks = ProteusVariable(
...     dim_name="stochastic_losses",
...     values={
...         "fire": StochasticScalar([100, 200, 300]),
...         "flood": StochasticScalar([150, 250, 350])
...     }  # All StochasticScalar
... )
>>> # Homogeneous nested container
>>> nested_risks = ProteusVariable(
...     dim_name="regions",
...     values={
...         "north": scalar_risks,
...         "south": scalar_risks
...     }  # All ProteusVariable instances
... )
>>> # INVALID - mixing types not allowed
>>> # mixed = ProteusVariable(values={"a": 100, "b": StochasticScalar([1])})
>>> # This would violate homogeneity and cause type errors

Note: Statistical operations should be performed using numpy and scipy functions directly on ProteusVariable instances. For example: - Use np.percentile(variable, p) - Use np.mean(variable) - Use pal.stats.tvar(variable, p)

__init__(dim_name, values)[source]

Initialize a ProteusVariable.

Parameters:
  • dim_name (str) – Name of the dimension.

  • values (dict[str, TypeVar(T)]) – A dict containing variables that must support PAL variable operations.

Raises:

TypeError – If values is not a mapping type.

dim_name: str
values: dict[str, T]
dimensions: list[str]
count(value)[source]

Count occurrences of value in the container.

Required for Sequence protocol compatibility.

Return type:

int

index(value, start=0, stop=None)[source]

Return index of first occurrence of value.

Required for Sequence protocol compatibility.

Raises:

ValueError – If value is not found.

Return type:

int

get_value_at_sim(sim_no)[source]

Get values at specific simulation number(s).

Parameters:

sim_no (int | StochasticScalar) – Simulation index(es) to extract. Can be a single numeric value, a list of integers, or a VectorLike object such as StochasticScalar.

Return type:

ProteusVariable[Union[TypeVar(T), StochasticScalar]]

Returns:

A new ProteusVariable with values at the specified simulation indices.

upsample(n_sims)[source]

Upsample the variable to the specified number of simulations.

Return type:

ProteusVariable[TypeVar(T)]

sum()[source]

Return the sum across the outer dimension.

Return type:

TypeVar(T)

validate_freqsev_consistency(_is_nested=False)[source]

Validate that all FreqSevSims have consistent sim_index.

When a ProteusVariable contains multiple FreqSevSims objects, operations like sum() or aggregation require that all FreqSevSims have identical simulation indices for meaningful results. This method recursively checks for that consistency across nested ProteusVariable structures.

All leaf values in the ProteusVariable tree must be FreqSevSims with matching simulation indices. Nested ProteusVariable structures are supported and will be recursively validated.

Use this validation before performing aggregation operations on ProteusVariable instances containing FreqSevSims to ensure the results will be valid.

Parameters:

_is_nested (bool) – Internal parameter for tracking recursion depth. Do not set manually.

Returns:

  • is_valid: True if all leaf values are FreqSevSims with matching sim_index,

    or if there are 0 FreqSevSims (trivially consistent)

  • error_message: Empty string if valid, descriptive error message otherwise

  • sim_index: Representative sim_index array if valid and FreqSevSims found,

    None if no FreqSevSims or invalid

Return type:

tuple[bool, str, ndarray[tuple[Any, ...], dtype[Any]] | None]

Example

>>> freq_sev_1 = FreqSevSims([0, 1, 2], [10, 20, 30], 3)
>>> freq_sev_2 = FreqSevSims([0, 1, 2], [15, 25, 35], 3)
>>> var = ProteusVariable(
...     "losses", {"fire": freq_sev_1, "flood": freq_sev_2}
... )
>>> is_valid, msg, sim_idx = var.validate_freqsev_consistency()
>>> if is_valid:
...     total = var.sum()  # Safe to sum
classmethod from_csv(file_name, dim_name, values_column, simulation_column='Simulation')[source]

Import a ProteusVariable from a CSV file.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values regardless of intended type - Cannot preserve generic type information through deserialization - No support for nested ProteusVariable structures

Parameters:
  • file_name (str) – Path to the CSV file to read

  • dim_name (str) – Name of the dimension column in the CSV

  • values_column (str) – Name of the column containing the values

  • simulation_column (str) – Name of the column containing simulation indices

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values loaded from the CSV

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_dict(data)[source]

Create a ProteusVariable from a dictionary.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Always creates StochasticScalar values from float lists - Cannot preserve generic type information - No support for nested structures or other value types

Parameters:

data (dict[str, list[float]]) – Dictionary mapping dimension labels to lists of float values

Return type:

ProteusVariable[StochasticScalar]

Returns:

ProteusVariable with StochasticScalar values created from the data

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

classmethod from_series(data)[source]

Create a ProteusVariable from a pandas Series.

This method currently has significant limitations and will be replaced with a more comprehensive serialization system.

Current Limitations: - Only supports one-dimensional variables - Creates scalar values, not StochasticScalar - Cannot preserve generic type information - Limited to single simulation (n_sims=1)

Parameters:

data (Series) – Pandas Series with values to load

Return type:

ProteusVariable[float]

Returns:

ProteusVariable with scalar values from the Series

TODO: Implement comprehensive codec system for proper serialization

See: https://github.com/ProteusLLP/proteusllp-actuarial-library/issues/22

correlation_matrix(correlation_type='spearman')[source]

Compute correlation matrix between variables.

Return type:

list[list[float]]

show_histogram(title=None)[source]

Show a histogram of the variable values.

Parameters:

title (str | None) – The title of the histogram. If None, no title is set.

Return type:

None

show_cdf(title=None)[source]

Plot the cumulative distribution function (cdf) of the variable values.

Parameters:

title (str | None) – Optional title for the cdf. If None, no title is set.

Return type:

None