Decision Rules

decision_rules

Functions:

softmax –

Softmax function, with optional temperature parameter.
softmax_inverse_temperature –

Softmax function, with optional inverse temperature parameter.
softmax_stickiness –

Softmax function with choice stickiness, and optional temperature
softmax_stickiness_inverse_temperature –

Softmax function with choice stickiness, and optional inverse temperature
softmax_subtract_max –

Softmax function, with optional temperature parameter.

softmax

softmax(value: ArrayLike, temperature: float = 1) -> ArrayLike

Softmax function, with optional temperature parameter.

In equation form, this is:

\[ P(a) = \frac{e^{Q(a) / \tau}}{\sum_{b} e^{Q(b) / \tau}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and au is the temperature parameter.

Note that the value of the temperature parameter will depend on the range of the values of the Q function.

Parameters:

value
(ArrayLike) –

Array of values to apply softmax to, of shape (n_trials, n_bandits)
temperature
(float, default: 1 ) –

Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Higher values make choices less deterministic. Defaults to 1.

Returns:

ArrayLike ( ArrayLike ) –

Choice probabilities, of shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py

@jax.jit
def softmax(value: ArrayLike, temperature: float = 1) -> ArrayLike:
    """
    Softmax function, with optional temperature parameter.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{Q(a) / \\tau}}{\\sum_{b} e^{Q(b) / \\tau}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `\tau` is the
    temperature parameter.

    Note that the value of the temperature parameter will
    depend on the range of the values of the Q function.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Higher values make choices less
            deterministic. Defaults to 1.

    Returns:
        ArrayLike: Choice probabilities, of shape (n_trials, n_bandits)
    """

    return (jnp.exp(value / temperature)) / (
        jnp.sum(jnp.exp(value / temperature), axis=1)[:, None]
    )

softmax_inverse_temperature

softmax_inverse_temperature(value: ArrayLike, inverse_temperature: float = 1) -> ArrayLike

Softmax function, with optional inverse temperature parameter.

In equation form, this is:

\[ P(a) = \frac{e^{\beta \cdot Q(a)}}{\sum_{b} e^{\beta \cdot Q(b)}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and beta is the inverse temperature parameter.

Note that the value of the inverse temperature parameter will depend on the range of the values of the Q function.

Parameters:

value
(ArrayLike) –

Array of values to apply softmax to, of shape (n_trials, n_bandits)
inverse_temperature
(float, default: 1 ) –

Softmax inverse temperature, in range [0, inf]. Note that this is inverse temperature rather than temperature; values are multiplied by this value. Higher values make choices more deterministic. Defaults to 1.

Source code in behavioural_modelling/decision_rules.py

@jax.jit
def softmax_inverse_temperature(
    value: ArrayLike, inverse_temperature: float = 1
) -> ArrayLike:
    """
    Softmax function, with optional inverse temperature parameter.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{\\beta \\cdot Q(a)}}{\\sum_{b} e^{\\beta \\cdot Q(b)}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `beta` is the
    inverse temperature parameter.

    Note that the value of the inverse temperature parameter will
    depend on the range of the values of the Q function.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        inverse_temperature (float, optional): Softmax inverse temperature, in
            range [0, inf]. Note that this is inverse temperature rather than
            temperature; values are multiplied by this value. Higher values
            make choices more deterministic. Defaults to 1.
    """
    return (jnp.exp(inverse_temperature * value)) / (
        jnp.sum(jnp.exp(inverse_temperature * value), axis=1)[:, None]
    )

softmax_stickiness

softmax_stickiness(value: ArrayLike, temperature: float = 1.0, stickiness: float = 0.0, prev_choice: Optional[ArrayLike] = None) -> ArrayLike

Softmax function with choice stickiness, and optional temperature parameter.

The standard softmax function is:

\[ P(a) = \frac{e^{Q(a) / \tau}}{\sum_{b} e^{Q(b) / \tau}} \]

With stickiness added:

\[ P(a) = \frac{e^{(Q(a) + \kappa \cdot same(a, a_{t-1}))/\tau}} {\sum_{b} e^{(Q(b) + \kappa \cdot same(b, a_{t-1}))/\tau}} \]

\(P(a)\) is the probability of choosing action \(a\)
\(Q(a)\) is the value of action \(a\)
\(\beta\) is the temperature parameter
\(\kappa\) is the stickiness parameter
\(same(a, a_{t-1})\) is 1 if \(a\) matches the previous choice, 0 otherwise

Parameters:

value
(ArrayLike) –

Array of values to apply softmax to, shape (n_trials, n_bandits). Note that this does not account for trial-wise dependencies, so each trial is treated independently (i.e., we use precomputed choices, therefore the precomputed choice on trial t-1 can influence the choice on trial t, but this altered choice likelihood on trial t will not affect any subsequent trials since we rely on the precomputed choices provided). This can be useful to apply the same stickiness to all trials, but additional code will be required to account for trial-wise dependencies (i.e., the choice on trial t-1) influencing the choice on trial t, and this subsequently influencing trials t+1 etc.).
temperature
(float, default: 1.0 ) –

Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Higher values make choices less deterministic. Defaults to 1.0.
stickiness
(float, default: 0.0 ) –

Weight given to previous choices, range (-inf, inf). Positive values increase probability of repeating choices. Defaults to 0.0
prev_choice
(ArrayLike, default: None ) –

One-hot encoded previous choices, shape (n_trials, n_bandits). Defaults to None.

Returns:

ArrayLike ( ArrayLike ) –

Choice probabilities, shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py

@jax.jit
def softmax_stickiness(
    value: ArrayLike,
    temperature: float = 1.0,
    stickiness: float = 0.0,
    prev_choice: Optional[ArrayLike] = None,
) -> ArrayLike:
    """
    Softmax function with choice stickiness, and optional temperature
    parameter.

    The standard softmax function is:

    $$
    P(a) = \\frac{e^{Q(a) / \\tau}}{\\sum_{b} e^{Q(b) / \\tau}}
    $$

    With stickiness added:

    $$
    P(a) = \\frac{e^{(Q(a) + \\kappa \\cdot same(a, a_{t-1}))/\\tau}}
    {\\sum_{b} e^{(Q(b) + \\kappa \\cdot same(b, a_{t-1}))/\\tau}}
    $$

    - $P(a)$ is the probability of choosing action $a$
    - $Q(a)$ is the value of action $a$
    - $\\beta$ is the temperature parameter
    - $\kappa$ is the stickiness parameter
    - $same(a, a_{t-1})$ is 1 if $a$ matches the previous choice, 0 otherwise

    Args:
        value (ArrayLike): Array of values to apply softmax to, shape
            `(n_trials, n_bandits)`. 
            Note that this **does not**
            account for trial-wise dependencies, so each trial is treated
            independently (i.e., we use precomputed choices, therefore the 
            precomputed choice on trial `t-1` can influence the choice on 
            trial `t`, but this altered choice likelihood on trial `t` will not
            affect any subsequent trials since we rely on the precomputed
            choices provided).
            This can be useful to apply the same stickiness to
            all trials, but additional code will be required to account for
            trial-wise dependencies (i.e., the choice on trial `t-1`)
            influencing the choice on trial `t`, and this subsequently
            influencing trials `t+1` etc.).
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Higher values
            make choices less deterministic. Defaults to 1.0.
        stickiness (float, optional): Weight given to previous choices, range
            (-inf, inf). Positive values increase probability of repeating
            choices. Defaults to 0.0
        prev_choice (ArrayLike, optional): One-hot encoded previous choices,
            shape (n_trials, n_bandits). Defaults to None.

    Returns:
        ArrayLike: Choice probabilities, shape (n_trials, n_bandits)
    """

    sticky_value = value + stickiness * prev_choice

    return (jnp.exp(sticky_value / temperature)) / (
        jnp.sum(jnp.exp(sticky_value / temperature), axis=1)[:, None]
    )

softmax_stickiness_inverse_temperature

softmax_stickiness_inverse_temperature(value: ArrayLike, inverse_temperature: float = 1.0, stickiness: float = 0.0, prev_choice: Optional[ArrayLike] = None) -> ArrayLike

Softmax function with choice stickiness, and optional inverse temperature parameter.

The standard softmax function is:

\[ P(a) = \frac{e^{\beta \cdot Q(a)}}{\sum_{b} e^{\beta \cdot Q(b)}} \]

With stickiness added:

\[ P(a) = \frac{e^{(Q(a) + \kappa \cdot same(a, a_{t-1}))/\tau}} {\sum_{b} e^{(Q(b) + \kappa \cdot same(b, a_{t-1}))/\tau}} \]

\(P(a)\) is the probability of choosing action \(a\)
\(Q(a)\) is the value of action \(a\)
\(\beta\) is the inverse temperature parameter
\(\kappa\) is the stickiness parameter
\(same(a, a_{t-1})\) is 1 if \(a\) matches the previous choice, 0 otherwise

Parameters:

value
(ArrayLike) –

Array of values to apply softmax to, shape (n_trials, n_bandits). Note that this does not account for trial-wise dependencies, so each trial is treated independently (i.e., we use precomputed choices, therefore the precomputed choice on trial t-1 can influence the choice on trial t, but this altered choice likelihood on trial t will not affect any subsequent trials since we rely on the precomputed choices provided). This can be useful to apply the same stickiness to all trials, but additional code will be required to account for trial-wise dependencies (i.e., the choice on trial t-1) influencing the choice on trial t, and this subsequently influencing trials t+1 etc.).
inverse_temperature
(float, default: 1.0 ) –

Softmax inverse temperature, range [0, inf]. Higher values make choices more deterministic. Defaults to 1.0
stickiness
(float, default: 0.0 ) –

Weight given to previous choices, range (-inf, inf). Positive values increase probability of repeating choices. Defaults to 0.0
prev_choice
(ArrayLike, default: None ) –

One-hot encoded previous choices, shape (n_trials, n_bandits). Defaults to None.

Returns:

ArrayLike ( ArrayLike ) –

Choice probabilities, shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py

@jax.jit
def softmax_stickiness_inverse_temperature(
    value: ArrayLike,
    inverse_temperature: float = 1.0,
    stickiness: float = 0.0,
    prev_choice: Optional[ArrayLike] = None,
) -> ArrayLike:
    """
    Softmax function with choice stickiness, and optional inverse temperature
    parameter.

    The standard softmax function is:

    $$
    P(a) = \\frac{e^{\\beta \\cdot Q(a)}}{\\sum_{b} e^{\\beta \\cdot Q(b)}}
    $$

    With stickiness added:

    $$
    P(a) = \\frac{e^{(Q(a) + \\kappa \\cdot same(a, a_{t-1}))/\\tau}}
    {\\sum_{b} e^{(Q(b) + \\kappa \\cdot same(b, a_{t-1}))/\\tau}}
    $$

    - $P(a)$ is the probability of choosing action $a$
    - $Q(a)$ is the value of action $a$
    - $\\beta$ is the inverse temperature parameter
    - $\kappa$ is the stickiness parameter
    - $same(a, a_{t-1})$ is 1 if $a$ matches the previous choice, 0 otherwise

    Args:
        value (ArrayLike): Array of values to apply softmax to, shape
            `(n_trials, n_bandits)`. 
            Note that this **does not**
            account for trial-wise dependencies, so each trial is treated
            independently (i.e., we use precomputed choices, therefore the 
            precomputed choice on trial `t-1` can influence the choice on 
            trial `t`, but this altered choice likelihood on trial `t` will not
            affect any subsequent trials since we rely on the precomputed
            choices provided).
            This can be useful to apply the same stickiness to
            all trials, but additional code will be required to account for
            trial-wise dependencies (i.e., the choice on trial `t-1`)
            influencing the choice on trial `t`, and this subsequently
            influencing trials `t+1` etc.).
        inverse_temperature (float, optional): Softmax inverse temperature,
            range [0, inf]. Higher values make choices more deterministic.
            Defaults to 1.0
        stickiness (float, optional): Weight given to previous choices, range
            (-inf, inf). Positive values increase probability of repeating
            choices. Defaults to 0.0
        prev_choice (ArrayLike, optional): One-hot encoded previous choices,
            shape (n_trials, n_bandits). Defaults to None.

    Returns:
        ArrayLike: Choice probabilities, shape (n_trials, n_bandits)
    """

    sticky_value = value + stickiness * prev_choice

    return (jnp.exp(inverse_temperature * sticky_value)) / (
        jnp.sum(jnp.exp(inverse_temperature * sticky_value), axis=1)[:, None]
    )

softmax_subtract_max

softmax_subtract_max(value: ArrayLike, temperature: float = 1) -> ArrayLike

Softmax function, with optional temperature parameter.

Subtracts the maximum value before applying softmax to avoid overflow.

In equation form, this is:

\[ P(a) = \frac{e^{(Q(a) - \max_{b} Q(b)) / \tau}} {\sum_{b} e^{(Q(b) - \max_{c} Q(c)) / \tau}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and au is the temperature parameter.

Parameters:

value
(ArrayLike) –

Array of values to apply softmax to, of shape (n_trials, n_bandits)
temperature
(float, default: 1 ) –

Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Defaults to 1.

Returns:

ArrayLike ( ArrayLike ) –

Choice probabilities, of shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py

@jax.jit
def softmax_subtract_max(
    value: ArrayLike, temperature: float = 1
) -> ArrayLike:
    """
    Softmax function, with optional temperature parameter.

    Subtracts the maximum value before applying softmax to avoid overflow.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{(Q(a) - \max_{b} Q(b)) / \\tau}}
    {\\sum_{b} e^{(Q(b) - \max_{c} Q(c)) / \\tau}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `\tau` is the
    temperature parameter.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Defaults to 1.

    Returns:
        ArrayLike: Choice probabilities, of shape (n_trials, n_bandits)
    """
    # Subtract max value to avoid overflow
    return (jnp.exp((value - value.max(axis=1)[:, None]) / temperature)) / (
        jnp.sum(
            jnp.exp((value - value.max(axis=1)[:, None]) / temperature), axis=1
        )[:, None]
    )

Decision Rules

decision_rules

softmax

`value`

`temperature`

softmax_inverse_temperature

`value`

`inverse_temperature`

softmax_stickiness

`value`

`temperature`

`stickiness`

`prev_choice`

softmax_stickiness_inverse_temperature

`value`

`inverse_temperature`

`stickiness`

`prev_choice`

softmax_subtract_max

`value`

`temperature`