Skip to content

Decision Rules

decision_rules

Functions:

softmax

softmax(value: ArrayLike, temperature: float = 1) -> ArrayLike

Softmax function, with optional temperature parameter.

In equation form, this is:

\[ P(a) = \frac{e^{Q(a) / \tau}}{\sum_{b} e^{Q(b) / \tau}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and au is the temperature parameter.

Note that the value of the temperature parameter will depend on the range of the values of the Q function.

Parameters:

  • value

    (ArrayLike) –

    Array of values to apply softmax to, of shape (n_trials, n_bandits)

  • temperature

    (float, default: 1 ) –

    Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Higher values make choices less deterministic. Defaults to 1.

Returns:

  • ArrayLike ( ArrayLike ) –

    Choice probabilities, of shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
@jax.jit
def softmax(value: ArrayLike, temperature: float = 1) -> ArrayLike:
    """
    Softmax function, with optional temperature parameter.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{Q(a) / \\tau}}{\\sum_{b} e^{Q(b) / \\tau}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `\tau` is the
    temperature parameter.

    Note that the value of the temperature parameter will
    depend on the range of the values of the Q function.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Higher values make choices less
            deterministic. Defaults to 1.

    Returns:
        ArrayLike: Choice probabilities, of shape (n_trials, n_bandits)
    """

    return (jnp.exp(value / temperature)) / (
        jnp.sum(jnp.exp(value / temperature), axis=1)[:, None]
    )

softmax_inverse_temperature

softmax_inverse_temperature(value: ArrayLike, inverse_temperature: float = 1) -> ArrayLike

Softmax function, with optional inverse temperature parameter.

In equation form, this is:

\[ P(a) = \frac{e^{\beta \cdot Q(a)}}{\sum_{b} e^{\beta \cdot Q(b)}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and beta is the inverse temperature parameter.

Note that the value of the inverse temperature parameter will depend on the range of the values of the Q function.

Parameters:

  • value

    (ArrayLike) –

    Array of values to apply softmax to, of shape (n_trials, n_bandits)

  • inverse_temperature

    (float, default: 1 ) –

    Softmax inverse temperature, in range [0, inf]. Note that this is inverse temperature rather than temperature; values are multiplied by this value. Higher values make choices more deterministic. Defaults to 1.

Source code in behavioural_modelling/decision_rules.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
@jax.jit
def softmax_inverse_temperature(
    value: ArrayLike, inverse_temperature: float = 1
) -> ArrayLike:
    """
    Softmax function, with optional inverse temperature parameter.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{\\beta \\cdot Q(a)}}{\\sum_{b} e^{\\beta \\cdot Q(b)}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `beta` is the
    inverse temperature parameter.

    Note that the value of the inverse temperature parameter will
    depend on the range of the values of the Q function.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        inverse_temperature (float, optional): Softmax inverse temperature, in
            range [0, inf]. Note that this is inverse temperature rather than
            temperature; values are multiplied by this value. Higher values
            make choices more deterministic. Defaults to 1.
    """
    return (jnp.exp(inverse_temperature * value)) / (
        jnp.sum(jnp.exp(inverse_temperature * value), axis=1)[:, None]
    )

softmax_stickiness

softmax_stickiness(value: ArrayLike, temperature: float = 1.0, stickiness: float = 0.0, prev_choice: Optional[ArrayLike] = None) -> ArrayLike

Softmax function with choice stickiness, and optional temperature parameter.

The standard softmax function is:

\[ P(a) = \frac{e^{Q(a) / \tau}}{\sum_{b} e^{Q(b) / \tau}} \]

With stickiness added:

\[ P(a) = \frac{e^{(Q(a) + \kappa \cdot same(a, a_{t-1}))/\tau}} {\sum_{b} e^{(Q(b) + \kappa \cdot same(b, a_{t-1}))/\tau}} \]
  • \(P(a)\) is the probability of choosing action \(a\)
  • \(Q(a)\) is the value of action \(a\)
  • \(\beta\) is the temperature parameter
  • \(\kappa\) is the stickiness parameter
  • \(same(a, a_{t-1})\) is 1 if \(a\) matches the previous choice, 0 otherwise

Parameters:

  • value

    (ArrayLike) –

    Array of values to apply softmax to, shape (n_trials, n_bandits). Note that this does not account for trial-wise dependencies, so each trial is treated independently (i.e., we use precomputed choices, therefore the precomputed choice on trial t-1 can influence the choice on trial t, but this altered choice likelihood on trial t will not affect any subsequent trials since we rely on the precomputed choices provided). This can be useful to apply the same stickiness to all trials, but additional code will be required to account for trial-wise dependencies (i.e., the choice on trial t-1) influencing the choice on trial t, and this subsequently influencing trials t+1 etc.).

  • temperature

    (float, default: 1.0 ) –

    Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Higher values make choices less deterministic. Defaults to 1.0.

  • stickiness

    (float, default: 0.0 ) –

    Weight given to previous choices, range (-inf, inf). Positive values increase probability of repeating choices. Defaults to 0.0

  • prev_choice

    (ArrayLike, default: None ) –

    One-hot encoded previous choices, shape (n_trials, n_bandits). Defaults to None.

Returns:

  • ArrayLike ( ArrayLike ) –

    Choice probabilities, shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
@jax.jit
def softmax_stickiness(
    value: ArrayLike,
    temperature: float = 1.0,
    stickiness: float = 0.0,
    prev_choice: Optional[ArrayLike] = None,
) -> ArrayLike:
    """
    Softmax function with choice stickiness, and optional temperature
    parameter.

    The standard softmax function is:

    $$
    P(a) = \\frac{e^{Q(a) / \\tau}}{\\sum_{b} e^{Q(b) / \\tau}}
    $$

    With stickiness added:

    $$
    P(a) = \\frac{e^{(Q(a) + \\kappa \\cdot same(a, a_{t-1}))/\\tau}}
    {\\sum_{b} e^{(Q(b) + \\kappa \\cdot same(b, a_{t-1}))/\\tau}}
    $$

    - $P(a)$ is the probability of choosing action $a$
    - $Q(a)$ is the value of action $a$
    - $\\beta$ is the temperature parameter
    - $\kappa$ is the stickiness parameter
    - $same(a, a_{t-1})$ is 1 if $a$ matches the previous choice, 0 otherwise

    Args:
        value (ArrayLike): Array of values to apply softmax to, shape
            `(n_trials, n_bandits)`. 
            Note that this **does not**
            account for trial-wise dependencies, so each trial is treated
            independently (i.e., we use precomputed choices, therefore the 
            precomputed choice on trial `t-1` can influence the choice on 
            trial `t`, but this altered choice likelihood on trial `t` will not
            affect any subsequent trials since we rely on the precomputed
            choices provided).
            This can be useful to apply the same stickiness to
            all trials, but additional code will be required to account for
            trial-wise dependencies (i.e., the choice on trial `t-1`)
            influencing the choice on trial `t`, and this subsequently
            influencing trials `t+1` etc.).
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Higher values
            make choices less deterministic. Defaults to 1.0.
        stickiness (float, optional): Weight given to previous choices, range
            (-inf, inf). Positive values increase probability of repeating
            choices. Defaults to 0.0
        prev_choice (ArrayLike, optional): One-hot encoded previous choices,
            shape (n_trials, n_bandits). Defaults to None.

    Returns:
        ArrayLike: Choice probabilities, shape (n_trials, n_bandits)
    """

    sticky_value = value + stickiness * prev_choice

    return (jnp.exp(sticky_value / temperature)) / (
        jnp.sum(jnp.exp(sticky_value / temperature), axis=1)[:, None]
    )

softmax_stickiness_inverse_temperature

softmax_stickiness_inverse_temperature(value: ArrayLike, inverse_temperature: float = 1.0, stickiness: float = 0.0, prev_choice: Optional[ArrayLike] = None) -> ArrayLike

Softmax function with choice stickiness, and optional inverse temperature parameter.

The standard softmax function is:

\[ P(a) = \frac{e^{\beta \cdot Q(a)}}{\sum_{b} e^{\beta \cdot Q(b)}} \]

With stickiness added:

\[ P(a) = \frac{e^{(Q(a) + \kappa \cdot same(a, a_{t-1}))/\tau}} {\sum_{b} e^{(Q(b) + \kappa \cdot same(b, a_{t-1}))/\tau}} \]
  • \(P(a)\) is the probability of choosing action \(a\)
  • \(Q(a)\) is the value of action \(a\)
  • \(\beta\) is the inverse temperature parameter
  • \(\kappa\) is the stickiness parameter
  • \(same(a, a_{t-1})\) is 1 if \(a\) matches the previous choice, 0 otherwise

Parameters:

  • value

    (ArrayLike) –

    Array of values to apply softmax to, shape (n_trials, n_bandits). Note that this does not account for trial-wise dependencies, so each trial is treated independently (i.e., we use precomputed choices, therefore the precomputed choice on trial t-1 can influence the choice on trial t, but this altered choice likelihood on trial t will not affect any subsequent trials since we rely on the precomputed choices provided). This can be useful to apply the same stickiness to all trials, but additional code will be required to account for trial-wise dependencies (i.e., the choice on trial t-1) influencing the choice on trial t, and this subsequently influencing trials t+1 etc.).

  • inverse_temperature

    (float, default: 1.0 ) –

    Softmax inverse temperature, range [0, inf]. Higher values make choices more deterministic. Defaults to 1.0

  • stickiness

    (float, default: 0.0 ) –

    Weight given to previous choices, range (-inf, inf). Positive values increase probability of repeating choices. Defaults to 0.0

  • prev_choice

    (ArrayLike, default: None ) –

    One-hot encoded previous choices, shape (n_trials, n_bandits). Defaults to None.

Returns:

  • ArrayLike ( ArrayLike ) –

    Choice probabilities, shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
@jax.jit
def softmax_stickiness_inverse_temperature(
    value: ArrayLike,
    inverse_temperature: float = 1.0,
    stickiness: float = 0.0,
    prev_choice: Optional[ArrayLike] = None,
) -> ArrayLike:
    """
    Softmax function with choice stickiness, and optional inverse temperature
    parameter.

    The standard softmax function is:

    $$
    P(a) = \\frac{e^{\\beta \\cdot Q(a)}}{\\sum_{b} e^{\\beta \\cdot Q(b)}}
    $$

    With stickiness added:

    $$
    P(a) = \\frac{e^{(Q(a) + \\kappa \\cdot same(a, a_{t-1}))/\\tau}}
    {\\sum_{b} e^{(Q(b) + \\kappa \\cdot same(b, a_{t-1}))/\\tau}}
    $$

    - $P(a)$ is the probability of choosing action $a$
    - $Q(a)$ is the value of action $a$
    - $\\beta$ is the inverse temperature parameter
    - $\kappa$ is the stickiness parameter
    - $same(a, a_{t-1})$ is 1 if $a$ matches the previous choice, 0 otherwise

    Args:
        value (ArrayLike): Array of values to apply softmax to, shape
            `(n_trials, n_bandits)`. 
            Note that this **does not**
            account for trial-wise dependencies, so each trial is treated
            independently (i.e., we use precomputed choices, therefore the 
            precomputed choice on trial `t-1` can influence the choice on 
            trial `t`, but this altered choice likelihood on trial `t` will not
            affect any subsequent trials since we rely on the precomputed
            choices provided).
            This can be useful to apply the same stickiness to
            all trials, but additional code will be required to account for
            trial-wise dependencies (i.e., the choice on trial `t-1`)
            influencing the choice on trial `t`, and this subsequently
            influencing trials `t+1` etc.).
        inverse_temperature (float, optional): Softmax inverse temperature,
            range [0, inf]. Higher values make choices more deterministic.
            Defaults to 1.0
        stickiness (float, optional): Weight given to previous choices, range
            (-inf, inf). Positive values increase probability of repeating
            choices. Defaults to 0.0
        prev_choice (ArrayLike, optional): One-hot encoded previous choices,
            shape (n_trials, n_bandits). Defaults to None.

    Returns:
        ArrayLike: Choice probabilities, shape (n_trials, n_bandits)
    """

    sticky_value = value + stickiness * prev_choice

    return (jnp.exp(inverse_temperature * sticky_value)) / (
        jnp.sum(jnp.exp(inverse_temperature * sticky_value), axis=1)[:, None]
    )

softmax_subtract_max

softmax_subtract_max(value: ArrayLike, temperature: float = 1) -> ArrayLike

Softmax function, with optional temperature parameter.

Subtracts the maximum value before applying softmax to avoid overflow.

In equation form, this is:

\[ P(a) = \frac{e^{(Q(a) - \max_{b} Q(b)) / \tau}} {\sum_{b} e^{(Q(b) - \max_{c} Q(c)) / \tau}} \]

Where P(a) is the probability of choosing action a, Q(a) is the value of action a, and au is the temperature parameter.

Parameters:

  • value

    (ArrayLike) –

    Array of values to apply softmax to, of shape (n_trials, n_bandits)

  • temperature

    (float, default: 1 ) –

    Softmax temperature, in range [0, inf]. Note that this is temperature rather than inverse temperature; values are divided by this value. Defaults to 1.

Returns:

  • ArrayLike ( ArrayLike ) –

    Choice probabilities, of shape (n_trials, n_bandits)

Source code in behavioural_modelling/decision_rules.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
@jax.jit
def softmax_subtract_max(
    value: ArrayLike, temperature: float = 1
) -> ArrayLike:
    """
    Softmax function, with optional temperature parameter.

    Subtracts the maximum value before applying softmax to avoid overflow.

    In equation form, this is:

    $$
    P(a) = \\frac{e^{(Q(a) - \max_{b} Q(b)) / \\tau}}
    {\\sum_{b} e^{(Q(b) - \max_{c} Q(c)) / \\tau}}
    $$

    Where `P(a)` is the probability of choosing action `a`,
    `Q(a)` is the value of action `a`, and `\tau` is the
    temperature parameter.

    Args:
        value (ArrayLike): Array of values to apply softmax to, of shape
            (n_trials, n_bandits)
        temperature (float, optional): Softmax temperature, in range [0, inf].
            Note that this is temperature rather than inverse temperature;
            values are divided by this value. Defaults to 1.

    Returns:
        ArrayLike: Choice probabilities, of shape (n_trials, n_bandits)
    """
    # Subtract max value to avoid overflow
    return (jnp.exp((value - value.max(axis=1)[:, None]) / temperature)) / (
        jnp.sum(
            jnp.exp((value - value.max(axis=1)[:, None]) / temperature), axis=1
        )[:, None]
    )