The mathematics behind ANCCR
The ANCCR (Jeong et al., 2022) model,
which stands for adjusted net contingency for causal relations, proposes
that mesolimbic dopaminergic conveys an adjusted net contingency for
causal relationships (to biologically meaningful targets). The
mathematics (and logic) behind the model go well beyond what I can cover
here, but for now, it will suffice to say that the model:
- Uses a “Hebbian” mechanism to learn retrospective associations after
experiencing a meaningful causal target.
- Derives prospective associations using Bayes’s rule.
- Combines those associations into contingency terms that represent
dopaminergic activity.
- Uses the sign of dopaminergic activity to strengthen or weaken
causal weights.
- Responds as a function of prospective associations and causal
links.
1 - Maintaining stimulus representations
The degree to which a stimulus \(i\)
at time \(t\) is “active” in memory is
denoted by:
\[
\tag{Eq.1}
E_i(t) = \Sigma_{t_i \leq t}e^{-(\frac{t-t_i}{t\_constant})}
\]
where \(t_i\) are all the time steps
up to time \(t\), and \(t\_constant\) is a time constant (usually
meant to be the inter-reward rate)
2 - Learning stimulus associations:
The model learns retrospective associations after meaningful causal
targets occur. Whether event \(j\) is a
meaningful causal target is given by:
\[
\tag{Eq.2}
\Phi_j = \begin{cases}
1,& \text{if } \Phi_j(t) = 1\\
1,& \text{if }DA_j + \beta_j > \theta\\
0,& \text{otherwise}
\end{cases}
\]
where \(\Phi\) plays the role of an
indicator function, \(DA_j\) is the
total dopamine activity at the time of event \(j\), \(\beta_j\) is the unconditioned value of
event \(j\) and \(\theta\) is a global threshold parameter. Note that
the indicator function is self-preserving: once a stimulus becomes a
meaningful causal target, it does not stop being so.
After stimulus \(j\) is observed,
the predecessor representation contingency, or PRC, for
each stimulus \(i\) is updated via:
\[
\tag{Eq.3}
PRC_{i \leftarrow j} = M_{i \leftarrow j} - M_{i}
\]
where \(M_{i \leftarrow j}\) is the
predecessor representation of \(i\)
given \(j\) has occurred, and \(M_{i}\) is the base rate with which \(i\) occurs. Both of these quantities are
given by:
\[
\tag{Eq.4a}
M_{i \leftarrow j} = M_{i \leftarrow j}' + \Phi_j\alpha(E_{i
\leftarrow j} - M_{i \leftarrow j}')
\]
and
\[
\tag{Eq.4b}
M_{i} = M_{i}' + k\alpha(E_{i} - M_{i}')
\]
where \(M_{i \leftarrow j}'\)
and \(M_{i}'\) are the quantities
before \(j\) was observed, \(k\) and \(\alpha\) are learning rate parameters, and
\(E_{i \leftarrow j}\) is the
eligibility trace of stimulus \(i\) at
the time \(j\) occurs (see Eq. 1).
Then, the PRC can be used to derive the prospective association,
aptly named the successor representation contingency, or
SRC via Bayes rule:
\[
\tag{Eq.5}
SRC_{i \rightarrow j} = PRC_{i \leftarrow j} \frac{M_j}{M_i}
\]
The base rate for \(j\), \(M_j\) is calculated via Eq.4b.
3 - Releasing Dopamine
The model postulates that dopaminergic signaling encodes the adjusted
net contingencies for causal relations between stimuli, or ANCCRs. The
total dopaminergic activity at the time of event \(i\) is equal to:
\[
\tag{Eq.6}
DA_i = \Sigma_j (ANCCR_{i \rightarrow j}\Phi_j)
\]
And the ANCCR from stimulus \(i\) to
stimulus \(j\) is given by:
\[
\tag{Eq.7}
ANCCR_{i \rightarrow j} = NC_{i \leftrightarrow j} CW_{i \rightarrow j}
- \sum_{k \neq i}(ANCCR_{k \leftrightarrow j}\Delta_{k \leftarrow
i}\Phi_{k \leftrightarrow i})
\]
where \(NC_{i \leftrightarrow j}\)
is the net contingency between stimuli \(i\) and \(j\), \(CW_{i
\rightarrow j}\) is the causal weight that \(i\) has with \(j\), \(\Delta_{k
\leftarrow i}\) is the recency of stimulus \(k\) with respect to stimulus \(i\), and \(\Phi_{k \leftrightarrow i}\) is an
indicator function denoting whether \(k\) and \(i\) have a putative causal relationship
with each other.
The net contingency between stimuli \(i\) and \(j\), \(NC_{i
\leftrightarrow j}\), is given by:
\[
\tag{Eq.8}
NC_{i \leftrightarrow j} = wSRC_{i \rightarrow j} + (1-w)PRC_{i
\leftarrow j}
\]
or a weighted sum of successor and predecessor representation
contingencies.
The net contingency is used to calculate the indicator function
above, as:
\[
\tag{Eq.9}
\Phi_{k \leftrightarrow i} = \begin{cases}
1,& \text{if } NC_{i \leftrightarrow j} > \theta\\
0,& \text{otherwise}
\end{cases}
\]
where \(\theta\) is the same
threshold parameter used in Eq.2, and the indicator function for a stimulus
and itself, \(\Phi_{i \leftrightarrow
i}\), is 0.
The recency term, \(\Delta_{k \leftarrow
i}\), is given by:
\[
\tag{Eq.10}
\Delta_{k \leftarrow i} = e^{-(\frac{t_j-t_i}{t\_constant})}
\]
where \(t\_constant\) is the same
parameter used in Eq.1. Note however that Eq.9 does not include the sum
term in Eq. 1. Finally, the causal weight from stimulus \(i\) to stimulus \(j\) is given by:
\[
\tag{Eq.11}
CW_{i \rightarrow j} = CW_{i \rightarrow j}' +
\alpha_{reward}\delta_{i \rightarrow j}
\]
where \(CW_{i \rightarrow j}'\)
is the previous causal weight, \(\alpha_{reward}\) is a learning rate
parameter exclusive for causal weights, and \(\delta_{i \rightarrow j}\) is a delta term
depending on the sign of the total dopaminergic activity, given by:
\[
\tag{Eq.12}
\delta_{i \rightarrow j} = \begin{cases}
CW_{j \rightarrow j} - CW_{i \rightarrow j}, & \text{if } DA_j
\ge 0\\
(0-CW_{i \rightarrow j})\frac{n_i^{-1}\Delta{i \leftarrow j} \Phi_{i
\leftrightarrow j}}{\Sigma_{k \neq j}(n_k^{-1}\Delta_{k \leftarrow j}
\Phi_{k \leftrightarrow j})},& \text{otherwise}
\end{cases}
\]
where \(CW_{j \rightarrow j}\) above
is the reward magnitude of stimulus \(j\). In plain words, when dopaminergic
activity is positive, causal weights (from all present and absent
stimuli) strengthen. Conversely, when dopaminergic activity is negative,
causal weights (from all present and absent stimuli) weaken,
proportional to their normalized frequency and recency (as long as they
have putative causal relations with \(j\)).
4 - Generating responses
Responding in ANCCR is lightly specified. The value of responding
upon presentation of stimulus \(i\) is
given by:
\[
\tag{Eq.13}
Q_i = \Sigma_k(SRC_{i \rightarrow k} CW_{i \rightarrow k})
\]
which can then be mapped onto probabilities via a softmax function.
A diagram
The diagram below shows the dependencies in the model. I am excluding
the indicator functions and parameters for simplicity.
Note
The implementation of this model is a port from the MATLAB code that
Jeong et al. shared in the GitHub repository
associated with their paper. The output of the R model was checked
against the outputs of the MATLAB model, using training routines
(“eventlogs” in their parlance) generated using their MATLAB code. The
training routines generated in calmr
differ somewhat, to
accommodate generality. For example, as of version 0.6.1
,
it is not possible to specify probabilistic relations between cues and
rewards. Instead, it is left to the user to specify an exact probability
via trial numbers (e.g., an 80% reward probability can be specified as
“80A>(US)/20A”). The naming of parameters also differs between
codebases.
References
Jeong, H., Taylor, A., Floeder, J. R., Lohmann, M., Mihalas, S., Wu, B.,
Zhou, M., Burke, D. A., & Namboodiri, V. M. K. (2022). Mesolimbic
dopamine release conveys causal associations.
Science (New York,
N.Y.),
378, eabq6740.
https://doi.org/10.1126/science.abq6740