Splitting the Difference: How Does the Brain Code Reward Episodes?

Animal research and human brain imaging findings suggest that reward processing involves distinct anticipation and outcome phases. Error terms in popular models of reward learning (such as the temporal difference [TD] model) do not distinguish between the updating of expectations in response to reward cues and outcomes. Thus, correlating a single error term with neural activation assumes recruitment of similar neural substrates at each update. Here, we split the error term to separately model reward prediction and prediction errors, and compare the fit of single versus split error terms to functional magnetic resonance imaging (FMRI) data acquired during a monetary incentive delay task. We speculate and find that while the nucleus accumbens computes gain prediction in response to cues, the mesial prefrontal cortex (MPFC) computes gain prediction errors in response to outcomes. In addition to offering a more comprehensive and anatomically situated view of reward processing, split error terms generate novel predictions about psychiatric symptoms and lesion-induced deficits.