Temporal-Difference learning underpins the acquisition of intertemporal preferences for high-value food rewards in humans

Electronic versions


  • Timothy Davies

    Research areas

  • PhD, School of Psychology


Evolutionary perspectives posit that weight gain, obesity, and associated health complications occur due to the application of inherited foraging strategies in environments where highly-palatable, energy-dense food is easily obtainable (Lieberman, 2006). Human tolerance to risk is an obvious target to test this perspective experimentally. My thesis operationalised risk in terms of delay variability, where young, healthy participants made selections between two schedules that delivered high-value food rewards after either variable or fixed delays. I also applied a suite of computational models to specify the mechanisms of variable or fixed delay schedule preferences. Overall, preferences for variable delay schedules were enhanced when the last food reward received was delivered immediately. Experiment 1 found that this effect was not moderated by an operationalised environment of mild food scarcity. Experiment 2 demonstrated that individuals in states of heightened hunger were more likely to select the variable delay schedule following immediate food delivery. Experiment 3 revealed that individuals who attend towards visual cues that signal the duration of delays before the delivery of food rewards were more likely to select the variable delay schedule following short and fixed delays, but less likely following long delays, suggesting a form of delay aversion. I also found some evidence to suggest that variable delay schedule preferences were sensitive to BMI and temporal discounting, highlighting the potential relevance of this research for understanding food-seeking strategies in populations vulnerable to weight gain. A simple TD n-Step learning model was able to capture the acquisition of preferences when food rewards were delivered after every selection, and motivation to consume the rewards on offer was high. These data suggest that humans value the delivery and consumption of quick food more highly than food that is delayed, and will tolerate risks of longer delays for the possibility of receiving food rewards at the earliest opportunity. The acquisition of variable delay schedule preferences is likely underlined by temporal discounting and learning.


Original languageEnglish
Awarding Institution
Award date28 Nov 2018