Standardized automated training of rhesus monkeys for neuroscience research in their housing environment.

Teaching nonhuman primates the complex cognitive behavioral tasks that are central to cognitive neuroscience research is an essential and challenging endeavor. It is crucial for the scientific success that the animals learn to interpret the often complex task rules and reliably and enduringly act accordingly. To achieve consistent behavior and comparable learning histories across animals, it is desirable to standardize training protocols. Automatizing the training can significantly reduce the time invested by the person training the animal. In addition, self-paced training schedules with individualized learning speeds based on automatic updating of task conditions could enhance the animals' motivation and welfare. We developed a training paradigm for across-task unsupervised training (AUT) of successively more complex cognitive tasks to be administered through a stand-alone housing-based system optimized for rhesus monkeys in neuroscience research settings (Calapai A, Berger M, Niessing M, Heisig K, Brockhausen R, Treue S, Gail A. Behav Res Methods 5: 1-11, 2016). The AUT revealed interindividual differences in long-term learning progress between animals, helping to characterize learning personalities, and commonalities, helping to identify easier and more difficult learning steps in the training protocol. Our results demonstrate that 1) rhesus monkeys stay engaged with the AUT over months despite access to water and food outside the experimental sessions but with lower numbers of interaction compared with conventional fluid-controlled training; 2) with unsupervised training across sessions and task levels, rhesus monkeys can learn tasks of sufficient complexity for state-of-the-art cognitive neuroscience in their housing environment; and 3) AUT learning progress is primarily determined by the number of interactions with the system rather than the mere exposure time. NEW & NOTEWORTHY We demonstrate that highly structured training of behavioral tasks, as used in neuroscience research, can be achieved in an unsupervised fashion over many sessions and task difficulties in a monkey housing environment. Employing a predefined training strategy allows for an observer-independent comparison of learning between animals and of training approaches. We believe that self-paced standardized training can be utilized for pretraining and animal selection and can contribute to animal welfare in a neuroscience research environment.


INTRODUCTION
Cognitive neuroscience research with nonhuman primates (NHPs) often requires extensive animal teaching using positive reinforcement training. Animals have to learn to accurately operate devices such as a touchscreen, a joystick, a lever, or a button, interpret sensory cues, and react according to the behavioral paradigm. Training an animal from naive to expert in a cognitive task can last many months, with its success depending on the animal's motivation and cognitive abilities but also the training strategy chosen based on the trainer's experience and intuition.
Standardizing training protocols for cognitive tasks should help in improving the quality of experimental data. It avoids variability in training history that could otherwise lead to variability in cognitive strategy. The more precise an animal's cognitive behavior is shaped by the design of the cognitive task and its training, the better it can be understood by the experimenter, and the lower is the risk of confounding interpretations of the behavioral and neurophysiological data collected for understanding the neural basis of cognitive behavior. Especially with multiple animals having to be trained on the same task, it is crucial that the same cognitive strategies are instructed to achieve comparability of behavioral and neural results between animals. However, the trainer's choice of training strategy or even the trainers themselves might differ between animals, potentially leading to mismatching tasksolving behavior of the animals and making their comparison difficult.
Standardizing training of cognitive tasks does not imply that each animal after a certain training time is confronted with exactly the same task demands according to a fixed protocol. Instead, task demands and their progression should depend on the individual performance. We propose to standardize only the rules according to which animals progress through the predefined learning steps of a new task. This approach aims at ensuring an optimal learning rate for the individual animal by maintaining an intermediate performance level, avoiding both the frustration of too many errors and the decline of the learning rate when the task becomes too easy. Within a specific project, the predefined learning steps should be the same for all participating animals, but they will obviously have to differ between projects. The standardized rules of progression through successive training steps can nevertheless be applied to a variety of projects.
Standardizing the training of cognitive tasks is particularly promising in combination with an automated unsupervised approach, since this minimizes the "human factor." Additionally, automated and unsupervised training substantially reduces the trainer's work load, while additionally allowing the animal a self-paced training schedule (Miller et al. 2015;Tulip et al. 2017). The training period for complex cognitive neuroscience projects can often last over several months, such that automatization creates a major potential for time savings. Also, with the need for human interaction, the training schedule is typically determined by the experimenter and not by the animal (Prescott et al. 2010). Automated unsupervised training in the home enclosure allows animals to choose time and duration of their training. Such choice provides the animal with more control over its environment, potentially enhancing animal welfare (Westlund 2014).
Automated and unsupervised training can be used in the animals' home enclosures and serve as environmental enrichment (Clark 2017). Environmental enrichment is an important factor in maintaining the welfare of NHPs, but monkeys can quickly lose interest in unchanging enrichment toys, by becoming habituated (Murphy et al. 2003). Maintaining an animal's interest requires variations and novelty in the environment and the involvement of primary reinforcers, such as food or fluid rewards, variety, or novelty (Tarou and Bashaw 2007). Cognitive training by an automated protocol, which dynamically adjusts the difficulty of a rewarded task to the animal's current skill level as suggested here, might increase the animal's motivation to continuously interact with the device. Thus such an automated training device marks a cognitively challenging interaction tool that could serve as cognitive enrichment to positively impact animals' welfare (Bennett et al. 2016;Clark 2017;Newberry 1995).
In our study, we aimed at introducing standardization and automation of training to standard tasks used in sensorimotor neuroscience in NHP and toward fully unsupervised training with self-adaptive selection of the task and of task difficulty. The idea was, as proof of concept, to instruct naive animals toward proficient performance in a memory-guided center-out reach task in an unsupervised fashion across all gradual training steps within and across sessions. With the standardization and automation and the use of a larger number of animals we aimed at two goals. First, we wanted to characterize interindividual differences in learning behavior based on performance differences between animals, e.g., to identify fast and slow learners. Second, we wanted to characterize task demand over the course of training based on commonalities in performance among animals, e.g., to identify challenging training steps for later training optimization.
For this, we developed and implemented an across-task unsupervised training protocol (AUT), particularly suited to run cognitive task training on touchscreen-based kiosk systems in the animals home enclosure, like our previously designed XBI (Calapai et al. 2016). Inspired by other successful housing-based testing systems (Andrews and Rosenblum 1994;Bennett et al. 2016;Fagot and Bonté 2010;Gazes et al. 2013;Kangas and Bergman 2012;Richardson et al. 1990;Washburn et al. 1989;Washburn and Rumbaugh 1992), standardized training procedures developed for studying learning of certain cognitive skills in isolation (Baxter and Gaffan 2007;Crofts et al. 1999;Fagot and Paleressompoulle 2009;Fagot and Parron 2010;Hutsell and Banks 2015;Kangas et al. 2016;Mandell and Sackett 2008;Nagahara et al. 2010;Shnitko et al. 2017;Truppa et al. 2010;Washburn and Rumbaugh 1991;Weed et al. 1999), and successful approaches to automated training approaches for rodents (Duan et al. 2015), we developed a computerized training algorithm, with which we trained eight rhesus monkeys from very basic touchscreen interactions to the memory-guided reach task. We show that the AUT 1) standardizes training of animals in tasks typical for NHP cognitive neuroscience research, 2) keeps animals engaged over several months of training in their home enclosure without fluid restriction, and 3) allows for animal characterization and training optimization based on learning performance. As an example, we show that the animals' numbers of interactions with the training device better explain the variability of training progress across monkeys than does their time spent with the training device.

MATERIALS AND METHODS
All animal procedures of this study were approved by the responsible regional government office [Niedersaechsisches Landesamt fuer Verbraucherschutz und Lebensmittelsicherheit (LAVES), Permit No. 33.9-42502-04-13/1100]. The animals were group housed with other macaque monkeys in facilities of the German Primate Center in Goettingen, Germany, in accordance with all applicable German and European regulations. The facilities provide the animals with an enriched environment (including a multitude of toys and wooden structures and natural as well as artificial light, exceeding the size requirements of the European regulations, including access to outdoor space).
The German Primate Center has several staff veterinarians that regularly monitor and examine the animals and consult on procedures. Throughout the study the animals were monitored by the veterinarians, the animal facility staff, and the laboratory's scientists, all highly experienced and knowledgeable in working with NHPs. This study did not involve any invasive procedures, and the animals were subsequently used in other studies.
Animals. A total of eight male rhesus monkeys (Macaca mulatta, age: 4 -7 yr) had 90 min of daily individual access (hereafter referred to as "session") to the XBI, a housing-based and computerized interactive system (Calapai et al. 2016), from Monday to Friday with free access to water for at least 2 h before and for at least 2 h after every session [with variable delays (up to a maximum of 4 h) until water accessibility] and 24 h during both days of the weekend (with one exception: during training days animal Toa did not receive fluid before the session but after). During sessions, the participating animal was kept in a smaller (~1 or 1.8 m 3 ) housing compartment, in auditory and visual contact with the members of its social group and of other groups in the same animal facility. All eight animals were accustomed to the XBI with at least 8 days of prior access and showed interest in repeatedly interacting with it, as described in a previous study (Calapai et al. 2016). We excluded a ninth animal, which had been part of the previous study, since it had not interacted with the XBI in this previous study. None of the animals received specific prior training toward the behavioral tasks introduced in the current study. All animals received fruit-flavored sweetened water (Active O2 Orange; Adelholzer Alpenquellen) diluted with plain water as reward for correct performance on the XBI.
AUT protocol. All training procedures were performed on the XBI, a touchscreen-based training and testing device for rhesus monkeys, optimized for use in an animal facility (Fig. 1A) and for cognitive behavioral experiments in neuroscience (Calapai et al. 2016). Animals have access to a 15-in. touchscreen (ELO 1537L; 1,024 ϫ 768 resolution, 75-Hz refresh, 2.5-mm touch accuracy) mounted in an aluminum frame, which replaces one wire-mesh wall panel of the housing compartment. We used three devices to simultaneously test animals belonging to three groups and housed in two different facilities.
To automate training, the AUT adjusts the complexity of the task gradually. Animals start with a very easy task and then are introduced to more and more challenging task levels and new tasks at a speed that is determined by the individual animal's performance. Within each training level, individual stimulus parameters might vary randomly but only if these changes do not affect the practical and conceptual difficulty of the task. For example, within a training level the position of a reach target on the screen might vary randomly (if the animal has learned or is supposed to learn to generalize the target's position), but the spatial and temporal precision of the requested behavioral response (reach accuracy) does not vary. Moving to the next level will increase the task difficulty. For example, the reach target might decrease in size, thereby requiring higher reach accuracy, without changing other parameters of the task.
In the AUT, a simple staircase algorithm uses the animal's performance to determine when the current training level is incremented or decremented (Fig. 1B). If during a given experimental session the proportion of correctly executed trials over the previous 50 trials on the current level exceeds 80%, the training level is incremented (increasing the task difficulty). If performance is less than 20%, the training level is decremented (decreasing the difficulty). If performance is between 20 and 80%, the algorithm keeps drawing the trials from the current level (the difficulty stays the same while individual performance-irrelevant task parameters might vary). After every level change, the performance counter is reset and the staircase level Fig. 1. Across-task unsupervised training (AUT) protocol. A: photo of a monkey working on the housing-based touch-screen device (XBI). The device shown here is an updated version of the XBI used in this study. An image of the XBI placed inside a housing facility is shown in Fig. A1. B: staircase algorithm to determine the trial-by-trial training level based on the performance in the preceding 50 trials. C: automated touch-holdrelease (THR) training protocol. Over a total of 36 different task levels the animals learn to touch a small blue square on the screen (fixation point), keep their hand on the square as long as it is visible, and release the screen within a certain response time window once the square disappears. D: automated memoryguided center-out-reach (COR) training protocol; following the THR training. Over a total of 30 task levels, the animals learn to touch and hold a small blue square in the middle of the screen (fixation point), remember the location of a flashing white square (target) in one out of 8 possible peripheral locations, wait for a certain instructed-delay period, release the fixation point within a certain period of time (response window) after the fixation stimulus disappears, and reach to the remembered (now invisible) target location. remains unchanged for the next 50 trials. The performance is recalculated after each trial.
Note that, because the animal's performance is computed based on the last 50 trials at the current level, the animal needs to perform at least 50 successive trials at a level before the training can reach the next level. This means that the AUT in this configuration is not optimized for fastest possible learning progression. Note also that the initial design of the series of training levels is based on a priori assumptions of the difficulty of each level and of the transitions to the next level, based on our experience with conventional training of rhesus monkeys on these and comparable tasks (Gail and Andersen 2006;Klaes et al. 2011;Niebergall et al. 2011;Patzwahl and Treue 2009;Westendorff et al. 2010). We purposefully aimed for a mixture of easy and more challenging steps to reveal expected performance differences between animals. Importantly, we maintained the same initial parameters of all training levels and transitions across our animals in the current study to ensure comparability. When using the device outside the study for everyday training, testing, or enrichment, we routinely adapt parameters to optimize training progress or usability.
Since we did not adapt the AUT parameters, a training step in our predefined protocol might turn out insurmountable for an animal. Therefore, we defined two criteria for ending an animal's training if no training progress was observed for a prolonged amount of time (stagnation): 1) after reaching level n, the animal did not reach the next level (n ϩ 1) within 25 sessions (days of training) and performed less than 1,250 trials across those 25 sessions (i.e., less than an average of 50 trials per session); 2) after reaching level n, the animal did not reach the next level (n ϩ 1) within 35 sessions, independent of daily number of trials. If one of the two stagnation criteria was met, the training was ended for the animal.
We used MATLAB (MathWorks) and the graphics toolbox gramm  for data analysis and visualization.
Touch, hold, and release task. The touch, hold, and release (THR) task is a basic task for goal-directed reaching toward visual targets on a touchscreen. To complete the 36 levels of the THR training staircase, the animal needs to learn to reach for a blue square on the screen, maintaining his touch until the square dims, and release the square within a reaction time window to receive the reward (Fig. 1C). This is achieved by 1) progressively reducing the stimulus size and hence the required reach accuracy from a width of 13-3 cm in levels 1-16; 2) randomizing the target position on the screen, first only along the horizontal, then only along the vertical axis, and finally along both axes within a square of 12-cm side length in levels 17-19; 3) increasing the required hold time from 150 ms to random times between 700 and 1,500 ms in levels 20 -29; and 4) rewarding the hold and timely release rather than just the long-enough hold (level 30), and finally by gradually decreasing the reaction time window for releasing the stimulus from 1,000 to 500 ms in levels 31-36.
All eight animals participated in the training of the THR task. One of the eight animals (Fla) was removed from the study during this first phase of the experiment since the animal was needed for a different project. We still included this animal's data in the analysis, since our quantification of the results does not depend on reaching the final level.
We started the analysis of the THR task for each monkey with the session where the animal reached level 2 for the first time and ended with the session where it reached level 36 for the first time. This is because level 1 was used to habituate the animals to the device (Calapai et al. 2016), and the step from levels 36 to 37 was not automated but instead initiated by the experimenter, since it marked the transition between two training modules (see below).
Memory-guided center-out reach task. The memory-guided centerout reach (COR) task ( Fig. 1D) is widely used in sensorimotor neuroscience for goal-directed motor planning based on spatial working memory content (Kuang et al. 2016;Snyder et al. 1997;Wise and Mauritz 1985). The 31 levels (levels 37-67) of the COR training staircase are designed for animals that have learned the THR task. In the COR training, the animal has to learn to reach for the same blue screen-centered square as in the THR task, additionally remember the position of another stimulus (cue) briefly flashed at one of eight discrete peripheral locations uniformly distributed over eight positions along an invisible circle surrounding the central square, and finally reach for the memorized cue location as soon as the central stimulus disappears.
For the first COR training level (step 37), no working memory is required. The monkey has to hold the central stimulus for 500 ms and then touch the cue within a 5,000-ms reaction time window. The cue appears either left or right of the central fixation stimulus at the same time when the hand-fixation stimulus disappears ("go" cue). The AUT protocol then guides the monkeys toward the final COR task design by 1) reducing the reaction time window from 5,000 to 3,000 ms in steps 37-40; 2) randomizing the position of the cue (up/down, 4 cardinal directions relative to fixation, all 8 directions) in steps 41-43; 3) shortening the reaction time window further from 2,500 to 800 ms in steps 44 -47; 4) delaying the go cue from 100 to 1,300 ms after appearance of the peripheral cue in steps 48 -57 (instructed-delay reach); and finally 5) reducing the cue luminance from 50% to invisibility during the instructed delay and movement time in steps 58 -67. Once the cue becomes invisible after initial presentation, the instructed reach direction has to be memorized for proper reach goal selection (memory-guided reach).
Five animals that had completed the THR training (staying on level 36 for at least 2 wk and reaching a within-level performance of 80% at least once) were available and participated in the AUT of the COR task.
Memory-guided center-out pro-anti-reach task. Additional to the main experimental design, in which we used standardized training for a larger group of animals, we also wanted to explore the power of the AUT for training a more challenging task. The pro-anti-reach (PAR) task is an extension of the COR task in which proper selection of the reach goal is contingent upon choosing the correct visual-to-motor transformation rule instructed by a colored context cue (Crammond and Kalaska 1994;Gail and Andersen 2006). The color of the peripheral cue instructs the animal either to perform a direct (pro) reach (magenta cue) or to reach the opposite location of the cue, i.e., to perform an anti-reach (cyan cue). For training of the PAR task, we adapted the staircase such that not all animals in this third training phase experienced the exact same protocol. Therefore, only anecdotal results will be reported. We consider them noteworthy, since the PAR task represents an advanced level of task difficulty relevant for cognitive neuroscience, particular the analysis of context-dependent goal-directed behavior (Gail and Andersen 2006;Klaes et al. 2011;Westendorff et al. 2010). Three of the four animals that had completed the final level of the COR task (Chi, Gro, and Zep) participated in the PAR task. We used a small subset of animals only, since for the other animal this advanced task was not relevant for its later use in neuroscience projects.

RESULTS
The aim of this study was to test the suitability of standardized and automated protocols for training rhesus monkeys. Table 1 shows an overview of the overall performance of all monkeys that took part in this study for the THR and COR task training. Five out of seven animals completed the full THR training staircase, requiring between 13 and 120 sessions and between 4,680 and 11,778 trials (correct and error trials) to progress through the 36 THR task training levels. While the number of trials needed partially scales with the number of sessions needed, the amount of trials and of sessions was not directly related. Animals Odo and Toa stagnated at levels 26 and 30, respectively in the THR task. Animal Odo successfully accomplished to touch the target stimulus but not to hold for a prolonged period of time. The training was ended when animal Odo performed 25 sessions with Ͻ50 trials on average after reaching level 26 (criterion 1). Animal Toa accomplished holding the stimulus but did not learn to release it in response to its dimming. The training was stopped when animal Toa did not reach level 31 within 35 sessions after reaching level 30 (criterion 2).
Four out of five animals mastered the final level of the COR task. Again, the numbers of sessions and trials needed varied substantially (57-126 sessions, 14,184 -24,511 trials) even if considering only the successful animals. Given that the training was standardized across animals, this large variability in number of sessions and trials needed to learn the task must reflect an interindividual variability of the learning progress, which we will analyze below. Animal Nor, stagnating at level 63 in the COR task, learned to wait for the reach instruction before reaching to the target (instructed-delay task) but did not learn to memorize the target position. Animal Nor did not reach level 31 within 35 sessions after reaching level 30 (criterion 2).
Three out of the four animals that had been successful in the COR task were included in the PAR task staircase training. For two of the animals, we modified the staircase in response to performance difficulties that both animals encountered at the same level of the PAR staircase (see below for a discussion of this deviation from our otherwise strict adherence to the initial staircase parameters). Since the stagnation added extra sessions to the training, the PAR learning is not fully comparable across animals anymore and thus not included in Table 1 and the corresponding analysis.
Note that the first and last steps of the THR task (level 1 and 36) are excluded from all analyses because level 1 was identical to the task used to initially accustom the animals to the XBI (Calapai et al. 2016) and the transition to COR (level 36 and 37) was not automated. Excluding these two levels ensures that our analysis only includes automated level transitions.
Motivated by the observed variability in training progress, we analyzed the learning progress across and within animals for the THR and COR training protocols for two different purposes. First, we used the performance data from the AUT to quantify interindividual differences between animals and to test whether time spent in training or experience with the task better explains the average training progress of animals. Second, we used the performance data to characterize different phases of the training protocols in terms of their difficulty for the animals.
Performance in THR and COR task. Over the course of 2 yr, we collected data from 874 training sessions (13 sessions excluded due to technical malfunctions). The daily number of interactions with the XBI differed substantially between and within animals, as did the within-animal spread. Figure 2 plots the number of interactions per session (1 session per working day) for each animal. The median number of interactions varied from 43 trials (Odo) to 380 trials (Chi). The difference between the 25th and 75th within-animal percentile varied from 78 trials (Odo) to 259 trials (Zep). While the amount of interactions per session of an animal partly varied over the course of the study, none of the animals stopped interacting completely with the device. A more detailed illustration of the average amount of interactions as function of session number can be found in the APPENDIX (see Fig. A2).
All animals had been habituated to the XBI before study begin (Calapai et al. 2016), so that they knew that a successful interaction with the touchscreen would cause flavored water to be dispensed. The progress for stepwise learning of the two new tasks (THR and COR) is shown in Fig. 3 for each animal. In general, the achieved level of difficulty increased monotonically for all animals, with slower speed of progression at higher training levels in both tasks. When plotted as a function of session number (a proxy for exposure time; Fig. 3, right), the achieved level of difficulty after a certain time differed between animals up to a factor of 2-3. When the same performance data were analyzed as a function of number of trials performed in each training protocol (a proxy for task experience; Fig. 3, left), the spread between animals was reduced. This suggests that learning progress does not depend on the time of exposure to the task but rather the experience gained through individual interactions with the task.
To test for the effect of exposure time vs. task experience on the learning progress we determined the level demand, i.e., how long it takes an animal to accomplish each training level. We computed the level demand both as the time (in minutes, including time within and between trials) and the number of  The table shows the number of trials/session the animals performed in each task. "Final Level" denotes the maximally reached level, where touch-holdrelease (THR) covers levels 1-36 and center-out-reach (COR) levels 37-66. "Trials" denotes the total number of trials (successful or not) needed to reach the highest achieved level within the training for this task, and "Sessions" denotes the corresponding number of training sessions. Animals Alw, Chi, Gro, and Zep finished both tasks; Nor finished THR but not COR; and Odo, Toa, and Fla did not finish THR and thus did not participate in COR. *Animal Fla was taken out of the experiment for reasons unrelated to the current study. trials needed by each animal for reaching a certain level for the first time, after they reached the preceding level for the first time (Fig. 4, inset). By comparing the average level demands across levels (Fig. 4), we can identify individual levels or phases of the training for which the animals needed more time or attempts. As an example, between levels 58 and 67, the luminance of the touch target decreased stepwise until it reached threshold visibility. Around level 62, the touch target was not visible anymore for the animals so that they needed to memorize the visual cue shown at the beginning of the trial to know the correct touch position (memory-guided reach). Since most of the animals spent more trials on this level compared with the average of the other levels, we can infer an elevated difficulty for this level. In this way, the AUT approach can be used to evaluate a given training strategy and identify the difficulty of each training step within this strategy.
Some animals needed longer than others to complete a staircase or certain levels of it. To quantify the interindividual variability, we determined the difference in this variability if demand is quantified via time exposed to the XBI or the number of trials. For each level, we thus computed the coefficient of variation across animals of the time demand in minutes CV time and trial demand CV trial : CV ϭ ր , where is the standard deviation and is the mean. Figure 5 shows the distributions of CV time and CV trial . On average CV time was 1.15 and higher than CV trial , which was 0.84 (Wilcoxon signed rank test, P Ͻ 0.001). This indicates that experience with the task rather than time spent on the task is a better predictor for the learning progress.
PAR task. The idea behind the PAR task staircase, which was run only with few animals and was not as strictly pre-defined as the other training protocols, was to provide a proof-of-concept that the animals could also be trained on more advanced rule-based cognitive tasks with our standardized algorithm-based training protocol. In contrast to COR, the visual cue in PAR is presented in either of two colors instructing to touch the location of the cue (as in COR) or opposite to it starting from the position in the middle of the screen (see MATERIALS AND METHODS). In our experience, such rule-based tasks can pose some challenges even when taught to rhesus monkeys by experienced trainers.
After two animals stagnated at the same training level (dimming of an auxiliary target stimulus at the anti-position to render it invisible), we modified this training level (by delaying the disappearance of the salient auxiliary stimulus until after reach onset but before reach termination). Using this modified approach, both animals succeeded in learning the memoryguided anti-reaches, although one of the two animals did not generalize the anti-rule to all reach directions. The third animal that arrived at this level later did not manage to pass the level, despite the modified strategy. One of the first animals, monkey Chi, learned the final level of the PAR task and performed it with a success rate of 71%.

DISCUSSION
Eight rhesus monkeys were trained on a visually instructed reach task with increasing complexity on a touchscreen device within their housing environment using an across-task unsupervised training (AUT) protocol (Fig. A1). Within our rigid  training schedule, stagnation criteria, and the free access to fluid, five of the eight animals succeeded in learning a simple touchscreen interaction task [touch-hold-release (THR)] and continued training in a standard task for sensorimotor research, the memory-guided center-out reach (COR). Four of these five animals were able to complete this training staircase and three of them continued to an extension of the COR, the pro-antireach task (PAR), the last level of which was reached and completed by one animal only. By comparing the learning behavior between animals, we found that the learning progress was better predicted by the amount of trials rather than by the time spent training. Additionally, the unsupervised nature of the training progress allowed us to identify easy as well as difficult steps of the tasks, which in turn helped in the evaluation of the effectiveness of our training approach. Finally, all animals continued to use the device over several months despite fluid and food intake not being restricted outside the training sessions, suggesting that the AUT of cognitive tasks is a valuable tool for environmental enrichment (Clark 2017).
Unbiased behavioral training and assessment of learning performance. In cognitive neuroscience research with NHPs, monkeys are often required to solve complex cognitive tasks, for which the learning process requires extensive training. Some factors that influence training duration are task difficulty, physical and cognitive effort, motivation level of the animal, reward attractiveness, group rank, and training strategy. The latter is set by the trainer and hence is influenced by the trainer's subjective decisions on which task level to offer to the animal on daily basis. Also, despite mostly automated experimental control software being in place, well-intended direct interactions such as seemingly minor adjustments of task parameters based on a subjective estimate of the animal's performance will interfere with the learning process. In the worst case, this could lead to undesired behaviors. At the least, it leads to idiosyncratic training protocols for individual animals, which makes performance comparisons between animals difficult to impossible. In the AUT, the training strategy is still set by the trainer but predefined and can be applied to each animal in the same manner. Compared with more traditional types of training in NHP neuroscience, the AUT procedure ensures that differences observed when comparing learning progress do not reflect a trainer or experimenter bias since all the animals underwent exactly the same routine.
Moreover, a direct and unbiased comparison of the learning behavior shown by different animals in the same learning protocol is essential for quantifying the spectrum of cognitive skills within a group of animals (Andrews and Rosenblum 1994;Evans et al. 2008;Fagot and Paleressompoulle 2009;Harlow 1949;Hutsell and Banks 2015;Kangas et al. 2016;Truppa et al. 2010;Washburn and Rumbaugh 1992;Weed et al. 1999) or between species (Amici et al. 2010;Crofts et al. 1999;Herrmann et al. 2007;Rogge et al. 2013;Schmitt et al. 2012) or identifying animals particularly suited for specific research projects (Capitanio et al. 2006).
Given that our approach removes trainer effects from the list of factors potentially influencing learning progress, this should enable a closer look at other factors. For example, we used the same reward for all animals, which is likely more attractive for some animals than for others, and we did not systematically investigate the influence of the social rank of our animals.
Interindividual variability in learning. When designing AUT staircases, we aimed for a steady increase in difficulty at a moderate speed to minimize the risk of insurmountable conceptual changes of task rules. To ensure identical conditions across animals we maintained a rigid training schedule and unchanged stagnation criteria. Under these conditions one animal (Nor) did not complete the COR task before the stagnation criteria were reached and two (Odo and Toa) did not complete the preceding THR task. Interestingly, these three animals performed on average the least number of interactions per day on the device (Fig. 1), suggesting that interaction behavior with the device could be a quantitative predictor of long-term performance. To look at this more closely, we investigated which interaction dimension, trials performed or time spent practicing, would best predict the animals' learning, by looking at the variability across the learning curves. We found the variability to be lower when progress was measured across number of interactions (or trials) rather than absolute time spent on the device. This result suggests that the number of trials performed is a better predictor of the training progress of a given animal, than the amount of time the animal spends practicing a given task. Note, we do not think that the low number of interactions in the poorly performing animals was due to excessive task demand, since below-average interaction was noticeable from early during the training when the task was still rather trivial (Fig. A2).
The observed correlation between number of interactions with the XBI and learning progress has twofold implications. First, maximizing the number of interactions per session by creating additional incentives for conducting the task should lead to a gain in learning progress. This will be discussed further when comparing conventional training approaches below. Second, the variability in housing-based training performance might be used to preselect animals for research projects requiring complex and demanding tasks (either cognitive and/or physical). While any form of performance-based animal preselection obviously prohibits scientific conclusion on the general cognitive capacities of the species as such in comparative studies, in other fields such as cognitive neuroscience it might still be justified to select those individuals, which reach a certain experimental level faster than others for reasons of practicality and animal welfare.
Optimizing training protocols. By measuring the number of trials different animals needed on average to master a certain level, we learned about the inherent difficulty of that level. This measure can be used to evaluate the training approach implemented by the predefined set of levels. For instance, the first 20 steps of the THR task seem to be very easy for all the animals, since most animals performed at or close to the maximum possible speed of progression (dashed line in Fig. 3). Thus, by skipping several of those early levels, it might be possible to speed up the training. On the other hand, level 30, having the highest amount of trials across all animals in the THR task, seems to be the most difficult. In fact, it is the level where two animals dropped out due to lack of learning progress.
Beyond the study presented here, it would be useful to expand our automated approach to reduce the risk of animals stagnating. To optimize the training strategy toward a constant moderate task difficulty over the whole training, easy levels could be omitted and difficult levels could be broken down or adjusted. This avoids unnecessary idling at trivial levels and at the same time keeping the risk of stagnation low. Expanding the adaptivity of our staircase by also iteratively changing the parameters of the staircase steps based on the recent history of the individual animal's progress would thus boost the effectiveness of the unsupervised approach.
The above approach is advisable if the goal is to guide as many animals as possible toward successful completion of the final task level without investing unnecessary training time and without frustrating the animal with excessive cognitive demands. If, on the other hand, the goal is to emphasize interindividual performance differences and titrate the lower and upper end of the performance spectrum, one would purposefully use a broad spectrum of level demands, including moderate and more advanced training levels, to reveal the highest variability across animals. The task levels then should be easy enough for most animals to succeed but too difficult for all to master them trivially. By fanning out the performance across animals, interindividual differences become particularly apparent and one can identify the best performers.
Using such training approach with a large spectrum of task difficulties would be useful in cases of larger variability of cognitive abilities, such as in interspecies comparisons (Amici et al. 2010;Herrmann et al. 2007;Rogge et al. 2013;Schmitt et al. 2012).
By employing such a multifaceted training procedure, it will be possible to identify the relative difficulty related to certain aspects of the task. The pattern in Fig. 4 could mark a speciesor animal-specific learning profile useful for characterization of cognitive skills and thereby serve as a cognitive fingerprint. For example, as seen in Fig. 4, the animals needed more trials to accomplish levels 58 -67 (waiting for the cue to respond) than levels 48 -57 (memorizing the target location). This could indicate that rhesus monkeys find it easier to learn withholding an action for a few hundred milliseconds than to learn memorizing a certain spatial position for the same time. Quantifying the learning progress, especially by means of AUT, could in turn help mitigating confounds related to the human administration of the tasks, typical for studies employing test batteries (Herrmann et al. 2007). Although beyond the scope of the present study, we believe that the approach so far discussed might help dissociating the social components of primate cognition, an important challenge in the field of primate cognition (Schmitt et al. 2012;Seed and Tomasello 2010).
Environmental enrichment. Recently, a study proposed housing-based training as a valuable tool for environmental enrichment of captive NHPs (Bennett et al. 2016). Our automated and standardized approach to cognitive training resembles some of the key features of what make a good environmental enrichment tool (Clark 2017;Murphy et al. 2003).
Environmental enrichment ideally expands the possibilities for species-specific behavior (Newberry 1995). A useful enrichment tool thus triggers the interest of animals and keeps them engaged for an extended period of time. While monkeys explore new devices for a short period due to curiosity, primary reinforcers, such as food, seem to prolong the interest of an animal into a certain activity. However, even with primary reinforcers, a within-session reduction in the number of interactions has to be expected due to habituation (McSweeney et al. 1991). We observed that across sessions only one of the animals stopped working on the task (Fig. A2), even though they were not subject to fluid or caloric control schedules. Only animal Odo stagnated in training due to a low interaction rate (criterion 1), but note that this animal performed dozens of previous sessions with substantially higher interaction rates with the device. Our experiment was not built to test the habituation hypothesis. Yet, our results suggest that a dynamic device that changes gradually but constantly is less likely to lead to habituation and hence might be particularly suited as enrichment tool as it keeps the animal engaged for an extended period of time (Tarou and Bashaw 2007). It should be noted, although, that in the current phase of the project the animals were in a compartment connected to, but separated from, their group-housing compartment for the training. There, the environment was less varied than in their housing compartment during the rest of the day. The lack of other opportunities might have triggered some of the interactions with the device. On the other hand, occasional access to other objects or peers in the adjacent compartment did not seem to have a negative effect on the motivation to interact with the device.
Comparison to standardized learning tasks used in behavioral and cognitive studies. Standardized cage-based cognitive testing is well established in behavioral and cognitive research (Washburn et al. 1989;Richardson et al. 1990;Washburn and Rumbaugh 1992;Andrews and Rosenblum 1994;Fagot and Bonté 2010;Kangas and Bergman 2012;Gazes et al. 2013;Bennett et al. 2016). In this research, learning progress has been systematically quantified with experimental paradigms measuring the success rate or the number of trials needed to reach a criterion (Washburn and Rumbaugh 1991;Crofts et al. 1999;Weed et al. 1999;Baxter and Gaffan 2007;Mandell and Sackett 2008;Fagot and Paleressompoulle 2009;Fagot and Parron 2010;Nagahara et al. 2010;Truppa et al. 2010;Hutsell and Banks 2015;Kangas et al. 2016;Shnitko et al. 2017). For studies of complex behavior, there is the need to learn a combination of cognitive and motor skills. For example in the PAR task subjects have to 1) learn how to precisely handle a touch screen, 2) react only upon cue appearance, 3) memorize sets of spatial locations, and 4) understand and integrate contextual information from multiple cues. It is an important challenge to quantify the step-by-step learning in such paradigms. The AUT employs a series of small increments in task difficulty matched to the animal's own learning pace in an unsupervised across-session manner, allowing quantification of the learning performance of each level. This strategy might be especially useful when training animals to a new type of task that is not yet well characterized.
Across NHP tasks with various movement requirements, such as saccades (Yao et al. 2016), button presses (Niebergall et al. 2011), touchscreen interactions (Klaes et al. 2011), three-dimensional joystick (Morel et al. 2015), or large hand/ arm movements (personal observation), we observe a decline of the number of interactions as a function of the physical effort involved. Both tasks in the present study involved touching a stimulus on the screen for up to more than a second without an arm rest, a considerable effort when repeated many hundred times within one session. Nonetheless, 7 out of 8 animals performed on average more than 100 trials per session, over many months and despite no food or fluid restriction.
Automated housing-based training vs. conventional laboratory-based training. Our approach allowed us to train animals without fluid or caloric control schedules and without timeconsuming supervision by an experimenter. Four out of seven animals learned a full memory-guided COR task, a standard task in cognitive neuroscience. However, compared with standard training approaches, this is a low success rate given the type of task. Also, there are several additional disadvantages in comparison to conventional neuroscience training where the animal sits in a primate chair. First, even the four best animals, which finished the COR task, still needed on average 77.3 sessions and 19,285 trials to learn the COR task, not considering THR training before. Five animals, which we trained conventionally, i.e., with fluid control, learned the almost identical task on average in 16.8 sessions and 8,488 trials (Gail and Andersen 2006;Klaes et al. 2011) (see APPENDIX). This means that fluid control schedules for increasing the perceived value of fluid rewards decreased the total training period on average by a factor of 4.6, a reduction in overall training duration of 2-3 mo. This suggests that the stagnation in training due to a low interaction rate of one of our animals (Odo) with the device might have been prevented by using fluid control or a different reward regime. Second, most cognitive neuroscience tasks require devices other or additional to a touchscreen, such as eye tracking, joysticks, or three-dimensional vision. Especially scientific or technical constraints that mandate steady head position or body posture are much harder if not impossible to implement in a housing-based training device. Third, training within the housing environment introduces additional distracting stimuli, which cannot be controlled for, such as various noise sources, personnel entering the room, and other monkeys in view. Fourth, the conventional training is already performed inside the experimental setup, which the monkey needs to be accustomed to before invasive experimental procedures start. It is not clear yet how well monkeys will generalize a complex task from the housing-based to a laboratory-based setting, but the recent study of Tulip et al. (2017) suggests that a knowledge transfer is likely at least for simple button-press tasks.
Finally, a well-experienced trainer should be able to adapt a training protocol to an individual animal in a way that is beneficial for a fast training progress. Part of the reported difference in the speed of learning between our AUT approach and the conventional training could be explained by the fact that our automated algorithm was not optimized for speed and animals spent an unnecessarily long time on easy task levels. It was not the primary aim of our study to develop the fastest and most efficient training strategy. We designed the AUT to serve as a new approach to train animals to various, more or less complex tasks in their own housing environment. Therefore, various features could be easily implemented in the AUT if learning needs to be accelerated. For example, 1) in cage-based training within the housing environment, the exposure time to the training device could be considerably prolonged (Fagot and Paleressompoulle 2009); 2) to increase the animal's motivation to perform more trials per session, food or fluid intake could be controlled (Evans et al. 2008); and 3) the training algorithm could be improved by taking into account the individual animal's recent learning history. On the other hand, deviating from a predefined training protocol bears the risk of introducing variable learning histories, potentially confounding later results of cognitive testing and neurophysiological recordings. An attractive "best-of-both-worlds" approach could be, for example, to combine the automated approach with fluid control schedules and to optimize the algorithm for learning speed, while not giving up on the standardization of the training across animals. Similarly, cage-based training using AUT, employed for pretraining to the cognitive task, could be combined with the laboratory-based setting to accustom the animal to the experimental environment and for the final training.

Conclusion
Our study shows that housing-based unsupervised training is suitable to aid animal training for cognitive neuroscience research, despite slower training progress compared with traditional setup-based approaches. Using our XBI device (Calapai et al. 2016), we demonstrate that it is possible to teach rhesus monkeys demanding behavioral paradigms used in cognitive neuroscience research by employing an across-task unsupervised training protocol. Such an approach can be used even in housing settings without setups for neuroscience training and research. By providing an animal more choice in when and how much it engages in the training, it gains an increased level of control over its environment, which benefits welfare. Providing training opportunities in the familiar housing environment might also be beneficial for practicing difficult training steps accelerating the setup-based training. Furthermore, our AUT, which increases in difficulty according to the animal's abilities, keeps the animal engaged with the device over extended periods. This supports the usability of the XBI as an enrichment tool for animals in their home cage.

APPENDIX
Comparison with conventionally trained animals. To get an intuition on how automated training matches conventional training in a neurophysiology setup, we compared the COR training progress of the XBI animals with five animals trained with the conventional approach (Gail and Andersen 2006;Klaes et al. 2011). The latter animals were seated in a primate chair in front of a touchscreen in the experimental setup separated from their home environment. All of Fig. A1. XBI inside a housing environment. The XBI replaces a wall of a single cage compartment (back, displays a task not used in this study). An opening connects those compartments to the housing cage of the social group (right). During a training session, the animal was isolated in the compartment with the XBI attached. them received the majority of their fluids during working days through training contingent upon performance. The duration of a given session was determined by the assessment of the experimenter of the animal's motivational state, i.e., the sessions were ended when the animal indicated no further interest in continuing the training for this day. To reduce variability introduced by different trainers, we included only animals trained under the guidance of the same experienced trainer (author A. Gail). Note that the conventionally trained animals, different to the XBI animals, were not trained on the THR task before, but all animals were familiar with the general setting and the fact that touching the touchscreen can trigger a reward. Although all XBI animals learned COR after THR, the COR task was designed in a way that it could also be trained to naive animals. Furthermore, we designed the automated COR training that it resembles the conventional training strategies.
The conventional training strategies, although slightly varied across animals, always followed three main training steps: 1) direct reach, accurately touching a visual stimulus in the center of the screen (fixation point) to then touch a second visual stimulus in the periphery (target) (corresponding to levels 37-47 of the COR); 2) delayed reach, holding the fixation point until its disappearance before touching the target, i.e., fixation and target stimulus overlap in time (corresponding to levels 48 -57 of COR); memory reach, memorizing the target's location and reaching for it after it had disappeared (corresponding to levels 58 -67).
Here, we report the total number of sessions (Fig. A3, left) and number of trials (Fig. A3, right) each animal needed to succeed in a given phase. With the automated XBI training reported in the main text, animals learned COR in 77.5 sessions and 19,348 trials on average. In conventional training including fluid control, animals learned COR on average in 16.8 sessions and 8,488 trials. It means that animals trained on the XBI needed 2.3 times more trials to learn COR than animals in the conventional setup and 4.6 times more training sessions.
Certainly one reason for the slower progression in the XBI training was the fixed training strategy predefined in automated training, which was not designed to optimize training speed but to be easy enough to potentially lead the most animals through the training. In conventional training, trainers had the opportunity to adapt to the individual animal's performance more flexibly than the staircase algorithm does. This could explain why XBI animals needed more trials than conventionally trained animals to achieve the same task level. It also suggests that the difference could likely be reduced by more advanced staircase algorithms optimized for fast adaptation.
Another reason certainly was that animals in conventional training, even though in a very early phase of training and not used to longer training sessions yet, performed on average 505 trials per session while animals in automated training performed 250 trials per session. This difference cannot be attributed to a suboptimal staircase algorithm. Factors contributing to this difference likely are 1) the increased incentive of the reward due to water control; 2) more focused animals in the conventional setup due to a less distracting environment; and 3) the fact that XBI sessions lasted always 90 min while in  conventional training the duration of session was determined by the experimenter judging the animal's motivation to carry on or not with the session.
Given that training strategies and conditions varied substantially, the results from a comparison between these approaches deserve some clarifications. First, in the conventional approach animals did not experience the perfectly same training. Even though at least the supervising trainer (not necessarily the executing trainer) was the same person, the training approaches likely varied due to skill improvements or different personal bonds of the trainers with the animals. Second, animals in the automated training were already trained to THR and only those animals that succeeded in THR and COR were added to the comparison, whereas no criterion was applied to preselect conventionally trained animals. In this sense, the observed difference in learning speed is a conservative estimate and true differences could be even larger.
In conclusion, our comparison of the two approaches has to be taken with some care, since not all relevant parameter could be matched retrospectively. It is only meant to give an intuition on how the same training level can be reached either with the aid of an automated algorithm in an unsupervised fashion or with conventional training, with each of them having their own benefits and disadvantages.