Inexpensive, scalable camera system for tracking rats in large spaces

Most studies of neural correlates of spatial navigation are restricted to small arenas (≤ 1 m2) because of the limits imposed by the recording cables. New wireless recording systems have a larger recording range. However, these neuronal recording systems lack the ability to track animals in large area, constraining the size of the arena. We developed and benchmarked an open-source, scalable multi-camera tracking system based on low-cost hardware. This camera system was used in combination with a wireless recording system for characterizing neural correlates of space in environments of sizes up to 16.5 m2. This system improved accuracy in estimating spatial firing characteristics, theta phase precession, and head direction tuning of neurons compared to a popular commercial system, due to its better temporal accuracy. This improved temporal accuracy is crucial for accurately aligning videos from multiple cameras in large spaces and characterizing spatially modulated cells in a large environment.


Introduction 48
Spatial navigation is a widely employed behavior to study the neuronal circuits underlying 49 cognition, learning, and memory. Since the discovery of place cells in the hippocampus four 6 computers and the data acquisition system. Difference between timestamps recorded on 163 each Raspberry Pi computer and the data acquisition system corresponding to the first TTL 164 on transition gives us the instantaneous temporal offset between these devices. This offset 165 was then subtracted from all subsequent frame timestamps of each video to convert their 166 timestamps to the data acquisition system's temporal reference frame. Subsequent TTL 167 on/off transitions showed that there was virtually no temporal drift between the Raspberry Pi 168 computers and the data acquisition system for up to 4 hrs of recording (in case of temporal 169 drift/jump between clocks on different systems, every TTL on/off transition can be used to 170 correct the errors). 171 Three files were generated on each Picamera sub-unit: a video file in .h264 format and two 172 csv files: one containing the timestamps for all the frames and other one holding the 173 timestamps corresponding to each TTL input on/off transition. 174

Video Stitching 175
Videos from each camera were processed offline. A representation of the 5.5 m x 3 m room 176 was created by aligning simultaneously captured video frames from all 8 Picamera sub-units, 177 each of which covered only a part of the large room and had a substantial overlap with at 178 least two cameras. Since we had the exact timestamp for each frame, we could temporally 179 align each frame accurately across cameras. Video stitching involved two steps: 1. 180 Calculation of registration data for each camera, and 2. Generation of aligned images for 181 each camera (Figure 1b) Generating the aligned Image for each camera Individual video frames from all the 197 cameras were then transformed using the registration data (calculated once at the beginning 198 of experiment using one calibration frame from each camera) to get them all in the same 199 coordinate system. We modified the OpenCV library Stitcher class (https://opencv.org/) to were segmented into 4 cm x 4 cm spatial bins for the large room and 2 cm x 2 cm spatial 233 bins for other setups. Times during which the rat moved < 2 cm/s and the spatial bins where 234 the rat spent < 0.4 s were excluded from the analysis. The firing rate map of each cell was 235 calculated by dividing the number of spikes fired in each bin by the time spent there. Rate 236 maps smoothed using the adaptive binning algorithm (Skaggs et al., 1996) were used to 237 calculate spatial information score (see below). Gaussian (sigma = 1.25 bins) smoothed rate 238 maps were used to calculate peak firing rate, place field size and for illustrations. Only place 239 fields with peak firing rate greater than the 25% of the peak firing rate for that cell's rate map 240 were included for the place field analysis. The size of an individual place field was 241 determined as number of contiguous pixels (minimum 7) with firing rate greater than 15% of 242 the peak firing rate of that field. 243 Head Direction Head direction tuning curves were calculated after dividing the total number 244 of spikes fired for each head direction bin (5° bin width) by the amount of time the rat spent 245 facing in that angular bin (Taube et al., 1990) and smoothed with a Gaussian with sigma = 246 1.25 bins. 247 Spatial Information A spatial information score (Skaggs et al., 1996) was used to quantify 248 the spatial tuning of single units. The score calculates the information (in bits) about the rat's 249 location conveyed by a single spike. We employed a shuffling procedure to estimate the 250 probability of obtaining the observed spatial information by chance. The spike train was 251 shifted cyclically with respect to position data one thousand times by adding a uniformly 252 generated random number lying between 30 seconds and the duration of the recording 253 session -30 seconds. The fraction of time-shifted information scores greater than or equal to 254 the observed information score was used to calculate the probability of obtaining the 255 observed information score by chance. A significance threshold of p < 0.01 was used to 256 identify neurons with statistically significant spatial information (Deshmukh & Knierim, 2011). 257 Theta Phase Precession Analysis Theta peaks in the local field potentials were detected 258 as described by Deshmukh et al. (2010). Each spike was then assigned a phase (between 259 0° and 360°) using linear interpolation between consecutive peaks (Skaggs et al., 1996). For 260 the circular track, 2D data was transformed into units of degrees on the track for linearized 261 position estimates. Theta phase at which a place cell fired was plotted as a function of 262 9 linearized position at which it fired to visualize theta phase precession as the animal passed 263 through the place field. 264 Statistical Analysis Two tailed tests were used for all quantitative statistical comparisons. 265 Inter-frame intervals for both camera systems were normally distributed; Two-sample F-test 266 for equal variances was used for comparing the two. Wilcoxon signed rank test was 267 performed for all paired comparisons. 268

Data Availability 272
The datasets generated during the current study are available from the corresponding author 273 on reasonable request. 274

Results 275
Frames acquired using the Picamera system are temporally more stable compared to 276 a commercial camera 277 Performance of the Picamera system was benchmarked against a commercial camera 278 obtained as a part of the Neuralynx Cube-64 wireless recording system. Figure 2a shows the 279 fraction of frames showing deviations from the expected inter-frame interval (IFI) for a video 280 recorded for 4 hrs by the commercial camera and the Picamera in the same session. We 281 defined jitter as the range of deviation from the expected IFI. The Picamera jitter of ± 0.025 282 ms was lower than the ± 7 ms jitter of the commercial camera. Thus, the Picamera system 283 shows higher temporal accuracy over a long recording session compared to the commercial 284 camera, giving two orders of magnitude improvement in jitter. 285 Similar IFI stability with ± 0.025 ms jitter was measured for videos recorded on eight 286 Picamera sub-units simultaneously. The cumulative distribution plot for deviation from the 287 expected IFI shows a steep increase at 0 ms as expected (Figure 2b). For all our recordings, 288 all the frames across cameras were within ± 0.025 ms with > 98% of the frames lying within 289 the ± 0.002 ms. The IFI distribution of the Picamera is statistically significantly smaller than 290 that of the commercial camera (Two-sample F-test for equal variances, σ 2 (commercial camera) = 291 3.1 ms, σ 2 (Picamera) = 7.6 x 10 -7 ms; f = 2.73 x 10 6 , df(commercial camera) = 79909, df(Picamera) = 95891, 292 p < 0.0001). This temporal stability of the Picamera system can help accurately align frames 293 across cameras with one another as well as the neural data and thus facilitate analysis of 294 the neural and behavioral data at a higher temporal precision. 295 Since the data acquisition system uses a frame grabber to record video streaming from a 296 camera, we tested whether webcams show a better temporal accuracy, as some behavior 297 monitoring systems use webcams. We recorded videos using Logitech C170 USB webcam 298 (Logitech, Lausanne, Switzerland) at 25 Hz. The webcam dropped an average of 8.26% of 299 the frames giving us an extremely variable IFI (jitter = -28 ms to + 88 ms). Thus, webcams 300 may not be ideal for use with neurophysiology systems. 301

Reduced jitter improves estimate of neural correlates of behavior 302
We tested if the camera jitter affects our assessment of neural correlates of behavior by Spatial firing rate maps for units with significant spatial information score (spatial information 310 > 0.25 bits/spike, p < 0.01 using rat position estimates from at least one of the two cameras) 311 (n = 42) were generated for rat positions estimated from both the cameras used. Figure 3a  312 shows firing rate maps for different cell types (two place cells, and one putative grid cell) 313 recorded using the commercial camera and the Picamera. The place fields of spatially 314 responsive neurons showed better tuning in the Picamera data than the commercial camera 315 data. At the population level, place field size was significantly smaller for the Picamera data 316 as opposed to the commercial camera data (Wilcoxon signed rank test, z = -3.88, p = 317 0.0001) presumably due to the greater temporal accuracy of the Picamera in positioning the 318 animal ( Figure 3b). These results motivated us to look for differences in spatial information 319 and peak firing rate. As expected, peak firing rate (Wilcoxon signed rank test, z = 3.14, p = 320 0.0017), and spatial information (Wilcoxon signed rank test, z = 3.86, p = 0.00011) for the 321 Picamera was significantly greater than the commercial camera, after Holm-Bonferroni 322 correction for multiple comparisons (Holm, 1979). 323 We asked whether higher jitter in frame timestamp can lead to a large enough error in 324 assignment of positions to individual spike timestamps to cause degradation of measures of 325 spatial selectivity we compared above. We added jitter to the Picamera IFIs by sampling 326 (with replacement) from the commercial camera IFI distribution, and generated rate maps 327 11 using the jittered frame time estimate. There were significant reductions in spatial 328 information (Wilcoxon signed rank test, z = 3.29, p = 0.001) and peak firing rate (Wilcoxon 329 signed rank test, z = 2.99, p = 0.0028), and increase in place field size (Wilcoxon signed 330 rank test, z = -3.13, p = 0.0018) from the Picamera to the jittered Picamera (Figure 3c). 331 These results are consistent with the suggestion that higher temporal accuracy in the 332 Picamera frame timestamps led to more accurate instantaneous position assignment and 333 therefore improved measures of spatial selectivity. 334 We also tested whether the marginally higher frame rate of the Picamera The commercial camera has ~ 40% of the frames outside 1 ms deviation from the expected 353 IFI. Using multiple such cameras with their uncorrelated noise in frame timestamps can lead 354 to higher chances of temporal misalignment of frames across cameras. This increased 355 misalignment across cameras could worsen the estimates of neural correlates of behavior 356 even more than that seen in the single camera case. The low jitter in IFIs of the Picamera 357 sub-units discussed in previous sections predicted that their consecutive frames would be 358 temporally closely aligned with frames from other Picamera sub-units, provided all the 359 Picamera sub-units started recording videos nearly simultaneously. To test this prediction, 360 we looked at how stable the inter camera frame interval stayed over the duration of the 4 hr 361 recording session across 8 Picamera sub-units (Figure 5a). In multiple recording sessions, 362 the starting times of the 8 cameras are within 0.5 ms of the first camera. Given the low IFI 363 jitter across cameras, in multiple sessions, the consecutive frames from the 8 Picamera sub-364 units do not differ from that of the camera with shortest starting lag by more than 0.5 ms. The 365 example in Figure 5a shows an across-camera frame time difference of less than 0.05 ms 366 for all frames recorded over a 4 hr session. The extremely low inter camera frame timing 367 difference shows that the entire system remained in sync during the recording session, 368 facilitating temporal alignment of video frames across cameras at a sub-millisecond 369 accuracy. 370 Using a video stitching algorithm, frames from individual Picamera sub-units were aligned in 371 a single coordinate system which represented the entire maze. Figure 5b shows the 372 overlapping regions across cameras after aligning them. The aligned frames were then 373 checked for any distortions which could have been introduced because of the stitching 374 algorithm, using an estimate independent of the calibration frames used for stitching. A grid 375 was formed by stretching multiple strings across the length and breadth of the behavior 376 arena which gave multiple intersection points between strings running orthogonally for each 377 camera. When an intersection point is visible on multiple cameras, perfect alignment across 378 cameras should place this intersection point at exactly the same x and y co-ordinates on the 379 aligned frames of all the cameras. Thus, the difference in estimates from different cameras 380 of positions of the shared intersection points provides a measure for accuracy of spatial 381 location alignment using our stitching algorithm. We calculated the projection error for each 382 camera, defined as the distance between an intersection point for that camera and its 383 corresponding position across all cameras with overlapping fields of view. The maximum 384 projection error for single camera with respect to others came out to be 1.54 cm and median 385 error across cameras was 0.63 cm (Figure 5c). Thus, the stitched image has a spatial jitter 386 of less than the pixel size (4 cm) used for creating firing rate maps of neurons when rats 387 foraged in the large room. 388 Tracking spatial selectivity of neurons from the hippocampal formation in a large 389 room 390 Multiple Picamera sub-units were used to track position of a rat foraging in the large room. 391 Videos recorded from each camera were aligned in a single coordinate system and the rat's 392 position was calculated after averaging position from transformed frames across cameras. 393 Spatial firing rate maps were generated for neurons active during the behavior. effect of spatial scale. The second hypothesis predicts that as the spatial scale increases 432 beyond the scales used in the 2D large space studies, the distribution of the number of place 433 fields/cell will be skewed further, rather than shifted rightwards. Distinguishing between 434 these two possibilities requires recording hippocampal activity from rats foraging in 435 substantially larger spaces like the 16.5 m 2 space we recorded from in this paper. 436 Recordings from larger spaces will also enable the experimenter to address a number of 437 other questions relevant to our understanding of spatial representation at biologically realistic 438 scales. For example, is the largest grid spacing limited to 1. The availability of wireless recording systems now facilitates recording of neural activity in 442 large spaces. However, the commercially available extracellular electrophysiology systems 443 still face limitations in terms of the number of cameras (usually one or two) used for tracking 444 rats which constrains our ability to accurately track them in large environments. In this 445 paper, we described a system for tracking rat behavior in large spaces. This Picamera 446 system is temporally more accurate than a commercially available system used for 447 benchmarking in this paper. It is easily scalable (due to its parallel architecture, adding more 448 Raspberry Pi cameras to cover arbitrarily large areas is trivial) and can be adapted for use 449 with complex environments with multiple occlusions. The Picamera system is cost effective 450 because of the use of low-cost, off the shelf, easily available components (Prices from 451 www.element14.com: Raspberry Pi 2 model B + Raspberry Pi camera module v1: $35 + $27 452 per unit, Arduino: $10) and makes the collection of high-quality, temporally accurate 453 behavior datasets in large spaces feasible. Open-source libraries used in developing the 454 code -OpenCV (used in stitching and position tracking) and Picamera Python library (used 455 in acquiring videos and timestamps data) -made it possible to customize the code, and, in 456 turn, allowed us to record videos at sub millisecond temporal accuracy. 457 Improvement in spatial and temporal accuracy of a tracking system is expected to lead to 458 reduction in noise of estimation of behavioral variables like instantaneous position and head 459 direction. This improved accuracy in tracking behavioral variables should lead to reduction in 460 noise introduced by the tracking system in our estimates of spatial selectivity and head 461 direction tuning of neurons. Predictably, the rate maps generated using the Picamera had 462 significantly smaller place field size, increased peak firing rate and spatial information 463 content as compared to a commercial system with higher temporal jitter. Similarly, the 464 Picamera showed sharper head direction tuning as well as tighter theta phase precession 465 was acquired on each Picamera sub-unit. These features were then used for calculating 742 registration data for each camera. 743 Error in stitching