Blog: Quality Assessment of EYE-SYNC’s Eye Movement Data

(Valid) Data. My precious…

Several years ago, I was fortunate to be involved in an exciting large-scale eye tracking study. The team had spent countless hours recruiting subjects, conducting surveys, and gathering data. We had waited eagerly for several months to take a peek at the data, but the data czars had wisely cautioned us against it until the trials were complete. The study was now finally coming to an end and the data was made available for analysis. And just when you thought that nothing could make a dreary New York winter worse, someone exclaimed “A bunch of this eye tracking data is unusable!”.

How could something like this happen? Casting aside modesty, we knew that we were experts at collecting eye movement data and had done so on many other projects. We had the best equipment; our administrators were outstanding and been trained ad nauseum; the stack of papers lying in the corner showed that we had spared no efforts in writing (and printing) instruction manuals; patches of coffee stains on the manuals suggested that someone had even attempted to read them. So, we asked again: “How could something like this happen?”

The team painstaking analyzed all the eye tracking records to get to the bottom of this mystery. The diagnosis was grim; around twenty percent of the files had less-than-ideal data quality and was unusable: some subjects had not been administered a critical ‘calibration’ step while others had very poor calibration data; despite being instructed to keep still, some subjects had unwittingly moved their head during the eye tracking test resulting in data drifts; other files had severe data loss. Sure, an on-site eye tracking expert could have potentially identified these errors immediately during data collection and recommend additional testing. But, even in hindsight this was not a practical solution and neither was our expectation that the test administrators would somehow be aware of every single source of error – especially since they had several other responsibilities.

C’est la vie.There was not much we could do except to look to the heavens and yell – as a wise man once did – “Khaaaan”.

Anything that can go wrong will go wrong

Clearly, collecting reliable, high quality eye tracking data is not a trivial task. At SyncThink, I am involved in the development of objective quality assessment algorithms that continuously monitor the quality of the eye movement data in EYE-SYNC’s platform. EYE-SYNC’s visual tracking metrics rely on precise measurements of the position of a target displayed on the screen and the subject’s corresponding gaze position. To get a better idea of the sources of errors that impact data quality, it might be useful to have a general understanding of how eye trackers work. Broadly speaking, our eye trackers use an infrared camera to capture images of the subject’s eyes. Computer vision algorithms are then used to detect specific features in these images. Changes in these features as the subject moves their eyes are then used to infer the subject’s gaze position. As one might suspect, there are multiple points of failure that can result in an erroneous measurements of the subject’s gaze, which in turn can impact EYE-SYNC’s metrics:

  1. Poor device alignment: The eye tracker’s cameras need an unoccluded view of the subject’s eyes in order to extract relevant features. EYE-SYNC provides a live video stream of the eye tracker’s cameras using which the administrator can instruct the subject to either lower or raise the headset and obtain ideal alignment. We intend to automate this step in the future by providing audio-visual cues to the subject when the device alignment is sub-optimal.
  2. Field-of-view errors: Another alignment-related error occurs in subjects with larger-than-average interpupillary distances. In these cases, the subject’s eyes fall outside the field-of-view of the eye tracker’s cameras along the horizontal axis. The errors in the eye movement data resulting from this issue manifest as high-frequency oscillations, which can be detected and filtered using signal processing techniques.
  3. Poor calibration: Eye tracking typically begins with a ‘calibration’ step during which features extracted from the eye images are mapped to screen coordinates. Subjects are required to maintain stable gaze during this stage; failure to do so results in inaccurate measurements of the subject’s gaze. EYE-SYNC augments the eye tracker’s pre-built calibration routine with a novel, robust re-calibration algorithm that produces reliable data for our application.
  4. Data loss: As one might expect, there is no relevant gaze data to capture during a blink. While the loss of data as a result of normal blinking (around 10-15 blinks a minute) is acceptable, some subjects tend to blink faster or for longer periods in the virtual reality environment. EYE-SYNC automatically detects records with high data loss and warns the administrator if there is insufficient amount of data.
  5. Other structural distortions: Despite careful device alignment and calibration, partial occlusions of the pupil sometimes result in specific, localized distortions in the eye movement data. EYE-SYNC’s quality-evaluation algorithms automatically detect these distortions and flag the data as potentially invalid.
  6. Inconsistent stimulus presentation: The errors listed above are related to the gaze data obtained from the eye tracker. Since EYE-SYNC’s metrics are based on relative error between the gaze and the target, it is important to ensure that the display of target is free of errors as well. Overheating of the display device can result in a jittery stimulus presentation. EYE-SYNC uses the device’s temperature sensors to warn the administrator of overheating. Significant deviations from the ideal target path are detected algorithmically as well.

The Good, the Fair, and the Poor

The technical details of EYE-SYNC’s quality assessment algorithm are beyond the scope of this blog post.

But, it is relevant to note that since not all errors listed above have a similar impact on EYE-SYNC’s metrics, a domain-specific decision tree is used to summarize the final quality using the following color-coded quality label:

  • Good (Green): The eye movement data is free of artifacts. The reported score is valid.
  • Fair (Orange): The eye movement data has some artifacts. A score was computed, but the administrator is strongly advised to repeat the test.
  • Poor (Red): The eye movement data has fatal flaws. The score is invalid and a retest is needed.

Our quality assessment algorithms are improving drastically as we collect more data. While the current quality labels are helpful, future releases of EYE-SYNC will provide administrators with additional information on the types of errors that were detected and suggestions for improving the quality of data capture. There is no doubt that a well-trained, on-site administrator is invaluable in collecting reliable data. It is our hope that administrators find the automatic quality assessment to be a useful tool in their data collection pipeline.