Understanding Reader Variability: A 25-Radiologist Study on Liver Metastasis Detection at CT
Hsieh SS, Cook DA, Inoue A, Gong H, Sudhir Pillai P, Johnson MP, Leng S, Yu L, Fidler JL, Holmes DR 3rd, Carter RE, McCollough CH, Fletcher JG
Radiology 2023 Feb;306(2):e220266. doi: 10.1148/radiol.220266.
Interreader variability in radiology exists even for routine tasks, such as detection of hepatic metastases at CT. Various factors may contribute to this variability, including reader experience, sub-specialization and patterns of image navigation (1, 2). Lesion characteristics might also affect interreader variability, as they are heterogenous in nature (3). Sources of error in lesion detection can generally be categorized as either visual search or cognitive classification error, i.e. the lesion is missed because the eye never fixates it or because it is not reported after eye fixation (4). These errors can be differentiated by measuring gaze time with eye tracking. While interreader variability can undermine patient care by misdiagnosis, only marginal efforts have been made to address these performance differences among radiologists.
In this prospective observational study, the authors aimed to determine the impact of (a) reader experience, (b) image navigation patterns, and (c) eye gaze time on interreader variability of missed liver metastases in contrast-enhanced abdominal CT.
Twenty-five radiologists from a single academic center were recruited to screen 40 abdominal, portal venous CT studies for hepatic metastases. Raters had different experience and training levels, consisting of 9 abdominal subspecialists, 5 non-abdominal subspecialists and 11 trainees. They reviewed each study on a custom viewing workstation with an integrated eye tracker, while data on image navigation was collected (interpretation time, time in liver window and zoomed images, number of scrolls in axial and coronal stack). Gaze time near metastases was calculated, as well as the number of times a reader gazed at the same metastasis in both axial and coronal plane (correlated view). Blinded readers circumscribed the suspected hepatic metastases and scored their confidence from 0 to 100. Confidence scores of 0 indicated a benign lesion and would be disregarded in later analysis. Thirty-two studies contained 91 hepatic metastases proven by histopathology or progression; the other eight had no metastases.
Diagnostic performance was measured by area under the jackknife alternative free-response receiver operating characteristic (JAFROC) curve and per-metastasis sensitivity.
When comparing reader groups, abdominal specialists showed a significantly higher JAFROC performance as they had greater mean confidence in true-positive circumscriptions. However, there was no significant difference in sensitivity or false-positive findings among groups.
With regards to image navigation patterns, there were significant associations between sensitivity and the following six navigation variables: Interpretation time, time in liver windows, time spent gazing at coronal images, coronal scrolls, number of circumscriptions, and correlating views between axial and coronal images. Interestingly however, the number of false-positive findings also increased with these navigation variables, although not significantly. Therefore, navigation variables were not associated with improved JAFROC.
Eye-tracking data showed that of 377 missed metastases, 71%, 40% and 7 % occurred after a gaze time longer than 0.5, 2.0 and 10.0 seconds, respectively. Prior studies suggest that gaze times of less than 0.5 second represent search errors, and gaze times longer than 2 seconds represent classification errors.
The present study is of interest as it demonstrates the impact of various factors on interreader variability, which might affect our everyday performance. Certain image navigation patterns are associated with higher sensitivity. The majority of missed metastases received at least a brief eye gaze, indicating that visual search was not the dominant source of error.
Limitations of this study include the finite accuracy of the eye tracker, the selective investigation of only difficult-to-detect hepatic metastasis and the observational nature of the study. Therefore, the present findings may not be transferable to other tasks in routine clinical practice and may not fully address the problem of interreader variability as a whole.
Nevertheless, the authors offer novel and comprehensive insights in the challenging field of interreader variability, raising its awareness in abdominal imaging. Future work can integrate these findings into a more effective training program for radiologists. Furthermore, AI algorithms could be developed to reduce search errors for subtle lesions and classification errors for unclear liver lesions.
References:
1. Tsurusaki M, Numoto I, Oda T, et al.Assessment of liver metastases using CT and MRI scans in patients with pancreatic ductal adenocarcinoma: Effects of observer experience on diagnostic accuracy. Cancers. 2020;12(6):1455.
2. Marin D, Catalano C, De Filippis G, et al.Detection of hepatocellular carcinoma in patients with cirrhosis: added value of coronal reformations from isotropic voxels with 64-MDCT. American Journal of Roentgenology. 2009;192(1):180-7.
3. Pillai PS, Hsieh S, Holmes III D, et al. Individualized and generalized learner models for predicting missed hepatic metastases. Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment, 2022. SPIE.
4. Brunyé TT, Drew T, Weaver DL, Elmore JG. A review of eye tracking for understanding and improving diagnostic interpretation. Cognitive research: principles and implications. 2019;4:1-16.
Falko Ensle is a fifth-year radiology resident at the University Hospital Zurich. His interests in imaging cover a wide spectrum with a current focus on the musculoskeletal system, where he is also engaged in research projects.
Comments may be sent to: falko.ensle(at)usz.ch