Value of BI-RADS 3 Audits

Introduction

13 min readAug 28, 2024

Screening mammography is a vital element of breast cancer detection that has helped to reduce disease mortality [1–4]. With the current screening strategy, yearly cancer detection rate in the US is approximately five per 1000 screens and fewer than 2% of screens prove suspicious and require biopsy [5–7]. In an effort to improve specificity, decrease cost, and reduce harm the American College of Radiology (ACR) established the Breast Imaging Reporting and Data System (BI-RADS) category 3 — probably benign designation to be used for short-term surveillance instead of immediate biopsy [8–10]. The morphological criteria for BI-RADS 3 include a solitary circumscribed mass with a solid ultrasound (US) correlate, focal asymmetry without an US correlate, and grouped, round calcifications [8,9,11]. Typically the designation of BI-RADS 3 is made after an initial diagnostic work-up and should not be assigned on a screening mammogram. The assignment of BI-RADS 3 activates a short-term (6-, 12-, and 24-months) follow-up protocol which has been demonstrated to reduce false-positive findings at biopsy, while also retaining a high sensitivity for earlystage breast cancer [9].

The designation of BI-RADS 3 is meant to indicate that a finding has a 2% or less risk of malignancy [8] and a recent retrospective report of 45,202 BI-RADS 3 cases from the National Mammography Database suggests that this expectation is concordant with reality [12]. However, institution-level evidence still suggest that in practice 0.9–7.9% of BI-RADS 3 lesions are upgraded to BI-RADS 4 and sent for biopsy [9,13–15]. Additionally, as the BI-RADS 3 designation is afforded some flexibility there is an appreciable amount of interobserver variability within each modality [16- 18]. As a result, monitoring adherence to imaging criteria can be challenging and there are relatively few established benchmarks for auditing BI-RADS 3 assignment. Herein, we share BI-RADS 3 audit results from our own institution over a four-year period and propose discrete auditing criteria that may help to establish performance benchmarks. We introduce the following metrics while on surveillance and which may serve as useful benchmarks:

(i) Percentage of initial BI-RADS 3 to total screens
(ii) Percentage of initial BI-RADS 3 to screen-recalled cases (BIRADS 0)
(iii) BI-RADS 3 upgrade rates within 24 months
(iv) Positive predictive value (PPV3) of lesions biopsied within 24 months
(v) Distribution of imaging morphology assigned a BI-RADS 3 category
(vi) Cancer yield.

Materials and Methods

Our institute is a large tertiary academic medical center (a NAPBC accredited and a breast imaging center of excellence by the ACR) in the northeast United States with an effective catchment area of nearly 1 million individuals. This retrospective study was approved by the Institutional Review Board (IRB) and is compliant with the Health Insurance Portability and Accountability Act. Information regarding the annual number of screening mammograms and the specific number of BI-RADS 0, and BI-RADS 3 cases were obtained from the Radiology Information System (RIS). All relevant BIRADS 3 Medical Record Numbers (MRNs) were identified with the assistance of the institute’s translational science core. All cases were reviewed in the electronic medical records at our institution. All data was extracted and compiled in Red Cap [19] by study personnel. Efforts were taken to standardize the data extraction process and to minimize inter-observer variability. A sample of ten records was collaboratively reviewed by all study personnel to standardize the data extraction and compiling of records from radiologist’s interpretation. Subsequently, the data were extracted from the remaining charts independently by four study personnel.

Subjects

The study included all women over 40 years of age recalled (BI-RADS 0) from screening and assigned BI-RADS 3 at a followup diagnostic evaluation from January 2014 through December 2017 at our institution. Our inclusion criteria were women who were assigned BI-RADS 0 on initial screening exam, and, assigned BI-RADS 3 from a diagnostic follow-up exam performed within 90 days of the screening exam, and, had at least one follow-up visit in the subsequent 24-month period. Exclusion criteria were women under 40 years of age at the date of their initial screening exam, or, BI-RADS 3 assessment following diagnostic assessment in a symptomatic patient, or, the follow-up diagnostic evaluation from a screening mammogram exceeded the 90-day time limit, or, did not have one or more evaluations in the 2-year follow-up period. The study was limited to mammographic and ultrasound evaluations only. All of the digital mammograms were performed at our multiple clinical sites on Hologic (Bedford, MA) Selenia® or Selenia® Dimensions™ units. Both full-field digital mammograms (2D) and Digital Breast Tomosynthesis (DBT) techniques [20] are employed at the time of the screening examinations. There are no clearly defined criteria with regards to who is offered a 2D mammogram and who is offered a DBT study.

All breast ultrasounds were performed on a Phillips (Bothell, WA) iU-22 unit by a dedicated breast sonographer, and when necessary, the radiologist will also personally scan the patient. At our institute BI-RADS 3 cases are evaluated at 6 months (ipsilateral breast), 12 months (bilateral) and 24 months (bilateral). At each time point, supplemental ultrasound as indicated was also performed. The data abstracted from the chart included the patient age at time of BI-RADS 3 designation as well as if the preceding BIRADS 0 mammogram was their baseline. We also recorded whether the BI-RADS 3 designation was made via diagnostic mammogram, or ultrasound, or both. The radiologist who assigned the BI-RADS 3 designation, the breast density category (A-D), the quadrantbased location, and the morphology of the BI-RADS 3 finding from mammography and ultrasound were recorded. The presence of follow-up imaging at 6, 12, 24 months was recorded and was used to calculate the follow-up rate. If a patient was deemed to be lossto- follow up at 24 months, the last known finding was recorded. If a biopsy was completed, the duration (months) after BI-RADS 3 assignment, modality used image guidance, and the histopathologic findings from the biopsied specimen were all captured.

Statistical Methods

The quantitative measures in this study are all reported as proportions/percentages. The Clopper-Pearson exact 95% confidence interval was computed. One sample tests of proportions were used to determine if the quantitative metrics differed from values reported in literature. All tests were two-tailed. Effects associated with p<0.05 were considered statistically significant. All analyses were conducted using statistical software (SAS version 9.4, SAS Institute, Inc., Cary, NC).

Results

Demographics

A total of 135,765 screening exams were performed during the four-year period from which 13,453 were recalled (Figure 1). A total of 1,360 women were assigned BI-RADS 3 of which 1,037 women met the study eligibility criteria during the four-year period. There were 24 unique radiologists who assigned BI-RADS 3 category during the study period. Eight out of the 24 radiologists were fellowship-trained in breast imaging and each of these eight radiologists assigned 50 or more BI-RADS 3 studies and accounted for 93% (n=969) of all included BI-RADS 3 cases. The mean age at time of initial BI-RADS 3 assignment was 56.6 ± 11.1 years with range of 40–94 years (Table 1). For 165 (15.9%) women, the BIRADS 0 mammogram that preceded their BI-RADS 3 assignment was the patient’s first mammogram. In terms of breast density, nearly half (49.6%, n=514) of all of the breasts studied were category B, followed by 37.1% (n=385) in category C, 8.29% (n=86) in category A, and 4.82% (n=50) in category D.

Figure 1: Flowchart describing the assignment and follow-up of probably benign findings and the associated quantitative metrics for clinical practice management.

Table 1: Patient demographics, prior mammograms, and breast density of BIRADS 3 patients (n =1,037).

BI-RADS 3 Features: Morphology, Laterality and Location

Nearly all (95.9%, n=994) of the BI-RADS 3 cases were assigned BI-RADS 3 on either mammogram/DBT alone, or mammogram/ DBT with ultrasound. The remainder (3.95%, n=41) of cases were assigned BI-RADS 3 on ultrasound (Table 2). The imaging morphology breakdown of the 1037 cases were asymmetry/ architectural distortion (n=512, 49%), grouped calcifications (n=398, 38%), and non-calcified circumscribed mass (n=90, 9%). The remaining 37 BI-RADS 3 cases (4%) were called at the discretion of the radiologist and the electronic records did not document the classic descriptors for a BI-RADS 3 assessment. The assignment of BI-RADS 3 lesions was relatively even with 49.8% (n=516) in the left breast, 44.6% (n=462) in the right breast, and 5.70% (n=59) of cases bilaterally. The upper outer quadrant had the greatest number of lesions in both the right (n=232, 38.0%) and the left (n=195, 35.3%) breasts, followed by the subareolar/central region in the right (n=140, 22.9%) and left (n=115, 20.8%) breasts.

Table 2: Imaging characteristics including modality that resulted in BIRADS 3, lesion location and lesion morphology, and followup.

Follow-up of BI-RADS 3 Lesions

The follow-up rate at 6 months was 97.1% (1,007/1,037) and decreased progressively to 95.8% (979/1,022) at 12 months and 86.6% (876/1,011) at 24 months (Table 2). The denominator is adjusted for lesion downgrade due to benign pathology from biopsy at prior follow-up. Among the 1,037 BI-RADS 3 patients, 7.4% (n=77) of all the cases underwent biopsy, of which n=23, n=40 and n=14 cases were biopsied at 6 months, 12 months and 18–24 months, respectively. A majority of the biopsies (n=47, 61%) of the biopsies were performed under ultrasound guidance and the remainder (n=30, 39%) using stereotactic mammography. The distribution of biopsies at different follow-up periods was as follows: 23/77 (30%) at 6 months, 40/77 (52%) at 12 months, and 14/77 (18%) were performed between 18–24 months.

Table 3: Quantitative benchmarks for clinical practice management.

Quantitative Benchmarks

The quantitative benchmarks suggested for routine clinical practice management are summarized in Table 3. The percentage of initial BI-RADS 3 to total screens was 0.76% (1,037/135,765) and the percentage of initial BI-RADS 3 to screen-recalled cases (BIRADS 0) was 7.7% (1,037/13,453). Within the 24-month follow-up period, the BI-RADS 3 upgrade rate was 7.4% (77/1,037). Among the 77 lesions biopsied within 24 months following BI-RADS 3 assignment, there were 26 malignancies, resulting in positive predictive value (PPV3) of 33.8% (26/77). Among the 26 cancers, 62% (n=16) were biopsied under ultrasound guidance, while 38% (n=10) were biopsied under stereotactic mammography. The cancer yield within the 24-month follow-up period was 2.51% (26/1,037). Among these 26 cancers, 30.8% (8/26) were detected at 6 months, 57.7% (15/26) at 12 months and 11.5% (3/26) at 18–24 months. The most frequently identified cancer type was ductal carcinoma in situ (DCIS) with 46% (12/26) of the cases. This was followed by invasive ductal carcinoma (IDC) at 42% (n=11) and invasive lobular carcinoma (ILC) at 12% (n=3).

Discussion

The purpose of introducing the BI-RADS 3 categorization in the BI-RADS atlas [8] was to reduce the harms of screening by decreasing the number of false positives biopsies, reducing the cost of health care and yet maintaining sensitivity for early detection of breast cancers. Although the BI-RADS atlas specifies the probability of cancer in this subset as 2% or less, there has been no established routine audit in recent times for various clinical practice settings [17,21]. We therefore conducted a retrospective review of our own data as a quality assurance project to better guide clinical practice management. In our study over a 4-year period of 1,037 BI-RADS 3 cases following an inconclusive (BI-RADS 0) screening mammogram, the cancer yield was 2.5% (n=26) during the 2-year surveillance period. The observed cancer yield was not statistically different (p=0.243) from the 2% probability of malignancy as described in the BI-RADS atlas. Our cancer yield did not significantly differ with the 1.86% cancer yield reported by Berg, et al. [12] (p=0.123) but was significantly higher than the 1.47% reported by Micheals, et al. [21] (p=0.006), the 1.02% reported by Lehman, et al. [22] (p<0.001), and the 0.8% reported by Baum, et al. [23] (p<0.001).

Among the 26 cancers detected within the 2-year follow-up period, 8/26 (30.8%) were detected within the first 6 months and supports the value of the short-term (6 months) follow-up. The ratio in our series was different from Berg, et al. [12], where 58% cancers were identified at 6 months (p=0.005). During the first 12 months of follow-up, 23/26 (89%) cancers were detected and is comparable to the 73% reported by Chung, et al. [24] (p=0.076). In keeping with multiple prior studies [11,12,21] most of our cancers were DCIS 12/26 (46%). There were 11/26 (42%) invasive ductal carcinomas and 3/26 (12%) invasive lobular carcinomas in our series. The invasive cancers were early-stage cancers. In our study, during the 2-year surveillance, 77/1,037 (7.4%) cases were upgraded to BIRADS 4/5 and were biopsied. This rate was higher than the 5.9% reported by Michaels, et al. [21] (p=0.037) and 0.88% reported by Vizcaino, et al. [15] (p<0.001). The positive predictive value (PPV3) in our series was 26/77 (34%), which is larger than the 16.6% in Berg, et al. [12] (p<0.001) and comparable to the 25% in Michaels, et al. [21] (p=0.076). In our study, the proportion of BI-RADS 3 to the number of recalls (BI-RADS 0) was 10.1% (1,360/13,453) among all women and 7.7% (1,037/13,453) among study eligible women. In our literature search on PubMed, we could not identify any publication that reported on the use of this metric. We suggest including this metric as part of routine audits for clinical practice management.

To establish a benchmark across different practice settings, there is need for sharing recent data from varied clinical settings (academic and private, dedicated and non-dedicated breast imaging practices). The above referred indices could serve as a useful benchmark of a practice’s quality assurance. Age, ethnicity, lack of transport, education, and cost of care all result in disparities and barriers that contribute to a poor follow-up. Poor compliance to follow-up would directly impact the cancer yield in BIRADS-3 cases. While the literature [12,21,23,24] describes loss to followup as a major concern, in our series the follow-up rates were good with 97% at 6 months, 94% at 12 months and 84% at 24 months. In Michaels, et al. [21] the compliance for follow-up progressively declined from 83% at 6 months to 54% at 24 months. In Baum, et al. [23], the studied cohort only had a 71% compliance with follow-up. The current edition of BI-RADS atlas clearly discourages assignment of BI-RADS 3 from a screening examination without a complete diagnostic workup. However, prior literature did not make that clear distinction [12]. The BI-RADS atlas clearly outlines the morphology criteria for assignment of BI-RADS 3 under mammogram, ultrasound and MRI; however, it also mentions that the radiologist’s experience and discretion could determine the assignment.

The distribution of the different morphologies contributing to a BIRADS-3 assignment in our study was asymmetry/focal asymmetry/architectural distortion was 49% (512/1,037), microcalcifications 38% (398/1,037), non-calcified circumscribed mass on mammogram or ultrasound or both was 9% (90/1,037) and 4% (37/1,037) of the assignments were at the discretion of the interpreting radiologist without one of the above descriptors in the report. In most studies [13,14,15,21] calcifications accounted for greater than 50% of the BI-RADS 3 assignments, except in Varas, et al. [14], where calcifications accounted only for 19% of the BI-RADS 3 assignment. Institutional policies, reader variability and access to care may be contributing to these differences. Also, radiologist’s experience and fellowship-training may influence interpretation [18]. Dedicated fellowship-trained breast imagers and general radiologists performing breast imaging are known to differ in their evaluation and assessment of breast lesions [17,18]. Literature also mentions of varying cancer yields depending on whether dedicated breast imagers or general radiologists interpret breast exams [2,18,21]. The majority of our BIRADS 3 cases at our facility were reviewed by dedicated fellowship-trained breast imagers. Another factor contributing to variability that has been recently reported is the patient’s age with cancer yield exceeding 2% for women older than 60 years of age [25].

Also, after the introduction of DBT, there is literature indicating better visualization of architectural distortion, some of which lack an ultrasound correlate [26]. During the early stages of DBT adoption in clinical practice, there was lack of a DBT-guided biopsy device and hence consensus among the radiologists on the management of these lesions. Further, there is also variability among radiologists [16] in terms of lesion descriptors that could contribute to variability in assigning BI-RADS 3 category. Ambinder et al [18], refers to the decreasing incidence of BI-RADS 3 post-DBT implementation. All of these factors contribute to inter-reader and inter-facility variability and have resulted in wide variability across practices in the assignment of BIRADS 3 as a percentage of the total screens. We feel that larger data set from across the country may help us define some benchmarks necessitating practices to review their policies should there be large variances from established benchmarks.

Limitations

Our study had limitations. The study was retrospective in nature. Only mammographic and ultrasound features were considered. Prior to mid-2016 when we acquired the capability to perform tomosynthesis guided biopsies, architectural distortion without an ultrasound correlate were assigned BIRADS 3 at our institute. On review of our records, architectural distortion and asymmetry, though distinct morphologies, were sometimes used interchangeably in the report. Hence, we merged the two categories for analysis rather than attempt to distinguish them. We did not specifically account for downgrades to BIRADS 1 and 2 during follow-up, which is likely a very small proportion, since a majority of our breast imagers continue to follow up cases assigned a BIRADS-3 for the entire 24-month surveillance.

Conclusion

Audit of BIRADS 3 metrics has the potential to provide additional insights for clinical practice management. Many of the criteria referred to in this paper (cancer yield, BI-RADS 3 as a percentage of screens, as a percentage of BI-RADS 0, distribution of the morphology of BI-RADS3 assignments, upgrade rates, positive biopsy rates) may serve a useful role in monitoring clinical practice and for establishing the optimal range for the appropriate use of the BI-RADS 3 category. Larger data sets from varied clinical settings, with inputs from an expert committee could help establish benchmarks for these metrics.