World Journal of Endocrine Surgery
Volume 13 | Issue 3 | Year 2021

A 3-year Comparative Audit of Thyroid Nodule Ultrasound and Cytology Using TIRADS and Bethesda Scoring Systems

Reyaz M Singaporewalla1https://orcid.org/0000-0003-4145-6488, Bryan Wei S Seet2, Anil D Rao3, Venkateswaran Kotamma4

1,3Endocrine Surgical Service, Department of Surgery, Khoo Teck Puat Hospital, Singapore

2Department of Medicine, Lee Kong Chian School of Medicine, Singapore

4Department of Pathology, Khoo Teck Puat Hospital, Singapore

Corresponding Author: Reyaz M Singaporewalla, Endocrine Surgical Service, Department of Surgery, Khoo Teck Puat Hospital, Singapore, e-mail: reyazm@yahoo.com


Introduction: Thyroid imaging reporting and data system (TIRADS) scoring is gaining popularity around the world among clinicians and radiologists. With any new practice, it is important to audit the results among different specialists and check concordance with the gold standard. We compared TIRADS scoring and thyroid nodule cytology data of two different surgeons to determine concordance and accuracy.

Materials and methods: A retrospective analysis of records of patients with thyroid nodules managed under two specialist surgeons from 2016–2018 was performed comparing surgeon performed ultrasound (US) TIRADS grading with the Bethesda cytology classification. The TIRADS 2 and 3 lesions were corelated to Bethesda II (benign) results and TIRADS 5 was corelated to Bethesda V and VI (malignant) category. Data was also compared with our previously published audit in 2015.

Results: A total of 254 thyroid nodules over a 3-year period, 208 cases (82%) were reported as benign disease and correlated to the TIRADS 2 and 3 grade given by the surgeon. Five were reported malignant which matched the TIRADS 5 score. Overall concordance rate for surgeon 1 and 2 were 79.7% and 87.2%, respectively. For each separate category, TIRADS 5 had the highest concordance at 100%, followed by TIRADS 3 at 91.9% and TIRADS 2 at 91.7%. None of the TIRADS 2 lesions were malignant. Sensitivity rates were similar at 71.1% compared to 70.6% in the previous audit.

Conclusion: With proper training in TIRADS, we can achieve fairly good correlation between benign and malignant thyroid nodules among different surgeons and need for FNAC in TIRADS 2 lesions can be avoided.

Clinical significance: This study shows importance of surgical audit. Good concordance rates can be achieved among different specialists with proper training.

How to cite this article: Singaporewalla RM, Seet BWS, Rao AD, et al. A 3-year Comparative Audit of Thyroid Nodule Ultrasound and Cytology Using TIRADS and Bethesda Scoring Systems. World J Endoc Surg 2021;13(3):75-80.

Source of support: Nil

Conflict of interest: None

Keywords: Audit, Bethesda classification, Concordance, FNAC, Thyroid nodule, Ultrasound TIRADS


The thyroid imaging reporting and data system (TIRADS) classification1 and the Bethesda system2 for reporting thyroid nodule cytology are risk stratification tools used by clinicians and pathologists, respectively in the assessment and characterization of thyroid nodules (Table 1). With the endorsement by the American College of Radiology,3,4 TIRADS is now being used more frequently in radiological reporting. The easy availability of ultrasound (US) machines has allowed more and more thyroid specialists in getting trained and adept in using clinic based thyroid US for diagnosis and guided needle biopsies. Thyroid US has now become an integral part of the evaluation of any patient presenting with a thyroid swelling along with history and physical examination. It is therefore imperative to be well versed and trained in interpreting TIRADS categories and deciding the need for needle biopsies appropriately. The practice of performing blind needle biopsies of palpable thyroid lumps is now no longer recommended in most institutions. With any new practice, it is important to audit the quality of US TIRADS results and compare them with the gold standard–US-guided fine needle aspiration cytology (FNAC).5 The purpose of this audit was to perform a 3-year clinicopathological correlation of data from two different thyroid surgeons and also compare the results with a previously published single surgeon audit performed in the same department in 2015.6

Table 1a: TIRADS and Bethesda System for Reporting Thyroid Cytology – Table 1a: TIRADS classification4
Category Description Risk of malignancy (%)
1 Normal thyroid gland 0.0
2 Benign 0.0
3 Probably benign 1.7
4 4 can be subclassified into 4a, 4b, 4c based on the number of suspicious features 3.3–72.4
4a Low suspicion for malignancy; one suspicious feature 3.3
4b Intermediate suspicion for malignancy; two suspicious features 9.2
4c Moderate concern but not classic for malignancy; three or four suspicious features 44.4–72.4
5 Highly suggestive of malignancy; five suspicious features 87.5

Suspicious features are: Solid component, hypoechogenicity (especially marked hypoechogenicity), microlobulated or irregular margins, microcalcifications, taller than wide shape

Table 1b: The Bethesda System for Reporting Thyroid Cytology2
Bethesda category Diagnostic category Risk of malignancy (%)
I Nondiagnostic or unsatisfactory 5–10
II Benign 0–3
III Atypia of undetermined significance / Follicular lesion of undetermined significance 10–30
IV Follicular neoplasm / Suspicious for follicular neoplasm 25–40
V Suspicious for malignancy 50–75
VI Malignant 97–99


A 3-year retrospective audit between Jan 2016 to Dec 2018 was conducted for all first visit cases of nodular goiters seen in the endocrine surgical outpatient clinic. Data was gathered from patient records under two surgical consultants from an existing database after Institution Review Board (IRB) permission for the audit. All solid and complex thyroid nodules not previously evaluated and subjected to US-guided FNAC in our service with an accompanying Bethesda score were included in the study. All patients underwent surgeon-performed bedside thyroid US and nodules were classified using TIRADS criteria. Both surgeons used the same Ultrasound machine model (General Electric GE Logiq version P5) in the outpatient clinic to record their TIRADS score. Cases suitable for biopsy underwent US-guided FNAC. All cytology request forms had ultrasound findings and surgeon’s TIRADS score recorded so that an objective comparison of the pre-FNAC US TIRADS score and the FNAC Bethesda Classification result could be performed. Patients with symptomatic cystic nodules (TIRADS2) in whom only US-guided needle decompression was performed with no cytology sent were excluded from the study. Table 2 shows the correlation of TIRADS score to Bethesda categories for the analysis of results. TIRADS 1 cases (normal thyroid gland), previously operated thyroid patients and patients referred to us with biopsy proven thyroid cancer or benign pathology were excluded from this audit.

Table 2: Correlation of TIRADS and Bethesda categories
TIRADS Co-related to Bethesda
2 (Benign)
3 (Probably Benign)
Category 2 (Benign thyroid nodule)
4 (Indeterminate or suspicious lesions) Category 3 (follicular lesion of indeterminate significance)
Category 4 (follicular neoplasm)
5 (highly suspicious of malignancy) Category 5 (suspicious for malignancy)
Category 6 (malignancy)

The FNAC samples were prepared with ThinPrep Cytolyt solution and transported immediately to the laboratory for assessment. No on-site cytotechnician support was used in this audit. All FNAC results were reviewed and confirmed by two experienced pathologists trained in thyroid cytology who strictly followed the Bethesda classification system in their reports.


In total, there were 254 thyroid nodules that were assessed from 239 patients. Of these, 208 cases were reported as benign disease (Bethesda Class II). These comprised 36 cases with a TIRADS 2 scores and 172 cases with a TIRADS 3 score. There were 41 cases reported as TIRADS 4. Among the five cases which were classified as TIRADS 5 (highly suspicious of malignancy), all five were reported as cancers (Bethesda VI). Figure 1 shows the detailed breakdown of cases and concordance rates excluding final histology.

Figs 1A to C: (A) Overall data for surgeon 1 and 2 excluding final histology. (B) Overall data for surgeon 1 excluding final histology. (C) Overall data for Surgeon 2 excluding final histology

Overall concordance rate was 81.1%, with sensitivity and specificity for predicting malignancy at 71.1% and 100.0%, respectively. Positive predictive value and negative predictive value were 100% and 99.0%, respectively. Overall concordance rate for surgeon 1 and 2 were 79.7% and 87.2%, respectively. For each separate category, TIRADS 5 had the highest concordance at 100%, followed by TIRADS 3 at 91.9% and TIRADS 2 at 91.7%. TIRADS 4 had the least amount of concordance at 24.4%.

The probability of a malignant FNAC (Bethesda category V or VI) in TIRADS Class 2, 3, 4, 5 were 0, 1.2, 7.3, and 40%, respectively, while the probability of a benign FNAC (Bethesda category II) in TIRADS 2, 3, 4, and 5 were 91.7%, 91.9%, 61.0%, and 0%, respectively.

A large percentage of the thyroid nodules assessed (208 nodules; 81.9%) were reported as TIRADS 2 and 3, of which 191 cases were confirmed by FNAC to be benign, showing 91.8% corelation. Of the overall 36 cases classified as TIRADS 2, there were three cases reported as Bethesda category III/IV in the cytology. Two of these patients (Bethesda Class IV) underwent surgery for confirmation and the final histology in both cases turned out to be benign nodular goiter with hyperplastic change. This was consistent with the pre-FNAC TIRADS 2 score given by the surgeon. The remaining one case (Bethesda Class III) underwent repeat US-guided FNAC in 3 months which was again reported as Bethesda Category III. Subsequent diagnostic lobectomy confirmed the lesion to be benign in this case too. Hence, if histology is taken into consideration, all our TIRADS 2 reported cases were benign.

For TIRADS 3 lesions, there were a total of 172 cases of which 158 were Bethesda category II (benign), 12 cases were Bethesda category III or IV, and two cases were reported as Bethesda category V. Within the Bethesda II category, 35 cases underwent surgery (due to nodule diameter of >4 cm) of which 33 cases had a benign histology but in two of the cases, final histology turned out to be papillary thyroid carcinoma and follicular variant of papillary thyroid carcinoma, respectively. Within the Bethesda III category, four of the six cases underwent repeat FNAC at 3 months which was still reported as Bethesda class II and subsequent surgery confirmed benign thyroid nodules in all of them. The remain two patients declined repeat FNAC. The six cases with Bethesda IV classification all underwent diagnostic hemithyroidectomy of which five had a benign final pathology and the last case was found to be papillary thyroid carcinoma. Within the Bethesda V category, one out of two cases decided to undergo surgery which confirmed papillary thyroid carcinoma while the other patient sought a second opinion elsewhere.

Under the TIRADS 4 category, there were a total of 41 cases of which 25 patients classified as TIRADS 4A were Bethesda category II. Of the 10 patients classified as TIRADS 4B, six patients were classified as Bethesda category III from which four were lost to follow-up and remaining two declined repeat FNAC and chose observation. The remaining four patients classified as Bethesda IV underwent surgery and the final histology comprised two follicular carcinomas and two thyroid lymphomas. The final six patients classified as TIRADS 4C were all Bethesda category V and confirmed to have papillary thyroid cancer at surgery.

Regarding TIRADS 5 category, only five cases out of 254 nodules were reported as such. All five cases had a malignant FNAC (Bethesda category VI) showing 100% corelation. Out of the five cases, four had papillary thyroid carcinoma, and one had medullary thyroid carcinoma in final histology.

Table 3 shows overall concordance rates among the two surgeons and the yearly trend. Of note, surgeon 1 stopped performing FNAC for TIRADS 2 lesions from 2017 onwards and hence there is no concordance with cytology in 2017 and 2018. The current results were compared to the previous audit in 2015 by Singaporewalla et al.6 and achieved higher specificity (100% compared to 90.4%), positive predictive value (100% compared to 60%), and negative predictive value (99% compared to 93.8%) for predicting malignancy. Our sensitivity rates were almost identical at 71.1% compared to 70.6% after including the data of surgeon 2 who was newly trained.

Table 3a: Yearly concordance rates – Table 3a: Concordance for surgeon 1 and 2 excluding final histology
Year Overall concordance (%) TIRADS 2 concordance (%) TIRADS 3 concordance (%) TIRADS 4 concordance (%) TIRADS 5 concordance (%)
2016 86.1 89.7 88.1 40.0 100
2017 81.3 100 92.6 33.3 100
2018 76.2 100 93.5 11.1 100
Overall 81.8 91.7 91.9 24.4 100
Table 3b: Concordance for surgeon 1 excluding final histology
Year Overall concordance (%) TIRADS 2 concordance (%) TIRADS 3 concordance (%) TIRADS 4 concordance (%) TIRADS 5 concordance (%)
2016 86.2 90.0 89.5 25.0 100
2017 80.6 96.2 33.3 100
2018 72.9 94.1 11.1 100
Overall 79.7 90.0 93.7 22.5 100
Table 3c: Concordance for surgeon 2 excluding final histology
Year Overall concordance (%) TIRADS 2 concordance (%) TIRADS 3 concordance (%) TIRADS 4 concordance (%) TIRADS 5 concordance (%)
2016 85.7 88.9 75.0 100
2017 84.2 100 80.0
2018 92.9 100 90.9
Overall 87.2 93.75 83.3 100

A dash (-) indicates that the surgeon did not classify any nodules as that TIRADS category for that year


As the role of thyroid ultrasound in the clinical evaluation of nodular goitres has become established and a routine practice among all specialists, having an objective clinically relevant sonography report becomes more relevant. Ambiguous radiological reporting requesting for clinical correlation of the US findings is now being slowly replaced by a more objective reporting using the TIRADS classification to help clinicians in deciding which thyroid nodule justifies a biopsy. The TIRADS system is this sense is somewhat similar to the BIRADS reporting system that is well established in evaluating breast lumps, although the latter is based on mammograms. As more specialists are now beginning to use bedside US machines in their clinics and performing their own guided needle biopsies, it is therefore essential to have a proper audit of the clinical quality of the US TIRADS reporting and benchmark it against the gold standard (FNAC or Histology). This becomes a very useful learning and training tool and helps maintaining clinical quality standards especially across different disciplines and within a service. A uniformly accepted or standardized radiological reporting format based on TIRADS has yet to be implemented in many countries. Even among different radiologists in our institution there is significant variation in the way thyroid ultrasounds are reported. Just as the Bethesda Classification system2 revolutionized the way thyroid FNAC reporting was done worldwide, we have yet to see radiologists implementing TIRADS universally in their ultrasound reporting. A learning curve and retraining of radiologists to the TIRADS system are possible hurdles to a wider implementation. In this 3-year audit, there was also a yearly breakdown of the concordance rates of both surgeons (Table 3) to look for trends showing any significant deviation. The TIRADS 4 category had a poor concordance rate when we attempted correlate it to Bethesda class III and IV. Further subanalysis of the different TIRADS 4 classifications for Surgeon 1 further clarified the picture as 25 of the 41 cases which were reported as Bethesda Class II were classified as TIRADS 4A which have only a 3.3% malignant risk. Another 10 of the 41 TIRADS, four patients were classified as TIRADS 4B (9.2% malignant risk) and among these six were reported as Bethesda III and four as Bethesda IV. These numbers were used to calculate the concordance rate. For the remaining six of 41 cases who were classified as TIRADS 4C (44–72%) indicating a higher suspicion of malignancy on ultrasound, all were reported as Bethesda V and VI and turned out to be cancers in the final histology. One limitation of our study was that were we calculated the overall concordance rate for all TIRADS 4 subcategories. The low numbers of thyroid cancers in our study group was also a limitation but probably represents the overall prevalence in our population where benign nodular goitres are far more common. Additionally, number of cases between the two surgeons were disproportional as surgeon 2 had just started his practice. However overall, there was a good corelation for TIRADS with the Bethesda system in our audit for the benign categories and this has also been seen in other studies that have attempted to corelate the TIRADS with cytology finding.7,8 Recently Mendes et al.9 were successful in adopting the TIRADS criteria in more than 1000 nodules less 1 cm and compared it to the FNAC reports. In their study risk of malignancy for TIRADS 2 was 0.91%, for TIRADS 3 was 2.87% and among those with TIRADS 5 classifications was 85.7%.

TIRADS and Bethesda had the highest concordance when both scoring systems agreed on a malignant pathology (100%) (although the numbers were small) and also showed very high concordance when the scores agreed on a benign pathology (91.8%).

Comparing intersurgeon concordance rates, they were more similar when assessing TIRADS 2 cases (3.8% difference) as compared to assessing TIRADS 3 cases (10.4% difference). As Surgeon 2 had only one TIRADS 4 case and zero TIRADS 5 case, we were unable to compare the intersurgeon concordance rates for these cases. Sahli et al.10 compared the TIRADS variability of three blinded radiologists using intraclass correlation coefficients and found that shape and margin criteria were the biggest sources of disagreement.

When comparing year to year concordance for the respective surgeons, there is consistency even though surgeon 2 had considerably less experience than surgeon 1, he was able to achieve comparable rates of concordance. This shows that with appropriate training and understanding of TIRADS, it is possible for specialists to achieve good accuracy in their US scans.

Among TIRADS 2 cases, 100% of these cases were shown to have benign pathology, either based on cytology or histology after surgical resection. Hence going forwards, we may avoid unnecessary FNAC for TIRADS 2 cases, with the only indication for surgery being the size criteria and symptoms. This view is supported by Periakaruppan et al.8 who reported risk of malignancy for TIRADS 2, TIRADS 3, TIRADS 4, and TIRADS 5 as 0, 2.2, 38.5, and 77.8%, respectively and concluded that FNAC can at least be safely deferred in TIRADS 2 lesions.

Interestingly, there were two cases classified under TIRADS 3 with a benign FNAC (Bethesda category II) report that underwent surgery due to large nodule size ( >4 cm) and were found to be malignant on final histology. Hence, we continue to advise patients with cytologically benign nodules of more than 4 cm to consider surgical removal.

Lastly, in our audit 100% patients had an adequate FNAC report under surgeon-performed US-guidance with zero inadequate or insufficient (Bethesda category 1) results. This has remained consistent when compared to our earlier audit6 in 2015. Witt et al.11 also reported a 100% accuracy and adequacy with surgeon-performed ultrasound-guided FNAC with on-site cytopathology. Initially in our learning curve we had arranged for on-site cytotechnician support but have since abandoned this additional logistic requirement. It is our belief that majority of inadequate or insufficient Bethesda I samples are due to absence of US guidance during biopsy, poor technique and unnecessary needling of cystic or complex TIRADS 2 lesions to send fluid for cytology. In our unit, we have a standard policy to decompress complex cystic thyroid nodules whenever possible and reassess them in 4 weeks to assess suitability and need for guided FNAC from the solid component only.


Incorporation of TIRADS grading in surgeon-performed thyroid ultrasound reporting gives good clinic-pathological concordance with fine needle cytology based on the Bethesda Classification system. A consistently high accuracy rate can be maintained even among different specialists after adequate structured training and need for FNAC in TIRADS 2 lesions can be avoided.


This study was an attempt to emphasize the importance of clinical audit among surgeons to maintain high quality standards and shows that with proper training in thyroid ultrasound and FNAC these outcomes can be maintained among different specialists.


Reyaz Moiz Singaporewalla https://orcid.org/0000-0003-4145-6488


1. Horvath E, Majlis S, Rossi R, et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab 2009;94(5):1748–1751. DOI: 10.1210/jc.2008-1724

2. Cibas E, Ali S. The Bethesda system for reporting thyroid cytopathology. Am J Clin Pathol 2009;132(5):658–665. DOI: 10.1309/AJCPPHLWMI3JV4LA

3. Grant EG, Tessler FN, Hoang JK, et al. Thyroid ultrasound reporting lexicon: White Paper of the ACR thyroid imaging, reporting and data system (TIRADS) Committee. J Am Coll Radiol 2015;12(12 Pt A):1272–1279. DOI: 10.1016/j.jacr.2015.07.011.

4. Kwak JY, Han KH, Yoon JH, et al. Thyroid imaging reporting and data system for US features of nodules: a step in establishing better stratification of cancer risk. Radiology 2011;260(3):892–899. DOI: 10.1148/radiol.11110206.

5. Al-azawi D, Mann GB, Judson RT, et al. Endocrine surgeon-performed US guided thyroid FNAC is accurate and efficient. World J Surg 2012;36(8):1947–1952. DOI: 10.1007/s00268-012-1592-2

6. Singaporewalla R, Hwee J, Lang T, et al. Clinico-pathological correlation of thyroid nodule ultrasound and cytology using the TIRADS and Bethesda classifications. World J Surg 2017;41(7):1807–1811. DOI: 10.1007/s00268-017-3919-5

7. Vargas-Uricoechea H, Meza-Cabrera I, Herrera-Chaparro J. Concordance between the TIRADS ultrasound criteria and the BETHESDA cytology criteria on the nontoxic thyroid nodule. Thyroid Res 2017;10:1. DOI: 10.1186/s13044-017-0037-2

8. Periakaruppan G, Seshadri K, Vignesh Krishna G, et al. Correlation between ultrasound-based TIRADS and Bethesda system for reporting thyroid-cytopathology: 2-year experience at a tertiary care center in India Indian J Endocrinol Metab 2018;22(5):651–655. DOI: 10.4103/ijem.IJEM_27_18

9. Mendes GF, Garcia MR, Falsarella PM, et al. Fine needle aspiration biopsy of thyroid nodule smaller than 1.0 cm: accuracy of TIRADS classification system in more than 1000 nodules. Br J Radiol 2018;91(1083):20170642. DOI: 10.1259/bjr.20170642.

10. Sahli ZT, Sharma AK, Canner JK, et al. TIRADS interobserver variability among indeterminate thyroid nodules: a single-institution study. J Ultrasound Med 2019;38(7):1807–1813. DOI: 10.1002/jum.14870.

11. Witt RL, Sukumar VR, Gerges F. Surgeon-performed ultrasound-guided FNAC with on-site cytopathology improves adequacy and accuracy. Laryngoscope 2015;125(7):1633–1636. DOI: 10.1002/lary.25214.

© The Author(s). 2021 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted use, distribution, and non-commercial reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.