If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Currently, there is no robust indicator within the Cone-Beam Computed Tomography (CBCT) DICOM headers as to which anatomical region is present on the scan. This can be a predicament to CBCT-based algorithms trained on specific body regions, such as auto-segmentation and radiomics tools used in the radiotherapy workflow. We propose an anatomical region labeling (ARL) algorithm to classify CBCT scans into four distinct regions: head & neck, thoracic-abdominal, pelvis, and extremity.
Materials and methods
Algorithm training and testing was performed on 3,802 CBCT scans from 596 patients treated at our radiotherapy center. The ARL model, which consists of a convolutional neural network, makes use of a single CBCT coronal slice to output a probability of occurrence for each of the four classes. ARL was evaluated on the test dataset composed of 1,090 scans and compared to a support vector machine (SVM) model. ARL was also used to label CBCT treatment scans for 22 consecutive days as part of a proof-of-concept implementation. A validation study was performed on the first 100 unique patient scans to evaluate the functionality of the tool in the clinical setting.
ARL achieved an overall accuracy of 99.2% on the test dataset, outperforming the SVM (91.5% accuracy). Our validation study has shown strong agreement between the human annotations and ARL predictions, with accuracies of 99.0% for all four regions.
The high classification accuracy demonstrated by ARL suggests that it may be employed as a pre-processing step for site-specific, CBCT-based radiotherapy tools.
Cone-beam computed tomography (CBCT) is commonly used for radiotherapy image guidance because it facilitates accurate and precise positioning and alignment of the patient. In real-time adaptive radiotherapy, the CBCT may also be used to adapt the treatment plan based on the new target location and size, and the position of organs at risk (OARs). In this case, the delineation of the target(s) and OARs may be required on the CBCT scan prior to the plan adjustment [
]. However, these algorithms are typically anatomical region-specific and assume the presence of the organs-of-interest irrespective of the body region inputted to the algorithm.
The recognition of the global body region may be useful as a pre-processing step for these tools, such that they are applied to body regions within their domain. However, this step is often neglected due to the assumption that the anatomy information is present on the Digital Imaging and Communications in Medicine (DICOM) headers. While a ‘Body Part Examined’ tag is indeed present in the DICOM headers of the planning Computed Tomography (CT), it has been shown that this information is not very reliable, with a mis-labeling rate of 15.3% [
]. Furthermore, these pre-defined labels are driven by the acquisition protocol. Due to the variability and differences among the patients' anatomies, an imaging protocol for a different body region may be used by the clinical personnel in order to obtain better image quality. While the header can be adjusted following the CT acquisition, this is not commonly done in the clinic, which may lead to a wrong body region label [
]. Additionally, this ‘Body Part Examined’ tag may be completely absent in the CBCT DICOM headers, as is the case at our institution, highlighting the need for an automatic region-labeling algorithm to recognize the global patient anatomy and treatment region.
Several algorithms have recently been proposed for the classification of anatomical regions in CT and MRI scans [
] achieved the highest classification accuracy of 97.3% on their test dataset composed of 663 CT scans. These previous studies showcase the potential of deep learning techniques on such region labeling problem. Nevertheless, if these techniques are used as a pre-processing step for other clinical tools, which have their intrinsic error rate, it is imperative to minimize the pre-processing error rate as much as possible to improve the reliability of the labeling tool and reduce the overall algorithm’s failure rate. Hence, it is vital to continuously identify and address limitations of such region labeling tools.
One common characteristic and limitation of the previous studies is that they have all been developed and tested on CT and MR images, which typically have improved image quality as compared to pre-treatment CBCT images [
]. Hence, classifying CBCT images may become a challenge as fewer useful features and more artifacts may be present on the CBCT scan for accurate region labeling. CBCT scans also have a small field-of-view (FOV), which is usually restricted to the treatment region only, making a consecutive body part recognition algorithm as in [
To address this current limitation, we propose a CNN-based anatomical region labeling (ARL) tool which can classify a CBCT scan into four global regions, namely head & neck (HN), thoracic-abdominal (TA), pelvis (PL) and extremity (EX) using a single coronal slice from the CBCT volume. To the best of our knowledge, this will be the first region labeling algorithm built specifically for pre-treatment CBCT scans.
2. Materials and methods
2.1 Dataset for model training and testing
Under an IRB approved protocol (UID 18–001430), CBCTs were collected from 631 patients undergoing radiotherapy treatment at the University of California, Los Angeles Medical Center (UCLA) between January 2017 and April 2022. The dataset collection was performed using an in-house DICOM query and retrieval (DQR) application programming interface using the pynetdicom
Python package. The treatments at UCLA had been performed on three TrueBeams and one NovalisTx linear accelerator treatment machines (Varian Medical Systems, California, United States). CBCT scans were acquired using the on-board imager of each machine. For each CBCT, the corresponding planning CT, REG file and RTStruct file were also collected and used during the image pre-processing step in our implementation.
A visual inspection of the treatment isocenter was performed to sort the CBCT scans into four different global regions: head & neck (HN), thoracic-abdominal (TA), pelvis (PL), and extremity (EX). The C7 vertebral body was used as a limit to the HN region such that the CBCT scan only contained these two body parts, as shown in Supplementary Figure S1. However, in the clinical setting, it is possible to have neck scans containing part of the thorax. For the first part of our experiment, which included model training and testing, these scans with substantial overlapping regions were withdrawn from our dataset to maintain the distinction between each category. Following the triage, 3802 CBCT scans from 596 patients remained, as described in Supplementary Table S1. The limits of the TA region were the T1 vertebra and the L2 vertebra, avoiding the neck and pelvis regions. For the PL scans, the L3 and S2 vertebra were used as markers, avoiding the abdominal region and area below the pubic symphysis. Scans of the arms, legs and extremity of the shoulder were placed in the EX dataset.
Each of the four datasets was then separately and randomly split into a training, validation and test set using a 60:10:30 ratio. As scans from multiple treatment fractions were used in our study, the dataset split was performed based on the patients’ unique anonymized identifiers to avoid having scans from the same patient overlapping across the training, validation, and test sets.
2.2 Image Pre-processing
The pixel spacing of the CBCT scans ranged from 0.51 to 1.17 mm and the slice thickness from 1 to 2.5 mm. The CBCT scans were resampled based on their corresponding planning CT to produce uniform images with a voxel spacing of 1x1x1.5 mm3. In our pipeline, this resampling, and volume matching was performed using the REG file present with the CT-CBCT pair. However, the CBCT resampling can be made independently from the planning CT and REG file for another application of the ARL. Furthermore, the treatment couch and immobilization devices were removed from the CBCT image using the body contour present in the RTStruct file. In the event that a body contour is not present in the RTStruct file, a thresholding method, including a morphological dilation followed by erosion, was used to extract the body contour from the CBCT. The dilation and erosion operations used 20 × 20 and 5 × 5-pixel2 rectangular structuring element, respectively.
A coronal slice was then extracted from each CBCT present in our dataset and each image was labeled using their corresponding global region. The primary coronal slice was extracted by locating the CBCT slice with the highest mean Hounsfield Unit (HU). This slice-selection method was chosen such that the coronal slice would cover the whole extent of the patient scan while containing considerable bony structures (higher HU) which can be useful features in the recognition of the anatomical region.
For training purposes, two additional slices were extracted from the CBCT scans in the training and validation datasets, with each slice being 10 pixels away from the primary coronal slice location; one being 10 pixels in the anterior direction and the other being 10 pixels in the posterior direction. The extraction of these two extra slices were performed as an augmentation method during the model training due to the inter-patient variability in anatomy which can be present on the primary coronal slice. The slices were then cropped about the center of the patient body to reduce empty spaces around the body and obtain 150x400 pixel2 images, as shown in Fig. 1, and used as input to our ARL model.
2.3 Anatomical region labeling (ARL) model
The ARL model used the Dense-Net architecture as shown in Supplementary Figure S2. The ARL model makes use of densely contracting paths to capture contextual information from the CBCT coronal image before outputting a probability of occurrence for each of the four classes. The Dense Block in our architecture constitutes of two densely connected layers, each comprising of seven layers. The two densely connected layers in the Dense Block were connected to each other in a feed-forward mode to maximize feature reuse, which been shown to be computationally efficient, hence allowing a deeper network [
], with a starting learning rate of 2x10-5. During training, the model was evaluated on the validation dataset after each epoch, and a learning rate reducer (0.75) was applied if the validation loss did not decrease for 15 consecutive epochs. To avoid overfitting the model on the training dataset, an early stopping method was applied such that training would stop if the validation accuracy did not improve for 50 consecutive epochs, or for a maximum of 400 training epochs. For comparison purposes, we also trained a Support Vector Machine (SVM) [
], and saved the parameters which produced the highest accuracy on the validation set.
2.5 Evaluation metrics and quality control
After the trained ARL model and SVM was applied to the test dataset, the true positive (tp), false positive (fp), false negative (fn), and true negative (tn) counts were obtained for each anatomical region. Subsequently, the four metrics shown in Equations 1–4 were used to evaluate and compare the performance of our models.
To obtain visual explanations of the model’s prediction, the Gradient-weighted Class Activation Mapping (Grad-CAM) [
] was implemented. This Grad-CAM method uses the gradients from the final convolutional layer from the ARL model to produce a heat map describing the regions which contributed the most to the activation of the predicted anatomical region.
2.6 Clinical implementation and validation
Using our in-house DQR system to interface with the ARIA system (Varian Medical Systems, Palo Alto, CA), the ARL was implemented at our clinic to automatically classify incoming CBCT data on a daily basis for 22 consecutive days between August and September 2022 as part of a pilot process for an automated weekly chart check image analysis [
]. For validation purposes, the predictions for the first 100 unique patients were compared to a human perspective. Without any information about the predictions of the ARL, each of the 100 unique scans was visually analyzed and labeled by a human observer to obtain the ground truth label.
However, in contrast to the dataset used during model training and testing this validation dataset did not exclude scans containing overlapping regions, such as neck and thorax, or abdomen and pelvis. Hence, the ground truth labels were obtained by identifying the dominant region (i.e. the region encompassing the majority of the CBCT scan). Furthermore, the other less pronounced region(s), if present, was noted as a ‘less-pronounced region’. For example, for a neck treatment scan containing mostly the neck and part of the thorax, the region would be labeled as HN, with the ‘less-pronounced region(s)’ being TA. Following the human annotations, the predictions from the ARL were compared with their respective ground truth labels and the model performance was evaluated.
3.1 Model training and evaluation
During the algorithm training, the ARL model achieved convergence after 49 epochs with training and validation accuracies of 99.8% and 99.3%, respectively. Following the testing phase, the ARL model resulted in 9 misclassifications out of the 1,090 test cases, for an overall accuracy of 99.2%. Selected true-positives and misclassifications are shown in Fig. 2 and Fig. 3, respectively.
For the SVM, a polynomial kernel was found to produce the best fit, with training and validation accuracies of 96.0%. Following testing, the SVM obtained an overall accuracy of 91.5%. Using Student paired t-tests, results from the ARL model and SVM were found to be statistically significant (p-value < 0.0001). The detailed results obtained from the ARL model and SVM are reported in Table 1.
Table 1Performance of the Anatomical Region Labeling (ARL) model and the Support Vector Machine (SVM) on the 1,090 test cases. The results are shown for each three global regions separately. Bold texts represent the better result between the two models.
HN: Head & Neck, TA: Thoracic-abdominal, PL: Pelvis, EX: Extremity, ARL: anatomical region labeling model, SVM: support vector machine.
3.2 Validation of the proof-of-concept implementation
During 22 consecutive treatment days between August and September 2022, 798 patient scans were processed and classified by the ARL algorithm. The validation dataset was composed of the first 100 unique patient scans, which were labeled by a human observer, and described in Supplementary Table S2.
The ARL prediction for each of the 100 cases was compared to its respective ground truth label (dominant region), and the results of this validation study are reported in Table 2. Out of the 100 individual cases, two cases had an ARL prediction-ground truth mismatch. However, it was found that each of these two cases had overlapping regions present on the CBCT scan, and the ARL prediction matched with the referenced less-pronounced regions, as shown in Fig. 4.
Table 2Performance of the Anatomical Region Labeling (ARL) model on the 100 cases used for clinical validation. The results are shown for each three global regions separately.
HN: Head & Neck, TA: Thoracic-abdominal, PL: Pelvis, EX: Extremity.
The ARL model presented in this study has shown high classification ability for each of the four global regions (HN, TA, PL and EX), with accuracies of 99.9%, 99.4%, 99.6%, and 99.4% respectively, outperforming the SVM model in all four regions. As compared to other CNN-based anatomy recognition algorithms developed by Roth et al. and Ouyang et al., which achieved the highest reported accuracies in the literature (94.1% and 97.3%, respectively) [
], our ARL resulted in a better performance with an overall accuracy of 99.2%. However, a direct comparison between those methods is not the primary aim of this study as different imaging modalities, number of classes, and imaging planes have been used in each method. Nevertheless, the high classification accuracy produced by the ARL model demonstrates the feasibility of applying such deep learning tool to pre-treatment CBCT scans to identify the global anatomical region.
Fig. 2 shows the input coronal slices of 12 true-positive cases with the Grad-CAM activation heat map of the ARL model overlaid on the CBCT slice. It can be observed that the regions which activated the model are in the vicinity of the craniovertebral junction for HN cases, the spine, abdominal organs and the ribs for the TA cases, and the pelvic bones for PL cases. As for the extremity cases, the model was activated by the empty regions around the patient anatomy. While this may not be the most logical and robust way of identifying extremity cases to a human observer, this feature is a characteristic of most extremity scans. However, it must be noted that the amount of extremity cases in the training dataset was limited, which may be the source of the decrease in performance for EX classification.
Out of the 1,090 scans, 9 scans were wrongly classified by the ARL model. Fig. 3 illustrates some misclassified cases, with their corresponding activation maps overlaid on top. It can be observed from Fig. 3(a) and 3(b) that the limited FOV resulted in a wrong classification of the thorax as an extremity due to the empty spaces around the patient. On Fig. 3(c), the presence of metal artifacts may have been the cause of the misclassification as shown by the heatmap. A potential solution would be to use activation gates [
] within the ARL model such that it focuses on targeted regions instead of irrelevant regions on the image.
Nevertheless our proof-of-concept implementation and validation study have shown that the ARL predictions correlate with the human observer annotations, with accuracies of 99.0% for all four global regions. Out of the 100 cases, two cases had an ARL prediction-dominant region mismatch, as shown in Fig. 4. However, it can be observed that the ARL prediction was still consistent with the overlapping region present on the scan in both cases. The results of this validation study hence reinforces the relevance and ability of the ARL tool to label CBCT images from daily treatments.
To be more robust to the entire patient population, the current algorithm could be further refined to accommodate for outlier cases, such as extremity treatment scans. However, these types of CBCT scans are seen more sporadically in the clinical setting due to the rare occurrence of soft tissue sarcomas [
], leading to too few cases for optimal model training or refinement. Furthermore, the ARL was trained and tested on only a single institution’s data. To validate and improve the generalizability of the ARL on other facilities’ datasets, a multi-institutional study needs to be performed, which will be part of our future studies.
Another limitation of the current ARL is that it uses a single 2D coronal slice which contains limited anatomical information as compared to the whole 3D image. A 3D Dense-Net [
]. However, training a 3D CNN is computationally expensive and the inference time of the tool will be higher with our current system. With the increased availability of high performance Graphics Processing Units, this 3D method may be feasible in the future.
In this work, a CNN-based Anatomical Region Labeling (ARL) tool was developed to classify pre-treatment CBCT scans into four regions, namely head & neck, thoracic-abdominal, pelvis, and extremity. Our results have shown strong agreement between the model predictions and human annotations for all four regions, confirming the strong performance of the model. The ARL algorithm may be employed in the clinical setting as a pre-processing step for radiotherapy tools which have been developed for pre-treatment CBCTs containing specific anatomical regions, such as auto-segmentation algorithms, patient setup error detection algorithms, and radiomics tools for early treatment response assessment. Furthermore, the tool may be used as a quality assurance check by comparing the model’s prediction to the treatment site to avoid wrong-site radiotherapy treatment.