Cone beam computed tomography based image guidance and quality assessment of prostate cancer for magnetic resonance imaging-only radiotherapy in the pelvis

Highlights • MRI-only IGRT accuracy is ≤2 mm as compared to CT but significant differences were observed.• MRI-only CBCT-based IGRT seems feasible but caution is advised.• The median absolute error (MeAE) for independent verification on the sCT quality is proposed.• A MeAE around 0.1 in mass density could call for sCT quality inspection.


Introduction
Radiotherapy (RT) planned on magnetic resonance imaging (MRI) only, commonly referred to as MRI-only RT, is currently implemented clinically for the pelvic region mainly focusing on prostate cancer [1][2][3][4]. A major focus and active area of research within MRI-only RT is the development of methods that convert the MRI into a synthetic computed tomography (sCT) needed for dose planning and possible image guidance (IGRT) purposes [5,6]. Following the trends in related areas such as computer vision and medical imaging, much attention has recently been given to deep learning convolution neural network techniques [7][8][9][10] and commercial solutions are currently available for clinical use [11,12].
The vast majority of the MRI-only RT literature has focused on methods for generating the sCT and the corresponding dose planning performance as compared to the CT-based clinical standard. Literature on MRI-only IGRT and independent verification on the sCT quality in the absence of the CT, however, is much more sparse. As a consequence, no clinical guidelines on markerless cone beam CT (CBCT) IGRT and routine sCT quality checks exist for MRI-only RT.
In MRI-only RT, the CBCT attracts attention for sCT assessment since it is the only independent measurement of a CT-like image available. For sCT quality verification, Palmer et al. used the CBCT of the first fraction to assess the dosimetric accuracy of the sCT comparing identical dose calculations based on sCT, CT and CBCT images [13]. sCT-CT and sCT-CBCT dose differences were found to be ≤ 1%.
For MRI-only IGRT, Kemppainen et al. compared sCT-CBCT and CT-CBCT based registrations for different pelvic cancers and found the difference to be ≤ 0.5 mm [14]. One CBCT from a randomly selected fraction was used for 10 patients and both bone and soft-tissue based registrations were included in the study. A similar agreement of ≤ 1 mm and 1 • was obtained for 5 prostate patients based on 10 CBCTs of each patient registering the sCT to the CBCT on the volume around the prostate only [15].
In a previous study for the brain, we investigated whether the CBCT could reliably be used to assess the sCT quality and IGRT accuracy in MRI-only RT [16]. The study investigated MRI-CBCT, sCT-CBCT and CT-CBCT differences and demonstrated a promising potential to assess the agreement with a corresponding CT-based RT. Given the current clinical implementation of MRI-only RT in the pelvis, one goal of this study was to examine the general applicability of the previous CBCT-based method for this more challenging anatomical region. In line with [15], we further included more CBCTs of some patients for the IGRT investigation to address the accuracy of CBCT based MRI-only IGRT. Given the fact that MRI-only RT has been adapted into clinical practice, our aims were 1) to provide data for the establishment of an overall agreement of markerless IGRT in the pelvis for MRI-only RT as compared to a standard CT-based workflow, and, 2) provide novel suggestions for an implementable feasible quantitative assessment of sCT image quality.

Imaging and pre-processing
The CT and MRI scans of ten prostate cancer RT patients were retrospectively included in the study. Patients informed consent for using their data was obtained. The CT scans were acquired using a standard protocol for pelvic scans (Brilliance Big Bore, Philips Medical Systems, Cleveland, OH, 120 kVp, 232-503 mAs). The voxel resolution was between 0.78 × 0.78 × 2.00 and 1.4 × 1.4 × 2.0 mm for an in-plane matrix of 512 × 512 voxels and 129-199 slices. The MRI scans were obtained with a T 1 -weighted sequence, TE/TR = 10/623 ms, on a 1 T open scanner (Panorama HFO, Philips Medical Systems, Cleveland, OH) using a bridge body coil. The voxel resolution was 0.8 × 0.8 × 4.0 mm, for an in-plane matrix between 528 × 528 and 640 × 640 voxels and 16-24 slices. The MRI has a large field of view (FOV) to include the outer body contour needed for sCT generation. The patients were positioned in treatment position using the same fixation devices during both the MRI and CT scanning. In addition, a CBCT scan was obtained for each patient during the course of RT. The CBCTs were acquired with the On-Board Imager (OBI) system mounted on the linac (models iX and TrueBeam, Varian Medical Systems Inc., Palo Alto, CA, USA) using an abdominal scanning protocol of 125 kV and mAs 659-1049 with a resolution of 0.9 × 0.9 × 2.0 mm or 1.2 × 1.2 × 2.0-2.5 for an in-plane matrix of 512 × 512 or 384 × 384 voxels, respectively. Eight and nine weekly CBCTs were further included for patient 7 and 9, 1 respectively.
The MRI was deformably (non-linearly) aligned with the corresponding CT using elastix software and checked by visually inspecting [17]. The MRI was then re-sampled to match the CT resolution. A sCT was generated from each patient's MRI using a patch-based approach trained on the non-linearly co-registered MRI and CT multi-atlas of the other nine patients [18][19][20]. For each MRI voxel, a 3D subvolume of voxels (a patch) was extracted and the most similar patches in the multiatlas were found using the L 2 -normalized intensity distance. A weighted average of the corresponding CT values in the multi-atlas was then assigning to the MRI voxel. The sCT resolution was identical to that of the MRI, i.e. the resolution of the CT. Full details of the sCT method can be found in [21]. For the sCT quality assessment, the CBCT was rigidly aligned with and re-sampled to the resolution of the CT. For the IGRT study, the CBCT maintained its original resolution.

IGRT study
The CT, MRI and sCT scans were aligned with each other given the pre-processing procedure. The scans were imported into the registration workspace in Eclipse v.15.6 (Varian Medical Systems, Helsinki, Finland). Here, the scans were roughly aligned manually with the CBCT followed by an auto-match on the bony anatomy in line with the clinical matching procedure for elective lymph node irradiation. As the sCT did not contain prostate markers, this markerless match strategy was chosen. Both translational (AP: anterior-posterior, LR: left-right, CC: craniocaudal) and rotational (pitch, roll, jaw) displacements, e.g. 6 degrees of freedom (6 DOF), were included in the matches. For the CT and sCT reference, the bone anatomy included voxels between 100 and 4000 hounsfield unit (HU) while no intensity constraints were applied on the MRI-CBCT matches. All matches were visually inspected for acceptable agreement. The MRI-CBCT (ΔMRI) and sCT-CBCT (ΔsCT) difference relative to the CT-CBCT registration was calculated for each DOF and pooled for one CBCT from all 10 patients. The same procedure was done on the weekly CBCTs of patient 7 and 9, respectively. A Shapiro-Wilk test showed that ΔMRI and ΔsCT were not normally distributed (p < 0.02 for all tests). A paired Wilcoxon rank-sum test was consequently performed to determine significant difference from 0 (the CT-CBCT reference) [22].

sCT quality assessment
Paired sCT-CT and sCT-CBCT data were created and cropped to the (smallest) body outline of sCT. To compensate for temporary deformations in air pockets and body outline between the sCT, CT and CBCT, which are not relevant for sCT quality assessment and hence should be eliminated or reduced to a minimum, water was assigned to the CBCT and CT air voxels (<-500 HU) if the corresponding sCT voxels were soft tissue (>-200 HU), 2 see Fig. 1.
CT numbers were converted to relative electron densities (RED) and mass densities (DES) using the treatment planning system (TPS) calibration curve for the CT and sCT. For the CBCT RED and DES conversion, a calibration curve based on a CBCT phantom (phan) and a paired CT-CBCT population (pop) of the 9 other patients was applied, as presented in Fig. 2. For the latter, CBCT numbers were averaged in 100 HU bins. The known RED/DES of the corresponding CT bins were then averaged and assigned to build the calibration curve with the paired CBCT bins. Bins with points < 100 were disregarded. CT-CBCT pairs were aligned and corrected similar to the sCT-CBCT pairs prior to building the CBCT pop curves.
The CT numbers, RED and DES of the paired sCT-CT and sCT-CBCT data were averaged in bins of 10 HU or 0.01 RED/DES over all tissues. Bins with points < 100 were again disregarded. The absolute error between the sCT-CT and sCT-CBCT data was calculated for each bin as where AE i is absolute error between the mean values of the sCT and CT or CBCT of the i th bin in HU, RED or DES. The median of the absolute binned errors (MeAE) was then found as where AE i is an ordered list of n bins, and ⌊.•⌋ and ⌈.•⌉ are the floor and ceiling function, respectively. Unlike the more commonly reported mean absolute error (MAE) metrics which average the absolute error of all voxels and thus is biased towards a large number of water equivalent voxels [5,23], the whole CT range of voxels contribute equally from each bin to the MeAE. 3 The AE i distributions were subject to a Shapiro-Wilk test and found not to be normally distributed (p < 10 − 8 for all patients). An unpaired Wilcoxon rank-sum test was performed to determine significant difference between the sCT-CT and sCT-CBCT AE i distributions in HU, RED and DES space. Significance was obtained for pvalues <0.05 for the IGRT and sCT quality study.

Results
The ΔsCT and ΔMRI match differences relative to the CT-CBCT match (set to zero) can be seen in Table 1. The mean difference was ≤2 mm with a maximum standard deviation (std) of 2-4 mm. This was especially pronounced for the CC direction across all patients and CBCTs for individual patients. A similar observation was seen for the pitch rotation, which had a mean around 1 • and a std of 1-2 • . A few outliers of 6-8 mm and 4-5 • were observed for these directions. Significant difference was observed only for the AP directions of all patients. However, this pattern could not be reproduced for the CBCTs of the individual patients. Significance was similar for both ΔMRI and ΔsCT except for the LR direction of patient 9.
The MeAE of all patients are shown in Table 2. Overall, CBCT HU or phan-based RED/DES difference did not provide a similar estimate for the sCT quality as compared to the true CT-sCT difference. A CBCT popbased RED/DES difference, however, provided this estimate in most cases, i.e. non-significance from the CT-sCT difference.

Discussion
sCT generation methods have demonstrated <0.5-1% agreement with CT-based dose calculations and it is thus questionable if further advancements in sCT generation are clinically meaningful. Hence more attention should be given to other steps in the RT chain and here, we assessed the agreement in MRI-only IGRT and sCT quality verification for the pelvis.
Overall, the average deviations between CT and MRI-only based IGRT seem acceptable whether the MRI or sCT is used as reference. However, significant differences were observed which depended on patient cohort (Table 1, top) or the CBCT course of individual patients (Table 1, middle and bottom) and therefore no unambiguous conclusions can be drawn. Caution is therefore advised in making general IGRT statements based on patient cohorts' single CBCT. The magnitude in differences and outliers were especially pronounced for the CC and pitch directions whereas deviations in the other directions were around 1 mm and 1 • or less in line with previous studies [14,15]. It is likely that this is caused by the relatively short longitudinal (long) MRI FOV of 64-96 mm. This could result in incorrect combinations of CC and pitch that lead to a (favored) reduction in the registration cost function that is similar to a correct one. A MRI long FOV > 100 mm is therefore suggested at the expense of increased MRI scanning time.
To reduce the influence of differences between the sCT, CT and CBCT scans not caused by the sCT generation method, we 1) aligned the anatomy through ridged registration, 2) filled inconsistent air cavities with water and 3) adjusted for HU intensity by transforming tissue voxels into (electron) densities. This seems like a clinically feasible approach given the data available although not ideal. Deformable registration is another approach to minimize these differences but introduce additional challenges for verifying the correctness of the deformation field [24]. The MeAE metric suggests an error estimate of the sCT quality similar to a CT reference if the CBCT voxel values are transferred to RED or DES space using a population (pop) based calibration curve. This curve behaves quite differently as compared to the TPS and traditional phantom based calibration curves (Fig. 2). A major contribution to the CBCT HU numbers is scattered radiation [25][26][27]. In the pop curves, the true patient scattering geometries result in more photons being scattered away from detectors when crossing low density region and into the detectors in high density regions, resulting in a more even curve over the CT range. Given the ever-developing reconstruction algorithms and equipment, the pop curves are likely to be dependent on vendor, model version and anatomical site, see e.g. our previous pop curve for the brain [16].
The MeAE shows a low and high DES value of 0.04 and 0.17 for patient 1 and 8, respectively. By inspection of Fig. 3, it is clear that the bony anatomy of the sCT is much better predicted in patient 1 than patient 8. This suggests that the MeAE metric could help flag a sCT of unacceptable quality for clinical use. A sCT-CBCT MeAE value above 0.1 DES or RED could act as an initial action level for required inspection. The corresponding MAE DES values were 0.042 and 0.048 for patient 1 and 8, respectively, leaving little room for discrimination in image quality using this metric.
In conclusion, both the MRI and sCT can be used for MRI-only CBCTbased IGRT in the pelvis but caution is advised for longitudinal FOVs < Table 1 Pooled ΔsCT and ΔMRI match differences for one CBCT of all patients (top), and, all 8 CBCTs of patient 7 (middle) and all 9 CBCTs of patient 9 (bottom). Numbers indicate mean ± 1 standard deviation in mm (translations) and degrees • (rotations). Significant p-values are in italic font. MRI-CBCT (ΔMRI) and sCT-CBCT (ΔsCT) difference relative to the CT-CBCT registration. AP = anterior-posterior, LR = left-right and CC = cranio-caudal.  Table 2 The median absolute error for CT numbers in Hounsfield units (MeAE HU ) of sCT-CT (CT) and sCT-CBCT (CBCT, left), relative electron densities (MeAE RED , middle) and relative mass densities (MeAE DES , right). The CBCT RED/DES conversion is either made with a phantom (CBCT phan ) or population (CBCT pop ) based calibration curve (see Fig. 2 100 mm. The CBCT seems adequate to assess pelvic sCT quality if converted to RED or DES using a population-based calibration. A MeAE of 0.1 DES is suggested as a potential action level for inspection of sCT quality.