Predictors for late genitourinary toxicity in men receiving radiotherapy for high-risk prostate cancer using planned and accumulated dose

Highlights • Accumulated Bladder dose was significantly higher at the intermediate-high dose region (V30-65 Gy) compared to bladder planned dose.• Accumulated bladder dose was significantly lower at the very high dose region (V70-75 Gy) compared to the planned bladder dose.• Dose-based region of interest structures were better predictors than dose-volume metrics for the bladder.• Higher accumulated bladder dose was not translated to an increased in GU toxicity compared to planned dose.• Smaller prostate volumes have a minor protective effect on Grade 2 late GU toxicity.


Introduction
High-risk prostate cancer (HR-PCa) accounts for about 15 % of all prostate cancer diagnosis with a higher likelihood of metastatic relapse after definitive treatment [1,2]. The application of modern radiotherapy (RT) technologies indicates that dose escalation and hypofractionated regimens have the potential to improve biochemical disease-free survival in HR-PCa [3,4]. Often, the bladder constraints can be achieved on the dose-volume histogram (DVH) metrics using the planning CT (pCT) acquired when the patient follows a bladder filling protocol to displace the bowels from the high dose region [5]. However, these treatment plans generated on static pCT, do not account for volumetric variations of the bladder during treatment [6]. Reported genitourinary (GU) toxicity remains high, especially in patients with HR-PCa which requires prophylactic pelvic lymph nodes (PLNs) irradiation as the bladder is positioned between the PLNs [7].
Inter-fractional volumetric changes during the course of RT could affect the actual dose received by the target volumes and surrounding organs at risk (OARs) [8][9][10]. To date, most of the reported studies involve the prostate only as the target volume and use the dose planned (D P ) as dose-volume (DV) variables for associations with GU toxicity [11,12]. There is limited work being performed using dose accumulated (D A ) as DV variables in the setting of HR-PCa with PLNs irradiation [12,13]. Additionally, the cohort size in these published works were small. The lack of robust data in this aspect could be due to the timeconsuming and resource-intensive nature of constructing and streamlining a dose accumulation workflow [14,15]. The developed and validated dose accumulation workflow from our previous work was used to generate the accumulated dose (D A ) for the bladder, employing the patient's daily CBCT images [16]. Apart from the DV component, GU toxicity has been reported to be influenced by clinical variables such as patient-related factors, medications, and the occurrence of acute GU toxicity within three months from RT [17][18][19][20].
In this study, we hypothesise that the multivariate analysis (MVA) models generated using D A for the DV component are more predictive than D P in determining the occurrence of late Grade ≥ 1 and Grade 2 GU toxicity in HR-PCa. None of the patients in this study experienced Grade 3 and 4 GU toxicity. The goals of this study were firstly to evaluate the DV differences between D A and D P for the bladder. Secondly, MVA models with the toxicity endpoints of developing late Grade ≥ 1 and 2 GU toxicity were independently assessed using either D A or D P and the clinical variables.

Materials and methods
In this study, a total of 150 HR-PCa patients with prophylactic PLNs irradiation treated in our institution from January 2016 to December 2019 were retrospectively recruited. The median follow-up (FU) for the entire cohort was 57 months, ranging from 31.8 to 77.0 months. Ethics approval was obtained from the centralized institutional review board (CIRB ref: 2019/2018). Table 1 presented patients' clinical variables, acute and late toxicity profiles. Similar methodology has been adopted from previous publication based on clinical and DV associations with late gastrointestinal (GI) toxicity [21].

CT-Simulation and treatment planning
Patients were simulated in a supine position with arms on their chest using a leg immobilizer. Before CT-simulation and each RT session, patients were advised to adhere to the bladder filling protocol (2-3 cups; 400-600 ml of water, 30-60mins) and were encouraged to empty their bowels. CT-simulation was undertaken with 2.5 mm slice thickness (120kVp, GE LightSpeed RT 16). Clinical target volumes (CTVs) were defined as the prostate, seminal vesicles (SVs), PLNs with superior extend at L5/S1 interspace for phase 1 (Ph1), and a coned down CTV for phase 2 (Ph2) defined as the prostate and proximal 1 cm of the SVs. Planning target volumes (PTVs) comprised of an anisotropic expansion margins of 5 mm posteriorly and 5-8 mm to all other directions from the CTVs. Dose prescriptions comprising of 46-54 Gy (23-27 fractions) and an additional 24-28 Gy (12-14 fractions) were prescribed to Ph1 and Ph2 respectively. Both phases were planned using 10 MV energy, dual arc Volumetric Modulated Arc Therapy (VMAT) technique.

Dose based-region of interest (DB-ROI)
DB-ROI structures were created using an automated process utilizing a customized workflow in MIM (MIMVista® v6.9, MIM Software Inc., Cleveland OH USA) [22] (refer to supplemental Fig. S1). The mean dose derived from the novel DB-ROI method was used as DV variables together with the standard DV values in MVA. This method accounted for the volumetric changes of the bladder at a fixed distance from the prostate surface, thereby minimizing the uncertainties in defining the bladder trigone where correlation with GU toxicity has been reported [23].

Dose accumulation workflow
A customized dose accumulation workflow that was able to accommodate two sequential treatment phases using MIM was developed. Details of the workflow building and validations of the deformable image registration (DIR) algorithm have been previously described [22]. The generation of D A was based on patients' daily CBCT scans acquired as part of their target localization procedure before treatment delivery. The scans were acquired in a half-fan mode (45 cm field-of-view, 120 kVp) scan, and reconstructed to 2.5 mm slice thickness (Varian on-board imaging v2.1, Varian Medical Systems, Palo Alto, CA).

GU toxicity assessment and documentation
Late GU toxicity was recorded after three months post-RT, sixmonthly for five years followed by yearly thereafter. In this study, the incidence of maximum toxicity grading for Grade ≥ 1 and 2 GU toxicity defined at two years post-RT FU were used as the examined clinical toxicity endpoints. Grade ≥ 1 and Grade 2 toxicity are defined as patients having mild-severe (Grade 1-2) and severe (Grade 2 only) late GU toxicity respectively. Previously toxicity records graded using the Radiation Therapy Oncology Group (RTOG) criteria were reviewed and regraded by the National Cancer Institute Common Terminology Criteria for Adverse Events (version 4.03; CTCAE) by the radiation oncologist from the study team. The GU toxicities were defined as urinary frequency, urinary urgency, urinary incontinence, and cystitis.

Statistical analysis and modelling
The primary clinical outcome of this study was the occurrence of Grade ≥ 1 and 2 GU toxicity measured at two years post-RT FU. Descriptive statistics (e.g., means ± standard deviation, medians with interquartile ranges) were calculated. For the DV analysis comparing D P and D A values, a parametric two-sided t-test was used after performing a normality test using the Shapiro-Wilk test and evaluated visually with QQ-plots and histograms. A p-value of < 0.05 was deemed significant. Highly correlated variables tested using Pearson correlation test (r ≥ 0.8) were removed. Univariate logistic analysis (UVA) was performed on individual clinical and DV variables to define associations with late  clinical endpoints. For the DV variables, D A and D P were being analysed separately with the defined GU toxicity. Variables with p-values of < 0.05 were statistically significant [24,25]. Significant variables at the UVA level were used for the subsequent MVA using an enter/remove method to identify the independent predictors for the final MVA model, whereby p < 0.05 was considered statistically significant [26]. Results were reported as odds ratios (OR), 95 % confidence intervals (CIs), and p-values. Model performance was measured for its calibration results and discriminative ability. For model calibration, the Hosmer-Lemeshow pvalue (p-HL) goodness of fit test was used to generate the calibration plot. The observed outcomes were divided into quartiles to obtain the observed probabilities and were plotted against the predicted probabilities for binary dependent variables [20]. The ability of the models to distinguish patients with the defined clinical outcomes was evaluated using the area under the receiver operating characteristic curve (AUC). An ideal correlation corresponds to an AUC of 1. An AUC of ≥ 0.6 and minimum 95 % CI ≥ 0.5 was considered statistically significant [27]. Internal validation was accomplished using bootstrapping, in which resampling with replacement techniques was performed 1000 times on the original dataset and recalculated during the variable selection process adjusting for model optimism [19,28]. Best fit predictors with 95 % CI were obtained. All analyses were performed using SPSS statistics (IBM Corp. v27.0. Armonk, NY) and R software (https://www.r-project.org/, version 4.0, Vienna, Austria).

Dose-based ROI analysis between D A and D P for the bladder
The absolute mean DV values for bladder D A , D P and the difference in dose (D A -D P ), Gy were calculated for all patients per ROI. D blad ROI 5− 50 mm were shown in Table 3. For D blad A ROI 20− 50 mm , the obtained dose difference was significantly higher as compared to D P (D A -D P , p < 0.001). The greatest dose difference of 2.9 ± 3.4 Gy was observed at D blad A ROI 25 mm . At D blad ROI 5− 10 mm region, D A on average was significantly lower compared to D P (p < 0.001).

MVA modelling and model performance evaluation
For MVA modelling, clinical variables were evaluated separately with the DV variables D A and D P respectively based on the significant predictors found in UVA (refer to supplemental Table S1). Three statistically significant single variable models (p < 0.05) were achieved, correlating to the development of late Grade ≥ 1 and 2 GU toxicity (see Table 4). All the obtained models have an OR < 1 for the defined clinical endpoints. The models demonstrated good performance by attaining an AUC of ≥ 0.6, with Model 2 having the highest AUC of 0.81 (see Table 5). Similarly, the models were also well-calibrated whereby the obtained p-HL values were ≥ 0.05, indicating that the predicted probability is comparable to the actual observed events as demonstrated in Fig. 1.

MVA regression analysis for Grade ≥ 1 and 2 GU toxicity at 2 years post-RT FU
For Grade ≥ 1 GU toxicity, D blad ROI 50 mm was the only significant DV predictor for Model 1 and 1a with D A and D P respectively (p < 0.05), although D A achieved a slightly higher mean dose of 0.9 Gy compared to D P (refer to Table 4). An OR of < 1 was obtained for Model 1 and 1a, indicating that there was a marginal prophylactic effect for low dose on the risk of Grade ≥ 1 GU toxicity. For Grade 2 GU toxicity in Model 2, every 1 cm 3 increment in prostate volume has a corresponding 13 % reduction in toxicity event. None of the DV predictors were significant in defining this clinical outcome.  Abbreviations: ROI = region of interest; D A = mean dose accumulated, D P = mean dose planned; SD = standard deviation.

Discussions
This study hypothesised that MVA models obtained using D A as the DV component are more predictive than D P in associating with late GU toxicity. This is one of the largest studies to date using VMAT technique and with D A obtained from a customized automated workflow to account for patient's inter-fractional organ motion, in addition to patient's clinical variables for model construction.
For Models 1 and 1a, the risk of developing Grade ≥ 1 late GU toxicity was reduced moderately with the corresponding increase in dose received by the D blad ROI 50 mm (low-intermediate range) for both D A and D P [29]. This could be due to a prophylactic effect of low dose on GU toxicity. However, Marcello et al. [30] conducted a study on 1071 men treated using 3D RT found that low-intermediate doses to the extraprostatic urethra were associated with the risk of developing late GU toxicity. The findings contrasted with our findings as urethra was not contoured and assessed in this study and moreover, the occurrence of late GU toxicity has been reported to be beyond 2 years [31]. In the DV analysis phase, D A for the bladder was significantly higher for most of the dose range for D blad V30− 65 Gy and D blad ROI 20− 50 mm . This could be due to an overall reduction in bladder volume throughout RT, thus resulting in a larger volume of the bladder being bathed in the PLNs dose range [13]. This observation validated the results reported in our recently published work demonstrating the correlations between dose received by OARs and the volumetric changes using patients' CBCTs on 20 HR-PCa patients [16,22]. Despite having a bladder filling protocol in place, maintaining bladder consistency during RT is challenging due to factors such as patient hydration status, co-morbidities (e.g., diabetes), and the intake of diuretics [5]. As compared to the bladder D P , higher values for bladder D A obtained for BD-ROI metrics do not contribute to an enhanced risk of late Grade ≥ 1 GU toxicity in this study. These findings are corroborated by studies involving full bladder protocols in conventionally fractionated RT for PCa [6,32].
For Model 2, none of the DV variables were significant in predicting late Grade 2 GU toxicity. This finding might be due to the low toxicity rates (n = 11) in this study contributed by the use of inverse planning technique with optimal OARs sparing [33] and the high tolerability of the bladder tissues to radiation [34]. Moreover, other co-factors such as patient's genomic and proteomic features might play an important role in the risk of developing GU toxicity [35]. Prostate volume was the only significant clinical predictor that has shown to have a minor protective effect on the risk of developing late Grade 2 GU toxicity. Studies investigating the impact of pre-treatment prostate volumes on GU toxicity found that larger prostate volumes (median > 50 cm 3 ) correlate to higher rates of acute GU toxicity, but symptoms were resolved within a year [36,37]. In this study, the smaller prostate volumes observed (mean: 36.8 cm 3 , SD: ± 19 cm 3 ) could be due to the routine use of neoadjuvant ADT as the standard of care for patients with HR-PCa Abbreviations: D A = dose accumulated; DP = dose planned; ROI = region of interest; OR = odds ratio;  [ 38,39]. This is in parallel to studies reporting that a corresponding reduction in Grade ≥ 2 GU toxicity has been observed in patients with neoadjuvant RT as compared to patients treated with RT alone [40,41]. There were several limitations to this study. Firstly, the toxicity outcomes were physician-reported rather than patient-reported (PROs). The rates of underestimation of the actual Grade ≥ 1 and Grade 2 GU toxicity may be present as there is a low agreement between physician and patient-reported symptoms [42]. Although a combination of PROs and physician-reported outcomes is the ideal standard of care when reporting RT-induced toxicity, the actual implementation remains a challenge as it is resource-intensive [43]. As the majority of the toxicity reporting is currently based on a standardized comprehensive system for reporting adverse events (e.g., CTCAE, RTOG, etc.) ( [44], results obtained from our study are applicable across similar HR-PCa cases using inverse planning techniques. Secondly, DV metrics of the urethra were not available for analysis as this structure was not routinely contoured in prostate cases with conventional fractionation in our institution. While an increased in dose to the urethra has a corresponding effect on GU toxicity [45], dose prescription in the study cohort do not exceed 80 Gy and the dose distribution is homogeneous, with very small volumes of higher doses/ "hot spots" within the prostate gland. Therefore, the association of the urethra DV variables on GU toxicity in MVA should be low. Lastly, the use of DV-based metrics in MVA do not provide any geometrical information as every region of the OARs are considered equally critical, unlike the use of voxel-based metrics [46,47]. However, the utilization of DV-based metrics as the DV variable in correlating with toxicity outcomes were commonly used, thereby enabling a robust comparison across various institutions [17,21]. One of the main strengths of this study includes the use patient's daily CBCTs images (5761 images) to generate D A for the prostate and bladder, thus accounting for the inter-fractional organ motion that might affect the actual dose received for MVA. Although the results are not significant, the obtained single variable models can draw several important decisions to guide future dose escalation and hypofractionation strategies. For instance, like the rectum, the bladder behaves prevalently as a serial organ, thus is more sensitive to small volumes receiving high doses [48]. The dose range received by bladder V 75 Gy for D A and D P is well below our departmental planning dose constraint of ≤ 25 % [49]. Therefore, keeping within this dose limit, in addition to an acceptable bladder filling protocol and a robust image-guided RT workflow while performing dose escalation is highly recommended [11,50]. Another point worth mentioning will be the higher D A at the intermediate to high dose region (D blad V30− 65 Gy and D blad ROI 20− 50 mm ). Despite having a statistically significant dose differences, this result was not translated to late GU toxicity. Therefore, institutions could consider an acceptable bladder filling protocol, incorporating the use of IGRT for patients with difficulty in achieving a desired bladder filling volume. Lastly, the alternative DB-ROI method is a stronger predictor compared to the standard DV metrics in correlating with GU toxicity in the final MVA although the results were not significant. None of the DV metrics were selected during MVA. Moving forward, this study could be expanded to incorporate the use of dose surface maps (DSM) analysis with spatial information for the model-building to improve spatial dose-response correlations [51]. In addition, work is in progress to apply this model to determine the feasibility of performing dose escalation or hypofractionated regimens as well as incorporating advanced modalities, such as proton therapy to enhance the patient's therapeutic ratio.
In conclusion, we have demonstrated that firstly, the use of DB-ROIs as surrogates for DV metrics is more predictive in MVA. Secondly, significant inter-fractional variations of the bladder occur during RT delivery as demonstrated by the higher dose received by the bladder in DV and DB-ROIs in D A . However, the higher bladder D A observed as compared to D P does not correlate to the increased risk of late GU toxicity in patients with HR-PCa. Lastly, smaller prostate volumes have a minor protective effect for Grade 2 GU toxicity. As patients were treated using inverse planned modulated techniques, the reported results serve as an excellent yardstick for toxicity predictions as compared to previously reported results using three-dimensional conformal RT. Moving forward, more research is needed in this area to enhance our knowledge pertaining to the DV-effects on late GU toxicity.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.