Bayesian network structure for predicting local tumor recurrence in rectal cancer patients treated with neoadjuvant chemoradiation followed by surgery

Highlights • Algorithmic-based BN structures are more performant than expert structure.• Algorithmically derived BN structures are comparable to a black-box model.• The alignment of BN structures with clinical processes increases interpretability.


Introduction
The introduction of total mesorectal excision (TME) surgery and the use of neoadjuvant chemoradiation (nCRT) have reduced mortality and recurrence rate for rectal cancer patients, with an incidence of locoregional relapses after treatments of 4-8% [1][2][3]. Despite the low incidence, tumor recurrence remains the predominant concern for most rectal cancer survivors considering the relatively poor quality of life involved [4,5]. Besides the treatment procedure, several other factors such as tumor site, size, ethnicity, genetics, etc. could influence the chance of tumor recurrence [5][6][7] , and processing all these pieces of information to estimate the likelihood of a patient developing a tumor recurrence after treatment can be overwhelming, even for experts [8,9]. Predictive models such as Bayesian networks, which consider causal relationships between features, can, on the other hand, learn efficiently from large and heterogeneous volumes of available information and make inferences about future patients.
Bayesian networks are suitable for clinical applications because they can probabilistically reason under uncertainty with an intuitive clinical interpretation of the results [10][11][12]. Generally, they can be specified by an expert in the domain of interest or inferred from available data via a learning algorithm [11,13]. However, these methods may be challenging in healthcare. An algorithm-based structure can include spurious relationships that are not plausible or have no clinical meaning (e.g., causally linking gender to age) due to correlations in the data and the impossibility to determine the direction of causality from data [14]. On the other hand, a structure specified by an expert might be biased by the expert's prior knowledge and subjective domain experience.
One possible solution to this problem is to survey multiple experts' opinions. This study hypothesizes that eliciting multiple experts' opinions will give a reliable Bayesian network structure to predict local tumor recurrences at several time-points (2, 3, and 5 years) in rectal cancer patients whose predictions closely approximate the ground truth. To test this hypothesis, we implemented a solution to examine experts' opinions on the causal relationships to predict tumor recurrences in locally advanced rectal cancer (LARC) patients.

Materials and methods
A retrospective cohort of 6,754 diagnosed LARC patients treated with neoadjuvant chemoradiation followed by surgery from 1993 to 2014 from 14 international trial cohorts was analyzed for this study (Table S1 supplemental material). All the trials have different treatment protocols, patient characteristics, and accrual start dates. Only nonmetastatic rectal cancer patients treated with conventional preoperative radiotherapy were considered for this study. Patients with a surgical procedure different from anterior-resection or abdominoperineal resection, treated with adjuvant radiotherapy or incomplete radiotherapy treatment, were excluded due to their relatively low representation. Fig. 1 shows the variables under investigation in this study based on a timeline (T) of clinical practice availability. Local tumor recurrence at 2, 3, and 5 years was considered the endpoints of interest defined as detecting a tumor on the same sites it previously started after therapy.

Statistics
Data from the 14 trial cohorts were merged and split into training and validation sets by performing a random 80-20% split (stratified per cohort). The SMOTE algorithm [15] was used to address the class imbalance per response time point, and continuous variables were categorized based on literature and experts' suggestions. The circumferential resection margin (CRM) was dichotomized into positive if the tumor is ≤1 mm from the circumferential margin and negative if >1 mm (Table S2 supplemental material). Missing values were considered as a category (Unknown) for all variables. However, patients with missing information on their local recurrence status were excluded from further analyses. All analyses were conducted in R version 3.6.1 [16] using the bnlearn package [17], and GeNIe (Graphical Network Interface) application [18] was used for structural visualization. Model performance was assessed by generating calibration plots, model accuracy, and calculating the area under the curve (AUC) on training and validation sets for all time points of interest.
The median age of the 5404 patients in the training and 1350 in the validation cohorts was 61  and 61 (25-84) years. Patients' characteristics and treatment modalities are shown in Table 1.

Structure learning
The domain knowledge from multiple experts' in three international radiotherapy institutions (Gemelli, Maastro, and Gil Medical Center) was employed to develop and validate the Bayesian network structure. Two experts from Gemelli independently defined the causal relationship between the variables. These experts were requested to draw arrows between variables to indicate causal relationships without setting the relationships' importance. Relationships were restricted to only variables in the same time-point t or the next t + n. Arrows drawn from a given variable at time-point t to another variable in a preceding time- point t -n were considered invalid. For example, arrows from Clinical T to Clinical N stage or from Age to Clinical T stage are acceptable. However, arrows from Clinical T stage to Age are rejected. Two other experts from Maastro separately reviewed the subset of connections common to both experts from Gemelli. The Dutch experts were tasked to validate the relationship by agreeing or disagreeing with each of the connections between the variables made by Gemelli experts. Only connections where both experts agreed were considered for further evaluation. As a final validation, an expert from Gil Medical Center reviewed the subset of connections common to both experts from Maastro. Only the connections validated by the expert were used to construct the final structure. The expert-developed structure was checked for cycles, which are not allowed in Bayesian network structures.

Structure comparison
In order to compare the performance of the developed Bayesian network structure in predicting local recurrence in rectal cancer patients, a structure was also inferred solely from the data with the hill climbing (HC) algorithm [19] for each time point of interest using the same training and validation data as the expert structure. The structures were first compared structurally and then numerically using the AUC, sensitivity, and specificity values. Calibration plots that measure how similar the distribution and behavior of the predicted probabilities are to that observed in data were produced to further evaluate the performance of the structure. The HC algorithm, which looks for the best structure over the search space by adding, removing, and reversing arcs (arrows) in the DAG one at a time, was preferred because it is computationally efficient, and a random restart search was implemented to prevent the structure from getting stuck on a bad local optimum [20]. The Bayesian Information Criterion (BIC), a statistical goodness-of-fit measure that penalizes structural complexity, was used for the structure-learning process [19]. . 2 shows the resulting Bayesian network structure based on expert knowledge. This network achieved AUCs above 0.9 and 0.8 for predicting the risk of local recurrence on the training and validation data, respectively, for all time points of interest. Table 2 shows the accuracies, AUCs, and confidence intervals of the structure's performance on the training and validation data for the three-time points of interest.

Structure comparison
The Bayesian network structures resulting from the HC algorithm mentioned above used all 14 available variables (Fig. S1 supplemental material) with 32, 33, and 30 arcs for 2, 3, and 5 years tumor recurrence, respectively. On the other hand, the experts' structure had 19 arcs and 10 variables, excluding age, gender, adjuvant chemotherapy, and overall treatment time. The outcome had a direct patent-to-child connection with all 13 nodes for the structure at 5-years, and the outcome for the 2-year structure had 11 children, excluding the arc with adjuvant chemotherapy and CRM, while that of the 3-year structure had 10, excluding adjuvant chemotherapy, pathological T, and N which was quite the opposite for the expert structure with just six parents. The only similarity between the algorithmic and expert structures was the arc CRM to the outcome for the 2-years structure and the arcs pathological T and N to the outcome for the 3-years structure.
Based on the relationship of the variables with the outcome among the algorithmic structures, the arcs pathological T and N to the outcome for the 3-years structure were reversed in the 2, and 5-year structures, and the arc CRM to the outcome in the 2-year structure was reversed in the 3, and 5-year structures while the arc adjuvant chemotherapy to the outcome in the 2-year structure was reversed in the 5-year structure and absent in the 3-year structure. Age was only connected to the outcome  alone for 3 and 5 years structures. The numerical comparison of the expert and algorithmic structures based on their performance in terms of the AUC ( Fig. S2 supplemental  material), sensitivity, and specificity values for each time points in the training and validation data is shown in Table 3. The algorithmic structures had a slightly better performance than the expert structure for all three matrices of interest, especially in the validation data. However, the difference in the AUC values between the structures was not statistically significant (p-value >0.05) for all time points.
The calibration plots in Fig. 3 show a good match between the predicted probabilities and the observed frequencies in the training and validation cohort for both expert and algorithmic structures. Generally, the expert structure seems to be better calibrated than the algorithmic structures since most points are closer to the dotted diagonal gray line, representing an ideal or perfect structure, especially recurrence at 5years.

Discussion
In the current study, we have developed and internally validated a Bayesian network structure based on five experts' opinions from three different radiation therapy treatment institutions to predict tumor recurrence for locally advanced rectal cancer patients. The structure was developed to capture the biological process that leads to a tumor recurrence by connecting the variables in the structure based on a timeline of their clinical availability. The developed structure used nine clinical features well-known to clinicians as pivotal factors for predicting local tumor recurrence. The structure was well-calibrated with predictive performance based on the AUC values above 0.9 and 0.8 in training and validation data, respectively, for all time points of interest. Structures inferred from a learning algorithm used four variables more than the expert structure, including age, gender, adjuvant chemotherapy, and overall treatment time, with slightly better performance in terms of AUC values. However, the expert structure was clinically more plausible than the algorithmic structures and aligned with the clinical process. The choice of model to predict local tumor recurrence in rectal cancer patients in this study was influenced by two main reasons. Firstly, Bayesian networks better represent complex systems such as the clinical processes leading to tumor recurrence since they have more liberty to define interactions between variables [10,21], unlike the generally used regression method [22][23][24]. Secondly, their ability to make inferences on any variable(s) in the network makes them very valuable for decision support as they can serve as a diagnostic and prognostic tool. Thus far, this study is the first to assess the predictive value of Bayesian networks for tumor recurrence in rectal cancer patients which makes a direct comparison and interpretation with other studies arduous because of the difference in analytical approach and study design.
Nonetheless, Valentini et al. [22] previously developed a model to predict tumor recurrence in rectal cancer patients as a decision support tool. The model included age as a predictive factor for local recurrence, a variable missing in our expert structure. One apparent reason for this difference is the variable selection procedure, given that age is also present in the algorithmic structures. Algorithm-based variable selection methods exploit spurious correlations within the data, which might unnecessarily increase the complexity of the model and cause overfitting. Also, Farhat et al. [25] did not find age or gender a predictive factor for tumor recurrence in their 16-year respective study, which is in support of the expert structure. Although the controversy of included variables depends largely on patient heterogeneity between the studies, variable inclusion in a prediction model must not always be strictly algorithm-based, even if it improves its performance. Instead, it should contain some level of clinical understanding, context, or rationale, which domain experts are more suited to provide because a correlation between variables does not necessarily imply causality.
The algorithmic structures use four variables more than the expert structure, which explains its slightly better predictive performance. However, the additional variables increase the complexity of the structure since more connections between the variables are formed but with minimal predictive benefits (Table 3). Although the expert structure did not perform better than the algorithmic structures, we believe the expert structure might be more suitable for clinical use over the algorithmic structure. Firstly, the algorithmic structure uses parent-tochild connections not aligned with the clinical process (e.g., the pathological nodal stage has a causal link to age at start radiotherapy) to make decisions. These unconformable arcs imply that algorithmically generated structures are comparable to black-box models even with better performance on both data sets since the decision process lacks clinical explanation. Regarding model calibration, the expert structure seems to be better calibrated for all time points than the algorithmic structure with higher AUCs.
Tumor recurrence is a very challenging endpoint not only in terms of quality of life for cancer survivors [4,5] but also the difficulty in accurately predicting the endpoint [26]. Patient variability can explain this difficulty since a treatment regime that leads to recurrence-free for one patient might not give another patient the same outcome. Therefore, collaboration with domain experts is pivotal to have a more personalized prediction of tumor recurrence since they better understand tumor biology. The performance of the proposed expert structure is well above the chance level with clinically valid relationships. Therefore, it might be valuable in routine clinical settings as a decision aid to support personalized treatment decision-making. Also, it could guide clinicians to opt for a more aggressive adjuvant therapy to prevent the chance of a tumor recurrence for patients who have undergone surgery but with a high predicted probability of a tumor recurrence in the structure. However, the structure is trained on retrospective clinical trial data and warrants an external validation on routine clinical data to ascertain its clinical usefulness. In addition, the circumferential resection margin, a variable proven to influence local tumor recurrence [27,28], had a large proportion of missing information and will be worthwhile to retrain its conditional probabilities on a more complete dataset.
Despite the predictive performance of the Bayesian network structures in this study, there is still room for improvement. The international and multi-trial nature of our study may be seen as a limitation, given that it combines the contribution of experts from three cancer institutions and data from multiple clinical trials with different treatment protocols. However, this limitation could also be considered a strength, as it may make our findings more robust and generalizable. The multitrial combination is particularly relevant for this study since it enables the structure to be trained on a large sample size, which reflects the models' superior performance over other studies with relatively smaller sample sizes [22][23][24] given that model performance is proportional to training sample size. Also, this large sample size helped improve the Table 3 The AUC, sensitivity, and specificity values of the expert and algorithmic structures on the training and validation data at different time points. number of events given the disease's low event rate. Despite combining data from 14 different European trials, the number of local recurrence events was relatively low. Our study design also prevents updating the number of connections between variables since each expert is contacted only once for input. Also, some of the variables were categorized, leading to a loss of information. Lastly, blood tumor markers such as carcinoembryonic antigen (CEA), a proven predictive factor for local tumor recurrence in rectal cancer [25], were not included in the structure because of the study's retrospective nature.
In summary, we have developed and validated a Bayesian network structure from 14 trial cohorts' data by analyzing a total number of 6754 rectal cancer patients for predicting the risk of local recurrence in locally advanced rectal cancer patients at 2, 3, and 5 years. The causal relationships between the variables in the developed Bayesian network structure were proposed and validated by domain experts with years of experience from different international radiotherapy centers, where treatment protocols may differ. Our result showed that although structures from both methods performed above chance level, the algorithmicbased structures had higher discriminating power than the expert structure. However, they contained clinically incomprehensible arcs, making the expert structure more credible even with relatively lower predictive performance. Future research will combine these two Bayesian network structure learning approaches to produce clinically plausible structures with optimal predictive performance.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.