Background: Somatic EGFR mutations define a subset of non-small cell lung cancers (NSCLC) that have dinical impact on NSCLC risk and outcome. However, EGFR-mutation-status is often missing in epidemiologic datasets.
We developed and tested pragmatic approaches to account for EGFR-mutation-status based on variables commonly included in epidemiologic datasets and evaluated the clinical utility of these approaches. Methods: Through analysis of the International Lung Cancer Consortium (ILCCO) epidemiologic datasets, we developed a regression model for EGFR-status; we then applied a clinical-restriction approach using the optimal cut-point, and a second epidemiologic, multiple imputation approach to ILCCO survival analyses that did and did not account for EGFR-status.
Results: Of 35,356 ILCCO patients with NSCLC, EGFR-mutation-status was available in 4,231 patients. A model regressing known EGFR-mutation-status on clinical and demographic variables achieved a concordance index of 0.75 (95% CI, 0.74-0.77) in the training and 0.77 (95% CI, 0.74-0.79) in the testing dataset At an optimal cut-point of probability-score - 0335, sensitivity - 69% and specificity = 72.5% for determining EGFR-wildtype status.
In both restriction-based and imputation-based regression analyses of the individual roles of BMI on overall survival of patients with NSCLC, similar results were observed between overall and EGFR-mutation-negative cohort analyses of patients of all ancestries. However, our approach identified some differences: EGFR-mutated Asian patients did not incur a survival benefit from being obese, as observed in EGFR-wildtype Asian patients.
Conclusions: We introduce a pragmatic method to evaluate the potential impact of EGFR-status on epidemiological analyses of NSCLC. Impact: The proposed method is generalizable in the common occurrence in which EGFR-status data are missing.