Background The European Society for Medical Oncology-Magnitude of Clinical Benefit Scale (ESMO-MCBS) is a validated value scale for solid tumour anticancer treatments. Form 1 of the ESMO-MCBS, used to grade therapies with curative intent including adjuvant therapies, has only been evaluated for a limited number of studies. This is the first large-scale field testing in early breast cancer to assess the applicability of the scale to this data set and the reasonableness of derived scores and to identify any shortcomings to be addressed in future modifications of the scale.
Method Representative key studies and meta-analyses of the major modalities of adjuvant systemic therapy of breast cancer were identified for each of the major clinical scenarios (HER2-positive, HER2-negative, endocrine-responsive) and were graded with form 1 of the ESMO-MCBS. These generated scores were reviewed by a panel of experts for reasonableness. Shortcomings and issues related to the application of the scale and interpretation of results were identified and critically evaluated.
Results Sixty-five studies were eligible for evaluation: 59 individual studies and 6 meta-analyses. These studies incorporated 101 therapeutic comparisons, 61 of which were scorable. Review of the generated scores indicated that, with few exceptions, they generally reflected contemporary standards of practice. Six shortcomings were identified related to grading based on disease-free survival (DFS), lack of information regarding acute and long-term toxicity and an inability to grade single-arm de-escalation scales.
Conclusions Form 1 of the ESMO-MCBS is a robust tool for the evaluation of the magnitude of benefit studies in early breast cancer. The scale can be further improved by addressing issues related to grading based on DFS, annotating grades with information regarding acute and long-term toxicity and developing an approach to grade single-arm de-escalation studies.
- early breast cancer
- magnitude of clinical benefit scale
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, any changes made are indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is already known about this subject?
Form 1 of the European Society for Medical Oncology-Magnitude of Clinical Benefit Scale (ESMO-MCBS) serves to score therapies with curative intent. To date, very limited field testing has been performed to assess the scale in the curative setting.
What does this study add?
We evaluated the applicability of the scale and assessed the reasonableness of the generated scores in early breast cancer. Form 1 of the ESMO-MCBS V.1.1 provided a generally robust tool for scoring of adjuvant breast cancer studies. Six shortcomings were identified including lack of information regarding acute and long-term toxicity, an inability to grade single-arm de-escalation scales and limitations related to grading based on disease-free survival.
How might this impact on clinical practice?
The identified shortcomings in form 1 of the ESMO-MCBS V.1.1 will be rectified in the upcoming version 2.0 of the scale to strengthen the validity of that scale and its generated results. These developments have important implications for data interpretation, public health and clinical decision-making.
As the population ages, the incidence and prevalence of cancer are expected to continue to rise both in developed1 and developing countries.2 The estimated total annual economic cost of cancer was US$1.16 trillion in 2010, about 2% of global gross domestic product3 and is continuing to rise exponentially. Breast cancer remains the leading cause of cancer among women2 and the ongoing care of breast cancer patients is estimated to be one of the most significant contributors to growing cancer care expenditure.4
These considerations underscore the need for validated tools to evaluate value of care, where value is recognised as a balance between clinical benefit and cost. With this in mind, both the European Society for Medical Oncology (ESMO) and the American Society of Clinical Oncology (ASCO) established Working Groups to address these issues and they have developed and published a platform for evaluating new anticancer therapeutics—the ESMO-Magnitude of Clinical Benefit Scale (ESMO-MCBS)5 and the ASCO Framework for assessing value of cancer care.6
The ESMO-MCBS was initially launched and published in 20155 and revised in 2017 with version 1.1.7 The scale aims to provide a validated and rational stratification process for oncology therapies, and its development process has been predicated on ‘accountability for reasonableness’ which incorporated extensive field testing and the peer review of results for ‘reasonableness’.7 Form 1 of the ESMO-MCBS, which is used to grade therapies with curative intent including adjuvant therapies, hitherto, has only been applied in a limited number of studies. Form 1 of the ESMO-MCBS grades therapies with curative intent on a three-point scale A, B and C where scores of A and B represent substantial improvement.
This is the first large-scale field testing of form 1 in early breast cancer to assess the applicability of the ESMO-MCBS in this setting, to determine whether the scoring reflected clinical practice (reasonableness) and to identify shortcomings to be addressed in future versions of the scale. It also provides an overview of the magnitude of benefit for the most common therapies/therapeutic strategies in the field of breast cancer, allowing for a critical reassessment of available options.
ESMO-MCBS V.1.1 form 1, designed to evaluate adjuvant and neoadjuvant studies, was applied to all the selected studies (online supplementary data).
Representative key studies and meta-analyses of the major modalities of adjuvant systemic therapy of breast cancer (chemotherapy or endocrine therapy or anti-HER2 therapy) were identified for each of the major clinical scenarios (HER2-positive, HER2-negative, endocrine-responsive). Studies were identified through PubMed, Food and Drug Administration (FDA) and European Medicines Agency (EMA) registration sites. Pivotal phase 3 studies that have formed the basis for contemporary treatment practice and a randomised phase 2 study that resulted in preliminary drug registration8 were scored.
To identify the pivotal phase 3 studies, a PubMed search was performed with the following search criteria: “breast cancer”[Title] AND breast[Title] AND cancer[Title] AND adjuvant[Title] OR neo-adjuvant AND “2002”[Date—Publication] : “2019”[Date—Publication] AND English[Language] AND “randomized controlled trial” OR “phase 3” OR “randomized phase 2” NOT retrospective[Title/Abstract] NOT historical[Title/Abstract] NOT “systematic review”[Title] NOT advanced[Title] NOT metastatic[Title] NOT irradiation[Title] NOT safety[Title] NOT insights[Title] NOT observations[Title] NOT “quality of life”[Title] NOT biosimilar[Title] NOT analysis[Title] NOT analyses[Title] NOT radiation[Title]. There were 597 studies identified from the search. Relevant studies that were comparative phase 3 randomised controlled studies were identified and subsequently cross-referenced with the FDA and EMA registration sites and ESMO9 and National Comprehensive Cancer Network (NCCN)10 guidelines to identify pivotal and practice changing studies. Key meta-analyses referenced by ESMO9 and NCCN10 guidelines were identified.
Studies were eligible for scoring if they were randomised comparative studies comparing new therapies to standard of care or meta-analyses of those studies. Studies were scored if they met the scoring criteria defined by the ESMO-MCBS guideline according to the criteria in form 1. Where missing data impeded scoring, the corresponding author was contacted with a request for data or clarification. If no response was received, the study was either marked as not scorable (this occurred for only one study11 and one meta-analysis12) or excluded (if there was inadequate data reported). All scoring was reviewed for accuracy by members of the Magnitude of Clinical Benefit Working Group and the generated scores were reviewed by the ESMO Breast Cancer Faculty for reasonableness.
Scoring was performed in accordance with the rules for application of the ESMO-MCBS.5 7 Studies initially evaluated based on disease-free survival (DFS) criteria alone or pathological complete remission (pCR) rate were re-evaluated when mature overall survival (OS) data are available and a final score was determined based on these OS results. The only exception was for studies that were un-blinded after compelling early DFS results with subsequent access to the superior arm, whereby OS results were contaminated by the crossover and therefore were not evaluable.
Studies that could not be scored were classified into one of three groups: (1) studies that did not achieve statistical significance, designated ‘no evaluable benefit’ (NEB), (2) non-inferiority studies in which non-inferiority was not verified, designated ‘negative non-inferiority’ (NNI), (3) studies that could not be scored because required data were not included in the publication, designated ‘scoring not applicable’ (SNA) and (4) not-scorable subgroup data
Sixty-five studies were eligible for evaluation: 59 individual studies and 6 meta-analyses (5 of which were individual patient-level data meta-analyses), which yielded data relevant to 101 therapeutic comparisons, 61 of which demonstrated significant benefit or non-inferiority and could be scored.
Polychemotherapy versus no chemotherapy
Both cyclophosphamide methotrexate and 5-fluorouracil (CMF) and anthracycline-based therapy were found to be superior to no chemotherapy (in a predominantly node-positive population), both scoring an A compared with no treatment in the meta-analysis, with a 15-year gain in breast cancer mortality of 6.2% and 6.5%, respectively(table 1).13
CMF versus anthracyclines
Four cycles of doxorubicin and cyclophosphamide (AC×4) were not found to be superior to CMF×6 in the meta-analysis.13 Benefit of CAF (cyclophosphamide/doxorubicin/fluorouracil)/FEC×6 (fluorouracil/epirubicin/cyclophosphamide) over CMF×6 was not reported in individual studies,14 15 but was demonstrated in a meta-analysis, with a 10-year OS gain of approximately 4% (grade B) (table 1).13
The three studies that evaluated the addition of a taxane to an anthracycline-based regimen all demonstrated gains in DFS, but mature survival data was available for only one of these studies with no significant survival advantage and therefore classified as NEB.16–18 The MA-21 study compared AC×4 followed by paclitaxel to both cyclophosphamide/epirubicin/fluorouracil (CEF) and dose-dense (dd) epirubicin/cyclophosphamide followed by paclitaxel in patients with node-positive and high-risk node-negative disease.19 Both study regimens demonstrated superiority to AC×4 followed by paclitaxel based on 30-month DFS gain with no OS data available (grade A) (table 2).
In a meta-analysis, the addition of a taxane to an anthracycline demonstrated a small survival advantage at 8 years follow-up (grade C).13 In this meta-analysis, the assessed cohorts consisted predominantly of patients with node-positive disease.
Docetaxel and cyclophosphamide (TC) ×4 was superior to AC×4, demonstrating a 6% gain in OS at 7-year median follow-up (grade A).20 21 However, a joint analysis of three trials comparing TC×6 to combinations including AC and a taxane did not establish non-inferiority of TC×6 when compared with a combined taxane–anthracycline regimens.22
Other chemotherapy regimens
In all the dose-dense(dd) regimen trials, the high-risk, node-positive population demonstrated OS advantage (two studies in grade B, one study in grade C).23–25 The two studies with longest median follow-up achieved the highest grades.24 25 Two meta-analyses confirmed the superiority of dd regimens over standard scheduling (table 3).26 27
Post-neoadjuvant capecitabine for patients with incomplete pathological response after neoadjuvant therapy demonstrated survival benefit of more than 5%, at a median of 3.6-year follow-up for the intention-to-treat (ITT) population and for the triple negative subgroup (grade A).28
The addition of neoadjuvant carboplatin for patients with triple negative breast cancer demonstrated a benefit in the GeparSixto study for both pCR and DFS with an absolute DFS gain of 9.6%29 and a benefit in pCR in the BRIGHTNESS study of 15.8% compared with the non-carboplatin arm.30 The CALGB 40603 did not demonstrate an outcome benefit from the addition of neoadjuvant carboplatin or bevacizumab despite improvements in pCR and was categorised as NEB.31
In the NSABP B40 study, there was no benefit of the addition of gemcitabine or capecitabine to standard neoadjuvant chemotherapy regimens.11 32 This study reported an OS benefit from the addition of neoadjuvant bevacizumab with a HR of 0.65 (95% CI 0.49–0.88); however, since the absolute survival benefit was not published, this was not evaluable (SNA).11
In the GeparSepto study, neoadjuvant nab-paclitaxel demonstrated a limited improvement in pCR rate compared with paclitaxel, however the gain was below the ESMO-MCBS threshold for scoring the ≥30% relative and >15% absolute pCR gain).33
All the 12-month adjuvant trastuzumab studies demonstrated substantial benefit (grade A or B).34–36 Two years of trastuzumab was not superior to 12 months.34 While several studies failed to demonstrate non-inferiority of shorter duration of trastuzumab therapy,37–39 the PERSEPHONE study demonstrated non-inferiority for 6 months versus 12 months of trastuzumab and scored a B based on non-inferiority and reduced cost (table 4).40
Four of the five studies testing double blockade with trastuzumab plus a second anti-HER2 agent derived scores based on surrogate outcomes of pCR for neoadjuvant studies or DFS (table 5).
In the APHINITY study, evaluating the addition of pertuzumab to trastuzumab, the ITT population scored grade B.41 The node-positive subgroup was not scorable since this was 1 of 12 evaluated subgroups in an exploratory analysis and was, therefore, not eligible for grading (of note, the ESMO-MCBS allows only for scoring of subgroups only if there were up to three planned subgroups in the study design).41
Second-generation anti-HER2 therapies
In patients with residual disease after neoadjuvant anti-HER2-based therapy, completing 1 year of trastuzumab emtansine (T-DM1) demonstrated large improvement in DFS compared with trastuzumab (grade A).45
Adjuvant endocrine therapy
The addition of 5 years of tamoxifen compared with placebo was graded an A based on increased long-term OS by 6% and 9% at the individual trial level and in the meta-analysis level, respectively (table 6).46 47
The aromatase inhibitor studies to score an A were the Intergroup Exemestane(IES) study and the Italian Tamoxifen Anastrozole (ITA) study. The ITA study score was credited based on DFS results alone in the absence of mature OS data.48 Among the five studies with mature OS data, the data in two did not meet significance thresholds49–52 and the OS gain merited scores of B53–55 or C in the other three.56–58 Comparison aromatase inhibitor alone for 5 years with a switch regimen including tamoxifen and an aromatase inhibitor (2.5 years each) were credited on the basis of non-inferiority in OS and reduced toxicity compared with aromatase inhibitor alone (table 7).52 55 59 60
Meta-analysis data resulted in a C score for the use of an aromatase inhibitor alone in the adjuvant setting, and a C when used a part of a switch after tamoxifen.60
In the premenopausal population, the addition of an aromatase inhibitor (with ovarian function suppression) scored a C when compared with tamoxifen with ovarian function suppression, in the combined SOFT-TEXT study,61–63 but it did not score in the ABCSG-12 study.64
Extended endocrine therapy
In the MA-17 study of 5 years letrozole or placebo after 5 years tamoxifen, the node-positive subgroup scored A based on DFS criteria.65 66 Other studies of extended aromatase inhibitor failed to demonstrate improvement in OS.67–69 The ATLAS (Adjuvant Tamoxifen: Longer Against Shorter) study of 5 years versus 10 years of adjuvant tamoxifen demonstrated a 2.8% reduction in breast cancer mortality (grade C) (table 8).70
Ovarian function suppression in premenopausal women
Three studies were evaluated. Two mature studies did not demonstrate significant OS gain.61 64 71 72 In the SOFT study, a 1.8% OS advantage was observed in the tamoxifen with ovarian function suppression (OFS) arm, scoring a C, and in the subgroup of patients who had received prior chemotherapy the observed gain in OS was 4.3% (grade B) (table 9).63
Adjuvant bone-modifying agents
None of the six individual studies demonstrated a survival advantage. A meta-analysis identified a reduction in breast cancer mortality of 1.8% (grade C), largely derived from the benefit observed in postmenopausal subgroup where the benefit was 3.3% (grade B) (table 10).73
Expert peer review of the generated results
The scores generated in this field testing were reviewed by the ESMO Breast Cancer Faculty for reasonableness. Apart from the scores for double HER2 blockade, the derived scores were more commonly endorsed as reasonable than unreasonable. There was no consensus about the grading for double HER2 blockade (unreasonable 32%; reasonable 29%): many respondents expounded that the scores for the APHINITY and ExteNET studies, derived from the relative benefit gain in DFS but with very small absolute benefit, were excessively high. In situations when the primary outcome of the study was DFS, and a robust DFS benefit was observed (in terms of both relative and absolute benefits) but without significant OS benefit, a proportion of reviewers expressed that a grade of NEB under represented the clinical value of prolonged interim time without disease, treatment and toxicity.
The validity of the ESMO-MCBS is predicated on adherence to the public policy ethical standard of ‘accountability for reasonableness’ and the field testing of the scale over a large range of clinical trials is an important part of the development process. This study, applying the ESMO-MCBS V.1.1 to 59 individual trials and 6 meta-analyses, has demonstrated that form 1 of the ESMO-MCBS can be applied to systemic adjuvant therapy trials. Moreover, apart from a few specific exceptions, the generated grades were considered reasonable by experts in the ESMO Breast Cancer Faculty, largely reflecting standard clinical practice.
Applying the scale and interpreting the results was, in most instances, straightforward. A small number of studies did not incorporate all critical data in accordance with CONSORT standards. In some instances HRs were published without CIs, some meta-analyses did not include absolute gain data for OS12 and some studies report the HR to reflect increased recurrence risk (eg, MA-21).19 Furthermore, even with long-term follow-up, some studies never published follow-up of their mature survival data. Since magnitude of benefit grades derived from OS gain at maturity is often less than that derived from DFS, the non-publication of mature OS results occasionally resulted in disproportionally high scores in some studies. This is well illustrated in two examples: no mature survival data were ever published for the ITA study by Boccardo et al which evaluated switching from tamoxifen to an aromatase inhibitor48 and the MA21 study that evaluated the addition of paclitaxel to an anthracycline.19 Consequently, these were among the few studies in their respective classes to score an A, while all others for which mature survival data were available scored C or NEB. We note that this anomaly could be misinterpreted to suggest superiority, or even manipulated with delays or even non-reporting of mature OS data to avoid downgrading.
We note that the ESMO-MCBS is agnostic to DFS type and does not distinguish between DFS, invasive DFS (iDFS) and distant DFS (DDFS) that is also called ‘distant metastasis-free survival’. In recent years, there has been a shift to more accurate end points such as invasive iDFS or DDFS, which are better surrogates for OS benefit,74 since they emphasise events that are more closely related to cancer mortality (ie, invasive relapse or distant metastases). This underscores the importance of new initiatives to introduce standardisation in the definitions and application of these end points.74 75
A key aim of this study was to identify shortcomings in the current version of form 1 which will be addressed in future versions of the scale. This field testing and peer-review process identified six shortcomings in form 1. All of these shortcomings have been reviewed by the ESMO-MCBS Working Group and initiatives are underway to address each of them as part of the forthcoming revisions to be incorporated in the next version of the scale (V.2.0).
1. HR thresholds for DFS are excessively lenient: The experience of this field testing indicates that trials initially graded on the basis of DFS in initial publications, commonly attained lower scores when mature OS data were available and that in many cases the OS gains were not significant. This indicates that the relative benefit thresholds for grade B and C (lower limit of the 95% CI of the HR 0.65–0.85 and >0.85, respectively) are excessively lenient. Consequently, we recommend lowering of the HR thresholds for grades B and C.
2. Lack of absolute gain constraint on DFS scoring can generate inappropriately high scores when absolute gain is very small: Expert peer reviewers concerned that grades accrued on the basis of relative benefit when the observed absolute benefit is very small were unreasonably high. This was highlighted in their critique of scores generated in the APHINITY41 and ExteNET44 trials. This could be corrected by applying the ‘dual rule’ whereby grade criteria include both relative and absolute benefit thresholds in a manner that is constant with all other forms of the ESMO-MCBS V.1.1.
3. The clinical benefit derived from DFS gain is not credited when OS gain is not verified. In many instances, gains derived from DFS were not credited when there was no significant gain in mature OS. When substantially improved DFS does not result in improved OS, the grading of NEB undervalued the time gained without need for medical treatment, which may itself be a valued outcome independent of OS.76
4. Need to define OS maturity in adjuvant studies: According to the ESMO-MCBS V.1.1, surrogate scores prevail if mature OS data are not yet available. Maturity is generally defined as the time point where most of the anticipated events will have occurred. In a non-curative setting, when all patients are expected to die, conventionally it is when the median survival of both arms is reached. However, in the adjuvant setting, when the number of anticipated events may vary according to the tumour type and stage, this convention does not apply.
Consequently, evaluating maturity of survival data in this setting requires familiarity with the specific clinical scenario and it is conceivable that in some instances this may be source of reasonable disagreement even between experts. ESMO-MCBS instructions for use should include guidelines for OS maturity. For example, 5 years for subtypes at high risk for earlier recurrence (such as triple negative and HER2-positive/endocrine unresponsive subtypes) and at least 8 years for endocrine responsive tumours (including HER2-positive/endocrine-responsive).77
5. Lack of capacity to grade single-arm de-escalation studies in the curative setting: A recent single-arm phase 2 study reported excellent outcomes for node-negative HER2-positive breast cancers smaller than 2 cm treated with the combination of paclitaxel and trastuzumab (without an anthracycline).78 These type of studies are often used to evaluate de-escalation strategies. Form 1 is unable to grade these studies.
6. Lack of consideration of toxicity in the curative setting: The current version of form 1 does not consider toxicity. The shortcoming of this approach is illustrated by the ExteNET study that scores an ‘A’ for the hormone-positive subgroup despite very substantial toxicity secondary to the neratinib, which resulted in a 27.6% discontinuation rate.44 While we appreciate that patients may be willing to make short-term toxicity trade-off to improve cure rate, it is not clear that this approach applies also for long-term toxicity such as peripheral neuropathy or secondary cancers (especially when improvement in cure rate may be small). We support the proposition, initially made by patient advocacy groups, that ESMO-MCBS scores in form 1 should be annotated to indicate acute and/or long-term toxicities.
In a time of exponential growth in the costs of cancer care, tools to assist physicians and regulatory bodies in evaluating new therapeutic options are critical. This study reinforces the validity of the ESMO-MCBS approach to adjuvant therapies insofar as the scoring of adjuvant approaches in early breast cancer largely reflects standard clinical practice. This field testing has identified six shortcomings that have been reviewed by the ESMO-MCBS Working Group and that form the foundation for amendments to be incorporated into future iterations of the ESMO-MCBS.
The authors wish to thank and acknowledge our oncology colleagues listed who participated in the field testing for ‘reasonableness’ of scorings using form 1 of the ESMO-MCBS and who have agreed to place their names in this publication (online supplementary appendix 1). We also thank those who wished to remain anonymous.
Contributors Conception of the work: all authors. Funding acquisition: not applicable. Data collection and data analysis: all authors. Manuscript writing/editing: all authors. Final approval: all authors.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests SP-S reports institutional financial support for her advisory role from Astra Zeneca, Pfizer, Novartis, Roche, Teva, NanoString; EGEdV reports institutional financial support for her advisory role from Daiichi Sankyo, Merck, NSABP, Pfizer, Sanofi, Synthon and institutional financial support for clinical trials or contracted research from Amgen, AstraZeneca, Bayer, Chugai Pharma, CytomX Therapeutics, G1 Therapeutics, Genentech, Nordic Nanovector, Radius Health, Regeneron, Roche, Synthon; MJP reports scientific board member for Oncolytics, consultant honoraria from AstraZeneca, Camel-IDS, Crescendo Biologics, Debiopharm, G1 Therapeutics, Genentech, Huya, Immunomedics, Lilly, Menarini, MSD, Novartis, Odonate, Periphagen, Pfizer, Roche, Seattle Genetics, research grants to institute AstraZeneca, Lilly, MSD, Novartis, Pfizer, Radius, Roche-Genentech, Servier, Synthon; FC reports institutional financial support for her advisory role from Astellas/Medivation, AstraZeneca, Celgene, Daiichi-Sankyo, Eisai, GE Oncology, Genentech, GlaxoSmithKline (GSK), Merck-Sharp, Merus BV, Novartis, Pfizer, Pierre-Fabre, Roche, Sanofi, Teva.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information. All data freely available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.