Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
Statistics from Altmetric.com
The ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS) is a standardised, generic, validated tool to stratify the magnitude of clinical benefit that can be anticipated from anticancer therapies. The ESMO-MCBS is intended to both assist oncologists in explaining the likely benefits of a particular treatment to their patients as well as to help decision makers prioritise outstanding new drugs for reimbursement.
Which payers/reimbursement agencies were consulted on the use of this tool? How does ESMO envisage payers using the ESMO-MCBS?
The scale has not been discussed with any payer organisations.
The scale was developed as a tool to derive clear and unbiased evaluation of the magnitude of clinical benefit based on published peer-reviewed data.
When governments are the payer, as is common in many European countries, we envisage that the ESMO-MCBS will assist in the Health Technology Assessment process. This is described in our paper1 where we write: “Grading derived from the ESMO-MCBS provides a backbone for value evaluations for cancer medicines. Medicines and therapies that fall into the ESMO-MCBS A+B for curative therapies and 4+5 for non-curative therapies should be highlighted for accelerated assessment of value and cost-effectiveness. While a high ESMO-MCBS score does not automatically imply high value (that depends on the price), the scale can be used to frame such considerations and can help public policymakers advance ‘accountability for reasonableness’ in resource allocation deliberations.”
Scale structure and criteria
Will ESMO consider taking proportional benefit (ie, a percentage gain relative to current survival in orphan or difficult-to-treat cancers) into account beyond focusing on absolute overall survival (OS)/progression-free survival (PFS)? If proportionality were to be taken into account, then the percentage gain of OS for a drug in a rare or difficult-to-treat cancer would have quite a different benefit rating. Equally some of the cancers for which survival is already very good might have their rankings changed.
All of the quantitative elements of the scale, both for PFS and OS, take into account both absolute gain and hazard ratio (HR). HR data takes proportionality into account.
The scale is stratified for diseases with better and worse prognosis, that is, there is a different scoring of PFS for disease with PFS in the control arm >6 months or <6 months, and for median OS >12 months or <12 months.
If the ESMO-MCBS is only to be applied to comparative studies, what will that mean for breakthrough therapies, licenced on single arm data?
The grading of single arm data will be incorporated into version 1.1 of the ESMO-MCBS.
Evaluation form 1—for new approaches to adjuvant therapy or new potentially curative therapies grade A (for grade B and C analogously): What is the specific meaning of ‘in studies without mature survival data’? Assuming we have survival data at 3 years but the improvement is 4%. At the same time, we have an improvement in disease-free survival (DFS) of HR=0.60. What would then be the correct grading? Grade A (both conditions connected by OR) or grade B (condition on DFS only matters if 3 years survival data are missing)?
In general, once mature survival data becomes available, the DFS, which is a surrogate indicator of survival, becomes redundant and in the example given here the ESMO-MCBS score would be B.
However, in rare instances in which DFS was the primary outcome and where there was major crossover because of early analysis leading to an unblinding of the randomisation (as happened in some of the trastuzumab trials) the DFS scoring would prevail.
How is the statistical significance of effects incorporated in your method? Since it is not explicitly mentioned in the text of the article, can we assume that a statistically significant result (ie, p value <0.05) for a primary end point is a general requirement for a positive grading?
YES. In the section ‘eligibility for application of the ESMO-MCBS’ the article1 is very specific: ‘The ESMO-MCBS can be applied to comparative outcome studies evaluating the relative benefit of treatments using outcomes of survival, quality of life (QoL), surrogate outcomes for survival or QoL (disease-free interval, event-free survival (EFS), time to recurrence, PFS and time to progression) or treatment toxicity in solid cancers. Eligible studies can have either a randomised or comparative cohort design or a meta-analysis which report statistically significant benefit from any one or more of the evaluated outcomes’.
Can you advise if there is any upgrading of the ESMO-MCBS if QoL is evaluated as a secondary outcome (eg, first question listed below is marked)? Or is there only an upgrading of 1 point if QoL is improved or there is less toxicity (eg, only if second or third question listed below is marked)?
YES. Upgrading for improved QoL is incorporated in to the evaluation of both OS studies (form 2a) and PFS studies (form 2b illustrated below)
QoL/grade 3–4 toxicities assessment
*This does not include alopecia and myelosuppression, but rather chronic nausea, diarrhoea, fatigue, etc.
Downgrade 1 level if there is one or more of the above incremental toxicities associated with the new drug
Upgrade 1 level if improved QoL or if less grade 3–4 toxicities that bother patients are demonstrated
When OS as secondary end point shows improvement, it will prevail and the new scoring will be carried out according to form 2a
Downgrade 1 level if the drug ONLY leads to improved PFS and QOL assessment does not demonstrate improved QoL
Final, toxicity and QoL adjusted, magnitude clinical benefit grade
Highest magnitude clinic benefit grade that can be achieved grade 4.
In PFS studies in which QoL is evaluated, a positive outcome demonstrating either improved QoL or delayed deterioration in QoL provides further supportive evidence regarding the significance of the PFS advantage reported.
If improved QoL OR delayed deterioration in QoL is observed in QoL evaluation (statistically significant), then the score can be upgraded by 1 point.
If however, PFS improvement is not accompanied by OS advantage and evaluation of QoL does not confirm that QoL was either improved or deterioration is delayed, this essentially devalues the PFS and scores are reduced by 1 point.
In our field testing there were many instances of scores being upgraded to 4 based on both high PFS score and improved QoL. There were three instances where PFS scores were downgraded because PFS was not associated with either statistically significant improvement in OS or a significant positive effect on QoL.
The rationale for this approach is that PFS is, in general, an unreliable surrogate for both OS and QoL improvement, and secondary outcomes can either lend veracity to the findings or indicate that they are of lesser clinical significance to patient outcomes.
You are describing for forms 2a and 2b that OS is graded based on the lower limit of the 95% CI of the effect. There is no specification how exactly the gain (in median survival) or the increase in survivors is graded. Is it correct to assume that you suggest using the point estimate of the effect (rather than any limit of a CI)? Which measures should be used in form 1 limits of CIs or point estimates?
In forms 2a and 2b, HR is evaluated based on the lower limit of the 95% CI. The median survival of the control arm and the gain in median survival (for both PFS and OS) are calculated based on point estimates.
In form 1, the HR for DFS is evaluated based on the lower limit of the 95% CI. The difference in 3 (or more) year survival, the δ between the curves, is based on point estimates and must be statistically significant.
There is also currently an anomaly that if no health-related quality of life (HRQoL) data is collected this element cannot be scored, but this alone could drive differentiation between therapies. This is the situation for studies of bevacizumab in platinum-sensitive relapsed ovarian cancer which did not collect HRQoL data. I would suggest that the scale should deduct if there is a HRQoL detriment, add to the score if there is HRQoL improvement and make no change if there is no difference.
PFS studies are in some ways the low hanging fruit of registration outcomes and since PFS is such a weak surrogate for survival or QoL it can be upgraded if benefit is supported by secondary outcomes such as OS or QoL advantage. Many studies in other diseases (often without OS advantage because of crossover at progression) demonstrated improved QoL or delayed deterioration in QoL, thus qualifying them for a high score of 4. Studies are only penalised if secondary data failed to show OS advantage AND confirmed that there was no QoL advantage.
Studies that do not evaluate QoL, particularly those with crossover, cannot be upgraded to a high ESMO-MCBS score; this is why it is in the interest of industry sponsors and researchers to evaluate QoL. The risk in this, however, is that sometimes a lack of QoL advantage is confirmed and this actually lowers the value of the PFS score. Be aware that this occurred with olaparib and with bevacizumab and everolimus in breast cancer which similarly have strong PFS data but no OS or QoL advantage.
The ESMO-MCBS currently cannot contextualise other aspects potentially relevant to healthcare decision-making, such as the level of unmet clinical need, the maturity of data and the level of confounding of overall survival which are increasingly relevant in oncology with accelerated approvals.
ESMO-MCBS data is unrelated to licencing approvals; as described in the article,1 it is a tool to assist in the processes of Health Technology Assessment (HTA) (which takes into consideration unmet needs etc). The tool is stratified for prognosis and this is one of its strengths. Finally, when mature data evolves and is published in peer-reviewed papers it can generate an upgrading. This was also commonly observed, particularly in melanoma and colorectal cancer studies (also described in the article).
The scale does not consider the heterogeneity of patient populations, but instead assumes all patients would have similar benefits from the treatments.
For all treatments there will be some patients who will be outstandingly unresponsive and, at the other extreme, there will be exceptional responders. To some degree, the ESMO-MCBS addresses this by looking at not only median survival data but also late survival data (in situations when mature data is available). Thus, for diseases with a control median survival of less than 12 months, the scale evaluates and credits survival advantage at 2 years, and for diseases with control median survival of more than 12 months 3-year survival advantage is credited.
With this grading, treatments that are meant for patients with refractory disease or those with a poor prognosis are inherently disadvantaged?
One of the strengths of the ESMO-MCBS is its prognostic stratification. In the design of this tool we have gone to great lengths to give recognition to improvements in outcomes for both poor prognostic diseases, which are diseases that have an OS of <12 months in the control arm, or median PFS of <6 months in the control arm. Thus, smaller absolute gains in either PFS or OS can achieve the same scale of scoring in poor prognostic diseases as a greater gain seen in better prognostic situations.
Using an active versus placebo comparator was not accounted for, especially within the prostate cancer scores.
The Helsinki requirements of the World Medical Association for Ethical Human Research require that the control arm be ‘best evidence supported practice’.2 This is essentially limited to the situation in which there is no evidence-based standard for ongoing disease modifying care. Thus, in all situations where there is an evidence-based standard for ongoing care this must be incorporated as a control arm.
With regard to castration-resistant prostate cancer, particularly after the failure of docetaxel chemotherapy, both options of either prednisone alone or prednisone in combination with mitoxantrone have been incorporated as control arms in different studies.3 Given that the studies comparing mitoxantrone/prednisolone versus prednisolone alone showed no survival advantage both have been accepted as reasonable control arms. It is, however, arguable that mitoxantrone/prednisone should be the appropriate control arm since there is some, albeit relatively soft, evidence that this combination improves patient-reported subjective outcomes, although this view is not universally held.
The stratification based on the control arm outcomes is binary and somewhat arbitrary. This stratification of the OS and PFS could result in different grades for therapies and can be misleading about the outcomes.
Part of the challenge in developing the scale has been to introduce a clear and simple approach to grading, but which incorporates enough nuance to be able to fairly represent studies in different prognostic settings and using different outcome measures.
The prognostic cut-off points were derived based on the input of multiple expert clinicians as well as detailed statistical review. Based on our experience in field testing that was published1 and reviewed by experts in each of the different disciplines (including prostate cancer) the feedback was that these cut-off points were fair and reasonable. However, this will be one of the subjects which will be reviewed in our ongoing deliberations regarding the development of the ESMO-MCBS.
Adding a sentence indicating that a small magnitude of clinical benefit may be of great importance in some types of tumours with less therapeutic options and could help to highlight that there are differences across tumour types when interpreting the results of this ESMO-MCBS.
The Annals of Oncology article published in 20151 is predicated on the commitment to promote professional integrity in discussing anticipated outcomes and treatment options. This commitment would suggest that when the likelihood of benefit or the quantity of expected benefit is small this ought to be explained to patients (with care and sensitivity) as part of the process of informed decision-making.
In situations where there is no treatment with a high ESMO-MCBS we would advise clinicians to counsel patients explaining that there are some treatments that may provide benefit to them, but that on average the likelihood or the amount of benefit may be limited. This discussion should also address best and worst case outcomes and the options to either consider participating in research (if a relevant trial is available) or to receive supportive and palliative care without anticancer treatment which may be quite reasonable in this setting.
When reviewing the colorectal and ovarian tables, I wonder if phase II trials, which are essentially exploratory, without a prespecified statistical hypothesis should be removed completely from these tables.
As discussed in the Annals of Oncology article,1 we carefully researched the issue of randomised phase II papers. Our conclusion to include them is supported and referenced by a major review of the validity of this genre of study by a cohort of leading cancer biostatisticians which concluded “Although each side of this debate has forcefully presented the favourable attributes of their nominated trial design, all the authors acknowledge that efﬁcient drug development will require the appropriate use both SA-II and RP-II trials. Table 4 in the Annals of Oncology article1 provides some guidelines regarding scenarios that the authors unanimously agree would favour the use of one particular trial design. SA-II trials may be preferred for single agents with tumour response end points—especially in rare tumours whereas RP-II may be preferred for trials of combination therapy and/or with time to event end points. The key requirement before considering an RP-II trial is a determination that completion of such a trial is feasible. Clearly, more research is required to compare efﬁcacy of SA-II and RP-II trials, as well as to develop more adaptive/effective phase II designs.”4
It is too strong to say that panitumumab in the PEAK trial (phase II random) has an ESMO-MCBS scoring of 4.
We share concerns about the findings of the PEAK study5 and these concerns are addressed in the article1 where we emphasise that the interpretation of the ESMO-MCBS must take into account factors that may have either artificially inflated the score, such as unbalanced crossover. For this reason, the score is asterisked in the table and there is a detailed discussion in the text: Unbalanced crossover: “In other instances, unbalanced crossover may exaggerate differences in survival. For instance, in the PEAK study comparing FOLFOX6 with either bevacizumab or panitumumab among the patients with KRAS wild-type tumours, only 38% of those in the bevacizumab arm received any epidermal growth factor receptor (EGFR) antibody in subsequent therapy. Although this study showed a survival advantage of 9.9 months over a baseline of 24.3 months for patients initiated on treatment with panitumumab, it remains unclear as to whether this was affected by the sequence of treatments or if it resulted from the fact that more than half of the patients in the bevacizumab arm were never exposed to an EGFR antibody.”
The magnitude of benefit is considered the same regardless of the type of cancer. However, we all know that the available therapies that exist for a given malignancy also influence the magnitude of benefit we are able to accept as high enough.
In developing the ESMO-MCBS V.1.0 we recognised that different conditions have different prognoses and that this is influenced by the nature of the disease and stage and also by best available therapies.
We believe that the concern raised here is addressed by the stratification of the benefit scales for non-curable disease by prognosis. The scoring thresholds are different with median survival in controls of less than or more than a year and the thresholds for PFS scoring similarly differ for PFS in controls <6 months and >6 months. We believe that the robustness of this approach across a wide range of conditions is well demonstrated in the field testing results.
There is no discussion about pathologic complete response (pCR). I understand this is only relevant for some types of cancer but it would be important to state if the scale considers pCR as a response-rate-like end point and if it does or does not assess phase 3 trials that have used this end point (and there are many). For example, no drug is evaluated based on pCR in the breast cancer field and we are now facing an era where some drugs are being approved based on pCR data only (eg, pertuzumab by the Food and Drug Administration (FDA) and currently undergoing evaluation by the European Medicines Agency (EMA)).
We had substantial internal discussions within the group regarding pCR as a surrogate outcome for curative therapies.
This is a point of contention and we appreciate that the level of evidence may change.
At this time the membership of the ESMO-MCBS working group does not feel that there is adequate evidence of the robustness of pCR as a reliable surrogate for inclusion in the scale. While recognising that this decision will be debated, and may be subject to future amendments, we felt that it was the best decision based on consideration of the currently available best evidence.
This was the conclusion of two recent meta-analyses:
The conclusion from the study by Berruti et al6 was “This meta-regression analysis of 29 heterogeneous neoadjuvant trials does not support the use of pCR as a surrogate end point for DFS and OS in patients with breast cancer. However, pCR may potentially meet the criteria of surrogacy with speciﬁc systemic therapies.”
Cortazar et al7 concluded: “Our pooled analysis could not validate pathological complete response as a surrogate end point for improved EFS and OS.”
This remains an active issue of research and the state of the science will be reviewed when considering future revisions.
It is not clear when we read the article1 why certain types of studies were grouped under the form 2c. Are these considered of less quality? Why are non-inferiority trials (which are good if well designed) and trials with response rate (RR) as an end point grouped together? Why is QoL also under this group? I am not saying it is not possible but it needs further explanation.
Evidence of clinical benefit is mainly derived from comparative studies in which the primary outcome is either improved OS (or its surrogate DFS), OS or its surrogate PFS.
Three groups of studies which are outliers include:
Non-inferiority studies: those which aim to demonstrate non-inferior primary outcomes, with important secondary outcomes of improved toxicity, QoL or cost. While this tool does not address cost, non-inferiority with improved toxicity or QoL is scored very highly.
QoL studies: there was only a single study (of early palliative care in metastatic lung cancer) in which this was the primary outcome.
RR: there were very few contemporary studies in which this was the primary outcome. RR is a weak surrogate for survival and even for QoL and consequently this is a low level of evidence for benefit.
Application, policy implications and feedback
Who will do the ESMO-MCBS assessment and how does validation take place? How will industry, patient advocates and others be able to provide feedback on subsequent ratings given to new treatments?
ESMO has established a portal for feedback on the ESMO-MCBS from the oncology community.
Feedback can be submitted to email@example.com.
The evaluation of newly approved EMA anti-cancer medicines will be completed by the ESMO Guidlines Committee.
All feedback will be taken into consideration by the ESMO-MCBS Working Group and ESMO Executive Board.
Extrapolation from trial data to the general population is an issue not for the ESMO-MCBS but when recommending a specific drug. The target population of the trial is the one in which a particular intervention is to be applied.
We share this same concern and have addressed this in the editorial “Proven efficacy, equitable access and adjusted pricing of anticancer therapies: no ‘sweetheart’ solution” Tabernero.8
Furthermore, in the section on validity of the scale1 we write: “ESMO-MCBS scores for a specific therapy are not generalisable to indications outside the confines of the context in which they have been evaluated. Consequently, the ESMO-MCBS score for a particular medication or therapeutic approach may vary depending on the specifics of the indication and may vary between studies”.
How will ESMO ensure their message around the ESMO-MCBS scale is understood properly and communicated accurately?
The central aim of ESMO in developing the ESMO-MCBS is to highlight those treatments which provide major clinical benefits in the hope that they will be made available to the public as fast as possible (pending HTA and value assessments).
ESMO also has a public commitment to present ‘clear and unbiased evaluations of the magnitude of clinical benefit’ as a matter of important public interest and professional integrity.
When the clinical benefits derived from new treatments are relatively limited, this is a matter of legitimate public interest.
Affordability of anticancer medication and value discrepancy is of growing concern to ESMO (see the editorial8), payers and the public and will inevitably attract critical scrutiny.
Given that advances in the non-curative setting have often been achieved with multiple incremental improvements, how can we collaborate to ensure that innovators are able to shoulder the inherent risk and not be de-incentivised to tackle those more difficult-to-treat cancers?
Incremental benefits are, nonetheless, benefits. ESMO recognises that all EMA approved drugs have demonstrated benefits for patients, but it is understanding the actual magnitude of the benefit that is often challenging and this is why the development of a standardised grading scale is useful.
We hope that the emphasis on the importance of innovations showing high levels of benefit will enhance the incentive to develop agents that substantially improve health outcomes.
The ESMO-MCBS assessment will be independently made by ESMO experts.
Feedback from industry or other parties can be made through the official communication channel firstname.lastname@example.org.
As base assessments have been chosen to take place at the pre-approval stage, how will ESMO manage the risk that medicines will be evaluated prematurely before their full value potential is studied and realised? Will scoring fluctuate over time (eg, will high-scoring products receive a lower score as more valuable innovations come to market or will low-scoring products receive a higher score as new information becomes available?) If so, how and how often will these adjustments be made over time?
ESMO will apply the scale to all drugs that have been newly approved by the EMA. The pivotal published clinical trials will be used for the grading assessment.
Since some studies publish early data based on PFS alone, we have seen and described in the article examples where scores have been modified upward as mature survival data emerged.1
The ESMO-MCBS Working Group will be responsible for an ongoing review of new product related data after approval by the EMA. In the event that new data in the same indication is published and the results differ from the original grading, the grade allocated will be modified accordingly, if appropriate.
In order to ensure that the ESMO-MCBS contributes to the guiding principles which are solid ground for policymaking, what are ESMO's plans to make this scientific assessment part of a larger discussion with all relevant stakeholders to look into the overall value of what an intervention, be it medicinal or other, can represent and how it should then be placed in the system?
ESMO would be very supportive of models of pharmaceutical pricing that engaged professional bodies and stakeholders based on the concept of a negotiated ‘just price’.
Such a process may incorporate models of either ‘value-based pricing’ or risk-sharing arrangements.
ESMO supports pricing policies for anticancer medications that reflect the value of medications in terms of healthcare benefits, that are affordable and sustainable for healthcare systems, that are sensitive to the needs and demands of developed economies, and to those of emerging economies (possibly with differential pricing) and that deliver an adequate profit to maintain the incentive for research and development.
The methodology scores any one individual clinical trial rather than the product overall. More recent evidence, including real-world data is excluded; therefore, it is not an accurate representation of the treatment experience.
The ESMO-MCBS is a tool to grade the magnitude of clinical benefit observed from comparative clinical studies. To date, randomised clinical studies have proven to be the most reliable approach to evaluate the relative merits of new agents as compared to the control arm.
As these scores reflect only a cross-section of the available clinical data at the time of analysis, we would like to ensure that the limitations are clearly conveyed and the emerging evidence is appropriately incorporated.
It is clearly indicated in the text that there is the ongoing need to review subsequent publications and update data where appropriate. Indeed, again as described in the article,1 this did lead to upgrading of scores for several therapeutic agents. In some cases, scores were upgraded based on late published survival data, in other cases scores were upgraded when subgroups of patients (preplanned) were identified to have more substantial levels of benefit. This latter phenomenon was particularly prevalent for the anti-EGFR agents in metastatic colorectal cancer.
Furthermore, ESMO has appointed a Working Group to re-evaluate agents and studies as further data comes to hand and this will be made available to the public. The format for this publication is currently being developed.
The ESMO-MCBS will be a dynamic tool and its criteria will be revised on a regular basis by a dedicated ESMO-MCBS Working Group, taking into consideration peer-reviewed feedback from other stakeholders and developments in cancer research and therapies.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.