Whenever we approach the problem of classifying and ranking the clinical benefit of any therapy in oncology, we immediately face a number of problems. Some of these can be overcome, but some others do not appear to have a solution, given the complexity of the matter. In our recent papers on how to raise the bar for anticancer drug approval,1 2 we called for at least four clear distinctions in approaching this task.
The need to distinguish between the palliative situation and the adjuvant setting. In fact, the benefit of extending life or improving its quality, palliating tumour-related symptoms or extending progression-free survival (PFS) are all on a qualitatively different plan compared with the benefit of improving the cure rate of a resected neoplasm or even extending the disease-free survival (DFS); DFS, in fact, represents ‘life, thinking of being cured.’ No quantitative correlation can be established between the two ‘worlds’ of oncology (curative and palliative) and their specific treatment end points. Clearly, different scales must be used to measure the benefit of treatment in these two different conditions.
If we limit the effort of ranking the clinical benefit to the advanced, non-curative setting, another relevant distinction is the relative weight of the different end points. Clearly, extending life, overall survival (OS) and improving quality of life (QoL) rank first. Then come PFS and response rate (RR). The other patient-reported outcomes in general rank in-between, depending on the setting. But how much OS a patient may be ready to trade for just extending PFS or improving the other mentioned end points remains highly subjective. Hence, any equivalence level between these end points is highly variable and arbitrary.
Because of the previous observation regarding the impossibility to establish an objective correlation between the benefit in OS and that afforded to patients by PFS, we concentrated our efforts to rank the clinical benefit of the antineoplastic agents on OS. However, even after operating these two restrictions (palliative setting only and OS only), we ran into the need of another crucial distinction. We observed that conditions where the median OS is very short, say in the range of 6–12 months, cannot be compared with conditions where the median OS is much longer, say, 2–3 years. It is simply because an improvement of 3 months out of 6 months is 50%, representing a huge step ahead, whereas the same 3-month improvement is less than 10% of the OS, if the prognosis is 3 years on average, representing a marginal improvement. Hence, the need to take into consideration the prognosis of the disease in assessing the extent of benefit is another sensible distinction to be made. Furthermore, we proposed the cut-off of 1 year within the palliative setting, recognising that this is a limitation of our model, because of the continuum of prognosis within the different diseases.
Finally, even after recognising all these limitations before assessing the benefit of antineoplastic agents and after deciding to focus on OS for its lack of objective correlations with the other end points of the treatment of advanced stages, we faced the problem of the different ways to summarise the OS results in a randomised clinical trial. Essentially, we described three ways to summarise the OS benefit in these conditions, and these three measures may be interpreted very differently by patients, doctors and regulatory authorities.2
The HR. In general, this is very relevant to the clinical researchers because the trial design is usually based on a prespecified HR value to be achieved to call the study positive. HR is driving the size of the study. HR may also be regarded as a nice comprehensive measure of the difference between the two treatments under comparison in trial interpretation by any oncologist. However, due to the complexity of its calculation, very few stakeholders fully understand its applicability, relevance and meaning, and above all, the average patient usually cannot understand it.
The gain in median OS. Again, this is a fairly good measure of the overall benefit of a certain treatment over the standard of care in randomised comparisons. But it may be completely misleading because it refers to one point only in the survival curves. In addition, very often the oncologists do not feel to use this figure when explaining to patients the potential benefit of the new treatment compared with the old one because, initially, too often the expectations of the patients are much higher than the few months of gain that the trials report in terms of median OS.
The third way to express the benefit is actually going towards what patients are most interested in: they most often do not ask for the average benefit, but they want to know ‘what are my chances of being alive at long term.’ Thus, being able to say that if the new treatment is used, the patient doubles the chances to be alive at 3 or 5 years is of much more practical use than referring to median gains in OS or HR. The long-term OS rate though has several problems: first, most of the time, these long-term results are not available because the prognosis is not often so benign to allow for any patient surviving at 3 or 5 years. In addition, often long-term data are not available because of the pressure to publish as soon as the primary end point of the study is achieved. Second, almost invariably, too few patients at risk are available at those long-term points to make the estimate of benefit reliable. Therefore, what is highly a desirable figure (the long-term effect) most often is either not available or not reliable enough. Whenever available with sufficient observations, it must be taken into great account. In these instances, the benefit can be expressed either as an absolute increase in chances of being alive or as proportional increase over the control arm. The latter expression usually amplifies the emotional impact of the long-term benefit.
The definition of clinical benefit is the integration among efficacy, toxicity and convenience (meaning the logistical problems connected with getting treated, needing to go to the hospital, hospital admissions, loosing days of active life and work). Therefore, it goes without saying that any tool to evaluate the benefit of a new treatment must take into consideration these other two dimensions of the clinical benefit counterbalancing the increased efficacy. Furthermore, the difficulty to assess the impact of toxicity and convenience on the overall clinical benefit is even harder than that of assessing the relative impact of the different end points on the overall efficacy.
All these considerations were given attention by the task force on the clinical benefit that 3 years ago was established by the European Society for Medical Oncology (ESMO). The purpose of this task force was to elaborate a scale of the magnitude of benefit of the new agents against cancer so that the highest ranking compounds could be made available to all European countries, overcoming some of the existing disparities among countries, at least for those antineoplastic drugs considered ‘essential’ because of the very high score in clinical benefit.3
The task force adopted our above-mentioned principles. In fact, it produced a scale that differentiated between the curative setting and the palliative setting; took into consideration the prognosis of the condition; gave priority to OS in efficacy evaluation; used the three ways to summarise the difference in OS described above (HR, gains in median OS and long-term OS rates); and implemented corrections for toxicity and inconveniencies. In addition, the task force extended these principles: PFS, RR and QoL were incorporated in the efficacy evaluation. Extensive field testing of the scale was done, producing a scale to rank the drugs as a function of the clinical benefit afforded.3
At the same time, the American Society of Clinical Oncology (ASCO) made a similar effort to rank the benefit of antineoplastic agents.4 The American society tried to make the additional step to attribute a value to the net health benefit produced, as a function of its cost. Recently, ASCO updated the original scale5 clarifying that the primary aim of their effort is the development of an app for the patients so that they may be in better conditions to choose among available treatments for their clinical as well as social and financial conditions.
The ESMO and the ASCO scales generated and published by the task forces represent excellent starting points for the ranking of the oncology drugs' clinical benefit. The scales need some key methodological corrections6: for example, the ESMO scale uses the lower 95% CI in assessing HR thresholds; this is a clear mistake that affects the reliability of the entire scale and could easily be corrected. In addition, in the light of the recent advancements in immunology and precision oncology, the threshold values for each efficacy level of the ESMO scale should be corrected: they have been rather benevolent and should be stricter. But aside from these easily correctable methodological problems, a number of strategic issues need also to be addressed before considering using these tools for ‘social’ purposes beyond the field of science and clinical research. This aspect is marginal for the ASCO scale because its use is mainly within the patient–doctor relationship. But it is particularly relevant for the ESMO scale, considering that its primary goal has a potential impact at social level. As a matter of fact, at the time of this writing, there are certain regions in the European Union that have adopted the scale for reimbursement purposes despite its mistakes and limitations. The following strategic issues should be at least addressed before such leap is even considered.
Drug evaluation versus trial evaluation. If the overall purpose of the scale is to identify the most efficacious drugs in specific conditions, it makes little sense evaluating a single trial, when the available evidence is based on multiple trials in the same setting. To aggravate this inconsistency, only positive trials are considered by the ESMO scale that evaluates trial results. By definition of the task force, negative trials are not evaluable by the scale.3 This poses a tremendous bias on the classification of the benefit of each agent.
The expanded knowledge of the molecular mechanisms of uncontrolled cell proliferation, coupled with the new avenues opened by the checkpoint inhibitors, has improved substantially the outcome of the latest trials compared with the situation a few years ago when the target HR of most randomised phase III trials was in the range of 0.8. This improvement may call for an upward adjustment for defining the benefit relevant as compared with what was considered relevant until a few years ago. Because both the ESMO and the ASCO scales were elaborated in the preimmunology era, a substantial revision of the threshold values should be extensively discussed.
As mentioned above, a prohibitive task in oncology is finding equivalences between extent of benefit in terms of OS and extent of benefit in terms of the other end points such as PFS or RR. Sometimes, but not always, PFS is considered a surrogate for OS. But this measure of efficacy may also have an intrinsic value aside from being a surrogate for OS. Equally prohibitive seems the effort to establish equivalences between extending life and improving patient-reported outcomes. The complexity of the matter is such that no easy way out exists.
The definition of clinical benefit according to the ESMO task force and that of the ASCO task force are not identical. It is understandable that the definition of clinical value (ie, clinical benefit/cost) may differ according to the different stakeholders (patient, doctors, payers, regulatory bodies), but that of clinical benefit has to be a universal definition, independent of the society where it is used and implemented, because it refers to purely medical and human values.
The perception of the benefit is unexpectedly different in different diseases. For example, it is not so common to prescribe adjuvant therapy for breast cancer for an absolute gain in OS or long-term DFS of 2%–3%, whereas in the same setting, but in a different disease such as non-small cell lung cancer, where the benefit of adjuvant therapy is in the range of 4%–5%, adjuvant chemotherapy is not prescribed so often. This poses the question whether the scale of benefit should be disease specific or not.
Finally, there is a need for simplification. Complexity is acceptable if the effort to grade the clinical benefit of our research remains confined to the scientific world. But when this has social implications such as an impact on reimbursement policies, then the complexity should leave room to simplification at the cost of trivialising some crucial aspects of our world of oncology specialists. In this connection, the ASCO approach of converting the already complex classical measures of efficacy (HR, median) into a net health benefit score goes towards the opposite direction. Similarly, the ESMO approach with equally complex threshold levels of efficacy would be reinforced by simplification.
Sharing these issues with all stakeholders is the right way to generate a tool that is a good compromise among the need for (1) a sound scientific basis, (2) something as close as possible to what patients value most and (3) something easily understandable by all other stakeholders. The scales are a very good start, but the road to ranking the benefit of anticancer drugs is really very hard.
Competing interests AS has been a member of the ESMO Magnitude of Clinical Benefit Task Force from 2014 to 2016.
Provenance and peer review Commissioned; internally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/