Spin to Win: Results From Late-Breaking Meeting Trials Often Overstated
A new review picks apart the positive spin that characterized several top trial presentations at the recent American College of Cardiology meeting.
Late-breaking clinical trial (LBCT) sessions are the highlight of many annual scientific meetings, attracting the attention of clinicians and media alike. But a new analysis shows how presenters of negative studies often “spin” their results, emphasizing lesser findings or subgroup analyses, to make them appear more favorable and newsworthy.
“On the whole, presenters of major studies are top-level, highly-respected scientists for whom the quest for truth is paramount. Nevertheless, at the key moment of first presentation of pivotal findings, it is only human nature to allow a degree of ‘positive spin’ to creep in,” write Stuart J. Pocock, PhD, and Tim J. Collier, MSc (London School of Hygiene & Tropical Medicine, London, England).
Even more critical for presenters of trials, is that editors don’t exactly jump for joy over negative results, Anthony N. DeMaria, MD (Sulpizio Cardiovascular Center at the University of California, San Diego), said in an interview with TCTMD.
“Everyone tries to get the most positive conclusions out of their data, but it’s also important because over 80% of papers published in the literature deal with studies with positive results. In the real world, about 50% of the studies we do lead to negative results,” he said.
DeMaria, who served as editor-in-chief of the Journal of the American College of Cardiology for 12 years and now serves in the same role for Structural Heart, said it also has become a “badge of honor” for LBCTs to achieve simultaneous publication. “Years ago, one or two papers [per meeting] achieved that,” he noted. In the rush to get their barely completed work into manuscript form, “oftentimes there is not adequate time to reflect on all the implications of the data and all the limitations. So, the manuscript that is ultimately published may, in fact, not be the best representation of the study that was performed and the data that were acquired.”
Flexing the Statistics
In a state-of-the-art review published online earlier this month in JACC, Pocock and Collier critiqued seven LBCTs from the American College of Cardiology (ACC) 2018 Scientific Session, looking at what the findings were versus how they were presented to attendees. The trials were: ODYSSEY-Outcomes, VEST, SECURE-PCI, TREAT, POISE, SMART-DATE, and CVD-REAL 2.
In the VEST trial, for example, despite no significant reduction in the primary endpoint of sudden death between post-MI patients with reduced ejection fraction who did or did not wear a defibrillator (P = 0.18), the authors chose to emphasize a 35.5% risk reduction at 90 days in all-cause death that favored the wearable defibrillator, Pocock and Collier note. That is a problem, they say, because not only is the result “statistically fragile,” but also all-cause death, which was not the primary endpoint, would logically have to include sudden death, which was not prevented.
In SECURE-PCI, which looked at an atorvastatin loading dose versus placebo in ACS patients with planned PCI, there was no significant reduction in MACE at 30 days in the atorvastatin group. However, the presenters emphasized a prespecified analysis restricted only to those patients who underwent revascularization and reported a 38% reduction in MACE. But as Pocock and Collier point out, the finding has little practical value, “because for future patients, the decision whether to give loading doses of atorvastatin needs to be taken before one knows whether the patient will actually undergo PCI.” In SECURE-PCI, 35% of patients ended up not having PCI.
Perhaps the most talked-about trial of the meeting was ODYSSEY-Outcomes, which randomized patients with a recent ACS and inadequate control of lipid levels despite high-intensity statin therapy to the PCSK9 inhibitor alirocumab (Praluent; Regeneron/Sanofi) or placebo. At 2.8 years, there was a 15% reduction in MACE with alirocumab compared with placebo (HR 0.85), with a 95% confidence interval of 7% to 22%. “However, because this sits lower in the hierarchy of statistical testing, it does not fit in the formal list of claims for treatment efficacy within the bounds of strict type 1 error control,” Pocock and Collier write. Then again, they note that a counterargument would be that overall survival is the most important aspect for patients and “merits special attention beyond statistical formalities.”
But the ODYSSEY researchers also touted a 29% reduction in mortality in the subgroup of patients with LDL ≥ 100 mg/dL. According to Pocock and Collier, it’s questionable whether this claim is justified, suggesting that it amounts to “data dredging.”
DeMaria agreed that statistically, the claim does not hold up, and said in some cases such spin might even be harmful.
“If someone looked at those results and said, ‘My patient’s LDL is 80 and so they are unlikely to get benefit,’ that would be a harm that might accrue. Similarly, if you take a secondary endpoint like in the VEST trial . . . people might be prescribed the vest when it wouldn’t do them any good,” he said. “Inappropriate emphasis of a secondary endpoint, which really should only be exploratory, and often is not significantly powered, can sometimes lead to people being given a therapy that is not of established value.”
A Question of Peer Review?
Contacted by TCTMD, Sanjay Kaul, MD (Cedars-Sinai Medical Center, Los Angeles, CA), said he was in complete agreement with many of the issues highlighted in the paper.
“The focus on secondary endpoints that are deemed exploratory (all-cause mortality in ODYSSEY and VEST); cherry-picking improper (post-randomization) positive subgroups in a null trial (SECURE-PCI); declaring noninferiority based on one-sided 95% CI rather than the appropriate two-sided 95% CI . . . (SMART-DATE); fixing noninferiority margin in absolute risk difference rather than risk ratio (SMART-DATE); and making claims based on registry trials despite discordant results from RCTs . . . are not uncommon sources of misinterpretation and misinformation enabled by the high visibility environment of the LBCTs,” he told TCTMD in an email.
Another important issue is that three of the seven LBCTs were not peer-reviewed. According to Pocock, this means “presenters and their collaborators have essentially a free rein to present their study findings as they see fit.”
Kaul questioned why presenters would want to rush to present LBCTs that have not been thoroughly peer-reviewed. “Have LBCTs become more theater than substance where reality cannot be separated from rhetoric, and where promotional agenda and commercial interests trump the rigor of the scientific process? Given that the investigators are expected to put their best foot forward, how do we avoid such scenarios?”
Have LBCTs become more theater than substance where reality cannot be separated from rhetoric, and where promotional agenda and commercial interests trump the rigor of the scientific process? Sanjay Kaul
He suggested that perhaps only research that withstands rigorous peer review, ideally accompanied by a simultaneous journal publication, should qualify for LBCT status.
“This is not to say that the peer-review process will necessarily ensure the information presented or published is always correct or reliable,” Kaul said. “The traditional peer-review system, flawed as it is, is still the best thing we have in terms of filtering and evaluating scientific research. Their value is in enabling LBCT sessions to highlight the most meritorious, not the most high-profile or newsworthy, research to the public.”
Hype, Meet Hope
But it may not be that simple. To TCTMD, DeMaria recalled that when he was editor-in-chief of JACC, the journal would send letters of invitation for rapid review to the authors of the top 5% of abstracts accepted for ACC.
“You would think that when you got down to the top 5%, that they would be really, really good,” he said. “Of the papers that were submitted to us, the acceptance rate was only about 30%. So, in fact, these highly graded abstracts, more often than not, when turned into a paper were judged to have significant limitations such that priority was not adequate to be published.”
Kaul said another issue is the quality of the expedited peer-review process for LBCTs. He suggested that journal editors should keep a “scorecard” of how often published LBCTs fail to garner regulatory approval, citing the ATLAS trial of rivaroxaban in STEMI as one example where data “were clearly missed/ignored during the peer review,” resulting in questions and concerns from the US Food and Drug Administration.
However, he said he is reassured by the “open peer review” platform afforded by social media. “The critique of ODYSSEY and VEST (and CABANA presented at HRS last week) was primarily driven by the Twitterati and healthcare blogs,” Kaul noted.
DeMaria added that moderators and discussants amount to a de facto peer review after a trial’s presentation, asking questions about things that concern them, but in a limited time frame and certainly not anonymous like most peer review.
“If you think something is bad, you can feel quite comfortable saying it’s bad in peer review,” he observed. “When you’re a discussant, you get up there, the author has just presented, and generally you try to temper you remarks. Not that you wouldn’t say if there is a problem, but you might not make as big a point of it in a presentation venue as you would as an anonymous reviewer of a manuscript.”
Pocock SJ, Collier TJ. Critical appraisal of the 2018 ACC scientific sessions late-breaking trials from a statistician’s perspective. J Am Coll Cardiol. 2018;Epub ahead of print.
- Pocock reports serving on steering committees or data monitoring committees for trials sponsored by AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Idorsia, Janssen, Medtronic, Novartis, Novo Nordisk, and Vifor; and receiving grant funding from AstraZeneca and Merck.
- Collier reports serving on data monitoring committees for trials sponsored by Daiichi-Sankyo and Zoll.
- Kaul reports consulting for Boehringer Ingelheim, Novo Nordisk, and Biotronik.
- DeMaria reports consulting fees/honoraria from Bracco and ResMed; and serving on the speaker’s bureau of Zoll.