Central to the perspective I present in this blog post is my
work supervising psychiatric residents and medical students at a
university-based psychiatry clinic where our patient population includes a good
number of adults suffering from mild to moderate depression.
In 2010, the publication by JAMA of a single-study
challenged and upended a major assumption that had guided clinical work like
ours for over three decades (Barrett 2001; Qaseem 2008). This was the widely
covered meta-analysis of antidepressant (AD) trials conducted by Fournier and
colleagues(2010), which drew the far reaching conclusion that ADs show
significant response in very severely depressed patients, but are not more
effective than taking a placebo in less
severe cases.
Fournier was not the first study that took aim at the
foundation of treatment guidelines for depression, which in essence recommend
treating depression with antidepressants. In 2008 Kirsch et al. meta-analysis
of clinical trial data submitted to the Food and Drug Administration ended with
a rather strongly worded conclusion:
“Drug–placebo
differences in antidepressant efficacy increase as a function of baseline
severity, but are relatively small even
for severely depressed patients” (emphasis added). (Kirsch et al.,
2008)
After reading their findings a neutral conclusion for Kirsch
et al. would be that
ADs are statistically better than
placebo
Response correlates with patients’
severity of symptoms.
Not an earth shattering conclusion by any means as both
results were already common knowledge for anyone who started prescribing ADs
since 2002, the date when Kahn et al. published their 45 studies based
meta-analysis of FDA submitted AD trial data. Their conclusion?
“The magnitude of
symptom reduction was significantly related to [..] initial depression […]
scores; the higher the […] initial […] score, the larger the change.” (Kahn
et al. 2002)
Therefore, one can look at the Kirsch (2008) study findings
as a replication of earlier findings, a continuation of a line of knowledge
that has already been established. Which is most times the way scientific
knowledge expands. Given this, one would be hard pressed to understand how a
study that essentially replicated prior positive findings would become the
poster child for the anti-antidepressant movement that followed. But that is
exactly what happened.
Interestingly, Kahn et al. (2002) was not cited by Kirsch et
al. (2008), in itself a remarkable oversight considering the similarities
between the two studies. But I found even more troubling that instead of
conservatively explaining their findings and providing as much of a neutral and
tentative explanation as possible — the validated scientific communication
tradition — Kirsch et al. appeared to
formulate their conclusion from a position of commitment to an
anti-antidepressant view:
“The relationship
between initial severity and antidepressant efficacy is attributable to
decreased responsiveness to placebo [even] among very severely depressed
patients, rather than to increased responsiveness to medication.” (Kirsch
et al., 2008)
And that strongly worded conclusion made the Kirsch study an
almost overnight media hit. Front-page newspaper and radio coverage followed,
criticism was dismissed (Horder 2011).
To this date, the Kirsch study remains one of the most
popular papers on the PLoS Medicine website, as reflected in the following metrics:
282,219 views, 631 citations, 300 academic bookmarks, and 404 social share
(data as of December 20th, 2012). A
number of critical commentaries followed.
Some directly criticized Kirsch et al. (2008) for methodology or
overstated conclusions (Kelly, 2008; Khan and Khan, 2008; McAllister-Williams,
2008a, 2008b; Moller, 2008; Nutt and Malizia 2008; Parker, 2009; Turner and
Rosenthal, 2008). More interestingly, a few who decided to re-analyze Kirsch’s
data found they could not replicate Kirsch et al. pessimistic view on AD’s
efficacy (Fountoulakis , 2011; Horder et al., 2011). For unclear reasons these
subsequent reports aimed at reestablishing the ADs respectability got much less
media attention than Kirsch’s 2008 original.
“Déjà vu All Over
Again”
In this context when Fournier at al. came along in 2010 I
thought I had a déjà vu. Not as much in terms of the study’s conclusions but
rather in terms of the emotional intensity and dramatic flavor with which it
was greeted by the mass media. I first heard about it on NPR, and this
surprised me as I usually get the studies I am interested in before the media
does.
Over the next couple of days headlines such as these
appeared in print and online media around the world:
The NY Times: “Popular Drugs May
Help Only Severe Depression”(Carey B, 2010)
From the LA Times: “Antidepressant medications probably provide little or no benefit to
people with mild or moderate depression” (Roan S, 2010)
Immediately following this media hoopla, I found that my
students – a new generation who have not been part of the Kirsch
antidepressants wars – began to routinely question the wisdom of continuing or
starting antidepressant treatment for our patients suffering from mild or
moderate depression.
And it did not take long for our patients themselves to
express their doubts about the efficacy of antidepressants — even for severe
depression.
I was troubled at the time by the unquestioning coverage of
Fournier et al which inferred that this single study was in fact “settled
science” on the subject of antidepressants when it was not; and by the
inattention given (in either the professional literature or popular press) to
either the complexities or long history of debate (as discussed above) or at
least the serious flaws in the study’s methodology – as I’ve summarized below.
Two years later I’m equally concerned about the lack of
media coverage given to a 2012 publication, also by JAMA, of a study by Gibbons
and colleagues (2012) which, history aside, refutes Fournier’s claim that
antidepressants are not more effective than placebo for mild to moderate
depression. Similar to Fournier et al. (2010) Gibbons et al.’s (2012) findings
are based on individual patient data and include longitudinal measurement which
makes its conclusions a strong counterpoint to those of Fournier et al. (2010).
Among the points I now make to my students when questions
arise about antidepressant efficacy as a result of the meta-analysis conducted
by Fournier et al, are the following:
The individual patient-level data approach used by Fournier
et al represented an improvement over standard meta-analyses; however their
results were based on only 6 studies that met their criteria from more than 200
relevant studies. Reducing 2164 citations to 6 is hardly representative,
especially when the 6 analyzed studies represent only two medications:
paroxetine and imipramine, the latter not recommended for first line treatment
of depressive disorders. Furthermore, of the 6 studies, 5 specifically excluded with
very mild depression making the authors’ conclusions about lack of separation
of ADs from placebo for mild depression weak.
Exclusion Criteria
Raise Major Questions
The strength of a meta-analysis is based on applying a solid
statistical approach to all studies meeting a set of relevant
inclusion/exclusion criteria, and in this case it appeared that the authors
excluded too many relevant studies.
Specifically, 228 studies were excluded based upon their exclusionary
“placebo washout lead-in” requirement (a requirement that all study
participants get a placebo to start with and only those who do not respond to
the placebo continue in the study). The placebo washout/lead-in represents a
common historical design used in antidepressant trials with the intent of
excluding patients who do not demonstrate symptom stability thus are not likely
to benefit from a truly effective AD. Fournier et al. (2010) acknowledge that
“it is not clear that placebo washouts actually enhance the statistical power
of antidepressant medication/placebo comparisons” nevertheless they proposed
that in order to evaluate the rates of “true placebo response” one should
exclude all studies using a placebo wash-out/ lead-in design.
While it is true that a placebo washout might limit accurate
estimates of placebo-response and might not improve the probability of an AD
being more effective that a placebo, this design for studies of depression
would not affect the validity of an active AD – placebo separation, were one to
be found. The exclusion of washout studies was especially problematic precisely
because this represents acommon design for AD clinical trials, meaning that
numerous relevant studies will be excluded. In other words Fournier et al.
imposed a seemingly arbitrary (i.e. not evidence based) exclusionary criterion
that effectively filtered out themajority of the relevant studies. This is a
very bright red flag and potential source of bias, which greatly limits the
validity of the authors’ conclusions.
Assuming these easily excluded studies were otherwise methodologically
sound, the number of study investigators contacted would have increased from 23
to 251; and likely significantly more than 6 would have contributed to the
final analysis.
Considering the potentially grave implications of either
mental health providers or patients accepting the headlines generated by
widespread publication of these results at face value, the study’s methodological weaknesses –which were not treated in any depth by
comments accepted for publication by JAMA – warrant further critical review.
Overlooked and Highly
Relevant Research
Likely because it received dramatically less coverage, far
fewer of my students are aware of the 2012 study by Gibbons et al (2012) who,
after reviewing 43 fluoxetine and venlafaxine trials, concluded that, contrary
to the Fournier at al. (2010) findings, these two antidepressants are in fact
efficacious for major depressive disorder in all age groups, regardless of the
depression severity at baseline.
As noted, Gibbons et al. (2012), as Fournier et al. (2010),
also used patient-level data – making the point against Fournier el al. even
more significant. In addition, if you compare Gibbons et al. (2012) final set
of 43 studies with a meta-analysis population of 4303 patients in the
fluoxetine trials and 4882 patients in the venlafaxine trials (in total more
9000 patients) to the Fournier et al. (2010) final set of 6 studies (3
paroxetine and 3 imipramine trials) with a total of 718 patients, Gibbons et
al. (2012) significantly larger number of studies makes for a more believable
conclusion.
Both studies are limited in that they focused on only 2 ADs:
paroxetine and imipramine for Fournier et al. (2010) versus fluoxetine and
venlafaxine for Gibbons et al. (2012). At the same time Gibbons et al. (2012)
used an all-inclusive set of studies, whereas,
as noted above, the Fournier et al. (2010) study used a highly selective
group of studies. There are also important differences in data analytic methods
that could explain the differences in results. For example, Gibbons et al.
(2012) defined severity differently than Fournier et al. (2010).
To expert eyes, the main effects for the drug versus placebo
differences can be actually seen as similar in the two data sets. And that is
the very reason for engaging in this debate.
Which study is more convincing?
The Gibbons study reminds us that it is our duty as
physicians and society at large to carefully screen and aggressively treat
depression, including with medications if so recommended. The Fournier study
makes us aware that there might be more to the story of AD response than a
straightforward active ingredient effect.
We can all speculate about why the Gibbons study received so
much less media coverage than did Fournier and colleagues.
The Sequel
In the antidepressant wars, we have seen the pendulum’s full
swing from the early nineties when Elisabeth Wurtzel’s “Prozac Nation” was
thrilled to be “Listening to Prozac” with Peter Kramer, and into the early
millennium years when Healy’s tongue in cheek advice was to “Let Them Eat
Prozac”. By the time Carl Elloit’s “Prozac as a Way of Life” hit the stands in
2003, some thought we were at the end of an era. But ADs came back strong, only to engender
renewed debate and, as argued above, uneven and thus inaccurate media coverage
in the current decade.
Unintended
Consequences of an Unevenly Covered Debate
As my esteemed colleague Michael Thase adeptly put it to me,
“There is no ‘last word’ in the science of this debate.” He is undoubtedly
correct. And, as a physician, I find relief in the fact that we continue to
question engrained assumption and are reluctant to accept there is such a thing
as a last word or simple explanation when it comes to complex issues.
Depression, with its multidimensional tentacles equally anchored in nature and
nurture will never be a good subject for simple explanations.
But, again, as a physician I am very concerned about major
unintended consequences of uneven coverage of the competing major findings
discussed above. Specifically, I fear that clinically depressed members of the
public at large will refuse a likely efficacious treatment option. And while all may be well if that depressed
patient makes the informed alternative choice of starting treatment with
cognitive behavioral therapy (CBT), a validated form of therapy for depression
that compares well with SSRIs for mild or moderate depression, all is certainly
NOT well if the patient’s decision not to accept treatment with antidepressants
is based primarily on media delivered misinformation.
Given the stigma against acknowledging or treating a mental
illness with a psychotropic medication, the media saturation given to one study
only worsens an already difficult situation for many patients who fear the
personal and social consequences of admitting their illness and seeking
treatment.
In closing: my hope
is that members of the media who cover this debate will realize that “first do
no harm” is not only the duty of physicians; it is also the responsibility of
anyone trusted with giving health information to the public at large.
Acknowledgements: I would like to thank Lawrence Faziola and
Steven Potkin for critically discussing Fournier et al. and Michael Thase for
his critical read of the draft to this article.
References:
Barrett JE, Williams JW Jr, Oxman TE; et al. (2001)
Treatment of dysthymia and minor depression in primary care: a randomized trial
in patients aged 18 to 59 years. J Fam Pract. 50(5):405-412.
Carey B (2010) Popular Drugs May Help Only Severe
Depression. New York Times, January 5, 2010
Fournier JC, DeRubeis RJ, Hollon SD; et al. (2010)
Antidepressant Drug Effects and Depression Severity A Patient-Level
Meta-analysis. JAMA303(1):47-53.
Fountoulakis KN, Möller HJ (2011) Efficacy of
antidepressants: a re-analysis and re-interpretation of the Kirsch data. Int J
Neuropsychopharmacol. 14(3):405-12. Epub 2010 Aug 27.
Gibbons RD, Hur K, Brown CH, Davis JM, Mann JJ (2012)
Benefits from antidepressants: synthesis of 6-week patient-level outcomes from
double-blind placebo-controlled randomized trials of fluoxetine and
venlafaxine.Arch Gen Psychiatry 69(6):572-9.
Horder J, Matthews P, Waldmann R. (2011) Placebo, prozac and
PLoS: significant lessons for psychopharmacology. J Psychopharmacol.
25(10):1277-88.Epub 2010 Jun 22.
Kelly BD (2008) Do new-generation antidepressants work?. Ir
Med J 101: 155–155.
Khan A, Leventhal RM, Khan SR, Brown WA (2002) Severity of
depression and response to antidepressants and placebo: an analysis of the Food
and Drug Administration database. J Clin Psychopharmacol 22: 40–45.
Khan A, Khan S (2008) Placebo response in depression: a
perspective for clinical practice. Psychopharmacol Bull 41: 91–98.
Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ,
Johnson BT (2008) Initial severity and antidepressant benefits: a meta-analysis
of data submitted to the Food and Drug Administration. PLoS Med 5: e45–e45.
McAllister-Williams RH (2008a) Do antidepressants work? A
commentary on ‘Initial severity and antidepressant benefits: a meta-analysis of
data submitted to the Food and Drug Administration’ by Kirsch et al. Evid Based
Ment Health 11: 66–68.
McAllister-Williams RH (2008b) Misinterpretation of
randomized trial evidence: Do antidepressants work?. Br J Hosp Med (Lond) 69:
246–247.
Moller HJ (2001) Methodological aspects in the assessment of
severity of depression by the Hamilton Depression Scale. Eur Arch Psychiatry
Clin Neurosci 251(suppl 2): II13–20.
Moller HJ (2008) Isn’t the efficacy of antidepressants
clinically relevant? A critical comment on the results of the metaanalysis by
Kirsch et al. 2008. Eur Arch Psychiatry Clin Neurosci 258: 451–455.
Nutt DJ, Malizia A (2008) Why does the world have such a
‘down’ on antidepressants?. J Psychopharmacol 22: 223–226.
Qaseem A, Snow V, Denberg TD, Forciea MA; et al. (2008)
Clinical Efficacy Assessment Subcommittee of American College of Physicians.
Using second-generation antidepressants to treat depressive disorders: a
clinical practice guideline from the American College of Physicians. Ann Intern
Med. 149(10):725-33.
Parker G (2009) Antidepressants on trial: how valid is the
evidence?. Br J Psychiatry 19: 1–3. Web of Science
Turner EH, Rosenthal R (2008) Efficacy of antidepressants.
Br Med J 336: 516–517.
Roan S (2010) Study finds medication of little help to
patients with mild, moderate depression. Los Angeles Times. January 06, 2010.