Scientific Research

Validating Controlled Research Part II: Accounting for Complex Process

Validating Controlled Research Part II: Accounting for Complex Process

It is commonly assumed that if a medical practice, in particular homeopathy, claims to have real effects on real illnesses in the real world, then it ought to be possible to demonstrate those effects in a placebo-controlled trial. But in Part I of this series, we found that controlled research itself has certain intrinsic limitations in its capacity to record real differences between real objects and real processes in the real world.6 This raises the question, of course, whether, in a failed trial, the effects claimed by the physician were not there, or whether, on the contrary, instead of an axe it would have been prudent for the experimentalist to grind a new lens for his spectacles – in other words, whether the controlled trial was inadequately adapted to its task.

In the present paper, we will look at factors that differentiate the task of research in complex alternative practices, from the challenges encountered in the efficacy trial of conventional medicines. In this connection, it must be conceded that the subject is perhaps more a curiosity than an essential element serving to advance knowledge, at least concerning homeopathic medicine, since, increasingly, statistical research is producing dramatic experimental confirmation of the efficacy of homeopathic treatments.2,6 Yet, it should be noted, although the homeopathic side may relish such a dramatic shift in fortunes in the experimental theatre, the point is that, finally, there is clear evidence of an emerging agreement in outcomes between empirical and quantitative practice. This state of affairs reflects that, at long last, the true “gold standard” in scientific research is being approached: independent corroboration of findings.

This unfolding, monumental reversal of fortune is truly a stunning, epochal event in the making. It reflects, on the one hand, the vindication of empirical practice, warts of bias and all. On the other hand, it reminds us – for only the latest in a long, long series of similar experiences – of the seemingly limitless capacity of the human intellect to delude itself into thinking it had learned, practically speaking, all there was to learn.

The fall from grace of the hubristically acclaimed “gold standard” of medical research, the controlled trial, is obviously and unavoidably implied in the very success of homeopathy. For decades and even centuries, homeopathy has been the preferred whipping boy of the cadres of the Cochrane Contraption; yet the developing scientific and clinical successes of homeopathy reveal that, for those decades and centuries, those who claimed to speak for “science” in actuality spoke only for error and prejudice.

This is not the time to explore the ramifications of this long-lasting exercise in scientific self-deception – an important topic that will invite us back for a closer look at a later time. For the moment, however, we will focus our attention on specific technical problems to be addressed by those proposing to design a truly efficacious protocol for testing homeopathy, and to place controlled research into a more appropriately delimited frame of reference, so that it might better fulfill its promise as a useful tool in the armamentarium of scientific investigations.

Counting and Controlled Research

The controlled trial has but one purpose: to eliminate bias as an influence in forming our impression whether a medicine has a determining role in the healing process. But as by now should be clear, bias is not the only factor in the experimental situation that may contribute to inaccurate results. Indeed, this basic fact has always been clear, even going back to the origins of controlled research, when modifications to the blinded trial were introduced – including, for example, randomization and the crossover design. But these modifications to method only accounted for a limited number of confounders, those most commonly encountered in conventional (allopathic) practice.

These modifications, however, did not touch the more complex array of confounders encompassed by homeopathic medicine, and it is this that accounts for the persistent failure of controlled research to accurately gauge efficacy of homeopathic remedies. And this is the reason that controlled research, so well adapted to the allopathic model, was doomed to failure when applied to alternative practices, for, as has been truly said, “The need to perform adjustments for covariates…weakens the findings [of research].”3(p.95)

In this connection, homeopaths often claim that individualization is the problem that undermines performance of the RCT in measuring homeopathy, or length of treatment, or the fact that homeopathy is an energy medicine. All of these concerns are legitimate, and yet none are basic. Individualization can be factored into a trial easily, by simply allowing the therapist to prescribe according to his usual methods; length of treatment can obviously be accommodated by permitting treatment to proceed at its own pace, for as long as required – potentially a costly process, but nevertheless easy to incorporate into protocol design; and the fact that homeopathy is an energy medicine shouldn’t impact findings at all – if we claim an effect, at the end of treatment it should be there, whether it was achieved via energetic channels or with a hammer and nails.

The underlying problem is that all of these concerns, and others, can and do interfere with measurement in an RCT in one trial or another, depending on protocol design. In short, homeopathic treatment is so variegated in its nature, that the number and variety of potential confounders have undermined most efforts to measure homeopathic effect, because there has never been an effort to approach the problem of confounding variables systematically. Therefore, in designing protocols researchers have simply failed to account sufficiently for the range of potential confounding influences, missing one variable in this trial, and missing another variable in that trial.

By comparison, in a conventional efficacy trial the arms of the trial may include balanced populations – or they might not. But such uncertainty is not sufficient for the purposes of establishing reliable practices in scientific research. For that end, it is necessary to guarantee as far as possible that the two arms are balanced, and in the conventional trial, as noted, from an early date this was accomplished by endorsement of the systematically applied procedure of randomization.

What this suggests, indeed what it demands, is that in order to bring controlled research of homeopathic practice into line with best scientific practice, it is necessary to devise formal, systematically applied procedures to control for: length of treatment; individualization; numbers of symptoms targeted; range of symptoms harvested and not harvested; influence of dosing regimen (size of dose, frequency of dose, potency) on participant response; and the like. Lacking such systematic safeguards, at the least the published report of such a trial must include a comprehensive evaluation of the effects of such influences on the final tabulation of trial outcomes; as is demonstrated in the example provided at the end of this paper, such factors can obviously have profound effects on reported (perceived) results.

Further, these considerations suggest the question, why hasn’t this been accomplished already? For 200 years, why have the advocates of controlled research failed to address the question of systematic measures to guarantee accuracy of measurement in trials of alternative practices, in particular homeopathy? The answer is simply that mainstream (allopathically-grounded) medical practitioners and researchers disbelieved and even disliked the claims and practices of homeopathic medicine, and therefore were just not interested in finding research methods that would confirm those claims – indeed, they were content simply to accept the findings of controlled research, which confirmed their own expectations that homeopathy does not work. We are happy to concede that this was not, for the most part, a conscious conspiracy; indeed, it is clear that this course of events is the natural outcome of the normal workings of bias in human commerce – unnoticed and unintended, as bias always is.

Counting in Homeopathic Research

Although it is something of an oversimplification, it is nevertheless true that, archetypically, allopathic medicines aim at a single symptom or syndrome, and expect as a result of their application, the elimination of that symptom (or syndrome). Further, typically, the time allotted for the medicine to work is brief. Compare this to the expectations we have for the action of homeopathic remedies, which may eliminate a symptom or numerous symptoms, or produce them, or aggravate existing symptoms, or introduce symptoms that had not previously been experienced. Furthermore, the time frame for action of the homeopathic remedy is very broad – certainly extending to days, weeks, and even months, depending not only on the characteristic action of the specific remedy, but as well on the susceptibility of the patient.

Returning to the question of modifications to research methodology, for the purpose of controlling for variables, it becomes apparent that strategies such as randomization and crossovers account only for the simplest situations. Such modifications of method will have little or no impact on medicinal action such as we find in homeopathic remedies. Thus, in terms of the caveat proffered by the editors of the BMJ series, randomization systematically controls for a particular range of confounding variables; be it noted, however, that no such systematic modifications to controlled methodology have ever been introduced in order to control for variables found in alternative practices. Instead, the literati of the blinded trial seem to think it quite satisfactory to leave it to each investigator to re-invent the wheel, every time he sets out to design a new protocol.

Other than being enormously inefficient, this strategy obviously leaves the door wide open for confounders to undermine research through an “open admissions policy” regarding ignorance of practice methods, bias, and even simple lack of imagination. In short, implementation of the principle of blinded research took place in the absence of systematization of technique. As illustrated by the analysis that follows, of a small but representative sampling of controlled trials in homeopathy, such weaknesses characterized most controlled trials conducted until recently.2,7

There are many ways to count, but the best of them is to simply count everything, once. Considering the range of mathematical instruments strewn about in the statistician’s toolbox, it may seem perverse to begin this discussion with consideration of such a rudimentary task. But, if this be perverse, then its justification is to be found in the surprising inability – or unwillingness – of some of the most sophisticated of our culture’s intelligentsia to count from one to two accurately, and in the ingeniously wrought fabrics within which they stitch their numerical masquerades, putting on a ballet – to switch metaphors – when a two-step would do quite nicely.

In any case, the essential element in counting is to count everything, once. This is not rocket science. But even in this, it does not take long to discover the fancy footwork that finds other ways to tabulate outcomes. For example, in Brien et. al.1 the experimental group (and the control as well, of course) was given a list of symptoms and instructed to note when they experienced any of the symptoms. Out of 10 symptoms in the list, 5 were “true symptoms” (of Belladonna) and 5 were “false symptoms,” that is, symptoms that did not appear in the Materia Medica as proving symptoms for this drug.

The authors surely deserve kudos for the nonchalance with which they slip in the following ingenious maneuver, designed to justify subtracting verum responses from the total symptom count produced by the experimental group:

The primary outcome measure was an individual proving reaction to Belladonna 30C based on the … proving definition … as at least two true symptoms on at least 2 consecutive days with no more than one false symptom during the 21 days of the study period. (p. 564)

By this nifty little device, the authors succeed in subtracting two verum responses (the production of two different proving symptoms of the given remedy), each of which occurred on at least two consecutive days: in other words, a total of four symptomatic responses are neatly shuffled off the docket because of the simultaneous presence of a placebo response!

Aside from the “face absurdity” of this procedure, a number of unsubstantiated assumptions and arbitrary guidelines are revealed by even a brief examination of this definition, for example:

1. Though the authors did not specify, their definition of proving presumably assumed that the appearance of a false symptom in a participant’s report demonstrated his susceptibility to placebo, thus justifying the assumption that his apparent proving response was also merely placebo, rather than an actual response to the drug.

Out of a number of objections to this “theory” that leap to mind, it may simply be observed that everyone in the world is susceptible to the power of suggestion, but that does not imply that they do not also react to and obtain benefit from real medicines!

2. Of course, as is generally understood, clinically it is well known that placebo response often supports and even enhances patient response to medical interventions. In other words, from this perspective also, the presence of a placebo response has no bearing whatsoever on the question of the legitimacy of an apparent verum response. Indeed, the only way to evaluate the legitimacy of a verum response is clinically, either through an exhaustive case analysis, or through labs – but that would be an “inconvenient” procedure for a researcher who considered only numbers to represent “evidence.”

3. In any case, we should also note that the definition of “proving” adopted by these authors creates a situation in which there could be any number of proving reactions that never even approach being counted: for example, a participant could easily produce 5 or more true symptoms, none of which would be counted, if none of them occurred on consecutive days.

To be blunt about it, this is not science, nor research, nor medicine. Frankly, it is unmitigated nonsense, and the fact it is published under the authoritative-sounding banner of so-called “systematic research” adds not a jot of credibility to its findings. One might as well ignore verum responses for any participant who was observed to have worn a blue shirt on Thursday, for all the relevance these deliberations have to the question of medical efficacy.

Walach et al8 run into similar problems, though, paradoxically, they demonstrate a good awareness of the difficulties involved in their own research project, and are able to recognize indications, embedded in the outcomes of their research, suggesting positive effects of homeopathy: “…the effect is very small and at the same time it seems to be there.”

This trial was designed to provide a randomized, blinded experiment on an individual case. Randomization was achieved by giving each participant both verum (Belladonna) and placebo in a random sequence (over an 8 week period, 4 weeks of Belladonna and 4 weeks of placebo) – symptoms presenting during weeks participants received Belladonna were counted as verum responses; symptoms presenting during weeks participants received placebo were, correspondingly, counted as placebo responses.

The experienced homeopath will immediately object that response to homeopathic remedies typically can be expected to persist, easily, for days or weeks, and that, therefore, symptoms occurring during “placebo weeks” represent, or at least could represent, continuing action of the remedy. In fact, to their credit, the authors of this paper are aware of this problem, and conclude, therefore, that since “we cannot exclude the presence of carryover effects … [we] recommend not to use this kind of randomization design…” in homeopathic proving trials.

It is interesting to note, however, that if placebo is administered in the first week of the trial (or during the initial two weeks), then and only then can we be certain that symptoms produced in that week(s) was the product of suggestibility. But as soon as verum is administered to a participant, all subsequent symptomatic responses – because of what the authors call a “carryover effect” – must necessarily be counted as verum responses.

Indeed, although the trial is badly flawed as a trial of individual response – as the authors themselves observe – it has nevertheless produced a substantial body of evidence – that is, data – that could be mined for a new perspective on the nature of homeopathic action. Such a prospect is especially appealing in this situation, because it is possible, by reconsidering the design in light of the difficulties introduced by carryover effects, to construct a series of new hypotheses as to what might be discovered on analysis of the trial data within a frame more adequately adapted to homeopathic theory and practice:

Hypothesis 1 – comparing rate of “true placebo” responses (produced in a placebo week at the start of the sequence) with rate of verum responses will reflect that homeopathy outperformed placebo; the difficulty with this hypothesis is that it assumes adequacy of the dosing routines for purposes of producing proving reactions, an assumption that is by no means a foregone conclusion.4

Hypothesis 2 – comparing symptom rate in carryover weeks, with symptom rate in the initial placebo week, will show that carryover weeks outperformed initial weeks: if confirmed, this would represent a proof of both efficacy of homeopathic remedies and of carryover effects.

The Statement of Efficacy: Mapping the Controlled Trial

The essential feature of the allopathic medical system, that adapts its medicines and methods so nicely to the controlled format, is found in the fact that, archetypically, the allopathic medicine targets a single symptom, or a well-defined group of symptoms, the allopathic disease state. In this context, the problem set before the experimentalist is simply this: does the medication make the symptom go away? The answer – “yes” or “no” – is perfectly matched to the verum/placebo duality of the blinded trial: does the SSRI reduce depression; does Lipitor reduce cholesterol; does aspirin reduce pain?

But as soon as the experimental equation grows more complex, the reasonably perfect match falls apart. For example, if a medicine is more effective with one class of patient, such as youth or women, then a trial will return misleading results if the control and experimental groups are not well-matched in this dimension. Early in its history, considerations such as these led to the introduction of randomization and other modifications, or safeguards, to the controlled trial, to ensure that such variables balanced out between the two arms of the trial, thus eliminating these confounders as a potential influence over experimental measurements.

In earlier papers,4,5,6 numerous confounding variables were identified, that interfered in trial results in one experiment or another. In Part I of the present series6 we suggested that, in view of some of these variables it was essential that the experimentalist produce a formal “Statement of Efficacy,” specifying how sensitive a measurement was required to produce reliable (trustworthy, credible) evidence regarding any particular medical practice. The example was examined, regarding estimation of the number of symptomatic responses that might be lost to the experimental count, depending on a number of factors that combined to effect apparent and real response rates, including especially the size of dose and the sensitivity of individual participants. In particular, we explored the effect of what we called the “complex responder” in reducing verum performance by masking real responses behind placebo responses.

It is beyond the scope of this paper to propose design modifications that, comparably to randomization or crossover, would have the effect of systematically protecting a controlled trial from being influenced by the effects of these confounders. Indeed, it has to be confessed that the present author has no suggestions to offer in that connection. However, as a beginning, I would argue that it is essential to at least make a conscientious effort to identify those factors that call into question the credibility of statistical findings. In this regard, my proposal for a “Statement of Efficacy”, to be demanded of the research scientist, would have the effect of “mapping the controlled trial,” that is, identifying factors that interfere with efficacious application of the placebo control. As an example, we will explore one case in which adjustments to the raw count in a homeopathic proving trial may dramatically affect our conclusions regarding homeopathic efficacy:

Size of dose. A small dose – such as ordinarily used in homeopathic practice as well as in the homeopathic proving trial – typically produces a mild effect if it produces any effect at all. But Hahnemann indicated that individual sensitivity to remedies varied widely, as 1:1000. Similarly, it is commonly known that not all participants in clinical provings respond to the experimental drug, or respond at different times, some after a single dose, some after several doses, some only after the size of the dose has been substantially increased.

Therefore, the experimentalist must establish what percentage of participants in the verum arm of a trial are likely to respond to the specific size of dose administered to them during the trial. Then, the Statement of Efficacy must, in this regard, provide a statistically derived formula to correct for the induced measuring error produced by the small dose. For example, let us assume 28 out of 100 participants in the control (placebo) arm of a trial produce symptomatic responses, and that 30% out of 100 participants in the verum arm could be expected to respond to the dose administered during the trial. Then, if 28% of the verum participants actually do produce a symptomatic response, the statistical implication is that, at increasingly large doses, 93% of participants (in the verum arm) would have shown a symptomatic response.

Therefore, we have the following outcomes (where vs = number of symptoms returned by the verum group and “ps” = number of symptoms returned by the placebo group):

(fig. 1)

raw count

Such an outcome traditionally suggests that “verum performed no better than placebo,” and therefore leads to the conclusion that the medicine is ineffective against the symptoms under investigation. However, in this scenario, the raw count of participant responses does not reflect the fact that a portion of the respondents, according to theory and clinical experience, would have responded at larger doses. Therefore, correcting the count statistically, according to the response rates suggested (hypothetically) at the beginning of this example, we arrive at this (realizing that our example predicted 30 verum respondents, and that 28 actually did produce a symptomatic response, which translates to 93 responders out of a group of 100 participants):

(fig. 2)

statistically corrected count

Et voila! According to statistically corrected calculations, verum has significantly outperformed placebo! Furthermore, the corrected figures offer evidence to confirm the fact that symptom production by verum is lost to experimental count because of the phenomenon of the “complex responder.”


In this paper I have examined in detail some of the ways in which mistakes in trial design dramatically affect experimental outcomes. I have also offered an example of one way in which such mistakes can be identified, and their actual effects on outcomes estimated statistically. The conscientious research scientist will take the opportunity to re-examine his assumptions, and hopefully move the theory and art of his methodology into the twenty-first century, where it belongs, by the development of systematic measures to enhance the reliability and credibility of outcomes from placebo-controlled research. It has been roughly 200 years since the introduction of randomization, used to enhance accuracy of count in placebo controlled research in conventional medicine; now it is time – at long last – to provide comparable assurances in our experimental approach to alternative practices.


1. Brien, S., et. al. 2003. Ultramolecular homeopathy has no observable clinical effects. A randomized, double-blind, placebo-controlled proving trial of Belladonna 30C. Br J Clin Pharmacol, 56:562-568.

2. ENHR. November, 2006. An Overview of Positive Homeopathy Research and Surveys.

3. Godlee, F., et. al., editors. 2004. Clinical Evidence: Mental Health, Vol. 11. BMJ Publishing Group LTD, London.

4. Shere, N. 2005. Proving Homeopathy. Homeopathy for Everyone, April.

5. Shere, N. 2006a. Is the Randomized Double Blind Placebo Controlled Trial an Objective Scientific Instrument? January. Homeopathy for Everyone.

6. Shere, N. 2006d. Validating Controlled Research – Part I: Measuring the Measuring Rod. Homeopathy for Everyone. November.

7. Shere, N. 2006f. Book Review: Homeopathy – The Scientific Proofs of Efficacy. November. Homeopathy for Everyone.

8. Walach, H., et. al. 2003. Effects of Belladonna 12CH and 30CH in Healthy Volunteers. A Multiple, Single-Case Experiment in Randomization Design. Monaco International Talks. (viewed December 6, 2006).

About the author

Neil D. Shere

Neil D. Shere

Neil D. Shere
Neil is a Board Certified clinical social worker, specializing in psychotherapy with children, individual therapy with adults, and marital counseling. Neil has worked as a therapist, supervisor, and administrator in the public schools, in family service and mental health agencies.

Presently, Neil works in his own practice, Neil D. Shere & Associates, in suburbs near Chicago. Neil also serves voluntarily with the LAN (Local Area Network), a local, state-sponsored inter-agency committee that awards grants to families of children experiencing emotional and behavioral problems and situational distress.

Leave a Comment