Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Psych1lo ical Flullelin 1994, V7. I I i, No. 1, 19- 24 C("right 1994 by 1he American Psychfilogical Anociation, Inc, 0033-29(YJ/94113.00 Anomaly or Artifact? Comments on Bem and Honorton Ray Hyman Bern and Honorton imply that the I I autoganzfeld experiments demonstrate the existence of psi- a communications anomaly. They claim that the autoganzfeld results are consistent with previous parapsychological findings and constitute evidence for a replicable psi effect. Although the auto- ganzfeld experiments are methodologically superior to previous parapsychological experiments. the tests of their randomization procedures were inadequate. Me autoganzfeld experiments consistently produced positive hit rates, whose combined effect was highly significant. However, these experi- ments produced important inconsistencies with the previous ganzfeld experiments. They also showed a unique pattern in the data that may reflect a systematic artifact. Because of these unique features, we have to wait for independent replications of these experiments before we can conclude that a replicable anomaly or psi has been demonstrated. Bern and Honorton (1994) imply that if psychologists were familiar with the most recent parapsychological research, they would be more willing to accept the possibility that a commu- nications anomaly existed. In particular, Bem and Honorton focus on the experiments that are based on the ganzfeld proce- dure. They "believe that the replication rates and effect sizes achieved by one particular experimental method, the ganzfeld procedure, are now sufficient to warrant bringing this body of data to the attention of the wider psychological community" (Bern & Honorton, 1994, p. 4). They review the debate between Honorton and me over the original ganzfeld experiments. Hy- man (1985) found that these studies suffered from statistical, methodological, and documentation problems. Honorton (1985) responded that these flaws were not sufficient to account for the observed hit rates. Bern and Honorton (1994) review this controversy and cite reviewers who apparently agree with Honorton's position. The implication is that despite the defi- ciencies in the ganzfeld experiments, the results support the ex- istence of psi-a communications anomaly. To Honorton's credit, he initiated a new series of experiments that would be free from the flaws of the earlier ganzfeld database (Honorton et al., 1990). These I I new experiments, called the autogan;feldsiudies yielded consistently positive hit ratesand a highly significant overall effect. Because these new experiments showed positive results and allegedly were consistent with the earlier ganzfeld database and other psi research, Bern and Hon- orton implied that parapsychology had found its previously elu- sive repeatable experiment. Since the beginnings of psychical research in the mid-nine- teenth century, its investigators have believed that they have sci- entific evidence sufficiently strong to place before the general scientific community. Each generation has tried to get the atten- tion of the scientific community with findings that they claim to be irrefutable. The particular evidence put forth has changed from generation to generation. What a previous generation of parapsychologists considered to be a solid case for psi was aban- doned by later generations in favor of a more current candidate. This shifting database for parapsychology's best case may be why parapsychology still has not achieved the recognition it de- sires from the general scientific community. Now Bern, and Honorton (1994) believe that they have a strong case to put before the psychological community. They admit that the autoganzfeld findings still require independent confirmation. To their credit, they specify the conditions and the required sample size needed to provide adequate power. The informed critic of parapsychology might ask what makes the current situation different from the past claims for psi? Why should we now believe that Honorton and his colleagues have finally found a way to consistently produce evidence for psi? We must wait for future attempts at replication before we have an answer to the question. Bern and Honorton appear con- fident that this time is different. Their review of the ganzfeld and autoganzfeld databases encourages them to believe that consis- tent psi results are within reach. In this commentary, I provide reasons for believing that the autoganzfeld results contain in- consistencies and some unique patterns that raise doubts about their replicability.' Agreements and Differences Although my commentary focuses on my disagreements with Bern and Honorton's (1994) presentation, I would like to briefly specify some points of agreement. The autoganzfeld studies do comply with most of the "stringent standards" (p. 353) spelled out in thejoint communiqué by Hyman and Honorton (1986). I commend Honorton and his colleagues ( 1990) for creating a protocol that eliminates most of the flaws that plagued the orig- inal ganzfeld experiments. The I I autoganzfeld studies consis- tently yield positive effects that, taken together, are highly sig- nificant. I concur with Bern and Honorton's admission that Correspondence concerning this article should he addressed to Ray Hyman, Department of Psychology, University of Oregon, Eugene, Or- egon 97403. 1 Although I take a pessimistic position regarding future replications, I think it is good that Bern and the parapsychologists are optimistic. Such optimism should encourage investigators to attempt replications. These replications will eventually decide the issue. 19 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 20 RAY HYMAN "the autoganzfeld studies by themselves cannot satisfy the re- quirement that replications be conducted by a'broader range of investigators' " (p. 13). 1 also support their suggestion that sev- eral parapsychologists pool their resources and plan a large- scale ganzfeld replication in which each laboratory contributes a set of trials to the total pool. So what is there to disagree about? I disagree with Bern and Honorton about how strongly the autoganzfeld studies support the hope for a replicable psi experiment. Where they see consis- tency between the autoganzfeld studies and previous parapsy- chological findings, I see inconsistency. Although I agree that the autoganzfeld studies meet most of the stringent standards that Honorton and I spelled out, I disagree that they meet all of' those standards. Our disagreements are a matter of degree. *rhe value of discussing our disagreements is to help clarify what should constitute adequate evidence for the existence of an anomaly. The-existence or nonexistence of psi will not be settled by debate. The existence issue will be settled by independent attempts at replication-at least four of which are currently un- derway (McCrone, 1993). In explaining my disagreements, I point to weaknesses in the autoganzfeld experiments. I want to emphasize that as a single contribution to the ganzfeld database, these are commendable experiments of high quality. But no single experiment or set of studies can be perfect in all respects. When such a series is given the responsibility of carrying a burden beyond its original purposes, then various deficiencies will inevitably become ap- parent. This is the case, I believe, with the autoganzfeld studies. Internal Consistency Within the Autogan 4-M Studies Bem and Honorton describe the autoganzfeld studies as I I separate experiments conducted by eight different experiment- ers. The hit rates are positive and consistent across the studies and the experimenters. Although this is encouraging, the con- sistency tells us little about potential replicability. Neither the studies nor the experimenters are independent. The studies vary in whether they use naive or experienced subjects. However, the target set, the selection and judging procedures, the laboratory, the setting, and the procedures are identical across studies and experimenters. No experimenter is associated with a single study, nor does an experimenter have independent input into the design and procedure as happens in an independent replica- tion. Indeed, the term experimenter in this context simply refers to a person who plays an already scripted role. Any unique fea- tures of the autoganzfeld procedure- i ncl udi ng possible arti- facts-would be the same for all I I studies and the eight differ- ent experimenters. Consequently the autoganzfeld studies should be looked on as I large experiment rather than I I sepa- rate contributions. Consistency With the Original Ganzfeld Database Bern and Honorton claim that -[the autoganzfeld] results are statistically significant and consistent with those in the earlier database" (p. 13). They cite only two reasons to support this claim. The overall effect size or hit rate is approximately the same in the two databases. This apparent agreement in overall effect size is meaningless. The overall effect size in the auto- ganzfeld studies is a composite of two significantly different effect sizes-that I-or the static targets and that for the dynamic targets. The overall effect size in the ganzfeld data base is an arbitrary composite of heterogeneous effect sizes, contributed in unequal numbers, from different laboratories. The fact that the two composites yield approximately the same effect size is accidental. Both numbers could easily have been larger or smaller, depending on the mix of the arbitrary sources from which they were composed. The dynamic targets yielded a significantly higher hit rate than did the static targets in the autoganzfeld studies. Bem and Honorton argued that this was consistent with the finding that the multiple-image targets (View Master stereoscopic slide reels) in the ganzileld database yielded significantly higher hit rates than did the single-image targets. I do not believe that multiple static images on a View Master reel can be equated to the dy- namic moving image on a videoclip. However, I will not argue this point. Clearly the dynamic targets outperform the static targets in the autoganzfeld studies. Even if this is consistent with the ap- parent superiority of the View Master targets over the single- image targets, Bem and Honorton (1994) overlook a seriousdis- crepancy. Single-image targets constituted 76% of the 835 ses- sions in the ganzfeld experiments. Their average hit rate was .346. Given this effect size and the 166 trials using static targets, the power or probability of replicating this effect in the auto- ganzfeld experiments was .82. This failure to find a significant effect with the static targets was even more notable given that these experiments were conducted in "the warm social ambi- ence" (p. 14) of Honorton's laboratory. Bem and Honorton acknowledge that the autoganzfeld stud- ies failed to replicate the predicted sender-receiver pairing effect. In the original ganzfeld database, the trials on which the receiver chose a friend as a sender produced a hit rate of .44 compared with a hit rate of only .26 for those trials on which the experimenter assigned a sender. I would emphasize that given this size effect with the 198 trials with friends as senders and 128 with someone else as senders, the power of getting a significant replication of the effect is over .92. Again, given the Ipsi conducive' atmosphere of Honorton's laboratory, this fail- ure to get significance is a noteworthy inconsistency. On two key comparisons with the original ganzfeld database, the autoganzfeld fails to replicate even with adequate power. The positive hit rate and overall significance of the autoganzfeld studies are due to an essentially new type of target, presented in a new way. Even if we agree that there is a kinship between the View Master reels of the ganzfeld experiments and the dynamic targets of the autoganzfeld, we cannot ignore the differences be- tween multiple images of a travel scene presented statically with' a slide projector and excerpts from motion pictures presented with their accompanying audio on videocassettes. The prob- lems of selecting, presenting, and controlling such targets pres- ent new challenges. During the judging procedure in the origi- nal ganzfeld experiments, the target and the decoys were dis- played simultaneously. The judging procedure for the autoganzfeld involves presenting the target and its decoys one at a time. Because the positive hit rate and significance are due to an essentially new type of target presented in a new way, the need for independent replication is especially urgent. Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 ANOMALY OR ARTIFACT ConsistencY With Previous Parapsychological rindings Bern and Honorton (1994) also claimed that "there are reli- able relationships between successful psi performance and con- ceptually relevant experimental and subject variables, relation- ships that also replicate previous findings" (p. 13). They point to three such "replications." One is a small, but statistically sig- nificant, correlation of. 18 between a measure of extroversion and "psi performance." This is consistent with a tendency found in previous psi studies. Second, they report the strong psi per- formance of the Julliard students that theysce as consistent with psi studies that found a relation between psi abilities and cre- ative and artistic abilities. This latter replication is not so im- pressive when one considers that only 20 students were involved and that their performance was not significantly different from the other participants in the two studies in which they partici- pated (Fisher's exact p = .262, two-tailed). In addition. as I point out below, the Julliard students were exposed to just those conditions that favored high hit rates-targets that were re- peated, a preponderance of dynamic targets, and active prompt- ing by the experimenter during judging, Thus, it is unclear whether their high hit rate was a function of their creativity or a function of the special targets and conditions with which they happened to be associated. The third correlate could not be demonstrated for the auto- ganzfeld studies. Berri and Honorton (1994) pointed out that the subjects in the autoganzfeld tended to believe in psi, re- ported psychic experiences, and had practiced meditation or related techniques. These variables were previously reported as correlates of psi. However, I do not see how they can claim that these attributes of their subject population are a replication of previous findings. They report no correlations between these variables and performance in the autoganzfeld studies. Indeed. they cannot report any correlation because they did not have subjects who lacked these properties. We do not know if nonbe- lievers and people without psychic experiences would have per- formed better or worse than the actual subjects. In other words, they can justify only one of the correlates that they use to claim consistency with previous psi studies. Even here the relationship is weak and is just one of many previously reported correlates that might have been found. At one time, for example, parapsychologists claimed that the decline effect was a pervasive and characteristic property of psi. However. when no decline effect is found in it parapqychological study, it does not deter the experimenter from pointing to some. other significant departure from chance as evidence for psi. Note that in the autoganzfeld studies, there is no decline effect. Randomization and Claims of Psi As I already stated, I agree that the autoganzfeld studies meet most of the requirements that Honorton and I specified in our joint communiqu6 (Hyman & Honorton, 1986). One surprising exception is the inadequate testing of the randomization proce- dures. The issue ofrandomization was central to the debate con- cerning the original ganzfeld findings (Hyman, 1985). Adequate randomization procedures are critical for parapsychological re- search because the evidence for psi is based on a low probability value for a departure from a chance baseline. Such probability 21 values have meaning within an idealized statistical model of the experimental situation. Whether this statistical model applies to a given situation is an empirical matter that must be adequately justified if the stated significance levels are to be taken seriously. Appropriate randomization procedures are one way to help en- sure that the statistical model applies to the experimental data. With respect to the autoganzfeld studies, this would entail se- lecting the targets for each trial and ordering the target and de- coys during judging in a demonstrably random manner In ad- dition, following the practice of a few past researchers. the para- psychologist can also provide some post hoc analyses to show that the distributions of targets and judging orders are consis- tent with the underlying probability model. Unfortunately, the autoganzfeldstudies fell short on this crit- ical requirement. The tests for adequacy of randomization were confined to showing a uniform distribution of outputs from I to 160 for target selection and a uniform distribution of the per- mutations of all possible orderings during the judging proce- dure. Emitting a uniform distribution of target choices is a nec- essary but hardly sufficient requirement for an adequate ran- dom generator. These randomization procedures are critical because we can expect strong systematic biases during the judging procedure. The fact that the items to be judged have to be presented se- quentially, when combined with what we know about subjective validation (Marks & Kammann, 1980), would lead us to expect a strong tendency to select the fist or second items during the judging series. We would also expect strong response biases within each target pool. Bern and Honorton show such a bias in the target pool used for Study 302. Both these biases may be strengthened by the fact that the experimenter interacts with the receiverduring theJudging process. Although most receivers participate in one session, each experimenter participates in several. The response biases of the experimenters can play an important role, especially in those studies in which the experi- menter deliberately prompts the receiver to choose a particular item during the judging. Such active prompting occurred in 6 of the I I studies (Honorton et al., 1990).' If the randomizing of the selection of targets and of the order- ing of items during judging is adequate, such response biases should not affect the validity of the statistical tests. One way to prevent response biases from distorting the hit rate is to use a randornizing procedure that makes sure that each item within a target pool occur-; equally ollen.The simple randomizing pro- cedure used in the autoganzfeld studies would guarantee that each target occurred an equal proportion (not number) of times only in the very long run. In any finite number of trials, the individual targets would occur with varying frequencies. Again, if the randomization was adequate, this inequality of occur- rence would not bias the hit rate. The items in some target pools that occurred most frequently would be those that were favored 2One referee suggested that I make it clear that I am not claiming that sensory leakage occurred because of experimenter prompting. I agree. The experimenter, according to the protocol, was ignorant of which member of the target pool was the target during the judging pro- cedurc. The point is that by actively helping the subject to rate the mem- bers of the target pool. the experimenter let his or her own subjective biases enter the selection procedure. Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 22 RAY HYMAN 'Fable I Hit Rate as a Function qfihe P'requenc.v of0ccurrence (#'Targels Frequency Variable1 2 3 4 5 6 7 8 Total Hits 12 25 16 20 19 4 4 6 106 Misses36 65 26 48 36 8 3 2 224 n 48 90 42 68 55 12 7 8 330 [lit .250 .278 .381 .294.346 .331.571 .750 .321 rate by the response bias. This would bias the hit rate upward. The items in other target pools, however, that occurred most fre- quently would be those that were avoided by the response bias. 'This would bias the hit rate downward, With adequate random- ization, these two tendencies would balance each other. . Achi-ev-ing adequate raridomization is not easy. Much can go wrong-as some parapsychologists, among others, have shown. This is why it is disappointing that the autoganzfeld studies did not show the same concern for randomizing that they showed for other aspects of the methodology. This is also why, in my role of devil's advocate, I was interested in directly checking the actual distribution of target positions among the decoys during judging. Daryl Bern kindly agreed to supply me with this infor- mation along with other data from the autoganzfeld database. Unfortunately, the variable labeled position on the data sheet turned out to be the original position ofthe target in its target pool rather than its position during judging. This latter infor- mation was unavailable to either Bein or me at the time of this writing. Hit Rate and Target Frequency Because I could not directly check the adequacy of the ran- domization procedures, I tried to find some indirect indicators. If randomizing was inadequate and targets occurred with vary- ing frequency, possible biases might show up as differential hit rates for targets occurring with various frequencies. For exam- ple, if targets favored by response biases were also favored by a deficient target selection procedure, then we would find a posi- tive correlation between hit rate and target frequency. It would be possible, of course, for a deficient randomization procedure to yield a negative correlation. To see if actual repetitions of targets had any observable consequences, I tabulated the pro- portion of hits as a function of how many times a target oc- curred in this database.' As Table I shows, the relation between hit rate and target frequency was strong. The test for a linear trend among the pro- portions (Snedecor & Cochran, 1967, pp. 246-248) was positive and significant, (z = 2.49, p = .0 13, two-tailed). An indication of the strength of this trend is given by the Spearman rank order correlation between the hit rate and target frequency, which was .83. Another way to look at this relationship would be to com- pare the hit rate of targets that occurred once or twice (.27) with those that appeared three or more times (.36). This pattern exists separately for the static and dynamic targets, although it is stronger among the dynamic targets. The static targets that occurred once or twice had a hit rate of .22 compared with a hit rate of .31 for those that occurred more than twice. The hit rate was .32 for those dynamic targets that occurred once or twice as compared with a hit rate of .41 for those that occurred three or more times. Target Occurrence and Experimenter Prompting What accounts for this peculiar relationship? Is the correla- tion between target frequency and hit rate determined by which particular targets get repeated? Or does replication itself some- how increase the hit rate? If the relation is diAe to response bi- ases, I would expect experimenter prompting to affect the later occurrences of targets rather than their first occurrences. With these questions in mind, I conducted a multinomial analysis of variance (Woodward, Bonett, & Brecht, 1990). In this analysis, hit rate was the dependent variable, and 3 two-level factors were the independent variables: target type (static, dynamic), target occurrence (first, later), and experimenter prompting (no, yes). Of the interactions, only that between target occurrence and experimenter prompting was significant, X'(1, N = 330) = 6.83, p .009. The two significant main effects were target type, X2( 1, N 330) = 4.76, p = .030, and target occurrence, X2( 1, N 330) = 11.56,p <.001. The difference between the hit rate for dynamic targets (.356) and that for static targets (.249) does not interact with the other two factors and will be ignored for the present discussion.' The meaning of the interaction between target occurrence and prompting can be seen in the simple effects of target occurrence within each level of prompting. With no experimenter prompt- ing, the effects of target occurrence were minimal: The hit rate for first occurrences of targets was .291 and that for later occur- To be consistent with Rem and Honorton, I treated the basic data- base as the 330 sessions irl Studies I through 301. Study 302, which used a single target pool, was treated as a special case. " These hit rates are slightly different from those used by Honorton et al. (1990) and Rem and Honorton (1994). This is because they com- puted hit rates for any category by simply dividing the number of hits by the total number in that category. The hit rate for dynamic targets obtained with this approach is 61/164 = .372 and that for static targets is 45/166 = .271. These rates are means weighted by the number of cases in the cells for each combination of levels of the factors. For the purposes of additivity of effects, I am using the unweighted means (each cell ofthe design is weighted equally). This removes distortions and con- lbunds that are due to unequal cell sizes. In the present case, the differ- ences are small and inconsequential. I am supplying this footnote to explain some discrepancies that might confuse the reader. Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 ANOMALY OR ARTIFACT renceswas.334, X2(1, Ar= 181) = .396, p = .534. Theeffect. of target repetition combined with experimenter prompting, how- ever. was very large. The hit rate for first occurrences of targets with experimenter prompting was only .140. The hit rate for later occurrences of targets when combined with experimenter promptingjumped to.445. This gain was significant, X'(1, Ar = 149) = 14.702, p = .0001. These results suggest that experi- menter prompting depresses hit rates for first occurrences of targets and enhances hit rates for subsequent occurrences of targets. Internal Checks on the Validi~y of This Pattern Is this peculiar relation among hit rate, target frequency, and experimenter prompting merely a fluke? I broke the data into subsets in several ways to see if the pattern was consistent in the different subcategories. I checked this pattern within the dy- namic and static targets separately. I compared Targets I to 80 with Targets 81 to 160. Likewise, I looked for the pattern within Studies 101 through 103 taken as a group as compared with Studies 104 through 301 considered as a group. I also checked for this pattern for each of the five experimenters who contrib- uted the most sessions. Although the numbers became small in some of these comparisons, the hit rate was consistently larger for later as opposed to first occurrences of a target. I found just one nonsignificant exception in the trials for one experimenter. Likewise, wherever meaningful comparisons were possible, the interaction between prompting and target occurrence oc- curred.' For this database, then, the dependence of hit rate on frequency of target occurrence and experimenter prompting was a robust effect, Implications As far as I know, this dependence of hit rate on target occur- rence and experimenter coaching has never been previously re- ported in parapsychological research. One referee suggests that the dependence of hit rate on target frequency and prompting may reveal important moderator variables rather than artifacts. The referee may be correct. The skeptic, however, might point to the long history of alleged "moderator" variables in parapsy- chology-such as the decline effect, displacement effects, sheep-goat effects, and others. The problem is that when such moderators are discovered in the data, they are put forth as im- portant indicators or characteristics of psi. The absence of such characteristics in subsequent data, however, does not deter para- psychologists from claiming evidence for psi if they find a sig- nificant hit rate. This is the troublesome problem of boundary conditions. The parapsychologists have been unable to specify what would constitute the absence of psi. The positive effect for repeated occurrences of a target may eventually turn out to be an important property of psi-if psi exists. However, the fact that first occurrences of a target pro- duce a hit rate consistent with chance raises questions. All of the positive effect in the ganzfeld experiments rests on those targets that have occurred more than once. Tho prompting cffect is even more curious. On first occurrences of a target, active coaching by the experimenter seems to depress the hit rate-.28 without prompting versus .15 with prompting. For 23 second or later occurrences of a target. active coaching appears to enhance the hit rate-33 without prompting versus.45 with prompting. If the prompting by the experimenter is intended to increase reliability by reminding the receiver of ganzfeld asso- ciations that he or she might overlook during judging, why should the effects of such prompting show up only for the sub- sequent occurrences of a target? That hit rate correlates with frequency of target occurrence could mean that the "better- targets are somehow selected more often by the randomizing procedure. Or it could mean that fre- quency of occurrence, itself, is the determinant of a higher hit rate. The data suggest the latter possibility. The 48 targets that occurred exactly once in the database had a hit rate of 22. The first occurrence of the targets that occurred more than once had a hit rate of .23. The combined hit rate for second or later oc- currences of targets was .39. Another way of examining this re- lation would be to look separately at the hit rates for first and subsequent occurrences of targets that appeared exactly twice, three times, four times, and so on. Only the targets that oc- curred from two to five times could be used because only one or two targets appeared with frequencies of six or more. In all these comparisons, the first occurrences consistently had a lower hit rate than subsequent occurrences of the same targets. Whatever the source for this pattern, it raises questions about interpretations of other findings in the database. For example, Bern and Honorton (1994) pointed to the high rate of.50 for the 20 Julliard students as evidence for the effect of artistic cre- ativity on hit rate. However, all of the sessions in which the Jul- hard students appeared were prompted, and 15 of the 20 used second or later occurrences of a target. On the five targets that were occurring for the first time, these students got one hit. Con- scquently, we cannot tell if the hit rate for these students reflect any special abilities or if they are due to whatever makes hit rate a function of target frequency and coaching in this database. Are these findings due to an artifact, or do they point to some new, hitherto unrecognized property of psi? We cannot say. The existence of this pattern in the database. however, strongly sup- ports the need to replicate the findings before we can be confi- dent that the parapsychologists have finally found a way to cap- ture and tame their elusive quarry. Conclusions The autoganzfeld experiments are a praiseworthy improve- ment in methodological sophistication and experimental rigor over the previous ganzfeld experiments. Despite these improve- ments, the experiments fall disappointingly short in the critical area of justifying the randomization procedures. Even though all but one of the individual studies produced a positive effect size and the overall effect was significant, the autoganzfeld ex- periments do not constitute a successful replication of the orig- inal ganzfeld experiments. As noted in Footnote 5 of the Bem-Honorton article, a recent review of the original computer files uncovered a duplicate record in the auto- ganzfeld database. This has now been eliminated, reducing by one the number of sessions on which my analysis was based. Some experiment- ers contributed only unprompted sessions, and some contributed mainly prompted sessions. Approved For Release 2000/08/08 CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 24 RAY HYMAN Although Bern and Honorton point to consistencies between the autoganzfeld results and those of previous parapsychologi- cal research, these consistencies are more apparent than real. On the other hand, as I have argued, important inconsistencies exist between the two databases. Three robust effects in the autoganzileld database are the de- pendence of hit rate on type of target (dynamic or static), target occurrence (first or subsequent), and experimenter prompting (yes or no). Although my looking for effects of the latter two factors was motivated by my concern for possible randomiza- tion deficiencies, their existence should interest both parapsy- chologists and critics. This is because the existence of an effect depends on these factors. The combination of dynamic targets, repeated occurrences of a target, and experimenter prompting produces a hit rate of .471 with 95% confidence limits from .305 to.629. The combination of static targets, first occurrences of a target, and no prompting yields a hit rate of. 178 with 95% confidence limits from.066 to.336. We do not have enough information to know if the depen- dence of hit rate on target frequency and experimenter prompt- ing involves response preferences for items within a target pool. One way to ensure that such preferences do not bias the hit rate is to present each member of a target pool equally often. I tried to get some idea what the hit rate might be if each member of a target pool had occurred equally often. I restricted myself just to those target pools in which each member occurred at least once.' The hit rate for first occurrences of a target in these target pools was .275 with 95% confidence limits from .167 to .399. (The hit rate for the targets in these target pools that occurred a second or later time was .427.) This finding does not prove anything, but it suggests that if the targets within each target pool had occurred equally often, the results might have been consistent with chance. The autoganzfeld studies failed to replicate key findings of the original ganzfeld experiments, even though the power was sufficient. The positive effect size and significance depended on a new type of target whose presentation involves a new technol- ogy and on target repetition and experimenter coaching. What- ever their source, these effects are new to the psi literature. We do not know how much of this is unique to this experimental setup and laboratory. For these reasons, we have to wait for fu- ture attempts at replication to see if a replicable psi effect is at hand. I could not use higher frequency of occurrence because only three target pools existed in this database that had at least two occurrences of each of its members. References Bem, D.J., & Honorton, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psjvhological Bul- letin, 115. 4-18. Honorton, C. (1985). Meta-analysisof psi ganzfeld research: A response to Hyman. Journal oj'Paraps.Khology 49, 51-9 1. Honorton, C., Berger, R.E., Varvoglis, M.P., M., Derr, P., Schechter E.I., & Ferrari, D.C, (1990). Psi communication in the ganzfeld: Experi- ments with an automated testing system and a comparison with a meta-anaiysis of earlier studies. Journal of Parapsychoiogy 54, 99- 139. Hyman, R. (1985). The ganzfeld psi experiment: A critical appraisal. Journal ofParaps.Khology, 49.3-49. Hyman, R., & Honorton, C. (1986). Ajoint communiqu6: The psi pnz- feld controversy. Journal ofParapsjchology, 50, 351-364. Marks, D., & Kammann, R. (1980). The ps)rhology of the ps)rhic. Buffalo, NY: Prometheus Books. McCrone, J. (1993, May 1). Roll up for the telepathy test. New Scientist, pp. 29-33. Snedecor, G.W., & Cochran, W.G. (1967). Statistical methods (6th ed., pp. 246-248). Ames: Iowa State University Press. Woodward, J.A., Bonett, D.G., & Brecht, M. L. (1990). Introduction to linear models and experimental design. San Diego, CA: Harcourt Brace Jovanovich. Received July 26, 1993 Accepted July 27, 1993 n Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Psychok) ical Bulletin Cmyrighl 1994 by the American 1`4ychotogical Association, Inc. or -27 1994, v.H 5. No. 1. 25 0033-29M/94/$3.00 Response to Hyman Daryl J. Bem R. Hyman (1994) raises two major points about D. J. Bern and C. Honorton's (1994) article on the psi ganzfeld experiments. First, he challenges the claim that the results of the auloganzfeld experi- ments are consistent with the earlier database. Second, he expresses concerns about the adequacy of the randomization procedures. In response to the first point, I argue that our claims about the con- sistency of the autoganzfeld results with the earlier database are quite modest and challenge his counterclaim that the results are inconsistent with it. In response to his methodological point, I present new analyses that should allay apprehensions about the adequacy of the randomization pro- cedures. I am pleased that Ray Hyman, one of parapsychology's most knowledgeable and skeptical critics, concurs with Charles Hon- orton and me on so many aspects of the autoganzfeld experi- ments: the soundness of their methodology, the clear rejection of the null hypothesis, and. of course, the need for further repli- cation. I hope this brief response will further augment our areas of agreement. Hyman raises two major points about our article. First, he challenges our claim that the results of the autoganzfeld studies are consistent with those in the earlier database. Second, he ex- presses concerns about the "incomplete justification of the ade- quacy of the randomization procedures" and speculates that inadequate randomization may have interacted with subject or experimenter response biases to produce artiflactual results. Consistency With the Earlier Database The earlier ganzfeld database comprised studies whose meth- lods and results were quite heterogeneous. Consequently, one cannot justify any strong claims that some subsequent finding is either consistent or inconsistent with that database. For this reason, Honorton and I were careful not to make such claims. With regard to the major finding, we simply observed that ear- lier studies had achieved an overall bit rate of about 3317v (25% would be expected by chance) and noted that the autopnzfeld experiments achieved approximately the same effect size. End ofclaim. In general, the earlier database served primarily to suggest the kinds of variables that needed to be examined more systemati- cally or more rigorously in the new studies. For example, previ- ous ganzfeld studies that had used multi-image View Master slide reels as target stimuli obtained significantly higher hit rates I am grateful to Richard Broughton of the Institute for Parapsychol- ogy in Durham, North Carolina, for going through the original auto- ganzfeld computer files with me to unearth the data necessary for the additional analyses presented in this response. Correspondence concerning this article should be addressed to Daryl J. Bern, Department of Psychology, Uris Hall, Cornell University, Ith- aca, New York 14853. Electronic mail may be sent to d.bem@cor- nell.edu. than did studies that had used single-image photographs. This finding prompted Honorton and his colleagues to include both video film clips and single-image photographs in the autoganz- feld experiments to determine whether the former were supe- rior. They were. Our only claim about methodological compa- rability was the modest observation that "by adding motion and sound, the video clips might be thought of as high-tech versions of the View Master reels." But Hyman argues at length that video clips are not really like View Master reels, Surely this is a matter of interpretation, but does it really matter? Usually in psychology, successful con- ceptual replications inspire more confidence about the reality of the underlying phenomenon than do exact replications. I be- lieve that to be the case here. An example of a variable selected from the earlier database for more rigorous reexamination was sender-receiver pairing. Previous ganzfeld studies that permitted receivers to bring in friends to serve as senders obtained significantly higher hit rates than did studies that used only laboratory-assigned senders. But as we emphasized in our article, "there is no record of how many participants in the former studies actually brought in friends," and hence these studies do not provide a clean test of the sender-receiver variable. Moreover, the two kinds of studies differed on many other variables as well. In the autoganzfeld studies, all participants were free to bring in friends, and it was found that sender-receiver pairs who were friends did, in fact, achieve higher hit rates than did sender- receiver pairs who were not friends (35% vs. 29%). But the reli- ability of this finding is equivocal. In the archival publication of the autoganzfeld studies, Honorton et al. ( 1990) presented this finding as a marginally significant point-biscrial correlation of .36 (p = .06). In our article, however, we chose to apply Fisher's exact test to the hit rates themselves. Because this yielded a non- significant p value, we thought it prudent simply to conclude that "sender-receiver pairing was not a significant correlate of psi performance in the autoganzfeld studies." But to Hyman, "this failure to get significance is a noteworthy inconsistency." (In part, he makes it appear more inconsistent than it is by erroneously stating that the earlier database yielded a significant difference in performance between friend pairs and nonfriend pairs. As noted earlier, this is an indirect inference at best.) 25 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 26 DARYL J. BEM I submit that Hyman is using a double standard here. If* the successful replication of the relation between target type and psi performance is not analogous to the earlier finding with the View Master reels, then why is this near miss with a method- ologically cleaner assessment of the sender-receiver variable a noteworthy inconsistency"? Hyman cannot have it both ways. If the heterogeneity of the original database and the methodological dissimilarities be- tween its variables and those in the autoganzfeld studies pre- ciude strong claims of consistency, then these same factors pre- clude strong claims of inconsistency. Randomization As we noted in our article, the issue of target randomization is critical in many psi experiments because systematic patterns in inadequately randomized target sequences might be detected by subjects during a session or might match their preexisting response biases. In a ganzfeld study, however, randomization is less problematic because only one target is selected during the session and most subjects serve in only one session. The primary concern is simply that all the stimuli within each judging set be sampled uniformly over the course of the study. Similar consid- erations govern the second randomization, which takes place after the ganzfeld period and determines the sequence in which the target and decoys are presented to the receiver for judging. In the 10 basic autoganzfeld experiments, 160 film clips were sampled for a total of 329 sessions; accordingly, a particular clip would be expected to appear as the target in only about 2 ses- sions. This low expected frequency means that it is not possible to statistically assess the randomness of the actual distribution observed. Accordingly, Honorton et al. ( 1990) ran several large- scale control series to test the output of the random number generator. These control series confirmed that it was providing a uniform distribution of values through the full target range. Statistical tests that could legitimately be performed on the ac- tual frequencies observed confirmed that targets were, on aver- age, selected uniformly from among the four film clips within each judging set and that the four possible judging sequences were uniformly distributed across the sessions. Nevertheless, Hyman remains legitimately concerned about the adequacy of the randomizations and their potential interac- tions with possible receiver or experimenter response biases. Two kinds of response bias are involved: differential preferences for video clips on the basis of their content and differential pref- erences for clips on the basis of their position in the judging sequence. Conleni-Relaled Response Bias Because the adequacy of target randomization cannot be sta- tistically assessed owing to the low expected frequencies, the possibility remains open that an unequal distribution of targets could interact with receivers' content preferences to produce artifactually high hit rates. As we reported in our article, Hon- orton and I encountered this problem in an autoganzfeld study that used a single judging set for all sessions (Study 302), a prob- lem we dealt with in two ways. To respond to Hyman's concerns, I have now performed the same two analyses on the remainder of the database. Both treat the four-clip judging set as the unit of analysis, and neither requires the assumption that the null baseline is fixed at 25% or at any other particular value. In the first analysis, the actual target frequencies observed are used in conjunction with receivers' actual judgments to derive a new, empirical baseline for each judging set. In particular, I multiplied the proportion of times each clip in a set was the target by the proportion, of times that a receiver rated it as the target. This product represents the probability that a receiver would score a hit if there were no psi effect. The sum of these products across the four clips in the set thus constitutes the em- pirical null baseline for that set. Next, I computed Cohen's mea- sure of effect size (h) on the difference between the overall hit rate observed within that set and this empirical baseline. For purposes of comparison, I then reconverted Cohen's h back to its equivalent hit rate for a uniformly distributed judging set in which the null baseline would, in fact, be 25%. Across the 40 sets, the mean unadjusted hit rate was 31.5%. significantly higher than 25%, one-sample t(39) = 2.44, p = .01, one-tailed. The new, bias-adjusted hit rate was virtually identi- cai (30.7%), t(39) = 2.37, p = .0 1, td,fr(39) = 0.85, p = .40, indi- cating that unequal target frequencies were not significantly in- flating the hit rate. The second analysis treats each film clip as its own control by comparing the proportion of times it was rated as the target when it actually was the target and the proportion of times it was rated as the target when it was one of the decoys. This procedure automatically cancels out any content-related target prefer- ences that receivers (or experimenters) might have. First, I cal- culated these two proportions for each clip and then averaged them across the four clips within each judging set. The results show that across the 40 j udgi ng sets, clips were rated as targets significantly more frequently when they were targets than when they were decoys (29% and 22%, respectively), paired t(39) = 2.03, p = .025, one-tailed. Both of these analyses indicate that the observed psi effect cannot be attributed to the conjunction of unequal target distributions and content-related response bi- ases. Sequence-Related Response Bias Hyman is also concerned about the randomization of the judging sequence because we can expect strong systematic biases during the judging procedure. The fact that the items to bejudged have to be presented sequentially, when combined with what we know about subjective validation . . . would lead us to expect a strong tendency to select the first or second items during thejudging series. Hyman's hypothesis is correct: As shown in Table 1, receivers do display a position bias in their judgments X2 (3, N = 354) = 8.64, p < .05, tending to identify as targets clips appearing either first or last in the judging sequence. Moreover, the actual distri- bution of targets across the judging positions also departs sig- nificantly from a uniform distribution, X2 (3, N = 354) = 7.83, p < .05, with targets appearing most frequently in the third po- sition. To determine whether the conjunction of these two unequal distributions might contribute artifactually to the hit rate, one Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5 RESPONSE TO HYMAN can again combine the observed frequencies to derive an empir- ical null baseline. As shown in Table 1, each proportion in the second column can be multiplied by the corresponding propor- tion in the third column to yield the hit rate expected if there were no psi effect. As shown, the expected hit rate across all four judging positions is 24.7%. The pertinent fact here is that this is lower than the 25% that would have been obtained if the target positions had been uni- formly distributed across the sessions. In other words, the con- junction of receivers' position biases with the imperfect ran- domization of target positions works against successful psi per- formancc in these data. Again, inadequate randomization has not contributed artifactually to the hit rates. Alternative Randomizing Strategies? Hyman suggests that "one way to prevent response biases from distorting the hit rate is to use a randomizing procedure I that makes sure that each item within a target pool occurs equally often." Coming from a critic as sophisticated as Hyman, this is a very puzzling suggestion, because he appears to be sug- gesting some variant of sampling without replacement, a proce- dure that would virtually guarantee response-bias artifacts. For example, if receivers tend to avoid selecting targets that ap- peared in previous sessions, this response bias would coinci e with the actual diminishing probabilities that a previously seen target would reappear. The experimenters-who participate in many sessions and discuss them with one another-are in an even better position to detect and possibly to exploit the dimin- ishing probabilities of target repetition. Sampling without re- placement is precisely what enables card counters to improve their odds at blackjack. Alternatively, perhaps Hyman is advocating a procedure in which the experiment continues until each clip within ajudging set appears as a target a predesignated minimum number of times. For purposes of analysis. the investigator then randomly discards excess sessions until the target frequencies are equal- ized at that minimum number. This would solve the response- bias problem but would be enormously wasteful. Suppose, for example, that only 4 sessions from each judging set would have to be discarded, on average, to equalize the target frequencies. With 40 judging sets, the investigator would end up discarding 160 sessions, equal to nearly half of the sessions that took Honorton and his colleagues 61/2 years to collect! Only a study with many fewer judging sets could reasonably implement this strategy. Hit Rates as a Function of Target Repetition In his post hoc excursion through the autoganzfeld data, Hy- man uncovered an unexpected positive relationship between hit rates and the number of times targets had been targets in previ- ous sessions. (Ironically, Hyman has been one of the most out- spoken critics of parapsychologists who search through their data without specific hypotheses and then emerge with unex- pected "findings.") If this finding is reliable and not just a fluke of post hoc ex- ploration, then it is difficult to interpret because target repeti- tion is confounded with the chronological sequence of sessions: 27 Table I Proportion qfSessions in Which Each Clip Was Selected as the Target and Proportion in Which It Appeared as the Target Position in Expected judging Selected Appeared hit rate sequence as target as target M 1 .30 .25 7.5 2 .20 .24 4.9 3 .22 .31 6.7 4 .28 .20 5.6 Total 1.00 1.00 24.7 Note. N = 354 sessions. Higher repetitions of a target necessarily occur later in the se- quence than lower repetitions. In turn, the chronological se- quence of sessions is confounded with several other variables, including more experienced experimenters, more "talented" receivers (e.g., Juilliard students and receivers being retested be- cause of earlier successes), and methodological refinements in- troduced in the course of the program in an effort to enhance psi performance (e.g., experimenter "prompting"). Again, however, Hyman's major concern is that this pattern might reflect an interaction between inadequate target random- ization and possible response biases on the part of those receiv- ers or experimenters who encounter the same judging set more than once. This seems highly unlikely. In the entire database, only 8 subjects saw the samejudgingset twice, and none of them performed better on the repetition than on the initial session. Similar arithmetic applies to experimenters: On average, each of the eight experimenters encountered a given judging set only 1.03 times. The worst case is an experimenter who encountered the same judging set 6 times over the 61/2 years of the program. These six sessions yielded three hits, two of them in the first two sessions. At the end of his discussion, Hyman wonders whether this relationship between target repetition and hit rates is "due to an artifact or [does it] point to some new, hitherto unrecognized property of psi?" If it should turn out to be the latter, then I believe it only appropriate that parapsychologists reward his serendipity by calling it the Hyman Effect. References Bern, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bul- letin. 115.4-18. Honorton, C., Berger, R. E., Varvoglis, M. R. Quant, M., Derr, P., Schechter, E. L, & Ferrari, D. C. (1990). Psi communication in the ganzfeld: Experiments with an automated testing system and a com- parison with a meta-analysis of earlier studies. Journal of Parapsy- chology, 54,99-139. Hyman. R. (1994). Anomaly or artifact? Comments on Bern and Hon- orton. P5.0iological Bulletin. 115,19-24. Received August 9, 1993 Accepted August 9, 1993 0 Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200100001-5