
    UNITED STATES of America, v. Robert N. ANGLETON, Defendant.
    No. CR. H-02-0040.
    United States District Court, S.D. Texas, Houston Division.
    June 26, 2003.
    See also 221 F.Supp.2d 696.
    
      Michael Wayne Ramsey, Houston, TX, for Robert Nicholas Angleton, defendant.
    Edward F Gallagher, U.S. Attorneys Office, Terry Clark, U.S. Attorneys Office, Melissa Annis, U.S. Attorneys Office, U S Marshal — H, U S Probation — H, Pretrial Services — H, Financial Litigation, U S Attorney’s Office, Southern District of Texas, James L Turner, U.S. Attorneys Office, Houston, for U.S. Attorneys.
   MEMORANDUM AND OPINION

ROSENTHAL, District Judge.

Defendant Robert Angleton seeks to introduce expert testimony of Stephen Cain on the identity of an individual speaking on a tape recording. This recording was seized from Roger Angleton by Las Vegas, Nevada police officers on July 23, 1997. The government has moved to strike Cain’s expert testimony on the ground that it does not meet Federal Rule of Evidence 702 and the case law following Dauberb v. Merrell Dow Pharm., 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).

This court has carefully considered the record, including the evidence presented at the hearings held on April 28-29, 2003 and May 9-10, 2003; the motions and responses; the parties’ submissions; and the applicable law. Based on this review, this court concludes that Stephen Cain’s testimony does not meet the standard necessary for admission under Rule 702. The government’s motion to exclude the testimony of Stephen Cain is GRANTED. The reasons for this ruling are set out below.

I. The Applicable Law

Rule 702 provides:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.

Rule 702 requires district judges to ensure that testimony resting on specialized knowledge is sufficiently reliable to assist the factfinder. Dauberb, 509 U.S. at 597, 113 S.Ct. at 2799. The district judge must first determine whether the proffered testimony is the product of reliable principles and methods. Second, the district judge must determine that the testimony is relevant, that is, that the reasoning or methodology is reliably applied to the facts in issue. Id. at 592-93, 113 S.Ct. at 2796.

The Supreme Court has set out nonexclusive and nondispositive factors to aid a trial court in determining whether methodology is reliable. They are:

(1) whether the theory or technique has been tested; (2) whether the theory or technique has been subjected to peer review and publication; (3) the known or potential rate of error of the method used and the existence and maintenance of standards controlling the technique’s operation; (4) the existence and maintenance of standards and controls; and (5) whether the theory or method has been generally accepted by the scientific community.

Dauberb, 509 U.S. at 593-94, 113 S.Ct. at 2796-97. The Advisory Committee Notes to Rule 702 emphasize that “the trial court must scrutinize not only the principles and methods used by the expert, but also whether those principles and methods have been properly applied to the facts of the case.” As the Fifth Circuit stated in Watkins v. Telsmith, Inc., 121 F.3d 984, 991 (5th Cir.1997), “whether an expert’s testimony is based on ‘scientific, technical or other specialized knowledge,’ Daubert and Rule 702 demand that the district court evaluate the methods, analysis, and principles relied upon in reaching the opinion. The court should ensure that the opinion comports with applicable professional standards outside the courtroom and that it ‘will have a rehable basis in the knowledge and experience of [the] discipline.’ ” Id. (quoting Daubert, 509 U.S. at 592, 113 S.Ct. at 2796).

The government contends that the aural spectrographic method for voice identification in general, and Cain’s application of that method in particular, do not meet the Rule 702 and Daubert standards of admissibility. Defendants urge that the government is imposing overly rigid criteria. This court must determine whether Cain’s testimony had a sufficient basis; used a reliable method; and properly applied the method to the facts of the case.

II. Analysis

Cain testified that he used an enhanced copy of the Q-l tape seized from Roger Angleton and compared the unknown voice on that tape recording with an exemplar recording of Robert Angleton reading portions of what is contained on Q-l. Cain reached a finding of “possible elimination,” meaning that at least 80 percent of the comparable words on the exemplar and the Q-l recording were dissimilar, with at least ten words that did not match.

A. The Record

“Aural spectrographic” voice identification, as its name suggests, is a two-step method applying both aural and visual components to determine the identity of an unknown recorded speaker. First, the investigator must check the recording of the unknown speaker to determine whether it has a sufficient amount of speech for analysis. (Defendant’s Exhibit 12, p. 1). The investigator then obtains a recording of an exemplar of the known speaker’s speech, in which the subject repeats the recorded statements of the unknown speaker. (Id. at p. 2).

The investigator aurally compares the recording of the unknown speaker and the exemplar of the suspect. The investigator listens for such factors as accent and dialect, inflection, syllable grouping and breath patterns, and the presence of speech pathologies or other unusual speech habits. (Id. at pp. 2-3). In the spectrographic comparison, the examiner visually compares a spectrogram of recordings of the known and unknown speakers. A spectrogram is “a graphic display of the recorded signal on the basis of time and frequency with a general indication of amplitude.” (Id. at p. 3). The investigator looks for both similarities and differences in various psychoacoustical features of speech, such as bandwidth, mean frequencies, distribution of formant energy, and nasal resonances. (Id.). The investigator then integrates the findings from the aural and spectrographic comparisons to reach a conclusion of identification, probable identification, possible identification, inconclusive, possible elimination, probable elimination, or elimination. (Id.).

Stephen Cain is president of Forensic Tape Analysis. He holds two degrees in forensic science. (Docket Entry No. 160, p. 6, 1.13-1.18). He trained in voice identification at Michigan State University for two years under Professor Oscar Tosi, an influential researcher in voice spectro-graphic analysis. He later worked for the United States Secret Service and began performing voice identification in 1980. (Id. at p. 11, 1.21-22). The Secret Service sent Cain to train with Lieutenant Lonnie Smrkovski of the Michigan State Police, who ran a voice spectrographic identification laboratory for the Michigan State Police. Cain was certified as a voiceprint analyst by the International Association for Identification in 1981. (Id. at p. 16, 1. 2). He later moved to the Internal Revenue Service, where he continued to conduct voice identification. In 1989, he went into a private consulting practice, performing voice identification and tape enhancement and authentication for use in litigation. (Id. at p. 20,1.10-1.24).

Cain conducted an aural spectrographic analysis of the unknown speaker on the tape labeled Q-l that was seized from Roger Angleton. Cain testified that he followed the protocol of the American Board of Recorded Evidence (“ABRE”) in his analysis. (Docket Entry No. 160, p. 66, 1.10 — p. 67, 1.9). Cain testified that his opinion was a possible elimination and stated “that it was unlikely” that the unknown speaker on the Q-l recording was Robert Angleton. (Docket Entry No. 160, p. 69,1.11-1.21).

The government presented Dr. Hirotaka Nakasone as a witness on the use and reliability of the spectrographic methods of voice identification. Nakasone has been working in the field of speech recognition since 1977. He obtained a master’s degree in audio speech sciences from Michigan State University in 1978 and a doctorate in speech sciences from Michigan State in 1984. (Docket Entry No. 166, p. 7, 1.10-1.20). While at Michigan State, Nakasone worked with Professor Oscar Tosi. Id. Na-kasone conducted voice spectrographic research for the Los Angeles County Sheriffs Department and on several occasions testified in courts and administrative tribunals as to the results of the spectrographic method of voice identification. Since 1992, Nakasone has worked for the Federal Bureau of Investigation, conducting research in audio forensic identification. His current research is in developing computer-assisted voice identification systems. (Docket Entry No. 167, p. 98, 1.7-p.l00, 1.12).

The parties have also submitted a number of publications on voice identification techniques, including the aural spectro-graphic technique. The testimony and submissions are examined against the case law and Rule 702.

B. The Case Law on Voice Identification Techniques

Before the Supreme Court’s decision in Daubert, several circuits admitted expert voice identification testimony under the standard of Frye v. U.S., 293 F. 1013 (D.C.Cir.1923). See, e.g., U.S. v. Smith, 869 F.2d 348 (7th Cir.1989); U.S. v. Jenkins, 525 F.2d 819 (6th Cir.1975); U.S. v. Baller, 519 F.2d 463 (4th Cir.1975). Since Daubert, no federal appellate court has approved the admission of voice spectro-graphic expert testimony into evidence. The Fifth Circuit has stated that the state of the law concerning expert voice identification is “ambiguous” in the wake of Daubert. See U.S. v. Drones, 218 F.3d 496, 503 (5th Cir.2000). The issue in Drones, a collateral challenge to a criminal conviction, was whether the defendant received ineffective assistance of counsel because his counsel failed to obtain and introduce expert voice identification testimony that defendant’s voice was not the one recorded in a telephone conversation discussing a narcotics purchase. The trial court vacated defendant’s sentence, finding that defense counsel’s failure to investigate and present voice identification evidence was unreasonable and constituted ineffective assistance. The government appealed on two grounds: that defense counsel’s decision not to investigate the identity of the voice on the disputed recording was reasonable; and that even if this decision was unreasonable, it did not prejudice defendant. Id. at 500.

The appellate court found that counsel’s failure to investigate voice identification analysis could be constitutionally deficient only if it excluded competent evidence. Id. at 503. The court began its analysis of whether voice identification testimony was competent evidence by noting that four circuits had upheld the admissibility of voice spectrographic analysis under Frye v. U.S., 293 F. 1013 (D.C.Cir.1923) but that as of 2000, no federal court had considered the admissibility of expert voice identification testimony under Daubert. Id. The court reviewed the expert testimony presented to the trial court and concluded that “spectrographic analysis is ... of questionable scientific validity.” Id. The court particularly noted the testimony of Bruce Koenig, a private voice identification consultant who had previously worked for the Federal Bureau of Investigation. After years of study, Koenig had concluded that there was no demonstrable scientific basis for the assumption on which voice spectrograph analysis rests: that each person’s voice is unique and that intras-peaker variations can reliably be distinguished from interspeaker variations. Koenig testified that the number of voice spectrograph analysts had dropped significantly from fifty to sixty practitioners in the 1970s to approximately a dozen at the time of defendant’s trial in 1995. Stephen Cain, defendant’s proffered expert in this case, also testified in Drones. He admitted that he did not know whether spectro-graphic evidence was widely accepted by the relevant scientific community. Cain also testified that several factors, such as the possibility that a defendant would disguise his voice, could affect the reliability of voice spectrographic analysis.

The Drones court did not decide whether voice spectrographic analysis meets the Daubert admissibility standard. Id. at 504 n. 9. Instead, the court held that given “the uncertainty of the state of the law regarding the reliability and admissibility of expert voice identification evidence,” defense counsel’s failure to put on voice identification evidence did not constitute ineffective assistance of counsel. Id. at 504.

Although the Drones court did not decide whether expert voice spectrograph analysis testimony was admissible under Daubert, it recognized problems in the reliability and scientific validity of the voice spectrograph method. No other circuit court has analyzed voice spectrograph identification techniques at the same level of detail as Drones. In U.S. v. Bahena, 223 F.3d 797 (8th Cir.2000), the court upheld the trial court’s decision to strike a voice spectrographic expert’s testimony because of the expert’s lack of experience and credentials. Nakasone testified in Bahena as to flaws in the voice identification testimony. Nakasone testified that the witness did not know how the original tape used in the analysis was produced and did not use that original tape to perform his analysis. The Bahena court assumed that voice spectrographic analysis could be admitted under Daubert, but held that the credentials of, and the methodology used by, the expert proffered in that case did not meet the admissibility standards.

The Alaska Supreme Court in State v. Coon, 974 P.2d 386 (Alaska 1999) is apparently the only appellate court that has affirmed the admissibility of spectrographic evidence under a Daubert analysis. Cain was the expert witness in that case; the trial court admitted his testimony on the results of aural spectrographic voice identification analysis. The Alaska Supreme Court applied the Daubert standard and found that the trial court’s decision was not clear error. The appellate court stated that “it is not clear that voice spec-trographic analysis has gained general acceptance in the relevant scientific community,” and cited the same Koenig article presented to this court as Government Exhibit 11. 974 P.2d at 402. The Alaska Supreme Court also noted that the scientific literature “permit[ted] a conclusion that there is significant disagreement among experts in the field of voice spectrographic analysis regarding the reliability of the technique.” Id. The Drones court similarly noted the “uncertainty of the current state of the law regarding the reliability and admissibility of expert voice identification evidence.” 218 F.3d at 504.

In summary, few reported cases have subjected voice spectrograph identification methods to a Daubert analysis. Even the cases that found the admission of such expert testimony not to be clear error express doubts about the reliability, scientific validity, and acceptance of voice spec-trographic analysis. See generally Michelle Meyer McCarthy, Admissibility and Weight of Voice Spectrographic Analysis Evidence, 95 A.L.R. 471 § 17 (2002).

C. The Record as to the Reliability of the Methodology

1. Whether Voice Spectrographic Analysis Has Been Tested and Subjected to Peer Review and Publication

The record in this case exemplifies the concerns that have led courts to find voice speetrographic identification techniques of questionable scientific validity and reliability. Drones, 218 F.3d at 503. The studies both parties presented document significant long-standing, unresolved doubts about the reliability of voice spectrographic analysis.

A 1970 survey by Richard Bolt concluded that there is no scientific foundation to support the assumption, critical to the voice speetrographic analysis method, that intraspeaker variability can reliably be distinguished from interspeaker variability. (Government Exhibit D-l; Docket Entry No. 166, p. 21, 1.24-p. 22, 1.7). Bolt and his coauthors reviewed several different studies of voice speetrographic analysis, concluding that “differences in [spectro-graphie] pattern, when the words are the same, may reflect differences of speaker or only normal variations in the utterances of a single speaker.” (Government’s Exhibit D-l, p. 602). The Bolt survey also concluded that voice identification cannot approach the accuracy of fingerprint identification, because fingerprints are unchanged throughout an individual’s lifetime, while spectrographs can vary each time they are produced due to distortions of frequency, energy, and time occurring during the transmission, recording, and analysis. (Id. at p. 600). In 1975, James Atkinson, a researcher at the Naval Underwater Systems Center, concluded that intraspeaker variability of the fundamental voice frequency was as great as interspeaker variability among members of the same sex. (Government’s Exhibit D-6). A monograph produced in 1979 by the National Academy of Sciences entitled “On the Theory and Practice of Voice Identification” agreed that voice spectrographs are not analogous to fingerprints in determining identity and that “the statistical relations between the intraspeaker variability and interspeaker variability have not been established.” (Government’s Exhibit D-7, p. 59).

No study since the 1970 Bolt report has laid a scientific foundation for the assumption that interspeaker variability is greater than intraspeaker variability. (Docket Entry No. 166, p. 22,1.8-1.12). The evidence necessary to validate the fundamental hypothesis of voice speetrographic analysis, that interspeaker variations are greater than intraspeaker variations, and can be reliably distinguished, is lacking.

Other factors also diminish the reliability of voice spectrograph analysis. Voice characteristics can and do change frequently, based on factors such as mood and health, as well as age. Voice characteristics can be disguised. A 1971 study by Endres, Bambach, and Flosser showed that the frequency position of formants and pitch shift to lower frequencies with age and that formants are difficult to detect when speakers are attempting to disguise their voices. (Government’s Exhibit D-2). Nakasone testified that when a speaker disguises his or her voice, the mouth muscles have a different tension that produces a different spectrogram than if the speaker spoke naturally. (Docket Entry No. 166, p. 26, 1.10-1.18).

Research has also shown that the context in which words are spoken affects the reliability of the voice speetrographic method of speaker identification. A 1973 study by Barry Hazen compared spectro-grams of a word spoken by an unknown speaker with sample spectrograms of the same word spoken in a different context by known speakers. Subjects were asked to determine which of the known speakers’ spectrograms matched the unknown speaker’s spectrogram. The analysis was then repeated, except the unknown speaker and known speaker uttered the same words in the same context. Hazen determined that the identification error rate was approximately 12 percent where the known and unknown speakers spoke a specific word in the same context, but between 52 percent and 57 percent where the known and unknown speakers spoke the subject word in a different context. (Government’s Exhibit D-5, p. 656 and Table III). Hazen concluded that “contextual speech in general, as opposed to isolated speech, confounds identification;” that “the forensic value of sound spectrograms is quite limited;” and that “spontaneous speech itself decreases reliability.” (Id. at pp. 657, 659).

Aspects of voice spectrographic analysis are subjective, which further diminishes the repeatability and reliability of the results. Cain acknowledged that there are elements of subjectivity in reading spectro-graphs and in choosing the phrases to compare between the subject recording and the exemplar recording. (Docket Entry No. 160, p. 158, 1.15-1.22). The 1973 Bolt study found that different examiner panels disagreed over what constituted matching spectrograms, even when examining the same sets of spectrograms. (Government Exhibit D^4, p. 533).

The studies, by different researchers, performed over decades, show that the voice spectrographic technique has been tested and found wanting in aspects critical for admission under Rule 702. The studies emphasize the subjective nature of the voice spectrographic analysis, even when combined with an aural analysis component, which is subjective. Several variables, difficult to detect or control, affect the analysis. Although aspects of the voice spectrographic method have been subject to review in published studies, many of the studies conclude that voice spectrographic analysis is of questionable scientific validity as a method of identifying an unknown speaker.

2. Error Rates

The studies show that a number of factors affect the accuracy and reliability of voice spectrographic analysis. These factors include the recording conditions and recording quality; whether the recorded speech was spontaneous or read; and whether the subject is disguising his or her voice. In part because of this multitude of factors, error rates vary considerably among applications of the method and depend heavily on the protocol used by the analyst. The 1979 National Academy of Sciences report found that:

the degree of accuracy, and the corresponding error rates, of aural-visual voice identification var[ies] widely from case to case, depending upon several conditions including the properties of the voices involved, the conditions under which the voice samples were made, the characteristics of the equipment used, the skill of the examiner making the judgments, and the examiner’s knowledge about the case. Estimates of error rates now available pertain to only a few of the many combinations of conditions encountered in real-life situations.

(Government’s Exhibit D-7, p. 60). The report noted that error rates could accurately be determined only under controlled laboratory conditions. (Id. at p. 62). The Bolt survey noted the wide variability of error rates across the various studies performed before 1970. (Government’s Exhibit D-l, p. 603).

The 1973 study performed by Oscar Tosi at Michigan State University is often cited in support of the voice spectrographic method of analysis. Tosi selected 250 students as volunteers. Trained examiners were asked to identify an unknown speaker based only on a comparison of spectro-grams of these speakers’ voices with spec-trographs of known speakers’ voices. Tosi observed error rates of 6 percent false identifications and 13 percent false eliminations. (Government Exhibit D-3, p.2041). Tosi stated that had the examiners omitted the answers that they deemed “uncertain” from the error rate calculation, the error rates dropped to 2 percent false identifications and 5 percent false eliminations. (Id.). Bolt criticized the Tosi study on several grounds in a 1973 publication. Bolt and his coauthors noted that Tosi had failed to account for the changes in speech that occur depending on the emotional state of the speaker. (Government Exhibit D-4, p. 532). Bolt noted that in Tosi’s study, error rates increased to 16 percent when the words were embedded in random sentence contexts. (Id.). Bolt cited work done by Hazen in 1972 that indicated “high identification errors for words from conversation.” (Id. at p. 533). Both Bolt and Nakasone noted that the Tosi study was done under forensic conditions, and not real-world conditions, eliminating sources of error otherwise difficult to detect or eliminate. (Id.; Docket Entry No. 166, p. 24,1.11-1.25).

In 1986, the Federal Bureau of Investigation published a survey by Bruce Koenig of 2,000 cases applying voice spectrographic analysis to identify unknown speakers. The investigators rejected 65 percent of the recordings as too poor to permit voice spectrographic analysis. Voice spectro-graphic analysis was only applied to 35 percent of the recordings of unknown speakers. Koenig reported error rates of less than one percent for those cases. Since that study, other analysts have emphasized that it is misleading to cite the one percent error rate as a repeatable rate for voice spectrographic methods. Thomas Shipp, Thomas Doherty, and Harry Hollien criticized the Koenig study for not providing a complete description of the Federal Bureau of Investigation’s methods for applying voice spectrographic analysis. (Government’s Exhibit D-9). Nakasone testified that the process by which the Federal Bureau of Investigation determined the tapes that were of sufficiently high quality to apply voice spectrographic analysis was very conservative. (Docket Entry No. 166, p. 20, 1.14). Nakasone discovered after he came to the Federal Bureau of Investigation that the agency policy was to render no opinion if there was any evidence of voice disguise, distortions or reverberations, or other anomalies on the voice recording. (Id. at p. 20, 1.14-1.21). The Federal Bureau of Investigation minimized error rates in part by applying these stringent criteria. The one percent error rate Koenig observed at the Federal Bureau of Investigation has not been reliably repeated.

S. Whether Voice Spectrographic Analysis Has Been Generally Accepted in the Scientific Community

The parties submitted research articles representing over thirty years of research into and commentary on voice spectro-graphic analysis. These articles show that neither voice spectrography nor aural spectrographic analysis has been generally accepted as a method of identifying unknown recorded speakers. “Confusion still exists to this very day about the nature and merit of the ‘voiceprint/gram’ technique.” (Government Exhibit D-16, p. 133). Parsons wrote in 1986 that “identification by visual inspection of spectrograms is regarded with suspicion by a large segment of the speech community.” (Government Exhibit D-14, p. 343). Hollien noted that proponents of voice spectrographic analysis added aural comparisons to their method, creating the aural spectrographic method, to increase accuracy. Hollien stated that it was unclear what aural comparisons added to the reliability of spectro-graphic comparisons. (Government Exhibit D-16, p. 129). Hollien recognized that adding the admittedly subjective aural-perceptual approach to voice spectro-graphic analysis did not improve the results unless the examiner used a highly structured protocol and rigorously carried it out. (Id.). The use of voice spectro-graphic analysis as an identification tool has been limited by researchers’ persistent inability to formulate an accurate, reliable methodology. A 1993 article by Koenig showed that the number of practitioners of voice spectrographic analysis has dropped significantly since the 1970s, when Naka-sone began his work in the field. (Government’s Exhibit D-ll, p. 80). Koenig particularly noted that the Federal Bureau of Investigation adopted a policy to use voice spectrographic analysis only for investigative purposes and would not provide court testimony on spectrographic comparisons “due to the inconclusive nature of the examination and the unknown error rate under specific investigative conditions.” (Id.). That same article concluded that “the use of expert witnesses does not improve the accuracy of aural-only voice comparisons.” (Id.)

Nakasone testified in the May 9-10, 2003 hearing. Earlier in his career, he was a proponent of the voice spectrographic method. As late as 1988, Nakasone defended voice spectrographic analysis as a basis for courtroom identification. (Defendant’s Exhibit No. 1; Government’s Exhibit D-9). Nakasone testified in a 1989 federal case, U.S. v. Smith, 869 F.2d 348, 353-54 (7th Cir.1989), that the voice spec-trographic technique was reliable and had a low error rate. In Smith, Nakasone cited the 1970 Tosi study as showing that the error rate for false identifications was 2.4 percent and for false eliminations was 6 percent, and the 1986 Federal Bureau of Investigation report as showing a 0.31 percent false identification rate and a 0.53 percent false elimination rate. Id.

Nakasone testified that his initial belief that the voice spectrographic technique is sufficiently reliable for courtroom purposes has eroded over time, as research efforts have failed to support the underlying premises of the voice identification techniques or to produce reliable testing for error rates. Nakasone testified credibly that this failure is the basis of the Federal Bureau of Investigation’s approach to voice spectrographic analysis. The Federal Bureau of Investigation practice is to render no opinion based on the voice spectrographic method when the recordings at issue bear any evidence of voice disguise, distortion, background reverberations, or unclear speaking. Even with the lower error rates that result from such an approach, error rates and reliability still vary, in part based on the examiner’s threshold determination of the tape quality, which varied among different examiners. (Docket Entry No. 166, p. 20, 1.7-1.21). The Federal Bureau of Investigation does not permit the use of voice spectrographic analysis for courtroom identification, but only for investigation. (Id.).

Defendant has presented evidence showing, at best, that the validity of aural spec-trographic voice identification techniques are disputed by practitioners. (Defendant’s Exhibit 1). In a 1988 article on which defendant relies, the authors, including Nakasone, relied in part on the fact that 70 scientists approved of the technique if practiced by trained examiners following standards promulgated by the International Association for Identification (“IAI”). Nakasone testified that the IAI has since dismantled its voice identification practitioner certification board. Nakasone testified that he himself, once a proponent of voice spectrographic analysis, has now concluded that the technique is unreliable for several reasons, including the inconsistency in error rates across different studies.

The record before this court shows that the remaining proponents of the use of aural spectrographic voice identification for courtroom testimony are a handful of consultants who apply the techniques for the purpose of litigation. The proponents, including Cain, are not performing scientific research in aural spectrographic voice identification and testifying as experts as an aspect of their research work. Hollien criticized the voice spectrographic investigations of Smrkovski as “not research but rather the kind of inquiry that could be carried out by any curious layman given access to certain kinds of equipment.” (Government Exhibit D-16, p. 126). These consultants are certified to perform aural spectrographic analysis largely by organizations they created, such as the ABRE. (Docket Entry No. 166, p. 41, 1.7-1.14; Docket Entry No. 166, p. 42, 1.13-p. 43, 1.8). The LAI, an eighty-five-year-old organization in the field of forensic identification, has ceased certifying aural spectrographic examiners. (Docket Entry No. 166, p. 41, 1.22-p. 42, 1.5). The court’s responsibility as “gatekeeper” under Daubert “is to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho Tire, 526 U.S. at 152, 119 S.Ct. at 1176. Cain is employing techniques that are recognized as deficient by persons who practice as experts in the field outside of the courtroom.

General problems with the reliability of aural spectrographic voice identification raise considerable doubt as to whether these techniques meet the Daubert standard. The potential rate of error of the aural spectrographic method is unknown and may vary considerably, depending on the conditions of the particular application. The method has not been generally accepted by the scientific community. Peer review is increasingly difficult in a field in which there are a dwindling number of practitioners and “sparsely attended” professional meetings. (Government Exhibit 11 at p. 80). The Rule 702 indicia of reliability-whether the theory or technique has been tested; whether the theory or technique has been subjected to peer review and publication; the known or potential rate of error of the method used and the existence and maintenance of standards controlling the technique’s operation; and whether the theory or method has been generally accepted by the scientific community-are not satisfied.

D. Specific Problems of Cain’s Methodology

In addition to the problems with aural spectrographic voice identification generally, the record reveals several flaws in Cain’s application of the method in analyzing tape Q-l. The flaws provide additional grounds for excluding his testimony under Rule 702.

1. The Standards for Application of the Voice Spectrographic Identification Technique

The IAI and the ABRE provide some standards for examiners to follow in conducting voice spectrographic analysis. Those standards require an examiner to provide a list of words on the unknown recording and the exemplar recording that were usable for comparison purposes, whether those words are similar or dissimilar. (Docket Entry No. 167, p. 125, 1.20-p. 126,1.1). The ABRE Voice Comparison Standards state that “whenever possible, an impartial individual knowledgeable of the known speaker’s voice should be present to minimize attempts at disguise, changes in speech rate, adding or deleting accents, and other alterations.” (Defendant’s Exhibit 18, ¶ 3.1). Nakasone testified that the IAI protocols written in 1992 required the person taking an exemplar recording of a person’s voice to be familiar with that person’s voice. (Docket Entry-No. 167, p. 132, 1.4 — 1.9). Cain testified that having a person familiar with a subject’s voice take the exemplar is not required, but is helpful to determine whether the subject is speaking naturally or is disguising his or her voice. (Docket Entry No. 160, p. 178,1.8-1.10).

The ABRE and IAI standards require the exemplar to be taken under conditions similar to the original recording. The same type of microphone system should be used to make the subject and exemplar voice recordings whenever possible, and if the recording was made over a telephone, the exemplar should be recorded using telephone instruments and transmission lines. (Defendant’s Exhibit 18, ¶¶ 3.2.1, 3.2.3).

The ABRE standards also state that “ideally, the exemplar should be spoken in a manner that replicates the unknown speaker, to include speech rate, accent (whether real or feigned), hoarseness, or any abnormal vocal effect.” {Id. at ¶ 3.3.1). This requirement is directed to the concern identified by the Hazen study that spontaneous speech “confounds identification.” (Government Exhibit D-5, p. 657).

2. Cain’s Application of the Voice Spectrographic Identification Method

Cain used an enhanced version of the Q-1 recording provided by the government to conduct his most recent analysis of the voices on the tape. To conduct a voice spectrographic comparison of the voices on Q-l with a recording of a known voice, he had to make an exemplar recording of a known voice. The exemplar was recorded by a Harris County Police detective based on instructions from Cain. Cain did not instruct the Harris County District Attorney’s office, which took the exemplar of Robert Angleton, to have a third person familiar with Angleton’s voice present when the exemplar was recorded. (Docket Entry No. 160, p. 179, 1.1 — 1.4). Nakasone testified that to be “familiar” with an individual’s voice, a person must know the individual for a long time. (Docket Entry No. 167, p. 72, 1.6 — 1.9). Defendant has presented no evidence that the person who made the exemplar recording of Robert Angleton’s voice was familiar with Angle-ton’s voice to the extent necessary to determine if Angleton was disguising his voice.

Nor did Cain take steps to ensure that the environment in which the Q-l tape was recorded and the environment for recording the exemplar of Angleton’s voice were similar. On Q-l, the unknown speaker’s distance from the recording microphone varied over the course of the tape, while on the exemplar Angleton was close to the microphone. (Docket Entry No. 160, p. 163, 1.1-1.12). A background reverberation was present in the Q-l recording, but not in the exemplar, making Q-l of “marginal quality.” (Docket Entry No. 167, p. 69,1.25-p. 70,1.2). Cain instructed the person taking the Angleton exemplar to recite specific phrases and have Angle-ton repeat them in a natural, conversational voice. (Docket Entry No. 160, p. 138, 1.16-1.19). Nakasone, who had previously listened to the exemplar, testified that he believed the exemplar sounded as if it was read speech, not spontaneous speech, and lacked a “natural conversational tone.” (Docket Entry No. 167, 1.12-1.16). Cain admitted that “there were times [in the exemplar] where [Robert Angleton] is a bit monotone and he lacks the inflection or intonation that he would likely say.” (Docket Entry No. 160, p. 136, 1.6-1.10). Lonnie Smrkovski, who provided a second opinion of Cain’s findings, stated that he “found the majority of the questioned phrases lack sufficient information for comparison purposes due to insufficient quality.” (Defendant’s Exhibit 20). An examiner’s report should also contain a statement of accuracy, but Cain did not include a statement of accuracy in his report. (Docket Entry No. 167, p. 126, 1.7-1.9, p. 179, 1.10-1.14; Defendant’s Exhibit 18, ¶ 9.1). There was no evidence that Cain used similar recording equipment to make the exemplar as was used to make the Q-l tape.

The reliability of Cain’s application of the aural spectrographic method is further undermined by the changes in his conclusions over the course of the case, and by the differing second opinion from another voice spectrograph analyst, Lonnie Smrkovski. Cain provided his first voice spectrographic analysis for the Harris County District Attorney’s Office on July 1, 1998. Cain gave an opinion of “possible elimination,” stating that 80 percent of the comparable words on the Q-l recording and exemplar were different, with no fewer than ten words not matching. (Defendant’s Exhibit 14). On April 7, 2003, Cain stated that after using enhanced filtering techniques on the Q-l recording, a preliminary examination caused him to change his opinion to “probable elimination,” stating that 80 percent of the comparable words on the Q-l recording and exemplar were different, with no fewer than fifteen words not matching. (Defendant’s Exhibit 19). Smrkovski provided a second opinion differing with Cain’s conclusion, stating that it was “possible” that the speakers on the Q-l recording and the Angleton exemplar recording were different. (Defendant’s Exhibit 20). Smrkovski, however, qualified his conclusion by stating that he “found the majority of the questioned phrases lack sufficient information for comparison purposes due to insufficient quality.” (Id.). Cain acknowledged that it is not uncommon for voice spectrograph analysts reviewing the same recordings to have differing opinions as to the identity of the speakers on the recordings. (Docket Entry No. 160, p. 168,1.10-1.14).

Cain provided only a list of dissimilar words determined in the comparison of the Q-l recording and the exemplar recording in his report analyzing his results. He did not provide a list of the similar words he found in the analysis. (Defendant’s Exhibit 212). Cain’s list of dissimilar words showed the dissimilar words “sure, haven’t” on the spectrogram he numbered 11 in his analysis. (Id.). At the hearing, however, Cain stated that spectrogram 12, not spectrogram 11, was the spectrogram containing “sure, haven’t.” (Docket Entry No. 160, p. 195, 1.19-p. 196, 1.15; Defendant’s Exhibit 17). Cain’s basic documentation error further undermines the reliability of his aural spectrographic analysis.

Nakasone testified that the omissions and differences between the conditions under which Q-l and the exemplar were made increase the potential for error in Cain’s analysis. Nakasone’s testimony is supported by the National Academy of Sciences report, which states that the conditions under which the voice samples were made and the characteristics of the equipment used are important factors in assessing the reliability of voice spectro-graphic analysis. (Government Exhibit D-7, p. 60).

Cain’s testimony is unreliable under Rule 702. He is applying a technique that, in general, lacks the reliability necessary for admission under Rule 702. His application of the technique was flawed by deficiencies in making the exemplar recording and in the comparison analysis with Q-l. Cain’s testimony does not meet the standards necessary for admission. It is properly excluded as unhelpful and confusing to the jury. Fed.R.Evid. 403, 702; U.S. v. Newman, 849 F.2d 156, 164-65 (5th Cir.1988); U.S. v. Schmidt, 711 F.2d 595, 599 (5th Cir.1983).

III. Conclusion

The testimony and evidence show that voice identification techniques using the aural spectrographic method are not widely accepted by the scientific community. The evidence and testimony show that there is great dispute among researchers and the few practitioners in the field over the accuracy and reliability of voice spec-trographic analysis to determine the identity of recorded speakers. The evidence also shows that error rates for voice spec-trographic techniques are unknown and vary widely depending on the conditions under which the analysis is made. The post-Daubert case law casts doubt on the reliability and admissibility of voice spectrograph analysis. The protocol Cain followed did not protect against several sources of error, further reducing the reliability of the voice spectrographic analysis conducted in this case. This court GRANTS the government’s motion to exclude the testimony of Stephen Cain. 
      
      . In Kumho Tire Co. v. Carmichael, 526 U.S. 137, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999), the Supreme Court held that Rule 702 and the Daubert principles apply to all expert testimony, not merely scientific subject matters.
     
      
      . The categories of voice identification are as follows:
      (1) Identification-at least 90 percent of all the comparable words must be very similar aurally and spectrally, producing not less than 20 matching words, where each word has at least three usable formants;
      (2) Probable Identification-at least 80 percent of the comparable words must be very similar aurally and spectrally, producing, not less, than 15 matching words, where each word has at least two usable formants;
      (3) Possible Identification-at least 80 percent of the comparable words must be very similar aurally and spectrally, producing not less than 10 matching words, where each word has at least two usable formants;
      (4) Inconclusive-falls below either the Possible Identification or Possible Elimination confidence levels and/or the examiner does not believe that a meaningful decision is obtainable due to various limiting factors;
      (5) Possible Elimination-at least 80 percent of the comparable words must be very dissimilar aurally and spectrally, producing not less than 10 words that do not match, where each word has at least two usable formants;
      (6) Probable Elimination-at least 80 percent of the comparable words must be very dissimilar aurally and spectrally, producing not less than 15 words that do not match, where each word has at least two usable formants;
      (7) Elimination-at least 90 percent of all the comparable words must be very dissimilar aurally and spectrally, producing not less than 20 words that do not match, where each word has at least three usable formants.
      (Defendant’s Exhibit 18, ¶¶ 7.3.1-7.3.7). A formant is a band of acoustic energy produced by spoken vowels and resonant consonants, which appear on a spectrogram and can be visually compared by an examiner. (Defendant's Exhibit 18, ¶ 7.1.5(a)).
     
      
      . The ABRE is a part of the American College of Forensic Examiners ("ACFE”). Cain and Nakasone testified that the ABRE was formed after a dispute arose among the members of the voice identification board of the International Association of Identification ("IAI”) over the standards for aural spectrographic analysis. (Docket Entry No. 160. p. 113, 1.25-p. 114, 1.4). Nakasone testified that the group that left the IAI to form the ABRE felt the IAI's standards for voice identification were too stringent, because they required examiners to obtain a second opinion and to include a statement of accuracy in their reports. (Docket Entry No. 166, p. 42, 1.13-p. 43, 1.8). Nakasone testified that the IAI ceased certifying voice identification examiners in 1999 and ceased all voice identification activity in December 2002. (Id. at p. 41, 1.22-p. 42, 1.5).
     
      
      . In "closed” trials, the examiner knows that a spectrograph of the unknown speaker was among the collection of spectrographs of known speakers. In “open” trials, the examiner does not know whether a spectrograph of the unknown speaker was included in the list of the spectrographs of the known speakers.
     