
    THE STATE OF NEW JERSEY v. PAUL GORDON CARY, DEFENDANT.
    Superior Court of New Jersey Law Division—Criminal
    Decided February 9, 1968.
    
      
      Mr. Michael Diamond, Assistant Prosecutor, for the State (Mr. Leo Kaplowitz, Union County Prosecutor).
    
      Mr. Oscar F. Laurie, for defendant (Mr. Howard Schwartz on the brief).
   Barger, J. S. C.

This cause has been remanded for a pretrial hearing to determine whether the voiceprint technique in voice identification and the equipment producing the print have sufficient scientific acceptance whereby they produce uniform and reasonably reliable results and will contribute materially to the ascertainment of the truth, and as such is admissible as evidence. State v. Cary, 49 N. J. 343, 352 (1967).

Spectrogram voice identification is a recently developed technique that allegedly is able to identify a person from graphic representations of his voice. If reliable as an identification tool it will have enormous potential as a forensic aid.

The sound spectrograph with which we are here concerned was first developed at Bell Laboratories in this State about 1941. The instrument produces a permanent spectrogram, which is a graphic display of complex signals. The spectrograph is a basic research instrument used in many laboratories for research studies of sound, music and speech.

Voiceprint identification is the method by which a person can be identified from a speetographic examination of his taped voice, the spectrograph being capable of reproducing graphic impressions from tapes of human utterances. Specifically, ten frequently used cue words are normally involved : the, to, and, me, on, is, you, I, it, a. When the source material from which the voice is to be identified is contextual, these specific cue words are excerpted and compared with previously recorded voiceprints of or containing the same cue words.

The basic principle underlying the use of the voiceprint method is that whenever a sound is uttered, an energy output is required to transform it into an intelligible word. This energy output is electronically recorded on a sound spectrograph from the tape recording in the one-tenth of a second that it takes to utter the sound, and is thereafter transferred by the spectograph into a “contour” or “bar” print. The print is a visual representation of the utterance. The voiceprint can then be used for comparison and identification purposes.

There áre two basic types of voiceprints: (a) “bar” and (b) “contour.” Both types may be the result of a person uttering a cue word or 'other words as taped. The “bar” voiceprint shows the resonance bars of the person’s voice. The pattern of the bars determines what word is being said. In addition thereto, the voiceprint has dimensions of time (plotted from left to right, i. e., the beginning of the word is at the left and the end is at the right); the frequency is plotted along the vertióle axis (the lower pitch of sound appears at bottom and higher pitch- toward the top); and the loudness is ascertained by examining the blackness of the printing (the darker lines of the bar represent greater intensity of sound at each frequency for a particular time).

The “contour” voiceprint is identical with the “bar” print with regard to time and frequency measurements. The level of loudness, however, differs somewhat from that of the “bar” print. The various contours or “peaks” indicate the changes in intensity of sound at each frequency for a particular time. It has been suggested by an expert in the field of voiceprinting that it is easier to detect patterns of the “bar” voiceprint, but that the “contour” voiceprint is more easily anafyzed and is also more easily reproduced in print.

The inquiry in this area is whether identification by a voiceprint has the claimed validity and, if so, to what degree ? In other words, having several voiceprints, among which there are two made by the same person, can a trained individual, by reading, comparing and analyzing the spectrogram, determine with a high degree of certainty which of those voiceprints are of the same person; or having a known print to be compared with an unknown print, can it be reliably determined that it is or is not the voice of the same person; or having a pre-identified print that is to be compared with an unidentified print, can it be reliably determined that the two prints are or are not voiceprints of the same person’s voice? Can the utterances of two or more persons produce the same print? Is the technique generally recognized by the scientific community involved?

It is contended that the voiceprint technique (spectrograph producing a spectrogram) is not affected by either the physiological or emotional conditions of the speaker. The emotional element is often present when the polygraph is employed and is one of the major complaints of the techique. It is argued, therefore, that the voiceprint technique is based primarily upon fixed and constant existing physiological mechanisms such as the vocal cavities and articulators. The major cavities affecting speech are the throat, nasals and the two oral cavities formed by positioning the tongue. The articulators include the lips, teeth, tongue, soft palate and jaw muscles. It is contended that one starts to form a speech pattern and uniqueness in infancy. Whereas the subject’s body may undergo certain changes in responding to certain questions using the polygraph, which changes are capable of affecting the graphic recording and interpretation, it is claimed that the voiceprint technique is not so affected and the speaker has no ability to limit the efficiency of the process. The particular mechanical process which generates the impulses is not capable of change even though the person’s emotional mood may change. In other words, as opposed to the polygraph results, the spectrograph voiceprint, it is contended, will be accurate regardless of any act or emotion on the part of the speaker.

It is said that the reason the method is efficient and reliable in producing the spectrogram is because it does not check sound or pitch of the voice over which the person speaking or sought to be identified may have some control, but rather merely records the impulses which are created by the aforementioned vocal cavities and articulators of the speaker. These impulses retain their characteristics even if the voice itself is impaired, i.e., by laryngitis or head cold. Another apparent advantage of the voiceprint technique is that a person with reasonable technical skill can in a short time be trained to compare and interpret the spectrogram, as with the comparison and interpretation of fingerprints.

It is contended that even a voice mimic or impersonator will not be able to prevent proper identification when the spectrograph is employed; that basic identifiable features in a person's voiceprint will not be altered by disguising the voice either by whispering, holding the nose, or muffling the voice. It is further contended that "each voice is uniquely different enough to make it identifiable with the same accuracy that fingerprint identification enjoys."

Lawrence G-. Kersta testified for the State. He is an elec-trical engineer and physicist and for many years was em-ployed by the Bell Telephone Laboratories, retiring in 1966. He established the Voiceprint Laboratories at Somerville, N. J., and is referred to as the innovator of the technique. Initially, he was engaged in research which was concerned with a faster means of transmitting communication circuit information and in research relating to the coding of speech for various types of speech coding systems in the field of speetrography. During the course of his research he observed that spectrograms indicated a similarity when those of the same person’s voice were compared. About 1960 he com-menced conducting various experiments and tests in this field of voice identification, initially using colleagues known to him in the laboratories, and later about 16,000 spectro-grams of the speech of about 123 subject people were made as a controlled speech population. The spectrograms were made from magnetic tape recordings and screened for ob-servable speech characteristics so that they could not be easily identified without the use of the spectrogram. The ten cue words mentioned were used, being the words most frequently used in English conversation and on the telephone. Thereafter, certain high school girls were used as panelists, after comparison and identification training comprising about 40 hours. The training consisted principally of being taught to recognize speech characteristics and similarities from spectrograms. In all, about 12 were used, and the panelists were able to identify better than 97% of the speakers.

Kersta has written many papers and articles on the technique. As the result of his research and experience in the field he is definitely of the opinion that each person’s voice has unique characteristics and that no two people have similar voice characteristics which would produce similar voice-prints if properly compared and analyzed, and that the technique has the accuracy of fingerprint identification. However, his opinion is qualified by the fact that the technique is new and not of the same maturity as fingerprint identification.

Dr. Oscar I. Tosí, a professor teaching experimental phonetics, analysis of sound, mathematics for speech science and related subjects at Michigan State University, was a witness for the State. Dr. Tosi became interested in this type of voice identification about January 1967, when he was called upon by the Michigan State Police Department for an opinion as to its reliability. With two members of the State Police Department of Michigan he went to the Voice-print Laboratories at Somerville, N. J. and spent a period of time there observing the technique, and with the officers engaged in comparing and identifying voiceprints furnished by the laboratory. The technique impressed him and, as a result, he has recommended to the Department of Justice of Michigan a two-year experimentation project to engage in further experimentation and testing of the technique to determine its reliability. He was of the opinion that the technique has considerable potential as an aid to law enforcement, but before he would give a firm scientific opinion he felt that further experimentation and testing was required because 'of its infancy in the related scientific fields.

Dr. Louis J. Gerstman, an associate professor of psychology and speech at Queens College, City University of New York, was a witness for defendant. He also was previously employed by the Bell Telephone Laboratories. He was of the opinion that identification by the voiceprint technique does not at present have a sufficient scientific proven basis to indicate and support an opinion that it produces uniform and reasonably reliable results.

Dr. Peter N. Ladefoged, a professor of phonetics at the University of California, Los Angeles, was a witness for defendant. As a result of his experience and some limited testing of the technique, he was of the opinion that he could not scientifically say that the technique was scientifically reliable, and said that it had not been generally accepted scientifically. He experienced difficulty in his research in determining what aspects of the spectrogram uniquely conveyed the knowledge that a certain person was speaking and what aspects conveyed the volume quality, and he could not separate these two things; the difficulty was in being able to identify what speaker and identifying what vowel, so that reliable identification was not reliably possible. In his opinion, the technique of identification by voiceprint has not been adequately proven either by experimentation or testing in order to be scientifically accepted.

Part of the evidence was a test performed at the request of the assistant prosecutor handling this matter, by arrangement with an officer of a banking institution. Eive employees of the bank whose identity was unknown to the assistant prosecutor or the laboratory called the laboratory by phone and their voices were tape recorded. Each speaker repeated selected words as follows: “I am participating in a voice-print experiment that has been requested by the Prosecutor of Union County. My wife’s name is ...................” The same persons again, but in a different order, repeated these words. The tapes were run through the spectrograph and the voiceprints compared and analyzed by a trained employee of Voiceprint Laboratories. The identification was accurate as to the order in which the speakers spoke. This matching experiment indicates to a limited degree the identification potential of this technique, and it is through this type of experiment and enlargements thereof that general scientific acceptance may occur.

There was offered in evidence by defendant some 39 letters from various persons engaged in varying degrees in the science of speech sounds, their production and transcription. These letters are answers to an inquiry sent by defendant’s witness, Dr. Ladefoged, who sought opinions as to the scientific acceptance and reliability of the voice-print technique under consideration. The letters are not admissible to prove the opinions of the writers thereof as they are hearsay, Rules of Evidence, Rule 63, and do not fall within any of the exceptions to the rule. However, the court concludes that the letters do have a limited value as evidence. They have relevancy in indicating that there is an existing controversy in the related scientific fields concerned as to the lack of scientific acceptance of the reliability of the technique. The letters are therefore received in evidence for such limited purpose indicating that considerable scientific controversy does in fact exist. The other evidence referred to also indicates that at this time there is a lack of general scientific acceptance as to accuracy of the voiceprint technique. Rules of Evidence, Rule 6.

As indicated in Cary, supra, 49 N. J. 343 and State v. Walker, 37 N. J. 208, 215 (1962), when scientific aids to the discovery of the truth received general recognition scientifically as to their accuracy, courts do not hesitate to take judicial notice of this fact and admit evidence obtained through their use. Several such known and accepted aids are specifically referred to in Walker, at page 215. As pointed out, the polygraph, or as it is commonly referred to, Hie detector,” has not as yet attained scientific acceptance as a reliable means of ascertaining truth. It has never been established by adequate evidence that lying produces a reaction which can be accurately measured. Any techniques that have been developed appear to enhance the possibility of error from interpretation, emotional factors also being involved.

The legal criterion of “general scientific acceptance as a reliable means of ascertaining the truth” before judicial notice can be taken of the technique or aid involved, permitting its admissibility as evidence, accords with the standards set by almost all of the courts in this country that have passed on the issue, and most of them have. It is not for the law to experiment but for science to do so. It is generally known that the polygraph is widely used in police investigations, and that may eventually be the limit of its value and use as a scientific aid in law enforcement. As an example, it has not attained the required scientific status although it has been in use since about 1923. State v. Arnwine, 67 N. J. Super. 483, 493 (App. Div. 1961). All scientific aids and devices go through an experimental and testing stage, and during these stages there may be considerable scientific controversy. During this period of controversy over the technique and aid involved, the danger is that a trial actually may result in the trial of the technique rather than the trial of the issues involved in the case, if some less exacting rule is substituted for the time-honored rule of general scientific acceptance, realizing that there may, even after general acceptance, always be some lesser degree of doubt which time may or may not clarify. United States v. Wright, hereinafter cited, is an example.

The court is satisfied that the spectrograph as a machine for producing the spectrogram is an efficient and accurate piece of equipment which produces an accurate spectrogram from the tape recording. Whether the spectrogram produced thereby contains all of the detail required in order to provide a reasonably certain individualistic identification appears also to be scientifically questioned at this time. How long it may take to reach an ultimate conclusion scientifically as to the dependability of the technique one cannot say. In our society today scientific research is of such breadth and depth and moving at such a fast pace, that it would be unwise to predict. It is evident that this type of identification technique has been brought to the attention of scientists and the public, particularly law enforcement authorities, so that one can conclude that in the immediate future greater experimentation and testing will occur. One such contemplated experiment is referred to herein. This technique has not, however, as of this date attained such degree of scientific acceptance and reliability as to be acceptable as evidence. The need for a high degree of scientific acceptance, and particularly reliability, is vital when a criminal case is involved where the individual’s freedom or, in fact, his life may be at stake.

It is noted that this type of evidence has recently been offered and received in trials in several states. To date its admissibility has not undergone appellate review except in one case, United States v. Wright, U. S. C. M. A. (1967), affirming its admissibility with a dissent citing Gary, supra. Also, there is pending in the State of California an appeal in State v. King, an arson conviction resulting from the Watts civil disturbances in 1965. This court is informed that the appeal involves a constitutional question and that the appellate tribunal may, as a result of resolving that paramount issue, never reach the question of whether the voiceprint identification evidence was properly admitted at the trial level. It is also noted that in many of the cases in which such evidence was received, its general acceptance scientifically was really not raised as an issue since there was no opposing evidence on the part of the defendant.

As indicated, before a court can take judicial notice of a scientific process there must be general scientific acceptance of its reliability in the field concerned. We have here only the testimony of the witness Kersta as to its scientific acceptance and accuracy, which is the situation referred to in Gary, supra. It is simply the opinion of one expert in a scientific matter, without the required proof of general scientific acceptance. It may be that even after general scientific acceptance there will be in a given case differences of opinion on the part of experts as to the comparison and identification of the spectrogram. This is not true generally in fingerprinting, but it is true in handwriting and ballistics. After scientific acceptance any controversy usually relates to the accuracy of the comparison and identification, but not as to the scientific reliability and acceptance of the technique or aid involved. Just when a scientific principle or discovery passes from the experimental to the demonstrable stage is hard to define. There is a twilight zone beyond which the principle involved in the discovery must reach before it can be acceptable to the courts, but it can be said that it must be sufficiently established to have gained general acceptance in the particular field in which it belongs. The sole evidence in this case which meets this test is the opinion of the one who is apparently the innovator of the technique and who claims that it is virtually infallible in producing voice identification. As indicated, this is the very situation mentioned in Gary, supra. All of the other evidence indicates, not that it is not accurate and reliable, but rather that it is just too early to tell and at this time lacks the required scientific acceptance.

For the reasons indicated, this court finds that even though a tape recording of this defendant’s voice was made of the words desired and authorized under the order of this court, constitutionally affirmed in Gary, supra, and even though such was compared with the tape recording of the voice message recorded at police headquarters, any identification opinion resulting therefrom would not, as of this time, be admissible as evidence in this case. 
      
       “High-Speed Sound Spectrograph” by A. J. Presti, Bell Telephone Laboratories, Inc., Murray Hill, N. J., published in The Journal of the Acoustical Society of America, vol. 40, No. 3, pp. 628-634 (September 1966).
     
      
       “Spectrogram Voice Identification”, 19 Am. Jur:, Proof of Facts, p. 423 (1967).
     
      
       Kersta, “Speaker Recognition and Identification by Voiceprints”, 40 Conn. B. J. 586 (1966).
     