
    UNITED STATES of America, v. Charles O. SHONUBI, Defendant.
    No. CR 92-0007.
    United States District Court, Eastern District of New York.
    Aug. 4, 1995.
    
      Zachary W. Carter, United States Attorney, Eastern District of New York, Brooklyn, NY (Karen A. Popp, Assistant U.S. Attorney, of counsel), for U.S.
    David G. Secular, New York City, for defendant.
   AMENDED MEMORANDUM AND JUDGMENT

WEINSTEIN, Senior District Judge.

CONTENTS

I. Introduction.464

II. Facts.465

III. Procedural History.466

A. Aborted plea bargain.466

B. Trial.466

C. Sentencing.466

D. Appeal.467

E. Proceedings on remand utilizing Rule 706 of the Federal Rules of

F. Summary of arguments on remand-OO O

G. Statement of the issue before the court 05 <£>

IV. Sentencing Law.470

A. Burden of proof.470

1. Generally.470

2. Burden of proof at sentencing.471

B. Operation of the Guidelines.472

1. Drug cases.473

2. Non-drug cases.474

C. Caselaw on estimation and specific evidence.475

1. Estimation.475

2. Estimates based on extrapolation.477

3. Specific evidence.478

V. Available Sources of Information.479

A. Background knowledge .479

B. Knowledge of the drug trade.480

C. Demeanor.480

1. How judges use demeanor.481

2. Demeanor and appellate review.481

D. Assumptions about criminal behavior.481

VI. How Decision-Makers Learn and Decide. 00 CO

A. Inferences based on prior information and training £8. 00 CO

B. Methods of reaching conclusions. 00 CO

1. Classical step-by-step analysis. 00 CO

2. Bayesian and statistical analysis. 00

3. Biases. 00 05

4. Storytelling. 00

VIL Application of Law to Facts on Original Sentence.488

A. Evidence from trial .488

B. Demeanor and character.489

C. Knowledge of the drug trade: the trip effect .490

D. Storytelling analysis.491

VIII. Desirability of Further Analysis.492

IX. The General Federal Rule Favoring Admissibility Of and Reliance on All Helpful Evidence.492

A. Mechanistic rules versus flexible general principles.493

B. Development of twentieth century conceptions of evidence codes.494

C. Model Code of Evidence.496

D. Federal Rules of Evidence.497

E. Recent developments.499

X. Additional Material Available to Sentencing Judge On Remand

A.Experts’ reports .

1. Government expert.

2. Defense expert.

3. Rule 706 Panel.

a. Use of fictions.

b. Problems with government’s assumptions .

c. Comments on defendant’s report.

d. Non-statistical analysis. ©> LO

e. Statistical analysis. ©■ LO

f. Simulations accounting for trip effect.... <© LO

g. Conclusion . T — l LO

B. Survey of the Eastern District bench. tH 1C

C. Testimony on economics of heroin smuggling_ 1 — I )lO

XI. Law Applicable to Statistical and Other Information Supplied After Remand CO i — i U5

A. Admissibility of probabilistic evidence. CO i“H

B. Use of bare statistics. © tH

1. Generally. © t*H

2. Criminal cases. 00 tH lO

XII. Application of Law to Facts After Remand. co H iO

A. Conclusions about experts’ reports.-.. ca H lO

B. Random versus non-random sampling. © CJ ©

C. Use of statistics to illustrate non-statistical decision-making. h (M ©

D. Conclusions on proper role of statistics in this ease. co (M ©

E. Cross-checking. co (M ©

F. Conclusion in light of statistics and other information provided on remand. Cn 52

XIII. Obstruction of Justice Enhancement. cn to iN

A. Purposes of sentencing and of § 3C1.1. cn to

B. Double-counting. cn to cn

C. Discretion to enhance under Dunnigan. cn to ©

D. Unconstitutionally of automatic enhancement. cn to -q

E. Particularized finding of perjury. cn to

F. Conclusion. cn to OO

XIV. Additional Sentencing Considerations. cn to oo

A. “Penalty” for going to trial. Cn to oo

B. Sentencing within prescribed Guidelines range. cn to ©

C. Added time in prison required by Guidelines system cn co ©

XV. Conclusion. © CO no

I. Introduction

The defendant was caught entering the country with 427.4 grams of heroin in his digestive tract. It is believed that he made seven prior drug smuggling trips, but not how much he carried on those trips. The court is required, under the Sentencing Guidelines, to estimate the total quantity of heroin imported.

At the original sentence, the court multiplied 427.4 by eight, arriving at a total of 3419.2 grams. The sentence was 151 months in prison, the low end of the Guidelines range for importation of 3,000 grams of heroin or more. See United States v. Shonubi, 802 F.Supp. 859 (E.D.N.Y.1992) [Shonubi 7], conviction affirmed, sentence reversed, 998 F.2d 84 (2d Cir.1993) [Shonubi III

The court of appeals rejected this solution and remanded. On reconsideration, the trial court now concludes that the defendant should be sentenced for importing between 1,000 and 3,000 grams of heroin. For reasons indicated in Parts XIII, XIV, and XV, infra, the term of imprisonment is unchanged.

This memorandum is largely devoted to explaining how a sentencing judge — and a trier of fact generally — reaches a decision. The case presents an opportunity to observe, explain, and discuss forensic decision-making. The absence of exclusionary rules of evidence at sentencing, the availability of statistics against which to “check” non-statistical proof, and the assistance of skilled experts have permitted the court to examine the decision-making process more fully than circumstances usually permit.

It should come as no surprise that in addition to rational analysis, the forensic factfin-der depends upon assumptions and methods of thinking that may introduce biases and errors. Articulating the assumptions used in this case may provide valuable perspectives on decision-making practice and theory.

Because the sentencing judge found the problems raised by this case difficult, he issued a draft memorandum before resen-tencing and circulated it for comment. The parties and a number of scholars responded. These responses are on file with the court. The sentencing judge also posed questions to his colleagues on the court; their responses are described below. The final memorandum benefits from all of this assistance. While it generally follows the earlier draft, it includes a number of revisions.

The memorandum is written with great diffidence since those of us involved in the daily quest for the truth in the courtroom are out of close touch with those academics endeavoring to understand and explain decisionmaking. See, e.g., Ronald J. Allen, David J. Balding, Peter Donnelly, Richard Friedman, David H. Kaye, Lewis Henry LaRue, Roger C. Park, Bernard Robertson & Alexander Stein, Probability and Proof in State v. Skipper: An Internet Exchange, 35 Jurimetrics Journal 277 (1995) [hereinafter, Ronald J. Allen et al., An Internet Exchange ]. The separation between courts on the one hand and scientists, including statisticians and students of the human mind and emotions, on the other needs to be reduced by such projects as those of the Carnegie Foundation and the Federal Judicial Center described below. How to make available to triers, in a way they can understand, the work of scientists remains a central task of science and the law.

II. Facts

Charles 0. Shonubi, a 34-year-old Nigerian citizen, lived in New Jersey while studying architecture and working as a toll collector at the George Washington Bridge. His salary was $12,000 a year. On December 10, 1991, Shonubi flew from Lagos, Nigeria, to Amsterdam, and then on to New York.

A customs service officer at John F. Kennedy Airport noticed Shonubi near a baggage carousel; he was “turning rapidly” and “scanning the customs area” rather than looking for his luggage. The officer examined Shonubi’s passport and then questioned him about the frequent trips to Nigeria indicated on the passport. The defendant’s answers were contradictory and confusing. For example, Shonubi said the purpose of his last trip was to get married. Minutes later, he said the object was to visit his ailing mother. Reminded of his previous statement, he told the officer that he had forgotten that he had just said that he had gone to Nigeria to get married. Shonubi was “visibly nervous, avoided eye contact, and was wringing his hands”; he was “trembling and sweating.” (The interesting question of what “avoiding eye contact” means in different cultures is not at this stage of the case a matter of concern.)

In response to the agent’s request, Shonu-bi consented to an X-ray. Taken to a “search room,” he was patted down, read his Miranda rights, and handcuffed. A search of his person turned up a slip of paper bearing the name of a Nigerian customs official. An X-ray revealed a number of foreign bodies in his intestine.

Shonubi was escorted to a trailer designed for the observation of passengers suspected of carrying drugs internally. The trailer has two levels. Material expelled into a toilet on the upper level drops into a holding tank, where agents on the lower level can observe it. Shonubi passed a total of 103 balloons over two days.

A forensic chemist found that four of the balloons, selected at random, contained a heroin mixture. The mixture represented 60.49 percent of the gross weight of the four balloons. Cf. United States v. Esieke, 940 F.2d 29, 32 (2d Cir.1991) (weight of heroin mixture “is usually several hundred grams lower than the gross weight [of heroin plus balloons]”).

Multiplying the average weight of the heroin mixture in the four tested balloons by 103, the chemist arrived at an aggregate weight of 427.4 grams. Chemical testing established that the purity of the heroin mixture in the four balloons was 53 percent.

A customs service agent interviewed Sho-nubi while he was in the tráiler. When she asked him where he had obtained the narcotics, Shonubi answered that he had bought them from “an ordinary man on the street.” She asked him what he was supposed to do with the drugs, and he answered that they were his, and that he “wasn’t giving them to anyone else.”

III. Procedural History

A. Aborted plea bargain

The defendant attempted to plea bargain. However, during a plea allocution before the chief magistrate judge, he alternately admitted and denied knowledge of the drugs found in his system. The allocution was aborted. Had Shonubi accepted the plea agreement, it is likely — given the disposition of similar cases in this district — that he would have been sentenced to a maximum of thirty-six months in prison. Instead, he elected to go to trial.

B. Trial

At trial, two customs agents described Shonubi’s behavior at Kennedy Airport. Another customs agent testified that 427.4 grams of heroin, at 53 percent purity, would have a wholesale value of $44,000 and, if cut following the normal procedures of heroin distributors in New York, would produce 20,000 “hits.” That the heroin had to be diluted was, according to the agent, certain: ingestion of 53 percent heroin can be fatal. The heroin, he concluded, could not have been for Shonubi’s personal use. Cf. United States v. Martinez, 54 F.3d 1040, 1043 (2d Cir.1995) (jury properly inferred, “viewing the evidence in its totality,” that cocaine found on defendant was not for defendant’s own use).

The date stamps on Shonubi’s passport conflicted with his accounts of when he had traveled, as well as with employment records at the George Washington Bridge. Those records showed eight absences from work. Entries on Shonubi’s passport strongly supported the inference that eight trips had, in fact, been made, but that Shonubi had made some legs of the trips using a second passport.

Warned by his attorney not to testify, Sho-nubi — after visibly remonstrating with his attorney — chose to take the stand. He proved an effective witness for the prosecution.

Confronted with his passport and work records proving that he had made at least eight trips to Nigeria in the previous fifteen months, Shonubi nonetheless denied making more than four trips to Nigeria. He also denied traveling abroad except to see his family. He was unable to explain how he could afford eight round-trip tickets, costing a minimum of $900 each, while earning $12,-000 a year and paying college expenses. He also denied defecating in the medical van.

Shonubi was convicted by the jury of heroin importation, 21 U.S.C. §§ 952(a) and 960(b)(2)(A), and possession of a controlled substance with intent to distribute, 21 U.S.C. § 841(b)(1)(B).

C.Sentencing

At his sentencing hearing, Shonubi reiterated, under oath, that he had made only four trips to Nigeria and had not imported heroin on any of those trips. The sentencing judge found that Shonubi had lied throughout the trial and was lying at the sentencing proceeding. His fabrications included his statements about the number of trips he had made, about the purposes of those trips, about his employment history, about his use of multiple passports, and about events inside the customs service trailer.

Based on evidence at the trial and at sentencing, the judge found that the defendant had made a total of eight smuggling trips to Nigeria between September 1, 1990 and December 10, 1991. Multiplying the amount of heroin found on Shonubi by eight produced a total of 3,419.2 grams. Under the Sentencing Guidelines, possession of at least 3,000 but less than 10,000 grams of heroin is a “level 34” offense. See U.S.S.G. § 2D1.1 (1994) (drug quantity table).

The sentencing court rejected the defendant’s argument that he deserved a reduction for his “minor” role in the offense. See U.S.S.G. § 3B1.2. It held that a

preponderance of the evidence at his trial established that defendant imported large quantities of heroin on eight occasions, using two passports. Defendant was not the foolish, occasionally even unwitting, drag mule who frequently comes before this court. He regularly and systematically engaged in the business of narcotics trafficking between Nigeria and the United States. He deserves no leniency for being a “minor” or “minimal” criminal.

Shonubi I, 802 F.Supp. at 864.

The government contended that the defendant’s perjury at trial and sentencing constituted per se obstruction of justice, requiring enhancement under § 3C1.1 of the Sentencing Guidelines. The sentencing court held the enhancement provision should not be imposed whenever a defendant lies in court, but only in furtherance of the goals of sentencing enumerated by Congress at 18 U.S.C. § 3551 et seq. In this case, the defendant’s lies had led to the discovery of seven previously unknown smuggling trips. The sentencing judge observed:

The Guidelines’ predilection for incarceration should be satisfied by adding ten years for defendant’s exercising his right to a trial and to testify. Penalizing defendant additionally for what amounts to the same conduct (the maintenance of his own defense) would be inappropriate and cruel.

802 F.Supp. at 863.

The sentencing range for a level 34 offense, for an offender in defendant’s criminal history category—“I”—is 151-188 months. Shonubi was sentenced to 151 months, plus five years’ supervised release and a $100 assessment.

D. Appeal

Shonubi appealed from his sentence on two grounds. First, he argued that the government had failed to prove how much, if any, heroin he had imported on trips prior to the offense of conviction. Second, he contended that the court should have reduced his offense level by four points for his “minimal role in the offense.” U.S.S.G. § 3B1.2.

The government cross-appealed. It argued that the court was obligated, in light of Shonubi’s perjury, to impose a two-point obstruction of justice enhancement under § 3C1.1.

The court of appeals found that the record amply supported the trial court’s determination that Shonubi had made a total of eight related smuggling trips in 1990 and 1991.

The conflicting accounts in Shonubi’s passport and his work attendance records amply prove that he made eight trips to Nigeria on more than one passport. It may also be inferred that appellant imported heroin during each of these journeys because he used two passports, traveled frequently, avoided using direct flights ... and could not have afforded the air fare on his toll collector’s salary.

Shonubi II, 998 F.2d at 89. The court of appeals also found that, for sentencing purposes, the eight trips should be considered a single course of conduct, which the court defined as “an identifiable pattern of criminal conduct.” Id. In making this finding, the court considered “such factors as the nature of defendant’s acts and how frequently the ... acts have been repeated.” Id. (citing United States v. Perdomo, 927 F.2d 111, 115 (2d Cir.1991)). It also found that Shonubi was not a minor participant in the offense. In reaching this conclusion, the cpurt noted that “appellant. regularly engaged in drag smuggling_[and that his] pattern of travel ... suggests broad knowledge and culpability in the heroin trade.” Shonubi II, 998 F.2d at 90.

The court of appeals was not satisfied with the sentencing judge’s finding that Shonubi had imported more than 3,400 grams of heroin on his eight trips. It pointed to a lack of “specific evidence,” which it defined as “e.g., drag records, admissions or live testimony.” Id. at 89. It found that on the issue of quantity the evidence “does not constitute proof by the requisite preponderance of the evidence,” id. at 86, and that the sentencing court had relied on “surmise and conjecture.” Id. at 90.

Finally, the court of appeals held that the sentencing judge was required to impose the obstruction of justice enhancement once it had determined that Shonubi had lied under oath. In light of United States v. Dunnigan, — U.S. -, 113 S.Ct. 1111, 122 L.Ed.2d 445 (1993), decided after Shonubi’s initial sentence, the court held that enhancement is mandatory when a defendant lies about a material matter on the stand. Id. at 88.

The court of appeals vacated and remanded for resentencing.

E. Proceedings on remand utilizing Rule 706 of the Federal Rules of Evidence

On remand the government indicated that it would rely on statistical evidence to establish the total quantity of heroin imported by Shonubi. The defendant was authorized to retain an expert at government expense to respond.

To help the court evaluate the statistical evidence proffered by the parties’ experts, the court appointed a panel of experts. Federal Rule of Evidence 706 provides that “the court may appoint expert witnesses of its own selection.” Judges are encouraged to avail themselves of this procedure. See, e.g., Daubert v. Merrell Dow Pharmaceuticals, — U.S. -, -, 113 S.Ct. 2786, 2797-98, 125 L.Ed.2d 469 (1993) (Judges should be “mindful” of the Rule 706 power); see also Kenneth R. Foster et. al., Phantom Risk: Scientific Inference and the Law 437 (1994) (encouraging judges to appoint experts); The Evolving Role of Statistical Assessments as Evidence in the Courts 171 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989) [hereinafter Evolving Role ] (recommending “the increased use of court-appointment procedures”); cf. Science and Technology in Judicial Decision Making: Creating Opportunities and Meeting Challenges 16 (Report of the Carnegie Commission on Science, Technology, and Government, March 1993) (urging judges to “take an active role” in evaluating scientific proof); Federal Judicial Center, Reference Manual on Scientific Evidence 531-42 (1994) (describing uses of court-appointed experts) [hereinafter Reference Manual ].

Pursuant to Rule 706, both the defense and the government were given opportunities to respond to the report of the Rule 706 Panel. See generally Joe S. Cecil & Thomas E. Willging, “Court-Appointed Experts,” in Reference Manual, supra, at 527-73 (describing Rule 706 procedures). Cf. Edwin J. Wesely, Practical Workings of the Special Master, N.Y.L.J., Mar. 22, 1994, at 1 (recommending circulation of reports of special masters). The parties were also permitted to respond to the draft memorandum. Based on responses to that draft, the court requested the parties to provide additional information on the economics of heroin smuggling. This occurred at a final Fatico hearing described in Part X.C, infra.

F. Summary of arguments on remand

The government statistician, Dr. David Bo-yum, obtained customs service data on the amount of heroin recovered from 117 Nigerian heroin swallowers apprehended at John F. Kennedy Airport between September 1, 1990 and December 10,1991, the dates of the first and last trips indicated on Shonubi’s passport. Using this data, statistical methods, and a computer program, Dr. Boyum determined that it was “99 percent probable” that Shonubi had imported more than 2090.2 grams of heroin. This analysis, the government argued, satisfied its burden of proving that Shonubi had smuggled enough heroin to constitute a level 32 offense. It did not attempt to prove importation of at least 3,000 grams of heroin, the amount found by the court at the first sentence.

The defense, relying on an analysis by Professor Michael O. Finkelstein, argued that no statistics about other smugglers could satisfy the government’s burden of proof vis-a-vis Shonubi. It also disputed Dr. Boyum’s methodology. His calculations, it argued, failed to account for the likelihood that smugglers carry more heroin with each successive trip, a putative learning curve it termed the “trip effect.” Finally, it argued that smugglers of large amounts of heroin are more likely than smugglers of small amounts to be caught, suggesting that the DEA data overstate average quantities smuggled.

The Rule 706 Panel agreed with the defense that statistics about other smugglers, taken alone, could not satisfy the government’s burden of proof. The Panel, however, suggested a number of possible uses for the statistical evidence in conjunction with evidence already available to the court. The combination of statistical and non-statistical proof, the Panel suggested, would support a finding that Shonubi had imported, at a minimum, 1,000 grams of heroin on his eight trips.

The experts’ reports will be examined in detail in Part X.A, infra.

G. Statement of the issue before the court

As the Second Circuit Court of Appeals recently observed

“direct evidence” is described as evidence tending to show, without resort to inference, the existence of a fact in question... But ... “all knowledge is inferential.”

Tyler v. Bethlehem Steel Corp., 958 F.2d 1176, 1183 (2d Cir.1992) (quoting Visser v. Packer Engineering Assoc., 924 F.2d 655, 659 (7th Cir.1991) (en banc) (emphasis added)).

The relevance of any evidentiary fact depends on the premises adopted by the trier.

Whenever an item of evidence is offered as tending circumstantially — that is, inferentially — to establish a proposition the truth of which is at issue in a case, it is essential to articulate honestly and fully the inference or series of inferences invited. Each specific step of reasoning must invariably match a premise, usually unarticulated, which the judge judicially notices.

“Relevancy and Related Problems” in John H. Mansfield, Norman Abrams, Margaret E. Berger, et al., Cases and Materials on Evidence 1, 3 (8th ed. 1988) (emphasis added).

No rule prevents a judge or jury from “drawing an inference upon an inference.” Fenner v. General Motors Corp., 657 F.2d 647, 650 (5th Cir.1981). Without this ability

hardly a single trial could be adequately prosecuted. For example, on a charge of murder, the defendant’s gun is found discharged; from this we infer that he discharged it; and from this we infer that it was his bullet which struck and killed the deceased.... All departments of reasoning, all scientific work, every day’s life and every day’s trial, proceed upon such data.

1 Wigmore on Evidence § 41, at 435-36 (Peter Tillers rev. ed. 1983) (footnote omitted).

Estimation is a type of inference. As Professor Alan Wolf observed in his comparison of science and the law, both require “ ‘[a]p-proximations of many types.... Often it is necessary to make assumptions that are not testable ... simply to be able to proceed with a computation.’” See Eva H. Hanks, Michael E. Herz & Steven S. Nemerson, Elements of Law 505 (1994) (quoting Wolf). Under the Sentencing Guidelines, approximations are not only permitted but are frequently required. See infra Part IV.B.

One variety of estimation is extrapolation — broadly defined as the application of knowledge about an event to a series of similar events, using “inferences based on an assumed continuity, correspondence, or other parallelism.” Webster’s Third New Internay tional Dictionary 807 (1967). Extrapolation requires circumstances that permit the inference that known conduct will recur, or has recurred, under similar conditions. See, e.g., United States v. McMillen, 8 F.3d 1246 (7th Cir.1993) (using extrapolation to determine drug quantity under the Guidelines), cert. denied, — U.S. -, 114 S.Ct. 1649, 128 L.Ed.2d 368 (1994); United States v. Sklar, 920 F.2d 107 (1st Cir.1990) (same), described in Part IV.C, infra.

Extrapolation has been used by courts in a variety of contexts. In mass torts cases, extrapolation is required to predict the quantity and quality of future claims. See, e.g., In re Joint E. & S. Dist. Asbestos Litig. (Findley v. Falise), 878 F.Supp. 473 (E. & S.D.N.Y.1995) (“Courts cannot wait for the near certainty that may come with a post hoc perspective.”). For other examples of extrapolation in forensic decision-making, see, e.g., Integrated Circuits Unlimited v. E.F. Johnson Co., 691 F.Supp. 630 (E.D.N.Y.1988) (purchaser’s actions in rejecting shipment of radio parts, based on sampling of eight percent of shipped parts, was proper use of statistical methods) (citing James, Relevancy, Probability and the Law, 29 Calif.L.Rev. 689, 692 (1941)), rev’d on other grounds, 875 F.2d 1040 (2d Cir.1989); United States v. 449 Cases Containing Tomato Paste, 212 F.2d 567 (2d Cir.1954) (approving sampling of defective shipment as meeting burden of proof); In re Estate of Ferdinand E. Marcos Human Rights Litigation, MDL No. 840, Report of Special Master Sol Schreiber (Dec. 30, 1994) (depositions of 137 randomly selected claimants used to predict validity rates of 10,059 claims).

Extrapolation has already played a role in the instant case. Only four of the 103 balloons expelled by Shonubi were tested for heroin. Yet it is highly probable that all 103 balloons contained heroin mixture at a purity of roughly 53 percent. See People v. Argro, 37 N.Y.2d 929, 930, 379 N.Y.S.2d 840, 841, 342 N.E.2d 601, 602 (1975) (approving toxicologist’s procedure of choosing envelopes of heroin at random from a large batch to estimate total weight seized), approved, People v. Hill, 85 N.Y.2d 256, 624 N.Y.S.2d 79, 648 N.E.2d 455 (1995) (statistical sampling may be used to determine quantity of drugs). Sampling, as practiced in Hill, is common in drug cases. In the Second Circuit, where sentences are not reduced in accordance with reasonable belief, see, e.g., Fred A. Bernstein, The Denigration of Mens Rea in Drug Sentencing, 7 Fed.Sent.Rep. 121 (1994), sampling is often the basis of drug quantity determinations.

Conclusions based on extrapolation require assumptions about human behavior (i.e., no one carries sugar in balloons from Nigeria to New York) and about statistics (i.e., 4 balloons chosen at random out of 103, when there is no ostensible reason for variation, is a sufficient sample). Moreover, the trier’s sense of “story,” see infra Part VLB.4, suggests that the balloons were all filled from the same batch of heroin, using the same filling technique. This combination of thought processes permits the rejection as highly unlikely of any scenario in which the 103 balloons did not contain heroin in roughly similar amounts.

The same combination of beliefs about how people act in the real world and assessments of probability may permit a finding that each of Shonubi’s eight trips involved similar quantities of heroin. This is true despite the differences between a random sample — the choice of four out of 103 balloons — and the court’s non-random reliance on the eighth trip as a basis for conclusions about earlier offenses. This matter is discussed at length in Part XII.B, infra.

Understood this way, the question before the court is how likely it is, based on all available information, that the seven smuggling trips about which little is known were similar to the eighth, about which a great deal is known. The sentencing judge had previously concluded that the defendant’s first seven trips were substantially like the trip leading to his arrest. This memorandum explores the bases for that conclusion. See infra Parts V through XII.

IV. Sentencing Law

A. Burden of proof

1. Generally

In a judicial proceeding, “all the factfinder can acquire is a belief of what probably happened.” In re Winship, 397 U.S. 358, 370, 90 S.Ct. 1068, 1075-76, 25 L.Ed.2d 368 (1970) (Harlan, J., concurring) (emphasis in original). For that belief to become a decision, the court must utilize a burden of proof.

[A] standard of proof represents an attempt to instruct the factfinder concerning the degree of confidence our society thinks he should have in the correctness of factual conclusions.

Id.; see also, e.g., Ronald J. Allen, Burdens of Proof, Uncertainty, and Ambiguity in Modern Legal Discourse, 17 Harv.J.L. & Pub.Pol’y 627, 633 (1994) (burden of proof “provides the decision rule under uncertainty”).

Courts do not generally quantify burdens of proof as a matter of law. See John M. Maguire, James H. Chadbourn, et al., Cases and Materials on Evidence 871-73 (6th ed. 1973) (collecting literature on quantification of burdens of proof); see also Richard H. Gaskins, Burdens of Proof in Modern Discourse 20 (1992) (observing that implications of burdens of proof have been “neglected” in legal discourse). But see Terence Anderson & William Twining, Analysis of Evidence 338 (1991) (correlating verbal and mathematical measures of certainty and doubt). Despite the absence of explicit quantification,

the phrases ‘preponderance of the evidence’ and ‘proof beyond a reasonable doubt’ ... communicate to the finder of fact different notions concerning the degree of confidence he or she is expected to have in the correctness of [his or her] conclusions.

In re Winship, 397 U.S. at 370, 90 S.Ct. at 1076.

Judges — who deal with burdens of proof on a daily basis — are inclined to think of those burdens in probabilistic terms. A survey of judges in the Eastern District of New York found general; agreement that “a preponderance of the evidence” translates into 50 + percent probability. Eight judges estimated “clear and convincing” as between 60 and 70 percent probable (while two found this standard unquantifiable). Estimates for “beyond a reasonable doubt” ranged from 76 to 90 percent, with 85 percent the modal response. See United States v. Fatico, 458 F.Supp. 388, 410 (E.D.N.Y.1978) (chart), aff'd on other grounds, 603 F.2d 1053 (2d Cir.1979), cert. denied, 444 U.S. 1073, 100 S.Ct. 1018, 62 L.Ed.2d 755 (1980); C.M.A. McCauliff, Burdens of Proof: Degrees of Belief, Quanta of Evidence, or Constitutional Guarantees?, 35 Vand.L.Rev. 1293 (1982); Vern R. Walker, Direct Inference in the Lost Chance Cases: Factfinding Constraints Under Minimal Fairness to Parties, 23 Hofstra L.Rev. 247, 257 (1994) (“A number of courts have ... explained the meaning of preponderance ... using the quantitative terminology of mathematical probability.”). As Fatico demonstrated, there is a consensus among judges that burdens of proof can be stated in numerical terms. Moreover, with some variation, there is agreement as to what those numbers are.

2. Burden of proof at sentencing

At sentencing, a defendant is entitled to a full evidentiary hearing. See United States v. Fatico, 458 F.Supp. 388, 409-11 (E.D.N.Y.1978), aff'd, 603 F.2d 1053 (2d Cir.1979), cert. denied, 444 U.S. 1073, 100 S.Ct. 1018, 62 L.Ed.2d 755 (1980); United States v. Naugle, 879 F.Supp. 262 (E.D.N.Y.1995) (Appendix) (listing defendant’s rights at sentencing proceeding). The government must prove disputed facts by a preponderance of the evidence. See, e.g., United States v. Concepcion, 983 F.2d 369, 388 (2d Cir.1992), cert. denied, — U.S. -, 114 S.Ct. 163, 126 L.Ed.2d 124 (1993); see also U.S.S.G. § 6A1.3 (1994) (commentary) (endorsing preponderance standard).

A fact has been proven by a preponderance of the evidence, according to the court of appeals, if “ ‘the scales tip, however slightly, in favor of the party with th[e] burden of proof.’” Ostrowski v. Atlantic Mut. Ins. Cos., 968 F.2d 171, 187 (2d Cir.1992) (quoting Leonard B. Sand et al., Modern Federal Jury Instructions ¶ 73.01, at 73-4 (1992)). See also Herman & MacLean v. Huddleston, 459 U.S. 375, 390, 103 S.Ct. 683, 691-92, 74 L.Ed.2d 548 (1983) (preponderance standard allows “both parties to share the risk of error in roughly equal fashion”); McCormick, Evidence § 339, at 794 (2d ed. 1972) (preponderance standard requires proof that “the existence of the contested fact is more probable than its nonexistence”). Although these definitions derive from civil cases, the courts have not defined the standard differently in criminal litigation. See, e.g., United States v. Gigante, 39 F.3d 42, 47 (2d Cir.1994) (in sentencing, “preponderance standard is ... a tie-breaker dictating that when the evidence on an issue is evenly balanced, the party with the burden of proof loses”) (citing Sand’s civil instruction).

Commentators have argued that a preponderance standard provides the defendant with inadequate protection at sentencing under the Guidelines. See, e.g., Joseph P. Sargent, Comment, The Standard of Proof Under the Federal Sentencing Guidelines: Raising the Standard to Beyond a Reasonable Doubt, 28 Wake Forest L.Rev. 463 (1993); W. Crews Lott, Balancing the Burden of Proof and Relevant Conduct: At What Point Is Due Process Violated?, 45 Baylor L.Rev. 877 (1993); Richard Husseini, Note, The Federal Sentencing Guidelines: Adopting Clear and Convincing Evidence as the Burden of Proof, 57 U.Chi.L.Rev. 1387 (1990); Susan N. Herman, The Tail That Wagged the Dog: Bifurcated Fact-Finding Under the Federal Sentencing Guidelines and the Limits of Due Process, 66 S.Cal.L.Rev. 289, 309 (1966) (advocating burden higher than preponderance).

Several appellate courts have expressed concern about the burden of proof at sentencing. See, e.g., United States v. Restrepo, 946 F.2d 654, 659-60 (9th Cir.1991) (en banc) (suggesting that a clear and convincing standard be applied to facts that will dramatically increase a sentence), cert. denied, 503 U.S. 961, 112 S.Ct. 1564, 118 L.Ed.2d 211 (1992); United States v. Townley, 929 F.2d 365, 370 (8th Cir.1991) (same); see also United States v. Sepulveda, 15 F.3d 1161, 1198 (1st Cir.1993) (that “drug quantity has a dramatic leveraging effect [under the Guidelines] ... informs the preponderance standard, requiring that ... where uncertainty reigns, [district courts] must ‘err on the side of caution’ ”) (quoting United States v. Sklar, 920 F.2d 107, 113 (1st Cir.1990)), cert. denied, — U.S. -, 114 S.Ct. 2714, 129 L.Ed.2d 840 (1994). Cf. O’Neal v. McAninch, — U.S. -, -, 115 S.Ct. 992, 993, 130 L.Ed.2d 947 (1995) (rejecting the application of a civil burden of proof to a habeas action, although it is “technically” a civil proceeding, because “the errors being considered by a habeas court occurred in a criminal proceeding [where] ... someone’s custody, rather than mere civil liability, is at stake”) (emphasis in original).

In the Second Circuit, the Chief Judge has stated that “a strong argument can be made that the ‘clear and convincing evidence’ standard should be used, at least for substantial enhancements [under the Guidelines].” See United States v. Concepcion, 983 F.2d 369, 394 (2d Cir.1992) (Newman, J., concurring), cert. denied, — U.S. -, 114 S.Ct. 163, 126 L.Ed.2d 124 (1993). Another panel of the court of appeals has suggested that, at the very least, “there is a constitutional requirement of some rough proportionality between the weight of the evidence of ... uncharged conduct and the degree of adjustment or departure [to which that evidence leads].” United States v. Gigante, 39 F.3d 42, 47 (2d Cir.1994). Although it noted that “the preponderance test continues to govern,” id. at 48, the Gigante court suggested that some correspondence between the evidentiary standard and the consequence of the fact in question is warranted:

We find it difficult to believe that a district judge who had to resort to a tie-breaking rule to find facts justifying an upward departure would actually depart in any substantial way. We also believe that an appellate court would and should take the weight of the evidence into account in reviewing the reasonableness of a departure.

Id. The court further warned of “the danger of substantial error [that] would permeate a substantial upward departure based on a finding of, say, six uncharged crimes ... proven by a bare preponderance,” id. at 47, and noted that this problem is particularly acute under the Guidelines, which “mandat[e] that [judges] take uncharged conduct into account.” Id. at 48.

In light of these concerns, many sentencing judges employ a sliding scale, adjusting burdens of proof in proportion to the significance of the fact at issue. See United States v. Fatico, 458 F.Supp. 388, 409-11 (E.D.N.Y.1978) (surveying sentencing practices of judges in the Eastern District of New York), aff'd on other grounds, 603 F.2d 1053 (2d Cir.1979), cert. denied, 444 U.S. 1073, 100 S.Ct. 1018, 62 L.Ed.2d 755 (1980); see also A Trial Judge’s Second Impression of the Federal Sentencing Guidelines, 66 S.Cal.L.Rev. 357, 360 (1992) (surveying judges). This court follows the Second Circuit practice— described most recently in Gigante — of assessing the weight of evidence in light of its impact on the sentence.

B. Operation of the Guidelines

Sentencing for nearly all federal crimes is controlled by the Guidelines. United States v. DeRiggi 45 F.3d 713 (2d Cir.1995); see also United States Sentencing Commission, Guidelines Manual 10 (Nov. 1, 1994) (Guidelines apply to 90 percent of federal crimes) [hereinafter Guidelines Manual ]. In creating the Guidelines, the Sentencing Commission relied on “statistical analyses” of “40,000 convictions [and] ... 10,000 augmented pre-sentence reports.” Guidelines Manual at 10. See also Stephen G. Breyer, The Federal Sentencing Guidelines and the Key Compromises Upon Which They Rest, 17 Hofstra L.Rev. 1, 7 n. 50 (1988) (describing computer program used by the Commission to analyze data about 100,000 “criminal dispositions”); Reena Raggi, Foreword to Tony Garoppolo, The Sentencing Reform Act: A Guide for Defense Counsel at v (2d ed. 1994) (describing Guidelines as a “statistically based” system). The goal was to tie the Guidelines ranges to “typical, or average ... past practice.” Breyer, supra, 17 Hofstra L.Rev. at 17. In this respect, the Guidelines are at heart probabilistic.

The centerpiece of the Guidelines system is a one-page “Sentencing Table.” The table resembles a tax chart. Its vertical axis lists “offense levels,” numbered from 1 to 43; its horizontal axis lists “criminal history categories,” from I to VI. The meeting point of offense level and history category determines the applicable sentencing range. Both horizontally and vertically, each range “overlap[s] ... the ranges in the preceding and succeeding levels.” Guidelines Manual at 9.

The objective of overlapping ranges, according to the Sentencing Commission, was to “discourage unnecessary litigation.” Id. For example, the Commission explained, “little purpose will be served in ... trying to determine ... whether $10,000 or $11,000 was obtained as a result of fraud.” Id. at 9-10. Instead of discouraging litigation, the Guidelines have, of course, generated tens of thousands of appeals, required resentencing in a large number of cases, and caused unacceptable distortions, disparities, and irrationalities in sentencing. See, e.g., Steven L. Chanenson, Consistently Inconsistent: Circuits Rulings on the Guidelines in 1991, 7 Fed.Sent.Rep. 224, 225 (1995) (in 1994, 8,400 cases—some 59.5% of all federal criminal appeals — presented Guidelines issues). The proliferation of appeals and reversals illustrates the dangers of using statistical data for substantive law-making purposes. This use is quite different from the function of statistics in the instant and like cases as a means of improving fact-finding in litigation.

1. Drug cases

In drug cases, offense level is based on quantity, possessed, sold, or imported. U.S.S.G. § 2D1.1(c) (1994) (drug quantity table). For example, possession of more than 1,000, but less than 3,000, grams of heroin is a Level 32 offense. Within such ranges, estimation is encouraged. However, at the margins, a difference of a few grams can make a large difference in prison time. See, e.g., United States v. Rivera, 821 F.Supp. 868, 869 (E.D.N.Y.1993) (describing effects on sentence of possible errors in weighing); cf. United States v. Ekwunoh, 12 F.3d 368 (2d Cir.1993) (applying mandatory minimum sentencing statute requiring 10 years in prison for possession of more than 1,000 grams to offense involving 1,013 grams); modified on remand, 888 F.Supp. 369 (E.D.N.Y.1994). When the amounts are close to a critical step-up, at, for example, 1,000 or 3,000 grams, the possibility of a substantial increase in sentence suggests the need for extreme care in estimation.

Under the “modified real offense” system adopted by the Sentencing Commission, relevant conduct, in drug cases, is not limited to the offense of conviction. See Guidelines Manual at 4-5. Instead, the quantity of drugs includes any amounts “that were part of the same course of conduct” or “common scheme or plan.” U.S.S.G. § 1B1.3(a)(2) (1994); United States v. Madkour, 930 F.2d 234, 237 (2d Cir.), cert. denied, 502 U.S. 911, 112 S.Ct. 308, 116 L.Ed.2d 251 (1991); United States v. Cousineau, 929 F.2d 64, 67 (2d Cir.1991). In the Second Circuit, inclusion of such quantities is mandatory. See, e.g., United States v. Schaper, 903 F.2d 891, 898 (2d Cir.1990) (reversing trial judge’s decision to base sentence solely on quantity seized); United States v. Telesco, 962 F.2d 165, 168 (2d Cir.1992) (such quantities “must be considered in determining the base offense level”) (emphasis in original); United States v. Vazzano, 906 F.2d 879, 882 (2d Cir.1990) (same; citing, inter aha, William J. Wilkins, Jr. & John R. Steer, Relevant Conduct: The Cornerstone of the Federal Sentencing Guidelines, 41 S.C.L.Rev. 495, 513-17 (1990)).

Given their obligation to consider drugs from transactions that predate the offense of conviction — in some cases by over a year— judges must sometimes estimate the weight of drugs that neither the court nor the gov-eminent has seen. See, e.g., United States v. Santiago, 906 F.2d 867, 872 (2d Cir.1990) (offenses 14 months apart included in quantity computation). In such cases, the Guidelines offer this instruction:

Where there is no drug seizure or the amount seized does not reflect the scale of the offense, the court shall approximate the quantity of the controlled substance.

U.S.S.G. § 2D1.1, application note 12 (1994) (emphasis added). The Guidelines give examples of the kinds of evidence on which approximations may be based:

[T]he court may consider, for example, the price generally obtained for the controlled substance, financial or other records, similar transactions in controlled substances by the defendant, and the size or capability of any laboratory involved.

Id., application note 12 (emphasis added). This non-exclusive list is consistent with the rule that sentencing judges may consider any evidence that has “sufficient indicia of reliability to support its probable accuracy.” U.S.S.G. § 6A1.3 (1994); see also 18 U.S.C. § 3661 (“[N]o limitation shall be placed on the information concerning the background, character, and conduct” of the defendant which the court may consider in sentencing).

For many drugs, including marijuana, mescaline, and PCP, the Guidelines provide farther instructions for estimation.

If the number of doses, pills, or capsules but not the weight of the controlled substance is known, multiply the number of doses, pills or capsules by the typical weight per dose in the table below.... The Typical Weight Per Unit Table, prepared from information provided by the Drug Enforcement Administration, displays the typical weight per dose, pill or capsule for certain controlled substances. Do not use this table if any more reliable estimate of the total weight is available from case-specific information.

U.S.S.G. § 2D1.1, application note 11 (1994) (emphasis added). The Guidelines thus provide for estimation based on the activities of typical offenders (as determined by the Drug Enforcement Administration) when “ease-specific” evidence is lacking.

Similarly, in cases involving clandestine laboratories “in which the manufacture of a controlled substance has not been completed ... the court must estimate the amount of controlled substance that would have been manufactured....” 60 Fed.Reg. 2430, 2456 (Jan. 9, 1995) (construing U.S.S.G. § 2D1.1 (1994)). For this purpose, the Drug Enforcement Administration provides a formula for estimating theoretical yields from the quantities of precursor chemicals seized.

Because actual yield depends on a multiplicity of factors, according to the Sentencing Commission, “[t]he use of theoretical yield frequently [results] in a higher offense level for someone who ... does not produce any controlled substance than for someone who [does].” 60 Fed.Reg. at 2456-57. The Sentencing Commission has proposed adding a rebuttable presumption that actual yield is half of theoretical yield. Id. This amendment, even if it improves the result, will still require courts to impose sentences based on data culled from other cases.

2. Non-drug cases

The use of approximations under the Guidelines is not limited to drug cases. Such provisions can be found throughout the Guidelines. For example, in tax evasion cases, the “tax loss shall be treated as equal to 28% of the unreported gross income ... unless a more accurate determination ... can be made.” U.S.S.G. § 2T1.1(c)(1)(A) (1994) (emphasis added). Similarly, in sentences for fraud,

the loss need not be determined with precision. The court need only make a reasonable estimate of the loss, given the available information. This estimate, for example, may be based on the approximate number of victims and an estimate of the average loss to each victim, or on more general factors, such as ... the revenues generated by similar operations.

U.S.S.G. § 2F1.1, application note 8 (1994) (emphasis added); see also U.S.S.G. § 2B1.1, application note 3 (1994) (permitting estimation of loss in theft and embezzlement cases). See United States v. Stanley, 54 F.3d 103, 106 (2d Cir.1995) (approving “reasonable” calculation under § 2F1.1 because “[t]he Guidelines require no more”); see also, e.g., United States v. Whitlow, 979 F.2d 1008 (5th Cir.1992) (estimating loss in odometer rollback case); United States v. Hernandez, 952 F.2d 1110 (9th Cir.1991), cert. denied, — U.S. -, 113 S.Ct. 334, 121 L.Ed.2d 252 (1992) (estimating loss in audiotape counterfeiting ease). In Hernandez, the court rejected a method of loss calculation, suggested by the defendant, which would have “required detailed information and complex ... calculations.” Id. at 1119. The court held that “the Guidelines do not require such precision.” Id.

In short, the Guidelines recognize that precise quantification required may be impossible in many cases. In such instances the trier, relying on practical good sense and experience, must estimate from the best available information, including information culled from similar cases.

C. Caselaw on estimation and specific evidence

In reversing the sentencing judge’s finding in the instant ease, the court of appeals stated that estimates of drug quantity must be based on “specific evidence.” Shonubi II, 998 F.2d at 89. As examples of such evidence it listed “drug records, admissions or live testimony” — that is, written or spoken statements. Id. This rule, it asserted, is a “careful practice” necessitated by a system in which drug quantity determinations can significantly affect terms of incarceration. 998 F.2d at 89-90.

1. Estimation

The court of appeals pointed to six cases that illustrate this “careful practice”: United States v. Lasanta, 978 F.2d 1300, 1309 (2d Cir.1992); United States v. Rivera, 971 F.2d 876, 892-93 (2d Cir.1992); United States v. Burnett, 968 F.2d 278, 280 (2d Cir.1992); United States v. Colon, 961 F.2d 41, 43 (2d Cir.1992); United States v. Santiago, 906 F.2d 867, 871-72 (2d Cir.1990); and United States v. Schaper, 903 F.2d 891, 896-97 (2d Cir.1990). Examination of the evidence and inferences relied on in these cases is instructive.

In Lasanta, a government analysis of wiretapped conversations led to a finding that the defendant had sold 14.7205 kilograms of cocaine and .336 kilograms of heroin. The court adopted these amounts after reviewing the conversations. By “converting” heroin to cocaine (using the “drug equivalency table” of U.S.S.G. § 2D1.1 and the conversion practices described in application notes 6 and 10 of that section), the court determined that Lasanta had sold 16.4005 kilos of “cocaine equivalent.” This, it determined, was a “level 34” offense, which requires possession of “[a]t least 15 KG but less than 50 KG of Cocaine.” See U.S.S.G. § 2D1.1 (1994). The court of appeals affirmed. 978 F.2d at 1309.

In Rivera, the sentencing court found that defendants who worked at one of a conspiracy’s five “distribution points” were responsible for quantities estimated to have been sold at the other four points. Finding that the defendants’ familiarity with the “management team to which they reported” and knowledge of the sophisticated wrapping of the' heroin packets, along with the fact that they had attended a party with eo-conspira-tors, indicated awareness of the scope of the conspiracy, the court affirmed the sentence. 971 F.2d at 893.

In Burnett, a government witness testified that he had sold the defendant “[t]wo kilos, three kilos” of cocaine. 968 F.2d at 280. He later “clarified” his testimony, indicating that he had sold three kilos to the defendant. Id. The court sentenced the defendant to a term based on possession of more than three kilograms of cocaine. Deferring “to [the] sentencing judge’s credibility determinations,” the court of appeals affirmed. 968 F.2d at 280.

In Colon, the defendant told a probation officer that he had sold 80 glassine envelopes of heroin every two or three days for a “few years.” The court estimated, “conservatively,” that the defendant must have sold 8,000 envelopes. It multiplied this figure by the average estimated amount of heroin (.05 grams) in the 149 bags that had been seized, arriving at a gross weight of 400 grams. The court of appeals affirmed. 961 F.2d at 43.

In Santiago, the defendant admitted that he had made five heroin sales to an informant named Shattuck. Shattuck, in turn, testified that he bought heroin from Santiago on “approximately a dozen occasions.” 906 F.2d at 869 (emphasis added). The court multiplied the amount found in one bag of heroin by five and twelve to arrive at low and high estimates — either of which, it noted, would translate into a level 26 offense. The court of appeals held that a finding “that Santiago had in fact made a dozen prior sales to Shattuck was amply supported.” Id. at 872.

In Schaper, the sentencing court considered only amounts seized. The appeals court remanded for estimation of drug quantity above the amount known to have been possessed. 903 F.2d at 899.

A few “rules” can be derived from this series of cases. The court of appeals will affirm a sentence based on an estimated quantity 50 times greater than the amount actually seized. Colon. It will accept an estimate that pushes a defendant just over the “line” into a higher offense level. Lasan-ta. It will accept figures that, in their apparent precision, mask the inexactitude of human interactions. Lasanta (analysis of phone conversations translates into finding that 14.7205 kilograms of cocaine was sold). It will permit estimates based on judges’ inferences about other people’s inferences about what third parties are doing. Rivera. It will base sentences on quantity estimates proffered by witnesses whose statements are inconsistent, Burnett, or equivocal. Santiago. Finally, it will accept, without discounting, estimates of drug quantity based on statements of witnesses whose credibility is doubtful. Burnett.

Referring to the six cases, the court of appeals stated that “[c]ase law uniformly requires” such “specific evidence” for quantity determinations. Shonubi II, 998 F.2d at 89. The practice, however, is less “uniform” than the court believes. In United States v. Mickens, 926 F.2d 1323 (2d Cir.1991), cert. denied, 502 U.S. 1060, 112 S.Ct. 940, 117 L.Ed.2d 111 (1992), the sentencing court “approximated that [a] conspiracy distributed in excess of fifty kilograms of cocaine,” a finding it “based on [the defendant’s] unexplained income of over $2,000,000 during the operation of the conspiracy.” Id. at 1331. The court of appeals held that this estimate — based upon inferences from financial records — was “proper.” Id. at 1332; see also United States v. Jacobs, 955 F.2d 7, 9 (2d Cir.1992) (endorsing estimation procedure used in Miekens); cf. United States v. Perrone, 936 F.2d 1403, 1419 (2d Cir.), clarified, 949 F.2d 36 (2d Cir.1991) (disapproving estimation because “[t]here is no evidence here of any conversations about the amount of drugs, ... no records of past sales, no money”) (emphasis added).

In drug manufacturing cases, the court has permitted estimation based on seized chemical bottles whose contents were discarded before testing. See, e.g., United States v. Macklin, 927 F.2d 1272 (2d Cir.), cert. denied, 502 U.S. 847, 112 S.Ct. 146, 116 L.Ed.2d 112 (1991). In Macklin, the court found an estimate based primarily on the bottles’ labels to be “thoroughly reasonable,” id. at 1282, despite significant testimony that the government’s expert had overstated the laboratory’s capability. By contrast, in Perrone, 936 F.2d at 1403, the court rejected an estimate of the amount of cocaine a laboratory could produce. Distinguishing Macklin, the Perrone court noted that neither cocaine base nor hydrochloric acid — two crucial ingredients — had been found in the laboratory. 936 F.2d at 1419. The court was careful to note, however, that “we are not holding that a court can never reach a reliable estimate as to quantity on the basis of some drug ingredients if other drug ingredients are missing,” especially “[i]f the missing ingredients are demonstrated to be readily available to the defendant.” Id.; see also United States v. Pirre, 927 F.2d 694, 696-97 (2d Cir.1991) (upholding sentence based on possession of more than 15 kilograms of cocaine, when chemist estimated that the offense involved 15.09 kilograms of cocaine, then discarded much of the evidence, in order to save space, prior to trial).

In another estimation case, the court relied heavily on the defendant’s phone bill. See United States v. Prescott, 920 F.2d 139, 142 (2d Cir.1990) (in finding that defendant distributed 150 kilograms of cocaine, “[pjerhaps most telling was the fact that in the two and one-half months that [he] owned a cellular telephone ... telephone company records revealed a total of 12,611 incoming and outgoing calls in 79 days or nearly 160 calls a day on average”).

Other circuits have followed the Mickens approach, translating sums of money into quantities of drugs. See, e.g., United States v. Ortiz-Martinez, 1 F.3d 662, 675 (8th Cir.1993) (approving estimate of cocaine quantity by dividing sum of money by cost of cocaine), cert. denied, — U.S. -, 114 S.Ct. 355, 126 L.Ed.2d 319 (1993); United States v. Duarte, 950 F.2d 1255, 1265 (7th Cir.1991) (same), cert. denied, — U.S. -, 113 S.Ct. 174, 121 L.Ed.2d 120 (1992); United States v. Jackson, 3 F.3d 506, 511 (1st Cir.1993) (same; large amounts of cash in proximity to drug supply can be assumed to represent drug profits). They have also utilized the Macklin technique for estimating the capacity of laboratories. See, e.g., United States v. Evans, 891 F.2d 686, 687-88 (8th Cir.1989) (sentence may be based on amount defendant is capable of producing), cert. denied, 495 U.S. 931, 110 S.Ct. 2170, 109 L.Ed.2d 499 (1990); United States v. Smallwood, 920 F.2d 1231, 1236-37 (5th Cir.) (same, despite findings that facility was not operational and that several “necessary precursor [chemicals]” were missing), cert. denied, 501 U.S. 1238, 111 S.Ct. 2870, 115 L.Ed.2d 1035 (1991).

2. Estimates based on extrapolation

Estimates based on extrapolation from known events have been approved in a number of eases. For example, in United States v. McMillen, 8 F.3d 1246, 1249-51 (7th Cir.1993), cert. denied, — U.S. -, 114 S.Ct. 1649, 128 L.Ed.2d 368 (1994), the defendant was a member of a conspiracy for eight weeks, during which approximately 34 heroin sales took place. An average of 39 grams of heroin changed hands in each transaction. The trial court, with some rounding, multiplied the number of transactions by the average size of the transactions to arrive at a total quantity of 1,280 grams. The court of appeals approved the extrapolation:

Because these transactions were similar in nature, the court inferred that these transactions involved a similar quantity of heroin. It was the district judge’s function to ... make a finding supported by a preponderance of the evidence....

8 F.3d at 1250-51.

Similarly in United States v. Thomas, 12 F.3d 1350, 1369 (5th Cir.), cert. denied, — U.S. -, -, 114 S.Ct. 1861, 2119, 128 L.Ed.2d 483, 676 (1994), ledgers showed that 56 kilograms of cocaine were distributed during one third of the duration of a conspiracy. The court multiplied by three to arrive at a sentence based on “more than 150 kilograms” of cocaine.

In United States v. Sklar, 920 F.2d 107 (1st Cir.1990), the defendant was arrested after mailing a package of cocaine. His relevant conduct included sending eleven other packages, none of which was seized. The government knew the weights of the eleven packages. It multiplied those weights by the percentage of cocaine (by weight) of the seized package, rounded down. The court of appeals affirmed “this conservative approach,” noting that, by rounding down, the court had “insulated the challenged finding from clear-error attack.” 920 F.2d at 113.

United States v. Hilton, 894 F.2d 485 (1st Cir.1990), involved a sailboat which caught fire during a mid-ocean interception by government agents. A member of the boarding party saw ten packages in the bilge and retrieved one of them; he observed twelve additional packages floating away. The retrieved bundle had a gross weight of 14 pounds and was found to contain marijuana. The sentencing court “concluded that it was reasonable to assume that the remaining twelve packages each contained at least 10 pounds of marijuana, notwithstanding that they were never seized.” 894 F.2d at 486. The court of appeals affirmed, “agree[ing] with the district court that the evidence offered by the government ... satisfies the sufficiency test, even beyond a reasonable doubt.” Id. at 488 (emphasis added).

Such cases follow Guidelines policy by basing estimates on “similar transactions in controlled substances by the defendant.” U.S.S.G. § 2D1.1, application note 12 (1994) (emphasis added).

Such estimates have been disapproved where the court overlooks evidence that the defendant’s prior transactions were not similar to the offense of conviction. In United States v. Zimmer, 14 F.3d 286 (6th Cir.1994), the defendant possessed 802 marijuana plants at the time of arrest; he also admitted to having grown marijuana for seven or eight years. Based on this evidence, the district court found that the defendant had produced an “absolute minimum” of 200 additional plants over the last eight years — for a total of 1,000 plants. Id. at 289. The court of appeals reversed, finding that “the court’s determination that the defendant grew an additional 200 plants is not supported anywhere in the record.” Id. at 290. In reaching this conclusion, the court pointed to the district court’s failure to consider strong evidence that the defendant’s “‘sophisticated’ indoor growing operation was but a few months old.” Id.

Courts have also rejected estimates based entirely on the conduct of others. For example, in United States v. Garcia, 994 F.2d 1499 (10th Cir.1993), the district court relied on an FBI agent’s knowledge of the average size of shipments being trafficked by dealers along the route followed by the defendant. The agent testified at the sentencing hearing that, based on past experience, “the loads ... that come from El Paso to Oklahoma City ... average between [sixty] and a hundred pounds, and we went with the low end.” Id. at 1508. No evidence was introduced relating the shipments in question to average shipments along the route. Reversing, the court of appeals held that the

average size shipment of all marijuana traffickers is simply not evidence of the size of these particular shipments of marijuana. To find that these particular shipments were of average size is nothing more than a guess.

Id. at 1509 (citation omitted).

Courts have sometimes rejected estimates by trial judges even while encouraging estimates by witnesses and parties. In United States v. Sepulveda, 15 F.3d 1161, 1197 (1st Cir.1993), cert. denied, — U.S. -, 114 S.Ct. 2714, 129 L.Ed.2d 840 (1994), an informant estimated that a defendant had made between fifteen and twenty buys, each of which involved between four ounces and one kilogram of cocaine. The court averaged the number of buys and the size of each buy suggested by the testimony, and multiplied the numbers. Citing Shonubi II, 998 F.2d at 89-90, the court rejected this “wholly eonelu-sory” approach. Id.

[B]y using not one, but two, unsupported averages ... the court compounded the error of its ways. This is, after all, not a case in which a witness testified that, in his or her estimation, “X” equalled the average drug quantity involved in a specific number of drug transactions. That kind of estimate from a lay witness may itself be troublesome, but at least the witness can be cross-examined_

Sepulveda, 15 F.3d at 1199. Apparently, had the informant said “about 17” instead of “between 15 and 20,” the estimation would have been affirmed.

3. Specific evidence

In Shonubi II, the court of appeals called for “specific” — i.e., “direct” — evidence of drug transactions. This recalls the discredited notion that in a murder prosecution, the government must produce the body. See, e.g., Morris, Corpus Delicti and Circumstantial Evidence, 68 L.Q.Rev. 391 (1952); Regina v. Onufrejczyk, 1 All.E.R. 247 (Crim.App.1955). The belief that only direct evidence suffices in criminal cases is a shibboleth. See, e.g., O’Brien v. National Gypsum Co., 944 F.2d 69, 72 (2d Cir.1991) (“[I]t is beyond any doubt that circumstantial evidence alone may suffice [at trial].”) (citation omitted); Holland v. United States, 348 U.S. 121, 139-40, 75 S.Ct. 127, 137-38, 99 L.Ed. 150 (1954) (no “circumstantial evidence instruction” is required when the jury is properly instructed on burden of proof). In Tyler v. Bethlehem Steel Corp., 958 F.2d 1176, 1183 (2d Cir.), cert. denied, — U.S. -, 113 S.Ct. 82, 121 L.Ed.2d 46 (1992), the court of appeals reviewed the history of the direct-circumstantial distinction, concluding that “[i]f a jury can give equal or greater weight to circumstantial evidence, then requiring only ‘direct’ evidence to sustain a ... burden of proof is not only unhelpful, it is baffling.” 958 F.2d at 1184 (emphasis added).

Today, even non-lawyers — for whom trials are becoming daily television fare — understand that circumstantial evidence can be as persuasive and reliable as “direct” proof. See, e.g., DNA Tests Clear Man of Rape Nearly 8 Years After Conviction, N.Y. Times, Jan. 31, 1995, at B5 (defendant convicted after victim identified him in police photographs and two lines-ups was released after DNA testing); Daniel Goleman, Studies Point to Flaws in Lineups of Suspects, N.Y. Times, Jan. 17, 1995, at C1 (eyewitness errors constitute largest single factor in wrongful convictions, based on review of 1,000 eases). See also, e.g., Edwin Borchard, Convicting the Innocent (1932); P. Wall, Eye-Witness Identification Testimony in Criminal Cases (1965); Elizabeth F. Loftus, Eyewitness Testimony, 81 Colum.L.Rev. 441 (1981) (book review).

Perhaps the appellate court, embarrassed either by the Guidelines’ excessive reliance on quantity as a surrogate for culpability or by the low burden of proof required at sentencing, see supra Part IVA, believes that a specific or direct evidence requirement offers defendants some necessary protection. See, e.g., Mare Miller & Daniel J. Freed, The Disproportionate Imprisonment of Low-Level Drug Offenders, 7 Fed.Sent.Rep. 3 (1994) (describing inequities in quantity-based sentencing scheme); Jon O. Newman, Five Guideline Improvements, 5 Fed.Sent.Rep. 190 (1993) (urging Sentencing Commission to “[a]bandon the current excessive reliance on the drug quantity table”). If so, it has taken a road that leads to less accurate fact-finding with haphazard over- and under-“protection” of defendants. The “specific evidence” rule does not guard against injustice. Instead, it introduces — in the language of the court of appeals — a “baffling” distinction.

The court of 'appeals’ statement that the sentencing judge relied on “surmise and conjecture,” Shonubi II, 998 F.2d at 90, is better understood as a statement that the chain of inferences leading to the quantity determination was impermissibly weak, or appeared so as summarized in Shonubi I. This memorandum, accordingly, provides a more comprehensive recounting of the sentencing judge’s analysis. The sections that follow contain: 1) a description of the information used — albeit not fully described — in reaching the initial sentencing decision; 2) a statement of the need for further information and analysis after remand; 3) a description of the evidence provided by the parties and 706 Panel on remand; and 4) a discussion of the appropriate uses of that evidence in reaching a decision.

V. Available sources of information

A sentencing judge relies heavily on evidence introduced at trial and post-trial proceedings. Such evidence, however, is not the judge’s only source of information. Common sense, knowledge of criminal behavior and of behavior generally, and real proof (notably, the demeanor of witnesses and parties) are also employed. These will be explored in turn.

A. Background knowledge

Background knowledge — also called common sense — has long been recognized as necessary in evaluating forensic proof.

[W]hen a juryman takes up his office his mind is already ... stocked with a vast number of commonplace generalizations about acts, attitudes, intentions, etc. ... Without this stock of information ... he could understand very little.

L. Jonathan Cohen, The Probable and the Provable 274-76 (1977); see also Michael & Adler, Real Proof, 5 Vand.L.Rev. 344, 365 (1952); United States v. Cortez, 449 U.S. 411, 418, 101 S.Ct. 690, 695, 66 L.Ed.2d 621 (1981) (recognizing that factfinders must reach “commonsense conclusions about human behavior”); David Binder & Paul Bergman, Fact Investigation 85 (1984) (describing use of generalizations — “vast storehouses of commonly-held notions about how people and objects generally behave” — in drawing inferences from facts); Richard O. Lempert & Steven A. Saltzburg, A Modern Approach to Evidence 990 n. 5 (2d ed. 1982) (“judicial notice” of common experience is an exception to the general requirement that evidence be formally offered and admitted). A judge is, in this respect, no different from a juror, except perhaps in the quantity and reliability of relevant experience on which he or she can draw and in awareness of possible biases that may require discounting.

B. Knowledge of the drug trade

In this case, the court was aided by its knowledge of the drug trade. “Judges in the district courts in the Second Circuit have had extensive experience with the nuances of the drug trade.” United States v. Ekwunoh, 813 F.Supp. 168, 178 (E.D.N.Y.1993), rev’d on other grounds, 12 F.3d 368 (2d Cir.1993); see also Reena Raggi, Local Concerns, Local Insights: Further Reasons for More Flexibility in Guideline Sentencing, 5 Fed.Sent. Rep. 306, 306 (1993) (“[F]ew judges have as broad an experience dealing with drug importation and large-scale distribution as my colleagues in the Eastern District of New York.”). Judges must use this knowledge in reaching decisions. See, e.g., United States v. Jessup, 757 F.2d 378, 386 (1st Cir.1985) (Breyer, J.) (knowledge of drug trade improves decision-making); Stein v. New York, 346 U.S. 156, 181, 73 S.Ct. 1077, 1091, 97 L.Ed. 1522 (1953) (the “jury and the trial judge—knowing local conditions, close to the scene of events, hearing and observing the witnesses and parties—have ... undeniable advantages over any appellate tribunal”). In this case, on remand, the sentencing judge confirmed his beliefs about the drug trade by consulting with his colleagues on the bench. See infra Part X.B.

C. Demeanor

“The tongue of the witness,” it has been said, “is not the only organ for conveying testimony.” Jerome Frank, Law and the Modern Mind 109 (1931). Given this obvious truth, jury instructions routinely state that consideration of demeanor is not merely permitted, but expected. Cf. Henriod v. Henriod, 198 Wash. 519, 524-25, 89 P.2d 222, 225 (1938) (to watch attitude and demeanor “is not only the right but also the duty of the trier of facts”); 3A Wigmore on Evidence § 946, at 783 (James H. Chadbourn rev. ed. 1970) (demeanor, “without any definite rules as to its significance, is always assumed to be in evidence”); Fed.R.Evid., art. VIII, Advisory Committee’s introductory note (hearsay rule) (“The demeanor of the witness traditionally has been believed to furnish trier ... with valuable clues.”); NLRB v. Dinion Coil Co., 201 F.2d 484, 487-90 (2d Cir.1952) (Frank, J.) (recounting the history of “demeanor evidence” from Roman times); Olin Guy Wellborn, Demeanor, 76 Cornell L.Rev. 1075 (1991) (summarizing social science data relating to potential uses of demeanor).

Demeanor includes facial expressions, body language, and such matters as a defendant’s forcefulness in consulting with an attorney during court proceedings. See, e.g., United States v. Schipani, 293 F.Supp. 156 (E.D.N.Y.1968) (judge, at a bench trial, confirmed conclusions about susceptibility to coercion by observing and assessing demeanor), aff'd, 414 F.2d 1262 (2d Cir.1969), cert. denied, 397 U.S. 922, 90 S.Ct. 902, 25 L.Ed.2d 102 (1970). Triers must decide cases on the basis of evidence, which is on the record, interpreted in light of demeanor, which is not.

A judge may consider the demeanor and actions of a person even when that person is not testifying. In Schipani, the defendant whose demeanor influenced the trial judge’s conclusions never took the stand. 293 F.Supp. at 163. The court of appeals agreed with the district court that “[t]he trier’s observation of the non-witness defendant’s demeanor and general appearance may be—and almost invariably is—considered by him in evaluating evidence introduced at the trial.” Schipani, 414 F.2d at 1268. The Supreme Court has also implicitly recognized that triers can rely on non-witness demeanor. See Stein, 346 U.S. at 181, 73 S.Ct. at 1091 (noting advantages of “observing the witnesses and parties ”) (emphasis added); United States v. Grayson, 438 U.S. 41, 50, 98 S.Ct. 2610, 2615-16, 57 L.Ed.2d 582 (1978) (“[O]ne serious study has concluded that the trial judge’s ‘opportunity to observe the defendant, particularly if he chose to take the stand in his defense, can often provide useful insights into an appropriate disposition.’ ” (quoting ABA Project on Standards for Criminal Justice, Sentencing Alternatives and Procedures § 5.1, at 32 (App. Draft 1968))). See also Michael & Adler, Real Proof, supra, at 365 (“Demeanor, whether that of a witness while testifying or of a person who is not giving testimony, is ... a kind of event ... which occurs in the presence of a jury and is sensibly apparent to it. Consequently, such an event need not be offered and cannot be excluded.”). Cf. United States v. Nichols, 56 F.3d 403, 412-13 (2d Cir.1995) (“We must ... defer to the judgment of the district court, which had the benefit of examining [the defendant] ... that [the defendant had] ... the ability to assist his lawyer in mounting his defense.”).

1. How judges use demeanor

Gauging credibility is the best known application of demeanor. Ironically, this may be one of its least effective uses. See, e.g., Victor Barall, Book Review, Thanks for the Memories: Criminal Law and the Psychology of Memory, 59 Brook.L.Rev. 1473, 1482 (1994) (“[T]he proverbial sweaty palms and averted gaze may be more indicative of language difficulties, cultural differences in etiquette or stowage than of fabrication.”); Jeremy A. Blumenthal, A Wipe of the Hands, A Lick of the Lips: The Validity of Demeanor Evidence in Assessing Witness Credibility, 78 Neb.L.Rev. 1157 (1993) (similar). Cf. Quercia v. United States, 289 U.S. 466, 53 S.Ct. 698, 77 L.Ed. 1321 (1933) (reversible error for trial judge to tell jury that wiping hands during testimony “is almost always an indication of lying”).

Other applications of demeanor may produce more reliable results. In Stein, supra, the issue before the trial court was whether the defendants’ confessions had been coerced. Demeanor was relevant to the susceptibility of the defendants to coercion. Likewise, in an action for an order of physical protection, no judge would ignore a threatening gesture made by the respondent on his way out of the courtroom. In such situations, behavior in the courtroom properly supports inferences about behavior out of court.

2. Demeanor and appellate review

As Professors Michael and Adler have observed:

In observing ... demeanor and in coming to a conclusion on that demeanor a judge does not use any approved formulas; nor does he practice any art; nor does he proceed in any scientific way. So there is no terminology that he may employ to explain to an appellate court what has taken place.

Michael & Adler, Real Proof, supra, at 102. Thus, while demeanor evidence must be used by the trier in evaluating evidence, it is generally unavailable to the court of appeals in reviewing that evaluation. The trial court cannot “transmit” demeanor to the court of appeals. At most, it can attempt to briefly describe that demeanor, and assume that the court of appeals will give its judgments adequate deference. It is desirable at times, for purposes of assisting appellate review, for the judge to describe demeanor that departs from that suggested by the transcript.

The opportunity, which exists in the trial court alone, to observe and assess credibility explains in part the reluctance of appellate courts to overturn findings of fact. See Olin Guy Wellborn, Demeanor, 76 Cornell L.Rev. 1075, 1077 (1991) (“The opportunity of the trier to observe the demeanor of the witnesses is a principal basis for the deference accorded by reviewing courts to factual determinations of trial courts....”); see also United States v. Pfeiffer, No. 94-1331, at 2, 50 F.3d 3 (2d Cir. Feb. 23, 1995) (unpublished) (“[D]eference is owed to the district court’s determinations with respect to competence when they are based on observation of the defendant during the proceedings.”). Since appellate courts cannot withhold “permission” to consider demeanor, they must allow for the existence of factors not discernible from the written record.

D. Assumptions about criminal behavior

Every sentencing judge receives daily instruction in criminology. He or she generally takes from this continuing education the strong belief that many criminals lack flexibility. Some will repeat the same crime, in the same way, until caught. Criminologists have long noted this tendency. See, e.g., Arnold Miles, How Criminals Are Caught 32 (1940) (tendency to form habits well-known to police); Charles E. O’Hara, Fundamentals of Criminal Investigation 597 (1970) (criminals are “loath to alter ... operational procedure”). This propensity to repeat criminal conduct combines two distinct phenomena. First, criminals, like many others, are creatures of “habit.” See generally, Don C. Gibbons, Society, Crime, and Criminal Careers: An Introduction to Criminology (1973). Second, criminals, like rational businesspeople, tend to adhere to methods that have worked. See Mark Moore, “The Pusher As a Rational Business Man,” in Crime in Society 716, 741 (Leonard D. Savitz & Norman Johnston eds., 1978) (describing criminals as choosing “portfolios of tactics” that maximize profits); C. Peter Rydell & Susan S. Everingham, Controlling Cocaine: Supply Versus Demand Programs 59 (Report of Rand Drug Policy Research Center, 1994) (“[Suppliers have learned how to run their business more efficiently over time.”). See also discussion of economies of smuggling in Part X.C, infra.

The inflexibility of many criminals was described by Wigmore. See, e.g., J.H. Wigmore, The Principles of Judicial Proof or the Process of Proof as Given by Logic, Psychology, and General Experience 161 (2d ed. 1931) (“It is now well known to detective police officials that professional criminals tend to limit their misdeeds to special kinds and methods. In short, they develop a habit with respect to the crime”) (emphasis in original); id. at 161 n. 1 (“[This] fact has of course been known ever since the institution of organized police.”). Such observations underlie Federal Rule of Evidence 404, which permits introduction of evidence of prior acts for a number of purposes, including identity — that is, the likelihood that a crime was committed by a defendant known to have committed a similar crime in a similar manner in the past.

The reverse inference — that a defendant known to have committed previous crimes used the same method of operation on each occasion — is permissible in some cases. Every decision by a court allowing the introduction of evidence of prior crimes is an endorsement of the theory that criminals develop routine methods of operation. See, e.g., People v. Peete, 28 Cal.2d 306, 169 P.2d 924, cert. denied, 329 U.S. 790, 67 S.Ct. 356, 91 L.Ed. 677 (1946) (strikingly similar murders committed 20 years apart); id., 28 Cal.2d at 328, 169 P.2d 924 (Carter, J., dissenting) (arguing, unconvincingly, that the defendant would not have used the same “m.o.” that resulted in earlier arrest). Assumptions about modus operandi underlie much of the work of law enforcement and judicial personnel.

VI. How decision-makers learn and decide

A. Inferences based on prior information and training

Recently, Judge Frank M. Coffin applauded another judge’s “candid recognition of the judge’s ‘hunch’ ... [,] that intuitive flash of understanding which makes the jump spark connection between question and decision.” Frank M. Coffin, Judging Books, A.B.A. Judge’s J., Spring 1994, at 47 (reviewing Edward F. Hennessey, Judges Making Law (1994) (quoting Joseph Hutcheson, The Judgment Intuitive: The Function of the “Hunch” in Judicial Decision, 14 Cornell L.Q. 274, 278 (1929))). What is sometimes called a hunch is an important part of judging — given the need to reach initial determinations quickly. See 1A Wigmore on Evidence § 25, at 955-56 (Peter Tillers rev. ed. 1983) (time constraints make judging as much art as science). A trial judge’s “hunch” is generally based on the evidence and experience, albeit with the inferential chain unstated. As Justice Holmes recognized, “[m]any honest and sensible judgements ... express an intuition of experience which outruns analysis and sums up many unnamed and tangled impressions, ... which may be beneath consciousness without losing their worth.” Chicago, B & Q Ry. v. Babcock, 204 U.S. 585, 598, 27 S.Ct. 326, 329, 51 L.Ed. 636 (1907); see also Oliver W. Holmes, Book Review, 5 Am.L.Rev. 539, 540 (1871), reprinted in Holmes Uncollected Papers 90 (Shiver ed. 1936) (“[Jjudges know how to decide a good deal sooner than they know why.”). Thus a hunch, when its underlying assumptions are examined and articulated, may be validated as a sound conclusion. As Justice Shientag observed 50 years ago:

During the course of a trial or the argument of a motion or an appeal, there is a constant interplay of intelligence, sensibility and intuition. When at the conclusion of a trial or argument the judge has what is called a “first impression,” that may be the result of quick reasoning by a disciplined mind; it may be an inference made so rapidly as to be almost instantaneous; it may be an intuition, depending for its strength on the background of learning and experience behind it; it may be shrewd, native horse sense; or it may be little more than a guess. The conscientious judge checks up his first impressions.

Bernard L. Shientag, The Personality of the Judge 74 (1944). Justice Shientag concluded:

[T]he term “judicial hunch” ... is misleading. ... The use of that term tends to underestimate the logical quality of the judge’s thought and judgment_ Properly used and understood, it is a great aid to reflective reasoning, and instead of merely giving the judge an easy way out of his difficulty, it adds clarity and strength and validity to the conclusion he finally reaches.

Id. at 73-74.

B. Methods of reaching conclusions

In using available information to arrive at an estimate of the probability that a material proposition is true, a judge may employ a number of analytic methods. Several are described below. None in itself fully explains how the mind of the trier works in any case. Each, however, provides a useful partial view.

1. Classical step-by-step analysis

A trier’s conclusion — whether presented as the “hunch” or elucidated in a full opinion— utilizes observed evidence, draws inferences from that evidence based upon hypotheses predicated upon knowledge of the world (or supplied by experts), and results in an informed decision. Wherever practicable, a conclusion should be tested and explained. See, e.g., John H. Mansfield, Norman Abrams, Margaret E. Berger, et al., Cases and Materials on Evidence, ch. 1 (8th ed. 1988); James Bradley Thayer, Select Cases on Evidence at the Common Law (1892); Edward M. Morgan, John M. Maguire, et al., Cases and Materials on Evidence (4th ed. 1957); John M. Maguire, James H. Chadbourn, et al., Cases and Materials on Evidence (6th ed. 1973). In the language of classical forensic analysis, a line of proof necessarily consists of a series of minor and major premises and conclusions — a sequence of inferential steps ending in an estimate of the probability of a material (ultimate) fact. See, e.g., Jerome Michael & Mortimer Adler, The Nature of Judicial Proof (Colum.L.School 1931); 1A Wigmore on Evidence § 37.3, at 1026-30 (Peter Tillers rev. ed. 1983); Jerome Michael & Mortimer Adler, The Trial of an Issue of Fact: I and II, 34 Colum.L.Rev. 1224, 1252, 1462 (1934).

Except when provided by experts, major premises, or hypotheses, are generally based on background knowledge, which will vary from trier to trier. See, e.g., Peter Tillers, Webs of Things in the Mind: A New Science of Evidence, 87 Mich.L.Rev. 1225, 1226 (1989) (“[TJheory of relevancy and inference ... put[s] great emphasis on the role of experience and on generalizations based on experience.”).

The classical analysis of scholars such as Michael and Adler relies on a series of inferential steps, each utilizing a probans, a hypothesis, and a probandum to prove or disprove a material proposition. The analysis employs the following terminology:

Material proposition:

Statement about a matter of fact which is a specific example falling within the general class which is one of the elements of the applicable rule of law.

Evidence:

Perceptive objects — e.g., persons, things and events presented to the senses of the tribunal; all evidence must be perceived through senses.

Probandum:

Proposition being proved; if ultimate, a material proposition.

Probans:

Proposition being used to prove another; if ultimate, an immediate proposition (i.e., based on sense impression with minimal inference).

Evidentiary hypothesis:

General proposition employed as a pro-bans.

Step of proof:

A syllogism containing a probandum, an evidentiary hypothesis, and a probans.

Line of proof:

Series of steps of proof ... ending in a material proposition.

Syllogism:

Minor premise (probans) — > Major premise (general proposition, evidential hypothesis) — > Conclusion (probandum).

John H. Mansfield, Norman Abrams, Margaret Berger, et al., Cases and Materials on Evidence 10 (7th ed., 1983) (with minor alterations); see also id. at 11 (illustrative diagram); Jerome Michael & Mortimer Adler, The Nature of Judicial Proof (1931); Jerome Michael & Mortimer Adler, The Trial of an Issue of Fact, 34 Colum.L.Rev. 1224, 1252 (1934) (using diagram for analysis); John Henry Wigmore, The Principles of Judicial Proof (2d ed. 1931).

Once it is established that a defendant made prior smuggling trips, as in this case, an intermediate probans could be: “The defendant smuggled drugs internally on trip number X.” The evidentiary hypothesis could be: “A drug smuggler carries as much as he can on each trip.” The probandum: “The defendant smuggled as much as he could on trip X.”

The next step would start with the proban-dum of the prior step as a probans. The evidentiary hypothesis here could be: “The defendant has demonstrated a capacity to swallow and deliver Y grams of heroin.” The conclusion or probandum would be: “On X trip, the defendant smuggled Y grams.” The material proposition under the Guidelines would constitute a summation of trips: i.e., Yi + Y2 + ... Y3. Since under that theory of the case the amounts are the same on each trip, the total would be 8Y. The validity of this analysis would depend upon the probability that the hypothesis that a smuggler carries the maximum possible amount on each trip reflects the real world situation. It also assumes that his capacity does not vary appreciably from trip to trip — i.e., that there is no learning curve (the “trip effect” discussed by the experts in Part X.A, infra).

2. Bayesian and statistical analysis

A judge, in the position of factfinder, enters the courtroom in much the same position as a juror. Early on, he or she may form an opinion of the likelihood that a proposition is true. This perceived likelihood then rises and falls as each item of evidence is introduced. Despite admonitions not to “decide” until all of the evidence is in, triers tend to keep a running estimate of probabilities.

As Professor David Schum has noted:

[T]he structure of arguments we make from evidence [has] to undergo revision if we are to keep abreast of recognized changes in the world. Structural revisions ... involve changes in our hypotheses or possible conclusions, changes in our assumptions, premises and generalizations, and, of course, changes in the amount and kind of evidence we obtain.

1 David Schum, Evidence and Inference for the Intelligence Analyst 318 (1987), quoted in Peter Tillers, Webs of Things in the Mind: A New Science of Evidence, 87 Mich.L.Rev. 1225, 1250 (1989).

Bayes’ Theorem is useful in describing this dynamic process. Analysis begins with a “subjective probability” — the factfinder’s “opening” estimate of the likelihood of the end proposition. After each item of proof is introduced, this “subjective probability” is recalculated based on the degree to which the new evidence supports or contradicts the proposition to be proven. This recalculated probability is then treated as a “prior” assessment in processing a further item of proof, and so on. The calculation is repeated over and over, until all the items in a line of proof have been considered. The end result of all these calculations is called a “posterior probability” — a judgment accounting for each successive element of proof. See David H. Kaye, “What is Bayesianism?,” in Probability and Inference in the Law of Evidence: The Uses and Limits of Bayesianism 1 (Peter Tillers & Eric D. Green eds. 1988); Federal Judicial Center, Reference Manual on Scientific Evidence 386-87 (1994); Brian Skyrms, Choice and Chance: An Introduction to Inductive Logic (3d ed. 1986); Michael O. Finkelstein, Quantitative Methods in Law 87 (1978) (describing utility of Bayesian analysis in combining quantitative assessment with intuitive assessment of probabilities).

Building on the work of Professor Leonard Savage, who demonstrated that the rules of probability could be applied to “personal” or “subjective” judgments, Professor Michael Finkelstein and others have favored use of Bayesianism in legal decision-making. See Leonard J. Savage, The Foundations of Statistics (1st ed. 1954); Michael O. Finkelstein & William B. Fairley, A Bayesian Approach to Identification Evidence, 83 Harv.L.Rev. 489 (1970) (advocating use of Bayes’ Theorem in jury trials). They have convinced much of the current generation of evidence scholars that Bayes’ Theorem provides a helpful description of appropriate legal fact-finding techniques. See, e.g., Ward Edwards, Comment, The Society of Bayesian Trial Lawyers, 66 B.U.L.Rev. 937, 937 (1986) (Bayesian theory is “orthodoxy” among evidence professors); David H. Kaye, Apples & Oranges: Confidence Coefficients and the Burden of Persuasion, 73 Cornell L.Rev. 54, 54 (1987) (Bayesianism is “reigning theory”); see also The Evolving Role of Statistical Assessments as Evidence in the Courts 193 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989) (arguing that Bayesian analysis comports with the assumptions underlying Rule 401 of the Federal Rules of Evidence).

These ideas have traveled from the university to the courtroom as judges have observed that, among academicians, “Bayes’ Theorem ... is practically universally accepted as valid.” State v. Spann, 130 N.J. 484, 505, 617 A.2d 247 (N.J.1993) (applying Bayesian analysis to paternity determination); see also State v. Klindt, 389 N.W.2d 670 (Iowa 1986) (statistical evidence properly used, in conjunction with non-statistical evidence, to prove identity of mutilated body). But see State v. Skipper, 228 Conn. 610, 637 A.2d 1101 (1994) (criticizing application of Bayes’ Theorem in paternity hearing), discussed in Ronald J. Allen et al., Probability and Proof in State v. Skipper. An Internet Exchange, 35 Jurimetrics Journal 277 (1995).

The problem for the courts is not so much whether this type of analysis is valid, but how to make it comprehensible to the trier. See, e.g., Ronald J. Allen et al., An Internet Exchange, supra, at 277. Research on the utility of explicit Bayesianism in jury trials remains “in an incunabular phase.” D.H. Kaye & Jonathan J. Koehler, Can Jurors Understand Probabilistic Evidence?, 154 J. Royal Stat.Soc’y (A), 75, 80 (1991).

Professors Finkelstein and Fairley wisely advocate the application of Bayes’ Theorem primarily to cases in which a probability assessment based on non-mathematical evidence is modified by the later introduction of explicitly quantitative proof. In fact, they have argued that Bayes’ Theorem is most useful where statistical evidence is “added to the mix” after “particularistic” evidence has already been considered. In such situations, they believe, “Bayesian analysis would demonstrate that the evidentiary weight of an impressive figure like one in a thousand— which might otherwise exercise an undue influence — would depend on the other evidence in the case, and might well be relatively insignificant if the prior suspicion were sufficiently weak.” Finkelstein & Fairley, supra, at 514. If, by contrast, the probability of guilt based on the initial, non-statistical evidence is high,

jurors may be surprised at the strength of the inference of guilt flowing from the combination of their prior suspicions and the statistical evidence. But this, if the suspicions are correctly estimated, is no more than the evidence deserves.

Id.

It is possible to accept the validity of Bayes’ Theorem as a guide to, and description of, decision-making, without attempting the complex calculations literal application of the Theorem would require. Bayes’ Theorem is useful, even without quantification, to explain how triers think. Cf. Allen, Burdens of Proof, supra, at 642 (“Bayes’ Theorem [is] the best available explanation of reasoning under the conditions at trial, but it imposes impossible computational demands.”); Cohen, Response, supra, at 91, 95 (Bayesian concepts should be used “heuristieally”; focus should be “on the conceptualization rather than the mathematics of legal decision-making”). As utilized subconsciously by laypersons — or even consciously by judges — to structure their decision-making, Bayes’ Theorem is often not a mathematical formula but a heuristic device. Cf. John Maynard Smith, Life at the Edge of Chaos?, N.Y. Rev. Books, Mar. 2, 1995, at 28 (proposing use of “dynamics,” Darwinian theory requiring computation, in “loose and metaphorical sense”).

Professor Neil Cohen has developed a Bayesian gloss on confidence intervals as used by statisticians. He suggests that at least some judicial decision-making involves more than determining the most likely probability that an event has occurred. See Neil B. Cohen, Confidence in Probability: Burdens of Persuasion in a World of Imperfect Knowledge, 60 N.Y.U.L.Rev. 385 (1985). For example, in a preponderance setting, according to Professor Cohen, “[n]ot only must factfinders determine that their best estimate of the probability in question exceeds the threshold level — 0.5 ... — but they also must have a certain level of confidence that the true probability, based on all possible evidence, exceeds that threshold.” Id. at 399; see, e.g., DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 948 (3d Cir.1990) (proposing use of confidence intervals, as indicative of the “risks inherent” in decision-making), cert. denied, — U.S. -, 114 S.Ct. 691, 126 L.Ed.2d 658 (1994); DeLuca, 911 F.2d at 959 n. 24 (discussing relationship of confidence intervals to burden of proof). In O’Neal v. McAninch, — U.S. -, -, 115 S.Ct. 992, 994, 130 L.Ed.2d 947 (1995), the Supreme Court discussed the proper disposition of habeas cases in which the reviewing court is left in “grave doubt” about whether an error was harmless. The Court’s analysis suggests that this doubt is not the knife-edge of equipoise, but a “band” or interval of uncertainty.

Professor Cohen’s theory of “confidence intervals” may be useful in understanding decision-making in cases where a range of values rather than a precise figure is required, such as the Manville asbestos case, In re Joint E. & S. Dist. Asbestos Litig. (Findley v. Falise), 878 F.Supp. 473, 490 (E. & S.D.N.Y.1995) (discussing problem of projecting future claims), or Guidelines sentencing cases requiring estimation. However, some statisticians take issue with Cohen’s linking of Bayesianism and the concept of “confidence intervals”. Cf. Letter from Professor D.H. Kaye 2 (Apr. 30, 1995) (on file in the instant case) (disputing propriety of Bayesian interpretation of frequentist confidence intervals as used by statisticians).

Although Professor Cohen’s article on confidence intervals prompted a heated response from Professor Kaye, see Apples & Oranges, supra, their disagreement does not diminish the value to triers of their work. See Neil B. Cohen, Conceptualizing Proof and Calculating Probabilities: A Response to Professor Kaye, 73 Cornell L.Rev. 78, 81, 86 (1987) (acknowledging common ground); Kaye, Apples & Oranges, supra, at 77 (same). Professor Cohen’s belief that a court must sometimes find not only a “point probability,” but a range of most likely probabilities, the lower limit of which must satisfy the relevant burden of proof, is consistent with the trier’s obligation to maintain an oscillating doubt in tension and to appreciate the range of possible conclusions about a given set of facts.

3. Biases

A great deal of research has been done on how people think, remember, reason, and come to conclusions. Amos Tversky and Daniel Kahneman have offered descriptions of systemic biases in human decision-making. See, e.g., Amos Tversky & Daniel Kahneman, Judgment Under Uncertainty: Heuristics and Biases, 185 Science 1124 (1974). While their work needs to be applied with some skepticism in the courtroom setting, they do identify a number of common biases, including “availability” (the tendency to assess the frequency of an event based on the ease with which occurrences can be brought to mind) and “anchoring” (the tendency to stick too closely to initial estimates of probability, after contrary evidence has been introduced). Professors Tversky and Kahneman suggest that these “biases are not restricted to laymen,” but affect professional decision-makers as well. Id. at 1180. Factfinders who are aware of such biases can, they suggest, make valuable efforts at correction. Id.

Given the complexity of analysis in even the simplest cases, rational analysis of every strand of evidence is almost never possible. As Professor Richard Friedman points out: [W]e naturally batch hypotheses, [as] a way to make the problem intellectually tractable and the demand for information economically feasible....

Ronald J. Allen et al., An Internet Exchange, supra, at 295 (citing Richard Friedman, Infinite Strands, Infinitesimally Thin: Storytelling, Bayesianism, Hearsay and Other Evidence, 14 Cardozo L.Rev. 79 (1992)). See also, Norman Davies, The Misunderstood Victory in Europe, N.Y.Rev. Books 7, 8 (May 25,1995) (“It’s generally understood ... that we can only conceive of human activities by using collective names and stereotypes.... Without collective names, we could not process the otherwise unmanageable material.”).

Building partly on the work of Professors Tversky and Kahneman, Professors Michael J. Saks and Robert F. Kidd have assessed the role of heuristic biases in forensic decision-making. See Michael J. Saks & Robert F. Kidd, Human Information Processing and Adjudication: Trial by Heuristics, 15 Law & Soc’y Rev. 128 (1980-1981). In recognizing the fallibility of human fact-finders, they note:

[I]n many areas, decision aids [can be] developed to compensate for the fact that “... man’s cognitive capacities are not adequate for the tasks which confront him.” These aids range from the advice to engage in explicit calculation of probabilities, to decision analysis, to human/machine systems. We hope it is not unduly optimistic to suppose that the law could be another area in which less than reliable, less than accurate decision making can be identified and corrective strategies developed.

Id. at 134.

Trying to understand and correct for the biases of decision-makers is an essential part of trial practice. See, e.g., Donald E. Vinson, Jury Persuasion: Psychological Strategies & Trial Techniques (1993); Donald E. Vinson, Jury Trials: The Psychology of Winning Strategy (1986). Dr. Vinson, echoing the language of Professors Tversky and Kahneman, observes that jurors tend to be swayed by

relatively small numbers of anchoring ideas which they use to form their own understanding of a case. These ideas can be legal points, but they are often psychological or emotional issues which may not be immediately evident from a legal standpoint.

Id. at xiii (emphasis added).

Judges are subject to the same biases as jurors. Unlike jurors, however, judges tend to explain their decisions. Explanation, in turn, encourages analysis and efforts at bias correction. Biases cannot be ignored; rather, “fairness must take into account, and correct for, the judge’s sympathies and background.” Essay, Limits on Judges’ Learning, Speaking, and Acting: Part II Speaking and Part III Acting, 20 U.Dayton L.Rev. 1, 5 (1994).

4. Storytelling

One standard heuristic device is the tendency to weave discrete facts into stories that “explain” evidentiary data in light of the trier’s background knowledge. “[F]act-finders ... make ‘holistic’ judgments — they are persuaded by stories_ From that premise, some holists have argued that atomistic [presentations are] not only wasteful but ... counterproductive.” Terence Anderson & William Twining, Analysis of Evidence 168 (1991); cf. James L. Kainer, The Rationalist Tradition at Trial, 60 Fordham L.Rev. 1085, 1085 (1992) (distinguishing between “narrative holists” and “logical atomists”) (reviewing Anderson & Twining, supra); Peter Tillers, Mapping Inferential Domains, 66 B.U.L.Rev. 883 (1986); James S. Liebman, “Relevance and Scientific Proof Revisited: Probabilistic Proof and Decisionmaking,” in Second Supplement: The Rules of Evidence 5 (Fall 1994 course materials, Columbia University); James S. Liebman, “Psychological Perspectives on Lay Factfinding Under Conditions of Uncertainty: Heuristics and Holisties,” in Second Supplement, supra, at 236. This tendency influences trial practice. See Allen, Burdens of Proof, supra, at 642-43 (arguing that advocates prove stories, not facts, and that the preponderance standard “simply refers to which [story] is more plausible”); Michael E. Tigar, Examining Witnesses 5 (1993) (“[JJudges and jurors ... understand and restate events in terms of stories. If pieces are missing, they will fill in the gaps based on intuition.”).

In light of this inevitable tendency, lawyers are advised not to present items of proof that are “logically inconsistent with the [story].” Anderson & Twining, supra, at 166; see also James W. McElhaney, Putting the Case Together, A.B.A.J., March 1995, at 81, 85 (urging litigators to “[p]lan your story line”). Thus the danger is not only that decision-makers will ignore facts that do not fit their “story,” but that advocates will fail to present those facts in the first instance. Nevertheless, bias in favor of coherent stories can be helpful to the trier in reconciling disparate facts with his or her own experience. The judge, no less than the juror, properly seeks a coherent story.

One way to minimize bias is to attempt to analyze the same problem using both statistical and non-statistical techniques. For example, “[i]t is nearly always instructive for decision makers to compare their subjective impressions to data objectively summarizing the actual events they are deciding about.” Saks & Kidd, supra, at 139; see also Barbara D. Underwood, Law and the Crystal Ball: Predicting Behavior with Statistical Inference and Individualized Judgment, 88 Yale L.J. 1408, 1432 (1979) (describing differences between “clinical,” i.e. individual, and statistical modes of decision-making, and the biases that accompany each). As the Panel on Statistical Assessment as Evidence in the Courts has noted, the vividness of anecdotal evidence may outweigh statistical evidence in some situations, the specificity of statistical evidence may outweigh anecdotal evidence in other situations, and “[b]oth possibilities may occur ... within the context of a single case.” The Evolving Role of Statistical Assessments as Evidence in the Courts 154 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989). Applying both statistical and other forms of reasoning in one case can permit cross-checking and correction. The advantages of cross-checking in this case will be re-examined in Part XII.E, infra, after the available statistical material has been described in greater detail.

VII. Application of Law to Facts on Original Sentence

A. Evidence from trial

Evidence introduced at trial established that the defendant made eight heroin smuggling trips from Nigeria to the United States; the court of appeals found that these trips were “amply prove[n]”. Shonubi I, 998 F.2d at 89.

That the defendant is capable of carrying 103 balloons, containing approximately 430 grams of heroin, in his digestive tract for more than a day was also proven to the satisfaction of the court. (The jury was not required to make a quantity determination in finding the defendant guilty.)

It is also clear that Shonubi is not averse to using false documents and telling lies both in and out of court in an attempt to conceal his unlawful activities. The predisposition of the defendant — a mature person, set in his ways — to he is unlikely to have appreciably changed between his first and final trips. Yet the connection between perjury and other criminal conduct varies from case to case; not every criminal lies under oath and not every liar is a criminal. The continuing controversy regarding Rule 609 of the Federal Rules of Evidence illustrates the doubt surrounding this “perjury therefore criminality” hypothesis. See West’s Federal Criminal Code and Rules 247-51 (notes following Fed. R.Evid. 609) (West 1993); Margaret A. Berger et al., Evidence, 1990 Amendment, 1987 Amendment, Initial Congressional Action, ¶¶ 609[01]-609[04] (1994). Invocations of the perjury hypothesis require skepticism by the trier. But cf. Samuel Yochelson & Stanton E. Samenow, The Criminal Personality 348 (1975) (describing habitual lying as concomitant of criminal behavior); Jeremy A. Blu-menthal, Demeanor Evidence, 72 Neb.L.Rev. 1157, 1167 (1993) (“[A] defendant’s ‘truthfulness or mendacity while testifying on his behalf ... [is] probative of his attitudes toward society.’”) (quoting United States v. Grayson, 438 U.S. 41, 50, 98 S.Ct. 2610, 2616, 57 L.Ed.2d 582 (1978)) (alterations supplied).

In the instant case the defendant’s particular form of perjury suggests persistence, rigidity, and stoicism. These qualities, in turn, support an inference of uniform criminal conduct—in this case, the smuggling of the largest amount he believed he was capable of carrying on every trip. See discussion in Part VII.B, infra.

B. Demeanor and character

Both the trial and post-conviction proceedings provided the court with a clear view of the defendant’s demeanor. His decision to go to trial, his control of the defense through forceful conversations with his attorney, and his insistence on taking the stand—despite his counsel’s admonitions—revealed an extremely low level of risk aversion and an overconfidence in his own powers. It is not inappropriate to infer from these observations, which the sentencing judge could not have failed to make, that the defendant was reckless in other situations.

Carrying heroin internally is a highly risky endeavor. See United States v. Gonzalez-Rincon, 36 F.3d 859 (9th Cir.1994) (Kozinski, J., concurring) (“[T]he containers in which drugs are carried are not foolproof; packages have been known to rupture or leak, severely intoxicating or killing the carrier.”), cert. denied, - U.S. -, 115 S.Ct. 1323, 131 L.Ed.2d 203 (1995); Punit S. Ramrakha & Ian Burton, Drug Smuggler’s Delirium: Suspect Cocaine Intoxication in Travellers With High Fever and a Bizarre Mental State, 306 Brit.Med.J. 470 (1993); Sue Leeman, Many Nigerian Women Pay With Lives in Tricky British Drug Traffic, L.A. Times, Feb. 7, 1993, at A24. The risk of a burst balloon increases with the number of balloons swallowed.

Individuals’ risk-aversion has a significant effect on their behavior. See, e.g., Louis Kaplow, Rules Versus Standards: An Economic Analysis, 42 Duke L.J. 557, 605 (Dec.1992) (describing risk aversion); Peter Reuter, Robert MacCoun & Patrick Murphy, Money from Crime: A Study of the Economics of Drug Dealing in Washington, D.C. 111 (1990) (describing importance of risk-taking in facilitating the drug trade). Observations about the defendant’s attitude toward risk permit inferences about his activities as a smuggler.

Similarly, observations about the defendant’s consistency of behavior permit inferences about his criminal conduct. The views of commentators on the appropriate uses of “prior-act” evidence in criminal cases support this view. See Edward J. Imwinkelried, Undertaking the Task of Reforming the American Character Evidence Prohibition: The Importance of Getting the Experiment Off on the Right Foot, 22 Fordham Urban L.J. 285, 286-87 (1995) (drive to revise the American character evidence doctrine has been “fueled,” in part, by theory of “interactionism,” which holds that a person’s known behavior can be used to “forecast ... [his] behavior in analogous settings”); see also Susan Marlene Davies, Evidence of Character to Prove Conduct: A Reassessment of Relevancy, 27 Crim.L.Bull. 504, 516-17 (1991) (concluding, from social science literature, that even “[t]he most outspoken critic of trait theory has conceded that traits exist and that trait theorists ‘can predict many things about people at levels of confidence that are reasonable for various goals and purposes’”) (emphasis in original). These views are reflected in the “inclusionary approach” to the admission of prior-act evidence under the Federal Rules of Evidence. “Under the ‘inclusionary’ approach [of the Second Circuit] ..., ‘evidence of prior crimes, wrongs, or acts is admissible for any purpose other than to show a defendant’s criminal propensity.’ ” United States v. Lasanta, 978 F.2d 1300 (2d Cir.1992) (quoting United States v. Brennan, 798 F.2d 581, 589 (2d Cir.1986) (emphasis added), cert. denied, 490 U.S. 1022, 109 S.Ct. 1750, 104 L.Ed.2d 187 (1989)). See also United States v. Gelzer, 50 F.3d 1133, 1139 (2d Cir.1995) (“Rule 404(b) does not bar all ‘other crime’ evidence; it bars [such evidence] only to prove the propensity to commit the crime charged”). See Fed.R.Evid. 404 and notes following the Rule, West’s Federal Criminal Code and Rules 227-29 (West 1993); Margaret A. Berger et al., Evidence, ¶¶ 404[04], 404[08]-404[18] (1994). Such purposes commonly include knowledge, ability, and motive. Here, information about Shonubi’s eighth trip is evidence of knowledge (of swallowing techniques), ability (to perform those techniques), and motive (to maximize the economic benefits of each smuggling trip).

The sentencing judge has no alternative in sentencing but to try to assess the defendant’s character and danger to society based on all available information, including in-court observation. The ability to observe the defendant at trial and during argument of motions and at sentence over a period of days or weeks is useful, so long as the trier is careful not to overstate his or her limited capacity to predict the character of another person in the artificial courtroom setting. In this case, observations of the defendant supported the sentencing judge’s conclusion that, given Shonubi’s ability and character and the nature of the drug trade, he would have carried some 430 grams, or close to it, on each of his smuggling trips.

C. Knowledge of the drug trade: the trip effect

The defense has argued that, as a general proposition, amounts of heroin found on smugglers who are caught are greater than amounts usually carried. Several theories are advanced by the defense for why this might be true.

One theory is that the more a smuggler has swallowed, the more likely that smuggler is to be caught. If true, this would mean that seized quantities are greater, on average, than quantities that evade detection, and that the Customs Service data would, therefore, overstate average quantities carried. No available information supports this conclusion, nor does it comport with the sentencing judge’s observations. Neither Shonubi nor any other defendant who has come before the court has shown outward signs of internal heroin smuggling at the time he or she was arrested. There is also no reason to conclude that swallowers of large quantities tend to look more nervous than swallowers of small amounts.

A theory put forward by both the defense and the 706 Panel (see infra Part X.A.3) and worthy of more credence is that there is a learning curve—that smugglers adjust to carrying larger amounts of heroin on each successive trip. The underlying assumptions would be that swallowers carry only the amount of heroin they are comfortable carrying, and that the quantity they can comfortably carry increases over time.

It seems highly improbable that smugglers carry only the amount of heroin they are comfortable carrying. Who would be “comfortable” holding any number of heroin-filled condoms in his or her body, given the associated dangers? More likely, smugglers carry the amount their associates tell them it is possible to carry. That they have associates is clear. Even assuming Shonubi was working for himself—that is, assuming his testimony that he owned the heroin was true—he could not have operated without substantial assistance in the Nigerian heroin network. Although, according to some reports, 80 percent of the heroin seized at John F. Kennedy Airport in recent years has passed through Lagos, Nigeria, none of it was produced there. See Scott Kraft, A Gaping Gateway for Drugs: Corruptions and Lax Security Make Nigeria One of the World’s Top Transit Points for Cocaine and Heroin; Nightly, Couriers Fly off With Illegal Stashes—Many Bound for the United States, L.A. Times, Feb. 17, 1994, at Al (describing shipment of heroin from the Far East to Lagos, and then from Lagos to the United States); United States Department of Justice, Bureau of Justice Statistics, A National Report: Drugs, Crime, and the Justice System 48 (1992) (describing Nigeria’s role as “transshipment location”).

It is highly unlikely that any smuggler can prepare to cross the Atlantic Ocean with large amounts of heroin in his digestive tract without assistance. The task of swallowing involves substantial preparation. First, the heroin must be purchased from underworld connections. Second, balloons must be filled and sealed in a way that minimizes air content; otherwise, they will expand during the plane’s ascent. Third, the balloons must be cleaned before swallowing, lest the small amounts of heroin spilled on the outside enter the swallower’s system. Fourth, the swallower must ingest the balloons, which may entail coating the stomach (with an ulcer medication) and the esophagus (with jelly). Suzanne Bilello & Adam Z. Horvath, ‘It’s a Risky Business’; Drug ‘Mules’ Desperate for Cash Endanger Their Lives, Newsday, Feb. 19, 1990, at 3 (describing swallowing techniques). Fifth, the smuggler must make his way to the airport, and onto the plane, without giving himself away. Since he cannot know exactly how many hours will elapse before he arrives in New York, he will want to swallow as close to boarding as practicable; this may require helpers, including, conceivably, airline and customs employees, in expediting movement through the airport. In this regard, it may be relevant that Sho-nubi was found with the name of a Nigerian customs official in his pocket. Each of these steps will require the assistance of others familiar with smuggling procedures.

A smuggler who is preparing for his or her first trip will be instructed and most likely controlled by experienced people. It is their institutional memory, skills, and practice, not the swallower’s comfort, that dictates how much he or she will carry. Cf. Nicholas Dorn, Karim Murji & Nigel South, Traffickers: Drug Markets and Law Enforcement 47 (1993) (describing hierarchical, commercially organized trafficking operations); Edwin H. Sutherland & Donald R. Cressey, Criminology 275 (1970) (describing tendency of criminals to share knowledge). If, in fact, it is difficult for a potential smuggler to swallow 103 balloons on the first try, there may be extensive practice sessions over a period of days. As one investigative reporter observed:

[d]rug bosses wrap little balls of heroin in condoms and feed 80 to 100 of them to each courier, often washed down with a thick green okra soup. Arrested couriers have told of pre-flight training sessions in which recruits were given oversized grapes to swallow. Those who gagged were rejected.

Joseph B. Treaster, Nigerian Connection Floods U.S. Airports With Asian Heroin, N.Y. Times, Feb. 15, 1992 at 1 (emphasis added); see also Bilello & Horvath, supra.

These accounts not only suggest that smugglers learn to carry large amounts before they take their first trip, but also that those who cannot quickly learn to swallow large amounts will not be useful in the drug trade. The supply of willing swallowers far outstrips demand. See, e.g., United States v. Ekwunoh, 813 F.Supp. 168, 178-79 (E.D.N.Y.1993) (describing lack of deterrent effect of U.S. drugs laws because of vast supply of willing smugglers), rev’d on other grounds, 12 F.3d 368 (2d Cir.1993); Bilello & Horvath, supra, at 3 (in Nigeria, which is “increasingly filled with people out of work and often desperate to get to America, there has been no shortage of people willing to take the risk of being caught — or killed”); Donal Conaty, Plight of the ‘Drug Mules,’ Irish Times, Nov. 25, 1992, at 13 (to mules, payment represents “a small fortune”); see also Jessica C. Vapnek, Portrait: The Ghanaian Drug Smuggler, The Accra Weekly Advertiser, Mar. 7, 1994, at 10 (describing impotence of smugglers in dealings with drug kingpins).

The defense theory that the economics of heroin smuggling create an incentive for smugglers to carry more heroin with each successive trip (which implies that, on an appreciable number of trips, they will carry less than the maximum possible amount) is inconsistent with the available data. In fact, the economics and sociology of heroin smuggling suggest an incentive to smuggle as much heroin as possible on every trip.

D. Storytelling analysis

The defendant’s story is that while earning $12,000 a year and paying for college he took eight trips to Nigeria, in a period of fifteen months, to visit his family. Unfortunately for him, there is another story that is more believable, considering all of the available data. That story is that the defendant paid or was paid for each of his eight trips by smuggling as much heroin as he could carry. This story is consistent with evidence interpreted in light of demeanor, as well as the court’s knowledge of the drug trade, its understanding of criminal behavior, and its background knowledge.

Based on all of these sources of information, the sentencing judge inferred that the defendant, who was capable of carrying some 430 grams on each trip, was encouraged, ordered, or had a strong economic incentive to do so on every trip, beginning with his first trip. Thus the court concluded that, by a preponderance of the evidence, the best estimate of the total quantity of heroin imported was 427.4 times 8.

VIII. Desirability of Further Analysis

Having conducted, although not fully described, the above analysis, the sentencing court believes that its original conclusion satisfied the preponderance standard. On remand, however, the court has elected not only to explain its rationale for the initial sentence, but to consider additional factors. It does so for at least three reasons:

First, given the operation of the Guidelines, a sentencing judge’s analysis need not produce a specific figure. For example, a finding that the defendant carried just over 1,000 grams is, for Guidelines purposes, the equivalent of a finding that he carried just under 3,000 grams. This sentencing scheme — encouraging estimation and discouraging over-quantification — suggests that the court should, whenever possible, determine the most likely range, rather than a specific number.

Second, the sentencing judge must take account of potential sources of error, including the possibility that the amount of heroin seized from Shonubi was misweighed; the possibility that he made fewer than eight trips; the possibility that some of his trips involved lesser or greater amounts; and the possibility that not every trip involved attempted or completed smuggling. Such doubts must be resolved in the defendant’s favor.

Third, the sentencing judge may take note of the possibility of heuristic pitfalls, including “availability,” which would suggest to the trier that every trip involved the amount known from the one trip that is familiar, and “anchoring,” which would lead to the persistence of early decisions. These possibilities support the court’s decision to seek additional data, to use supplementary and complementary analytical approaches, and to reconsider its original conclusions.

On remand, the government properly sought additional methods of meeting its burden of proof, and pursued one of those methods with the help of a statistician. This new evidence, as well as responses by defense and court-appointed experts, assisted the court in the process of revisiting its initial decision.

IX. The general federal rule favoring admissibility of and reliance on all helpful evidence

Before addressing the issue of the courts’ ability to rely upon the statistical data proffered on remand, it is desirable to examine the strong federal policy favoring admissibility of, and reliance on, all helpful information. This policy supports not only admissibility but openness and receptivity to developing analytical techniques. A fortiori, this inclusive approach applies to sentencing proceedings, where exclusionary rules of evidence do not apply. See, e.g., Federal Rule of Evidence 1101(d)(3).

The successive compilations and codifications of the American law of evidence in the nineteenth and twentieth centuries collectively demonstrated a rejection of rigid, mechanistic rules, which often varied by type of case, in favor of general principles of free admissibility and rational evaluation of probative force that can be flexibly applied through judicial discretion to ensure fairness in individual eases. Evidence law, through the nineteenth century, was a mix of complicated, confusing, and often contradictory common law rules developed to protect against juror prejudices and incompetence. As Professor Edmund Morgan explained in his foreword to the American Law Institute’s Model Code of Evidence in 1942:

The rules of evidence have been developed in myriads of cases, wherein the later judges have felt themselves bound by the doctrine of stare decisis to adhere to the pronouncements of their predecessors but bound also to avoid the absurdities which the simple application of these pronouncements would produce. In attempting to escape this dilemma they have engrafted qualifications, refinements and exceptions upon the earlier rules, so that the law of evidence has grown irregularly and in haphazard fashion, one rule seeming to have no relation in reason to another. If an observer confines his view to a single compartment of the subject, he may not be shocked by the sight of the creature he finds therein; it may be curious but it will probably have some semblance of unity or uniformity. If, however, he breaks down the partitions between the compartments, he is amazed that anyone should contemplate turning into a single arena such diverse and antagonistic creatures.

Edmund M. Morgan, Foreword to American Law Institute, Model Code of Evidence 5 (1942) [hereinafter Morgan].

Responding to problems with the common law, several scholars in the late nineteenth century developed influential compilations of the common law that advanced normative views as to what the law of evidence should be. Perhaps most influential among these scholars were Professors James Bradley Thayer and John Henry Wigmore, whose divergent ideologies shaped all subsequent debate on the desirable scope and contours of evidence codes.

Thayer asserted that as the law “furnishes no test of relevancy,” the goals of adjudication would be best served by recognition of a few broad principles of admissibility and exclusion of all nonrelevant evidence. Wig-more, in contrast, sought to advance a comprehensive, scientifically precise evidentiary scheme that would anticipate every possible scenario, but whose potential harshness would be mitigated by application of judicial discretion. With the Model Code of Evidence, the Uniform Rules of Evidence and finally the Federal Rules of Evidence, Thayer’s conception of evidence law as a collection of flexible, broad evidentiary principles applied liberally by trial judges possessing discretion prevailed.

A. Mechanistic rules versus flexible general principles

Justifications for an evidence code comprised of rigid, mechanistic rules, or for a other forms of analysis, derive from historical circumstances which no longer apply. Perhaps the most significant such circumstance was the desire to protect jurors from their own prejudices:

Excluding information on the ground that jurors are too ignorant or emotional to evaluate it properly may have been appropriate in England at a time when a rigid class society created a wide gap between royal judges and commoner jurors, but it is inconsistent with the realities of our modern American informed society and the responsibilities of independent thought in a working society.

1 Margaret A. Berger et al., Preface to Evidence at iii (1994) [hereinafter Berger et al., Evidence ]; see also James Bradley Thayer, Select Cases on Evidence at the Common Law 1 (1892) [hereinafter Thayer, Select Cases ] (“ ‘Reasoning, the rational method of settling disputed questions, is the modern substitute for certain formal and mechanical tests which flourished among our ancestors for centuries, and in the midst of which the trial by jury emerged.’” (quoting Thayer, “Law and Fact” in Jury Trials, 4 Harv.L.Rev. 147, 157 (1890))); Thayer, Select Cases, supra, at 3 (“ ‘[T]he rejection of what is really probative, on one or another practical ground, ... is the characteristic thing in the law of evidence, marking, as it does, the influence of the jury system which gave rise to it.’” (quoting Thayer, Presumptions and the Law of Evidence, 3 Harv.L.Rev. 141-47 (1889))); John Henry Wigmore, The Principles of Judicial Proofs (2d ed. 1931) (distinguishing “Proof’ from “Admissibility,” the latter of which is comprised of “procedural rules devised by the law, and based on litigious experience and tradition, to guard the tribunal (particularly the jury) against erroneous persuasion”).

The justifications for a more flexible approach based on a few general principles with great discretion in the trial court are, briefly stated: 1) the impossibility of foreseeing and accounting for every potential fact scenario and associated evidentiary issue; and 2) the need for sufficient judicial power to permit tailoring in individual cases to ensure fair outcomes. In rejecting an approach to codification of the rules of evidence that would have “canvass[ed] all the situations in which pertinent questions have been answered by the courts,” the drafters of the American Law Institute’s Model Code of Evidence explained that “no draftsmen or body of draftsmen, be they ever so wise and ever so ingenious, could possibly foresee all the new situations or all the variations of old situations which will be presented in the future.” Morgan, supra, at 12. In his foreword to the Model Code, Professor Morgan explained:

[I]t must never be forgotten that in the settlement of disputes in a court room, as in all other experiences of individuals in our society, the emotions of the persons involved — litigants, counsel, witnesses, judge and jurors — will play a part. A trial cannot be a purely intellectual performance.

Id. at 4.

The experience of judges in busy, metropolitan districts further supports utilization of a flexible system that turns on judicial discretion:

Those unfamiliar with the operations of a busy metropolitan federal district court cannot fully appreciate the subtle interplay of a wide variety of factors that make a trial of any length take on a personality as strongly differentiated from other trials as one human being is from another. The character and style of the attorneys, the jurors individually and collectively, the judge, and the witnesses all have their effect.

1 Berger et al., Evidence, supra, at iv. The unique factors — the individual attributes of the lawyers, jurors, and judge — that converge in a given case require that the evidence law have sufficient flexibility to ensure that fairness is not subordinated to or lost in blind adherence to rules. See id. ¶ 401(01) (“[A] mechanical formula cannot reasonably be applied because it is impossible to formulate a rule covering all the particular circumstances and factors that may arise.”).

B. Development of twentieth century conceptions of evidence codes

Differences in the works of James Bradley Thayer and John Henry Wigmore, two of the leading tum-of-the-century evidence theorists, reflect competing views of the law of evidence.

The preface to Thayer’s 1892 evidence casebook, Select Cases on Evidence at Common Law, signalled unequivocally Thayer’s belief that the future of the law of evidence lay in rejecting outdated mechanistic systems of evidence in favor of broad principles of relevancy.

‘Observe at this point one or two fundamental conceptions. There is one precept to be mentioned, which is not so much a rule of evidence as a presupposition involved in the very conception of a rational system of evidence as contrasted with the old formal and mechanical systems; viz., that nothing which is not supposed to be relevant, i.e., logically probative, shall be received. How are we to know what these things are? Not by any rule of the law. The law furnishes no test of relevancy. For this, it tacitly refers to logic, assuming that the principles of reasoning are known to its judges and ministers; just as a vast multitude of other things are assumed as already sufficiently known.’

Thayer, Select Cases, supra, at 2 (quoting Presumptions and the Law of Evidence, 3 Harv.L.Rev. 142-47); see also 1 Berger et al., Evidence, supra, ¶ 401[01] (“Thayer ... realized that ‘the law furnishes no test of relevancy’ and that the judge in making determinations must be allowed flexibility in drawing on personal experience to evaluate the probabilities on which relevancy turns.” (quoting Thayer, A Preliminary Treatise on Evidence at the Common Law 278-79 (1898))). Thayer’s second presupposition was that “ ‘unless excluded by some rule or principle of law, all that is logically probative is admissible.’ ” Thayer, Select Cases, supra, at 2 (quoting Presumptions and the Law of Evidence, 3 Harv.L.Rev. 142-47). Fundamentally, according to Thayer, “[ajdmissibility is determined, first, by relevancy, — an affair of logic and not of law; second, but only indirectly, by the law of evidence which, in strictness, only declares whether matter which is logically probative is excluded.” Id. at 3 (quoting Presumptions and the Law of Evidence, 3 Harv.L.Rev. 142-47).

So confident was Thayer in the approach he advocated that he asserted that any common law conflicting with presuppositions identified by him should be eliminated or modified:

“These rules of exclusion have had their exceptions; and so the law has come into the shape of a set of primary rules of exclusion; and then a set of exceptions to these rules.... And yet, while this is historically true, the main propositions which I have stated should, in the order of thought, be first laid down and always kept in mind. If the doing of this shall require a restatement of some material parts of the law of evidence, that, perhaps, will only turn out as it should.”

Id. at 3 (quoting Presumptions and the Law of Evidence, 3 Harv.L.Rev. 142-47).

John Henry Wigmore, in contrast to Thayer, sought to advance a system of precise and clear rules to govern any evidentiary issue that might arise. In pursuit of that goal, he developed an influential multi-volume treatise on the topic. Perhaps ironically, the length, intricacy and complexity of Wigmore’s treatise was cited by his opponents to support their contention that a rigid, rule-based system was unworkable. Cf. Morgan, supra, at 4—5 (“When reminded that it takes Mr. Wig-more nine volumes of text to expound the intricacies of the subject, [most lawyers] are likely to shrug their shoulders and dismiss the matter with shrewd remarks about law book publishers or uncomplimentary observations concerning law professors or legal theorists.”).

Wigmore’s perspective is evident in the terminology that he adopted to explain his own work. In 1909, he published the first edition of Wigmore’s Code of the Rules of Evidence in Trials at Law. The preface to that edition explained that it was intended “to provide the practitioner with a .handy summary of the existing rules of Evidence; and at the same time to state them in a scientific form capable of serving as a code.” John Henry Wigmore, Preface to the First Edition of Wigmore’s Code of the Rules of Evidence in Trials and at Law, reprinted in John Henry Wigmore, Wigmore’s Code of the Rules of Evidence in Trials and at Law at xi (2d ed. 1935) (emphasis added). He referred to his own work as a “book of rules,” id., a collection of “principles of scientific statement” identified by the author, id. at xn, to be mastered by hypothetical specialist attorneys Wigmore termed “masters of the rules.” Id.

Professor Wigmore recognized in his first edition of the code that blind application of a precise set of rules could potentially lead to undesirable results. He saw the solution to the problem not in the nature or number of the rules themselves, but in creating room for judicial flexibility in their application:

A closing word about ‘technicalities,’ which nowadays are a subject of reformatory consideration. Evidence ought to have rules knowable before trial; and those rules ought to be fairly precise. That much may easily be conceded by all. But a rule need not be inherently a steel-clad formula. The evil nowadays is that nevertheless we treat it so, — a defect due in part to our traditional attitude towards statutes, and in part to our modern American attitude towards judges. A formulated rule tends unwholesomely to be the judge’s tyrannous master, not his ministrant tool. What the system of Evidence needs is, not so much another set of rules, or fewer rules, as a judicial flexibility of rules.

Id. at xiii.

With the publication in 1942 of the third edition of Wigmore’s Code of the Rules of Evidence in Trial at Law, Wigmore asserted his continuing antagonism to generalities and abstractions in the law of evidence, and his belief that concrete and precise rules were required. He identified six “Postulates” to control the formulation of his Code. Significantly, the Sixth Postulate, entitled “Details,” stated:

This Code, aiming as it does to serve as a practical guide in trials, must not be content with abstractions, but must specifically deal with all the concrete rules, exemplifying the application of an abstraction, that have been ruled upon or enacted in a majority of jurisdictions; the Code specifically either repudiating or affirming these rules.
If the objection be made that the law of Evidence should no longer remain a network of petty detailed rules, the answer is, first, that both Bench and Bar need concrete guidance in order that a normal routine be ordinarily followed for speedy dispatch without discussion; secondly, that the Bar needs them in order to prepare evidence for trial along normal expected lines; and thirdly, that the really effective way to eliminate the present frequent overemphasis on detailed concrete rules is to provide that they shall be only guides, not chains — directory, not mandatory — by forbidding the review of the Trial Court’s application of them except in extreme instances.

John Henry Wigmore, Wigmore’s Code of the Rules of Evidence in Trials at Law, at xii (3d ed. 1942) (quotation marks omitted). To be fair, the third edition also reaffirmed Wig-more’s view that evidence rules “need not be inherently ... steel-clad,” that “[a] formulated rule tends unwholesomely to be the judge’s tyrannous master, not his ministrant tool,” and that “judiciary flexibility” was key in any evidence system. Id. at xiii. Moreover, the third edition, even as revised, remains a cache of sparkling examples that are both fascinating and instructive on the need for imagination and flexibility by lawyers and trial judges.

While the above-quoted language might suggest some openness on Professor Wig-more’s part to flexibility within the application of the rules of evidence, Wigmore’s experience in drafting the Model Code of Evidence, as described below, indicates the steadfastness of his view that the rules themselves should be precise and seek to anticipate every eventuality. His view is not without current adherents. The revisors of the rules of civil procedure and of evidence provided relatively simple guidelines which are gradually being replaced by more restrictive rules. See, e.g., Federal Rules of Evidence, Rule 412, dealing with sex crimes (added Pub.L. 95-540, § 2(a), Oct. 28, 1978, amended Pub.L. 100-690, Title VII, § 7046(a), Nov. 18, 1988); see also Edward J. Imwinkelried, Undertaking the Task of Reforming the American Character Evidence Prohibition: The Importance of Getting the Experiment Off on the Right Foot, 22 Fordham Urban L.J. 285, 302 (1995); Violent Crime Control and Law Enforcement Act of 1994, Pub.L. No. 103-322, § 320935, 108 Stat. 1796, 2135-2136 (1994) (to be codified as Federal Rules of Evidence 413, 414); but see Judicial Conference of the United States, Report of the Judicial Conference on the Admission of Character Evidence in Certain Sexual Misconduct Cases, 871 F.Supp. cxxv, cxxv (Feb. 9, 1995) (“urg[ing] Congress” to reject the new rules); James Joseph Duane, The New Federal Rules of Evidence on Prior Acts of Accused Offenders: A Poorly Drafted Version of a Very Bad Idea, 157 F.R.D. 95 (1994); James S. Liebman, Proposed Evidence Rules 413 to 415—Some Problems and Recommendations, 20 U.Dayton L.Rev. 753 (1995) (suggesting alternatives to proposed rules).

C. Model Code of Evidence

In the American Law Institute’s view, “what was needed was a thorough revision of existing law.” ALI, Introduction to Model Code of Evidence, supra, at viii (1942). According to Edmund Morgan’s foreword to the Code, the method finally adopted as the “Theory of Drafting the Code” was a compromise, leaning in Thayer’s direction:

to draw a series of rules in general terms covering the larger divisions and subdivisions of the subject without attempting to frame rules of thumb for specific situations and to make the trial judge’s rulings reviewable for abuse of discretion. This leaves to the trial judge much room for the exercise of a sound judgment; it does not hamper him with detailed restrictions, and tends to discourage useless appeals. In short ... the choice is between a cata-logue, a creed, and a Code. The Institute decided in favor of Code.

Id. at 13.

The drafters of the Model Code explicitly acknowledged the debt that they owed to Professor Thayer’s theory of trial court discretion.

How much weight shall be given to evidence is a question of judgment incapable of a priori determination. The Anglo-American law has, as Professor Thayer has said, no mandate to the logical faculty. It does, in a few instances, refuse to receive one class of evidence where an obviously better one is available; but it always concerns itself with admissibility as distinguished from weight. In any rational inquiry, logically irrelevant evidence is excluded, and in judicial inquiries some logically relevant material may be either absolutely or conditionally rejected. A code of evidence should concern itself primarily with admissibility, and in this respect it should be complete in itself. Consequently it should begin with a sweeping declaration that all relevant evidence is admissible.... Then it should set up specific exceptions to this fundamental rule. The Code follows this plan.

Id at 11.

Explaining the desirability of flexibility in the Code, Morgan’s foreword rejected, though not by name, Wigmore’s view that “there is, and there must be, a probative science — the principles of proof — independent of the artificial rules of procedure.” See Wigmore, Principles of Judicial Proof, supra, at 3. Morgan explained:

Thoughtful lawyers realize that a lawsuit is not, and cannot be made, a scientific investigation for the discovery of truth. The matter to be investigated is determined by the parties. They may eliminate many elements which a scientist would insist upon considering. The court has no machinery for discovering sources of information unknown to the parties or undisclosed by them.... If the data leave the mind of the trier in equilibrium, the decision must be against the party having the burden of persuasion. No scientist would think of basing a conclusion upon such data so presented. The court is not a scientific body.... [SJociety and the litigants must be content with a rather rough approximation of what a scientist might demand.

Morgan, supra, at 3-4.

While scientific investigation of truth might be impossible, Morgan explained that a “rational” investigation of truth was within reach, and that it relied on liberal rules of admissibility:

All this is not to say that the rules for conducting the investigation of the facts cannot be, or need not be, rational. Quite the contrary. In such a setting it is especially important that artificial barriers to logically persuasive data be removed. With such a tribunal acting in such circumstances the exercise of superlative psychological powers is not to be expected. Rules calling for nice intellectual discrimi-nations and unusual intellectual and emotional controls are impossible of application. Speaking generally, the tribunal should hear .and consider those data which reasonable men confronted with the necessity of acting in a matter of like importance in their everyday life would use in making up their minds what to do.

Id. at 4.

The ALI’s Model Rules were further simplified, generalized, and stripped down by the Commissioners on Uniform State Laws (1954). The Uniform Rules — adopted in New Jersey and Kansas—were the prototype for the work of the drafters of the Federal Rules of Evidence. 1 Berger et al., Evidence, supra, at vi (1994). The Federal Rules of Evidence then provided the pattern for the rules of evidence adopted by the majority of states. See e.g., id. ¶¶ 101[03], 101[04].

D. Federal Rules of Evidence

The Federal Rules of Evidence “provide[d] a new basis for growth by sweeping away much of the case-law encrustations, paving the way for innovations responsive to new forces and events.” 1 Berger et al., Evidence, supra, at v; cf Daubert v. Merrell Dow Pharmaceuticals, — U.S. -, -, 113 S.Ct. 2786, 2794, 125 L.Ed.2d 469 (1993) (identifying ways in which the Federal Rules’ broad approach to admissibility supersedes common law doctrines).

Federal Rule of Evidence 401, which defines “relevant evidence,” is the primary example of Thayer’s ongoing influence in this respect. Rule 401 defines relevant evidence as “evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.” One evidence treatise explains that “the drafters of Rule 401 endorsed Thayer’s view ... which recognizes that relevancy issues are sui generis and must be decided on a case by ease basis.” 1 Berger et al., Evidence, supra, ¶401[10]. A comparative analysis of Rule 401 and its predecessors demonstrates the drafters’ conscious efforts to eschew mechanical tests of relevancy in favor of an open-ended approach:

Rule 401 is silent as to what factors the court must consider in determining whether the requisite standard of probability is met. This silence emphasizes that the Federal Rules reject Wigmore’s theory that legal precedents determine relevancy, in favor of Thayer’s view that “the law furnishes no test of relevancy.” Both the Model Code of Evidence and the Uniform Rule[, predecessors to the Federal Rules,] use the phrase, ‘evidence having any tendency in reason’ in defining relevant evidence. The Committee eliminated this succinct phrase as “perhaps emphasizing unduly the logical process and ignoring the need to draw upon experience or science to validate the general principle upon which relevancy in a particular situation depends.”
In an earlier internal working draft of Rule 401, relevant evidence was defined in terms of “evidence having any tendency on the basis of logic and experience or technical or other specialized knowledge....” The elimination of any reference to a basis for determining relevance underscores the conclusion that, in the final analysis, relevancy depends on the individual trial judge’s evaluation of probability in the individual case.

1 id. ¶ 401[08] (citations omitted); id at ¶ 401[01] (quoting Thayer, A Preliminary Treatise on Evidence at the Common Law 278-79 (1898)) (“Thayer ... realized that ‘the law furnishes no test of relevancy and that the judge in making determinations must be allowed flexibility in drawing on personal experience to evaluate the probabilities on which relevancy turns. Rule 401, by furnishing no standards for the determination of relevancy, implicitly recognizes that questions of relevancy cannot be resolved by mechanical resort to legal formulae.”); see also Edmund M. Morgan et al., Basic Problems of State and Federal Evidence 167 (5th ed. 1976); Margaret A. Berger et al., Basic Rules of Relevancy in the Proposed Federal Rules of Evidence, 4 Ga.L.Rev. 43, 60-61 (1969) [hereinafter Berger et al., Basic Rules ]; James William Moore, Federal Practice ¶ 43.02(3) (2d ed. 1968), quoted in Berger et al., Basic Rules, supra, at 70 (explaining “the rationale for considering ‘relevancy and materiality as the only universal rules of evidence’”).

Federal Rule of Evidence 402 adopts Thayer’s formulation. As described above, Thayer identified two broad presuppositions that underlie all evidence law: 1) “ ‘nothing which is not supposed to be relevant, ie., logically probative, shall be received’ ”; and 2) “ ‘unless excluded by some rule or principle of law, all that is logically probative is admissible.’” Thayer, Select Cases, supra, at 2 (quoting Presumptions and the Law of Evidence, 3 Harv.L.Rev. 142-47); see also Berger et al., Basic Rules, supra, at 69. Rule 402 adopts these presuppositions:

All relevant evidence is admissible, except as otherwise provided by the Constitution of the United States, by Act of Congress, by these rules, or by other rules prescribed by the Supreme Court pursuant to statutory authority. Evidence which is not relevant is not admissible.

Fed.R.Evid. 402. See generally Berger et al., Basic Rules, at 69-70.

Rule 403 is another example of how the Federal Rules of Evidence sought to build flexibility into the evidentiary scheme. It is the mechanism by which the significant discretionary role of the trial judge in admitting evidence is recognized and given effect. The Rule provides that otherwise probative evidence may be excluded where there is “danger of unfair prejudice [or] confusion of the issues.” Fed.R.Evid. 403. The language of earlier versions of Rule 403 was carefully drafted to respond to concerns of those who “believe[d] mechanical nondiscretionary rules of evidence can be drafted,” and it therefore distinguished between “mandatory” and “discretionary” exclusions. Berger et al., Basic Rules, supra, at 83. See generally id. at 79-82. The version that was finally enacted did not make this distinction, preferring to leave such balancing entirely to judicial discretion. See generally 1 Berger et al., Evidence, supra, at ¶ 403(02). Rule 403 enables judges to avoid the unfairness that would result from “mechanical applications of legal formulae”:

In applying these rules, the trial judge wields substantial power. Although the sphere in which he may properly act is not boundless, it is necessarily broad and incapable of exact delineation. Our system of justice recognizes uniqueness of circumstances, as well as uniqueness of the individual. To avoid the harsh, unjust results that might flow from mechanical applications of legal formulae, judges must be granted discretion: that is, they must be permitted flexibility in dealing with situations involving numerous special factors.

Berger et al., Basic Rules, supra, at 109.

The Federal Rules of Evidence do provide more specific guidance with respect to some evidentiary issues that recur so frequently that “one can generally assume that a certain item of evidence will be considered relevant.” 1 Berger et al., Evidence, supra, ¶ 401[03]. “[I]t is ... true, as Wigmore stressed, that some types or categories of facts appear on a recurring basis in the case law to prove the same or similar issues. The Federal Rules of Evidence single out a number of areas for specific evidentiary treatment in Rules 404 through 412.” 1 id. ¶ 401[10]. These provisions represent an exception to the general principle that each case is so unique that evidence rules must be sufficiently flexible to ensure just outcomes in individual cases.

E. Recent developments

The Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals, — U.S. -, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993), is a recent illustration of the continuing preference for broad general principles favoring admissibility and openness to new scientific information and analytic methods. Daubert rejected the Frye “general acceptance” test for admissibility of scientific evidence, which predated the Federal Rules by “half a century,” as an example of common law superseded by the Federal Rules of Evidence. See Daubert, — U.S. at -, 113 S.Ct. at 2794. The Court emphasized instead the role of the trial judge in “ensuring that an expert’s testimony both rests on a reliable foundation and is relevant to the task at hand.” — U.S. at -, 113 S.Ct. at 2799. See generally The Effect of Daubert on the Work of Federal Trial Judges, Shepard’s Expert & Sci. Evidence Q., Summer 1994, at 1; McCulloch v. H.B. Fuller Co., 61 F.3d 1038, 1042 (2d Cir.1995) (“In Daubert, the Supreme Court ... concluded] that Frye’s, rigid standard was inconsistent with the liberal thrust of the Federal Rules.”).

It is with this central discretionary role of the judge in controlling the flow of evidence in mind that the sentencing court considers the additional evidence presented on remand.

X. Additional Material Available to Sentencing Judge On Remand

A. Experts’ reports

I. Government expert

On remand the government argued that “there is a 99% probability that the total weight of heroin imported by Shonubi during the seven trips ... equalled more than 2090.2 grams.” Government’s Memorandum of Law on Re-Sentencing Issues (June 6, 1994), at 4-5 [hereinafter Government Memorandum]. It based this conclusion on an analysis conducted by Dr. David Boyum. Dr. Boyum, who holds a doctorate in public policy and a degree in applied mathematics from Harvard, lectures at Harvard’s John F. Kennedy School of Government and serves as a Research Fellow in the Department of Neurobiology of the Harvard Medical School.

Dr. Boyum obtained United States Customs Service data on 117 Nigerian heroin swallowers arrested at Kennedy Airport between September 1, 1990 and December 10, 1991 (the dates of the first and last trips shown on Shonubi’s passport). For each swallower, the data included name, age, sex, and gross weight of heroin seized (i.e., the weight of heroin plus balloons). Deducting the estimated weight of the balloons — based on DEA averages—Dr. Boyum produced a list of 117 net weights. These ranged from 42.156 grams to 1225.45 grams. Dr. Boyum distributed these numbers in 100-gram ranges (commonly known as bins). See Table 1.

Table 1
Net weight in grams per internal smuggling trip known to DEA agents at Kennedy Airport (September 1, 1990 to December 10, 1991)
Net weight In grams Number of occurrences
0-100 1
100-200 7
200-300 13
800-400 32
400-500 31
500-600 21
600-700 6
700-800 1
800-900 2
900-1000 2
1000-1100 0
1100-1200 0
1200-1300 1

Chart A presents these data in the form of a bar graph.

CHART A

Over 99 percent of the seizures involved 100 grams of heroin or more. The mean net weight was 432.1 grams and the median 414.5. The government noted that “Shonu-bi’s importation of 427.4 grams of [a heroin mixture] on December 10, 1991, falls squarely within the mean/median range of the data.” Government Memorandum at 7.

Dr. Boyum then set out to see what, if anything, he could learn about the total weight of heroin imported by Shonubi on his first seven trips. He utilized a computer program to “simulate” 100,000 possible sets of seven trips. The computer was first “fed” a list of the 117 net weights. It was then instructed to choose seven of those numbers at random, and to calculate their sum. This produced a total weight for a simulated seven-trip series. The computer was instructed to perform this operation 100,000 times. The “output” consisted of 100,000 seven-trip totals. Graphed, these figures produced a bell-shaped distribution shown by Chart B.

CHART B

Explaining Dr. Boyum’s use of a computer program, the government stated that “[w]ith the simulations of 100,000 trips, ... the ‘law of averages’ insures that the influence of chance is negligible.” Government Memorandum at 16-17. The government went on to define the law of averages as

an axiom in statistics that states that the more times one repeats an event that has a random outcome, the closer one gets statistically to the true probability distribution of the outcome.

Id. at 17 n. 6. For example, the more times one flips an evenly-weighted coin, the more likely it is that the cumulative results will approach 50-50.

From the generated data, Dr. Boyum reached the following conclusions:

According to the generated distributions, there is a 99% chance that Shonubi carried at least 2090.2 grams of heroin on the seven trips combined; there is a 95% chance that he imported more than 2341.4 grams; a 75% chance that he carried more than 2712.6 grams; and a 55% chance that he carried more than 3039.3 grams.

Government Memorandum at 8.

Had the government chosen to rely on a bare preponderance standard (i.e., 50 percent-plus probability), it could have argued that over 3,000 grams were smuggled on the seven trips. See id. (55 percent probability that Shonubi carried more than 3039.3 grams on the seven trips). With the 430 grams seized from Shonubi, this would have resulted in a total close to that found by the court at the initial sentence. The Government was, however, content to advance the more conservative view that approximately 2,500 grams (2090 + 430) were smuggled.

In sum, the government argued, “reliance on [Dr.] Boyum’s statistical analysis, which is based upon relevant data ..., meets [the government’s] burden of proof [as to 2,500 grams].... ” Government Memorandum at 18.

2. Defense expert

To evaluate Dr. Boyum’s study, the defense retained Professor Michael Finkelstein, a distinguished attorney and law professor who is prominent for his work in applying statistical analysis to legal problems. Professor Finkelstein has taught at Harvard, Yale, and Columbia Law Schools, lectured widely, served on the National Academy of Sciences’ Committee on Statistical Assessments as Evidence in Courts, and written standard works on applications of statistics to the law. Professor Finkelstein reviewed Dr. Boyum’s analysis. He proffered two objections.

First, according to Professor Finkelstein, Dr. Boyum’s study dealt entirely with the range of quantities carried by different smugglers (“interperson variation”) but did not account for the range of quantities carried by one smuggler during his or her smuggling career (“intertrip variation”). Such variation could occur, according to Professor Finkelstein, for a variety of reasons. “For example, ... one might reasonably believe that [Shonubi began his smuggling career] with smaller amounts and increased the amount carried as he became more practiced. ... Or trips on which less is carried may more frequently escape detection.” Affidavit of Michael O. Finkelstein at 4 (Nov. 4, 1994). According to Professor Finkelstein, an analysis of intertrip variation was required not only by appropriate statistical practice, but by the court of appeals’ mandate: “By refusing to accept the assumption that the amounts of heroin brought in on successive trips were the same, the Court [of Appeals] ... determined that intertrip variation cannot be ignored.” Id.

To account for this “trip effect,” Professor Finkelstein suggested using multiple regression analysis as a method of exploring the relationship between trip number and quantity carried. See, e.g., Daniel Rubinfeld, “Reference Guide on Multiple Regression,” in Reference Manual on Statistical Evidence 415-69 (1994) (describing multiple regression analysis as “a statistical tool for understanding the relationship between two or more variables’’); Barbara S. Meierhoefer, The Role of Offense and Offender Characteristics in Federal Sentencing, 66 S.Cal.L.Rev. 367, 392-94 (1992) (using regression analysis to analyze effect of race on sentencing under the Guidelines).

Professor Finkelstein noted that such an analysis could be performed using the government’s data only “if that data included the trip number at which [each] arrest was made.” Finkelstein Aff. at 4. Such data, he suggested, would enable him to develop a more accurate picture of Shonubi’s conduct over a series of trips. Id. at 5.

Professor Finkelstein’s second objection was broader. Referring to the language of the court of appeals mandate in the instant case, he wrote:

[Statistics relating to others would not usually be characterized as “specific evidence” relating to Shonubi.... [I]f one Shonubi trip is not a sufficient basis for inferring what was brought in on other Shonubi trips, then, a fortiori, trips by others could not be sufficient to make that inference_ Dr. Boyum’s analysis is not an adequate basis for estimating the total amount brought in by Shonubi in the seven trips made prior to his arrest.

Finkelstein Aff. at 3-5. Professor Finkel-stein concluded: “I do not question Dr. Bo-yum’s calculations, but only [their] application to Shonubi.” Id. at 3.

3. Rule 706 Panel

The “Panel” appointed by the court pursuant to Rule 706 consisted of Professors David Sehum and Peter Tillers. Professor Sehum, who teaches information technology at George Mason University, has published widely on probabilistic reasoning and its relationship to forensic decision-making. Professor Tillers is an evidence professor at Cardozo Law School who has lectured and taught at many universities in the United States and abroad. He has published extensively on such subjects as inferential reasoning, relevancy, and the uses of statistics in litigation. Professors Sehum and Tillers reviewed the Boyum and Finkelstein reports.

The Panel began its analysis by noting that under the Sentencing Guidelines, Shonu-bi’s conduct “falls ... within base offense level 32 ... if the aggregate weight of the heroin imported by Shonubi in his eight trips is at least 1,000 grams- Hence ... it may not matter whether the heroin imported by Shonubi weighs 1,000 grams, 1,500 grams, 2,000 grams, or 2,500 grams.” Report of Rule 706 Panel at 6 (Dec. 24,1994) [hereinafter Panel Report]. “Stated most starkly, [the issue is whether] Shonubi imported a total of only 572.6 grams of heroin (or more) in his seven drug-smuggling trips before December 10, 1991.” Id. at 7.

To the Panel, this observation was relevant to the question of how useful statistics could be to the court, since

it is intuitively obvious that the resolution of a question about the sufficiency of evidence to show the importance of a particular quantity of heroin may depend on just how much heroin must be shown to have been imported. For example, if the government were obligated to show only that Shonubi imported an aggregate of one gram of heroin in his seven drug-smuggling trips ... it is doubtful in the extreme that anyone would question the sufficiency of the evidence in the record....

Id. at 8.

a. Use of fictions

The Panel disputed the government’s claim that, “[a]ccording to [its] generated distributions, there is a 99% probability that Shonubi carried at least 2090.2 grams of heroin on the seven trips combined.” Affidavit of David A. Boyum at 5 (June 3,1994). According to the Panel,

there is no such thing as “the” probability distribution for Shonubi or for anyone else.... There is an infinity of possible probability distributions that might be applied in the present case. Each of these probability distributions rests on different sets of assumptions.

Panel Report at 25-26. Moreover, according to the Panel, since

[t]here were only 117 data points provided by the U.S. Customs Service- Bo-yum’s 100,000 cases are all fictitious.... Boyum could have generated 100,000 trillion numbers and yet have come no closer to determining “the” probability distribution regarding Shonubi.

Id. at 22 (emphasis in original).

The Panel did not rule out the use of statistical models that accord with real-world patterns:

What we see in Boyum’s study is an effort to compare the behavior of an individual (Shonubi) with the functioning of a computer in generating fictitious episodes of drug smuggling. Fictions can be useful if they are sensible and if they are used with appropriate caution.... When courts ... use ... simulation evidence they have to decide the extent to which any given simulation succeeds in capturing matters that are relevant to a determination of matters such as Shonubi’s probable past behavior.

Id. at 29-30. Dr. Boyum’s simulation, according to the Panel, “rests on various assumptions. If the assumptions make no sense, the simulation cannot represent how the events of interest [occur] in ... ‘real life.’ ” Id. at 16.

b.Problems with government’s assumptions

The Panel advanced two concerns about Dr. Boyum’s assumptions. First, in each simulation of seven trips Dr. Boyum’s computer chose the seven numbers independently. (This was done “without replacement”— the equivalent of simulating a game of poker by generating cards at random without accounting for the cards already removed from the deck.) The Panel recognized the appeal of this “independence assumption” — which can greatly simplify the statistician’s work— but concluded that the assumption was improper in this case. According to the Panel, “it is hard to believe that for Shonubi or any other balloon-swallower there is no dependence at all among the amounts ingested on successive trips.” Id. at 23 — 4. The panel noted: “[Dependencies are especially likely to be found in problems involving a sequence of similar human actions.” Id. at 24 (emphasis added); cf, e.g., Terence Anderson & William Twining, Analysis of Evidence 427 (1991) (independence so simplifies the statistician’s task that there is “a certain temptation to assume independence, even when this is not justifiable”).

One such dependency could be the trip effect, relied on by the defense expert. See supra Part X.A.2. The Panel agreed with Professor Finkelstein that “[i]t is unlikely that the man who swallowed.... [the highest of the 117 DEA figures in Table 1] was on his first trip.” Panel Report at 37. It went on to suggest that smugglers would learn to carry more on each trip: “[I]t is likely that balloon swallowing is a gastronomic ‘art’ that may improve over time.” Id. at 37.

The second type of dependence discussed by the Panel is the “interperson effect.” The Panel noted:

it may not make any sense to assume that there is complete independence from swallower to swallower.... [I]t is reasonable to assume that some of the swallowers know each other and compare notes about their experiences with their risky behavior. Perhaps there are groups of couriers who work from the same [wholesalers].

Id. at 24.

c. Comments on defendant’s report

The Panel agreed with Professor Finkel-stein that data about trip number would have been helpful. Given such data, instead of performing a regression analysis, as Professor Finkelstein proposed, the Panel suggested calculating the standard deviation for each such smuggler. From those numbers, one could calculate the average standard deviation for all 117 smugglers. This would provide a measure of intertrip variation that could, the Panel suggested, be useful in approximating Shonubi’s behavior prior to the trip that led to his arrest.

d. Non-statistical analysis

In the absence of statistics or field data, the Panel set out to learn as much as it could about balloon-swallowing techniques through its own efforts:

[Professor] Schum therefore conducted a kind of experiment. He distributed slightly less than a pound of powdered sugar in roughly equal amounts among 103 ... balloons .... It took Schum three hours to fill the 103 balloons, one at a time, using a funnel.... He began to think that at least some of the persons who swallow balloons must have developed a [method] for filling them faster.... Schum drew the [further] inference that balloon swal-lowers must have figured out that any air in the balloons will expand to some extent, even in a well-pressurized aircraft [and must therefore have developed methods for expelling air from the balloons before sealing them]. Schum tied the ends of each balloon. There may be other ways of closing balloons once they are filled. Since some spillage is inevitable, it seems likely that the balloons are washed before they are swallowed. Otherwise, the swallower might ingest a fair amount of heroin ... that had been accidentally deposited on the outside of the balloons.

Id. at 46-7.

After examining the 103 balloons and reflecting on the task of swallowing them, Professor Schum came to believe that there must be a “learning curve” involved.

[I]t seems very unlikely that anyone could swallow 103 balloons containing nearly a pound of any ... substance on his or her first try.... [There may be] an extended series of practice trials using other substances.

Id. at 48. The Panel also undertook an analysis of the likely incentives for smuggling greater quantities of heroin with each successive trip:

The evidence at trial suggests that 427.4 grams ... of 53% heroin — the ... amount ... Shonubi was carrying when he was apprehended — has a wholesale value of $44,000. Cut in purity, this represents 20,000 doses whose total street value is $200,000 (at $10 a bag).... The dollar amounts ... suggest ... that if a swallower’s payoff depends on the amount he carries on a single trip, he has an incentive to try to increase the amount of heroin he carries in his stomach. However, ... if the swallower were paid a fixed fee per trip, he or she might have less of an incentive to try to swallow more balloons on each successive trip. (The swallower bears the physical risk and discomforts (in addition to the risk of apprehension) and ... it is reasonable to assume that the risks and discomfort increase as the amounts ingested increase.) ... [T]he payoff ... is likely to be structured in a way that encourages the swallower to swallow as much heroin as possible.

Id. at 50-52.

e. Statistical analysis

Unable to obtain additional data and uncomfortable with “fictions,” the Panel attempted to provide a useful statistical model without performing simulations. It began by looking at the distribution among weight ranges in the customs service statistics. The modal — i.e., most common — weight range, it observed, was 300-400 grams; 32 of the 117 reported seizures fell within this range. Panel Report at 39. Given the likelihood, as it assumed, that these seizures were from smugglers who had made different numbers of past trips, this “modal weight range of 300^00 grams of heroin might be taken as a range of heroin weight typical across people who had taken different numbers of prior trips.” Id. Using the lower end of this weight range — to give Shonubi the “benefit of the doubt” — the Panel postulated that he carried 300 grams on each of 7 trips, for a total of (7 X 300) + 427.4 = 2527.4 grams.

There are two reasons why this is a conservative figure. First, we have taken the lowest possible value in the weight range that occurred most frequently in the 117 persons whose data form the reference class to which Shonubi can reasonably be compared. Second, this approach gives a smaller typical weight than taking either the mean, median, or mode....

Id. at 41.

The Panel went on to note that while Dr. Boyum’s class interval size — 100 grams — was arbitrary, it was also favorable to the defendant. If the interval size were larger or smaller, the Panel observed, the “conservative estimate for seven trips” would have been higher. The following table, taken from the Panel’s report, illustrates this point:

Table 3
Effect of changing class interval size
Class Interval Size Modal Weight Range (in grams) Number of Cases Conservative Estimate per Trip (in grams) Conservative Estimate for Seven trips (in grams)
100 300-100 32 300 2100
200 400-600 52 400 2800
50 350 — 400 14 350 2450
25 350-375 12 350 2450

Using the largest of the class interval sizes (400-600 grams) produces a seven-trip total of 2,800 grams. This, the Panel noted, is close to the 2991.8 grams that the sentencing court attributed to Shonubi’s first seven trips in its original opinion. Id. at 22.

f. Simulations accounting for trip effect

Finally, the Panel presented two simulations of Shonubi’s behavior allowing for the trip effect. For both simulations, the Panel imagined that on his first trip Shonubi carried only 42.156 grams. This amount is the lowest of any of the 117 seizures reported by the customs service. For his eighth and last trip, the Panel used the amount Shonubi actually carried on the trip on which he was arrested.

The Panel connected these points using two different approaches. The first (Chart C) assumes an arithmetic progression (that is, an increase from 42.156 to 427.4 in seven even steps). Under this assumption, the total quantity for eight trips is 1,930 grams. The second assumes a shallower learning curve. This simulation (Chart D) results in a total weight of 1,479 grams.

CHART C

CHART D

According to the Panel, it is not possible to say which of the projections best simulates Shonubi’s behavior. Yet, the Panel observed, “if the issue is whether Shonubi imported an aggregate amount of 1,000 grams, under either of our conjectures, he did that.” Panel Report at 59 (emphasis added). The court’s own view, based on all the evidence available, is that the learning curve is much steeper — and levels off at close to the final amount more quickly — than in either of the Panel’s projections. Yet the analysis suggested in Charts C and D deserves substantial weight.

g. Conclusion

The Panel concluded that “the uncritical acceptance of any statistical method invites us to be misled. By the same token, the rejection of all statistical evidence may leave us ill-informed when we do not have to be.” Id. at 44.

[I]t is true, on the one hand, that the 117 cases provided by the U.S. Customs Service cannot yield “specific evidence” about Shonubi. On the other hand, the Customs Service data do provide evidence about a reference class of individuals to which Sho-nubi can reasonably be said to belong.... [I]t is sometimes reasonable to make use of a particular reference class. The decision whether to do so boils down ... to the question of whether the arguments for ... the use of [that] reference class do or do not seem reasonable and whether the [required] arbitrary choices ... do or do not strike the decision maker as outlandish.

Id. at 45.

B. Survey of the Eastern District bench

In the Eastern District of New York, before adoption of the Sentencing Guidelines, three judges and the chief probation officer reviewed every presentence report and then discussed the appropriate sentence. As a result of this practice, experienced sentencing judges in this district have studied the cases of hundreds of drug swallowers, including many who brought in hundreds of grams per trip. This is shared institutional knowledge.

Most cases that come before this court involve only “mules.” The belief that “mules” are treated kindly, and are allowed to carry any less than their full “quota,” is contradicted by hundreds of eases.

The judges of this district — which once contained a great port and now contains a great international airport — have had wide experience with the importation of drugs, in all its variations. To draw upon this institutional experience, a questionnaire was sent to members of the Eastern District bench. Cf. United States v. Fatico, 458 F.Supp. 388, 409-11 (E.D.N.Y.1978) (surveying judges), aff'd, 603 F.2d 1053 (2d Cir.1979), cert. denied, 444 U.S. 1073, 100 S.Ct. 1018, 62 L.Ed.2d 755 (1980). The judges were asked to rate, “based on their trial and sentencing experience,” their confidence in five hypotheses about heroin swallowers, using a scale of 1 (unlikely) to 5 (likely). Nearly all the judges responded. Their replies, with names redacted, are on file with the court. The hypotheses, and the average “scores” received, were:

1. They start with smaller amounts and increase the amounts on later trips.
2.4
2. They start with an amount close to the maximum amount they can carry and keep carrying that amount until caught.
4.3
3. There is no relation between trip number and amount.
2.2
4. Those carrying more are more likely to be caught.
2.2
5. First-time smugglers are more likely to be caught.
3.8

Proposition 2 — which suggests that the trip effect, if it exists, is minimal — received the highest “rating.” Proposition 1 — a “strong” version of the trip effect — received a significantly lower score. These results parallel the sentencing judge’s assumption about the behavior of heroin swallowers. It is also confirmed by the testimony of the Drug Enforcement Administration expert described in Part X.C, infra.

C. Testimony on economics of heroin smuggling

Based on responses to its draft memorandum, the court asked the parties to provide information on the economics of heroin smuggling in 1990 and 1991, specifically the minimum amount that was economically feasible to carry. The letter to the parties and the experts read as follows:

Please be prepared by expert testimony or other method to answer the following question at the hearing on the resentenc-ing:
What is the minimum commercially feasible amount in grams that could be smuggled through swallowing in the period when the courts found that the defendant Charles Shonubi was smuggling, given the cost in Nigeria of heroin, and of preparation and of transportation and the sales price of wholesale amounts in New York and any other relevant factors?

Letter, April 11, 1995.

In response, the government proffered the testimony of James Glauner, a special agent with the DEA, who has been qualified as an expert on smuggling in a number of cases in this and other districts. Glauner testified that in 24 years with the DEA he has been involved in over 1,000 investigations — nearly half involving heroin — and has interviewed more than 100 heroin swallowers. He has also “debriefed” informants who had worked within heroin smuggling operations on such topics as methods, organizations, and costs. He was stationed at Kennedy Airport from 1988 to 1990.

Glauner outlined the economics of heroin transshipment through Nigeria, during the period in question, as follows: Heroin was purchased either in Southeast Asia (generally Thailand) or western Asia (generally Pakistan). From September 1990 to December 1991, the period of Shonubi’s heroin smuggling trips, the price of heroin in those countries ranged from $3,800 to $10,000 per kilogram. However, a Nigerian trafficker incurred a number of other costs in obtaining heroin for shipment to the United States. The trafficker would have to

maintain someone in [the source] country to purchase it and prepare it for shipment and obtain couriers.... [In Nigeria, the trafficker would have to] maintain a force of people who recruit couriers ... [and other] people to package it in the balloons or condoms.... [Other people] teach them how to swallow and supervise the swallowing....

In addition, Glauner testified, a trafficker would have to pay bribes in both in the country of origin and in Nigeria, to government as well as to airline officials. Such bribes averaged $10,000 per kilogram in the relevant time period, according to Glauner.

The bribes and other expenses brought the total price of obtaining and preparing a kilogram of heroin for shipment to the United States to around $20,000. During the same period, payments from purchasers in the United States upon delivery in this country averaged $60,000 per kilogram, with $65,000 the “high end.” (This is somewhat higher than the estimate adduced at trial. See supra Part III.B.) Thus the trafficker stood to earn $40,000 per kilogram, or $40 per gram, upon delivery of the heroin in New York.

According to Glauner, one Nigerian trafficker might send as many as ten or fifteen swallowers on a single flight. Each swallower would be given approximately $2,000 for airfare, and up to $3,000 in expense money. Given an average profit of $40 per gram, the swallower would need to carry 125 grams to pay his way — that is, to break even for the trip. Using the low-end wholesale price, $3,800, instead of the average, $6,000 brings the break-even point down to 118 grams. Using this low wholesale price and the high-end retail price, $65,000, further lowers the break-even point, to 106 grams. As Glauner pointed out, however, a “trafficker wants to do much more than break even. He wants to make as much profit as he can.”

On cross-examination, Glauner observed that the economics are different for every smuggler, some of whom, for example, may travel to Nigeria for purposes other than smuggling. “There are many variables,” he stated. He also agreed that not every smuggler breaks even on every trip. However, he stated: “If you don’t do the job right the first time, you’re not going to be employed very long.” That a swallower has made many trips makes clear that he was economically worth employing, according to Glauner. “Anyone who was used eight times would be a very reliable individual, who was capable of making a lot of money for the traffickers and bringing [in] considerable amounts.” He observed that, while a trafficker'can often absorb the cost of a courier being unsuccessful, “that courier would not be reemployed.”

Glauner described how couriers are “worked”:

They’re usually taken to a hotel room or a home, where they are closely supervised when they swallow the balloons. A count is kept, so that the trafficker’s people know exactly how much is being shipped out, and someone is with them the entire time until they get on the airplane.

Glauner also estimated that of those swallow-ers who enter the United States from Nigeria, 25 percent are caught (requiring the other carriers to bring in more than a bare minimum to make up for these losses). He reported knowing of no evidence that swal-lowers who carry larger amounts are more likely to be caught, since “the only way [heroin] shows is through an X-ray.”

Asked the same questions as on the judges’ survey, see supra Part X.B., Glauner responded that swallowers tend to “start with smaller amounts ... because the balloons are very difficult to swallow ... [but] usually by about the third trip they’re up to the maximum that they can swallow, and [the amount] would level off by then.” He estimated that, on a first trip, a swallower might carry 150 to 250 grams, and that “if they couldn’t swallow close to 200 grams [on the first trip], I don’t think that they would be reemployed.” Thus, while he accepted the possibility of a learning curve, he estimated its effects on a series of trips by a successful swallower as minor.

Glauner noted that swallowers generally received $7 to $8 per gram for delivering heroin in this country in 1990 to 1991. That means that a smuggler who paid his own airfare of $2,000 would have had to carry 250 to 300 grams to break even.

It is not necessary to accept the precise figures provided by the witness. It is apparent from his testimony that there is a floor amount below which heroin smuggling does not make economic sense.

XI. Law Applicable to Statistical and Other Information Supplied After Remand

A. Admissibility of probabilistic evidence

There is no certainty. “Time is irreversible, events unique, and any reconstitution of the past at best an approximation.... [I]t is inescapable that a trier’s conclusions be based on probabilities.” John Maguire, James H. Chadbourn, John H. Mansfield, et al., Cases and Materials on Evidence 1 (6th ed. 1973). As Jeremy Bentham observed:

Certainty, absolute certainty, is a satisfaction which on every ground of inquiry we are continually grasping at, but which the inexorable nature of things has placed forever out of reach. Practical certainty, a degree of assurance sufficient for practice, is a blessing, the attainment of which, as often as it lies in our way to attain it, may be sufficient to console us under the want of any such superfluous and unattainable acquisitions.

5 Jeremy Bentham, Rationale of Judicial Evidence 351 (J.S. Mill ed. 1827).

As already noted, Rules 401 and 402 of the Federal Rules of Evidence provide for the admissibility of any evidence which can logically influence the trier’s assessment of the probability of a material fact. From these rules “one might infer that the court wishes and expects to have its judgments about facts at issue ... expressed in terms of probabilities.” The Evolving Role of Statistical Assessments as Evidence in the Courts 193 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989) [hereinafter Evolving Role ].

The value of statistical evidence has been recognized, at least in theory, for several hundred years. See Peter Tillers, Intellectual History, Probability, and the Law of Evidence, 91 Mich.L.Rev. 1465, 1473-74 (1991) (reviewing Barbara J. Shapiro, “Beyond Reasonable Doubt” and “Probable Cause”: Historical Perspectives on the Anglo-American Law of Evidence (1991)) (describing 17th-19th century mathematical approaches to problems of proof); The Evolving Role of Statistical Assessments as Evidence in the Courts 211-18 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989) (historical survey of use of statistical evidence in the courts).

In 19th century England, debate flourished over the uses of probability in the courtroom. Jeremy Bentham proposed that witnesses and judges numerically estimate degrees of persuasion. He envisioned a “moral thermometer,” using a “decigrade scale”; a juror would say, for instance, “my persuasion is at ... nine.” 1 Jeremy Bentham, Rationale of Judicial Evidence 75 (J.S. Mill ed., 1827); see also William Twining, Theories of Evidence: Bentham & Wigmore 56-60 (1985) (describing Bentham’s “thermometer of persuasion”). Responding to Bentham, and presaging views of a few modern doubters, Thomas Starkie wrote: “The notions of those who have supposed that mere moral probabilities or relations could ever be represented by numbers ... and thus be subjected to arithmetical analysis, cannot but be regarded as visionary and chimerical.” Thomas Starkie, A Practical Treatise on the Law of Evi dence 752 (London, J. & W.T. Clarke ed. 1883), quoted in Evolving Role, supra, at 212. The Federal courts, as indicated above, have wisely rejected such limiting views, while taking appropriate steps to limit the risks of misuse. See, e.g., John H. Mansfield, Norman Abrams, Margaret A. Berger, et al., Cases and Materials on Evidence 47-73 (8th ed., 1988); William Fairley & Frederick Moesteller, A Conversation About Collins, 41 U.Chi.L.Rev. 242 (1974); Robert P. Charrow & Robert L. Smith, A Conversation About “A Conversation About Collins”, 64 Geo.L.J. 669 (1976).

Today, complex statistical evidence is being introduced in American courts in a wide variety of both civil and criminal matters. Evolving Role, supra, at 7-9; see also Cimino v. Raymark Indus., 751 F.Supp. 649, 661 (E.D.Tex.1990) (“Acceptance of statistical evidence is now commonplace in the courts.”); id. (listing wide variety of cases in which statistical evidence has been accepted); David W. Barnes & John M. Conley, Statistical Evidence in Litigation 13 (1986) (“There is at present virtually no area of the law in which properly conceived and executed statistical proof cannot be admitted.”). As one measure of the rise of statistical proof in litigation, a LEXIS search of district court opinions using the words “statistic,” “statistics,” or “statistical” turned up 608 examples in the years 1960 to 1969; 2,786 cases from 1970 to 1979; 4,364 cases from 1980 to 1989; and 3,015 from 1990 thru July 31, 1995. A similar search on Westlaw turned up nearly identical numbers. In fact, with the rise of “public law” litigation involving discrimination and mass torts, statistical and epidemiological evidence have become essential to legal fact-finding. At the same time, the adoption of the Federal Rules of Evidence permits judges to exercise broad discretion in admitting useful statistical evidence. See, e.g., Fed.R.Evid. 102, 401, 402, 403, 702-706, 803(18); see also supra Part IX.

A few commentators have argued that triers need to curtail evidentiary uses of statistics. See Laurence Tribe, Trial By Mathematics: Precision and Ritual in the Legal Process, 84 Harv.L.Rev. 1329 (1971); Charles Nesson, The Evidence or the Event? On Judicial Proof and the Accessibility of Verdicts, 98 Harv.L.Rev. 1357 (1985). This thesis has been rejected both by judges and academicians. As Professor Rosenberg has pointed out, the exclusion of probabilistic evidence is impossible, because all evidence is probabilistic:

The entire notion that “particularistic” evidence differs in some significant qualitative way from statistical evidence must be questioned. The concept of “particularistic” evidence suggests that there exists a form of proof that can provide direct and actual knowledge of [the parties’ conduct]. “Particularistic” evidence, however, is in fact no less probabilistic than is the statistical evidence that courts purport to shun.... “Particularistic” evidence offers nothing more than a basis for conclusions about a perceived balance of probabilities.

David Rosenberg, The Causal Connection in Mass Exposure Cases: A “Public Law” Vision of the Tort System, 97 Harv.L.Rev. 849, 870 (1984) (footnotes omitted), quoted in In re “Agent Orange” Prod. Liab. Litig., 597 F.Supp. 740, 835-36 (E.D.N.Y.1984); id., 597 F.Supp. at 836 (even with seemingly non-probabilistic evidence, “issues of credibility and varying inferences drawn by the trier based upon varying assessment of probative force may cause reasonable people to assess these percentages in a range”). See also, e.g., Michael J. Saks & Robert F. Kidd, Human Information Processing and Adjudication: Trial by Heuristics, 15 Law & Soc’y Rev. 123, 151 (1980-1981) (“Much of the testimony that is commonly thought of as particularistic only seems so. It is far more probabilistic than we normally allow jurors (or judges) to realize”) (citations omitted); cf. The Evolving Role of Statistical Assessments as Evidence in the Courts 78-79 (Report of the Panel on Statistical Assessments as Evidence in the Courts) (Stephen E. Fienberg ed., 1989) (noting the contradiction between the court’s insistence on evidence that seems certain, and such “probabilistic” institutions as plea bargaining, in which decisions are made on the basis of “probable” outcome).

Nevertheless, some scholars fear that the seeming precision of numerical evidence tends to overshadow evidence not expressed in quantitative form. See, e.g., Laurence Tribe, Trial by Mathematics, 84 Harv.L.Rev. 1329, 1330 n. 2 (1971) (arguing for exclusion of statistics despite acknowledgement that “all factual evidence is ultimately statistical, and all legal proof ultimately probabilistic”) (emphasis in original). These scholars argue that

[t]he apparent precision of statistical evidence often stands in marked contrast to the uncertainties of other testimony.... The danger is that such evidence will overshadow equally probative but admittedly unscientific and anecdotal nonstatistical evidence.

Evolving Role, supra, at 150 (citations omitted).

This limited view of the intellectual powers of judges and jurors when properly advised by experts and counsel has been rebutted. In one of the leading analyses of how decision-makers process information, Professors Saks and Kidd noted that triers are more likely to underestimate than overestimate the probative force of statistical analysis and quantitative proof. They wrote:

Research demonstrates ... that people ... cannot integrate ... statistical and anecdotal evidence and consequently tend to ignore that statistical information. Intuitive, heuristic, human decision makers must dispense with certain information, and that tends strongly to be the quantitative information. While commentators’ arguments have been that the [statistical] data are inordinately persuasive, the evidence says that the reverse is true.

Saks & Kidd, supra, at 149 (emphasis in original). The Committee on Statistical Assessments as Evidence in the Courts, in a report written at the request of the National Science Foundation, reached a similar conclusion:

When statistical evidence conflicts with anecdotal evidence that bears on the same issue, highly probative statistical data may be rejected in favor of a less probative but more striking anecdotal instance.... [A]necdotal evidence is vivid and reaches us in a way that ... statistical information cannot.

Evolving Role, supra, at 153-54.

As Professors Kaye and Koehler suggest, research is not decisive on this point. See generally D.H. Kaye & Jonathan J. Koehler, Can Jurors Understand Probabilistic Evidence?, supra; see also Edward J. Imwinkelried, The Next Step After Daubert: Developing a Similarly Epistemological Approach to Ensuring the Reliability of Nonscientific Expert Testimony, 15 Cardozo L.Rev. 2271, 2286 (1994) (“[G]iven the research data currently available, it would be dishonest to make any purportedly scientific claim about the impact of scientific or nonscientific testimony on lay jurors”). But cf. Evolving Role, supra, at 154 (citing R. Nisbett & L. Ross, Human Inference: Strategies and Shortcomings of Social Judgment (1980)); R.M. Reyes et al., Judgmental Biases Resulting from Different Availabilities of Arguments, 2 J. Personality & Soc. Psych. 39 (1980); see also United States v. Starzecpyzel, 880 F.Supp. 1027, 1048-49 (S.D.N.Y.1995) (urging triers “not to overreact” to supposed dangers of scientific proof); Edward J. Imwinkelried, The Standard for Admitting Scientific Evidence: A Critique from the Perspective of Juror Psychology, 28 Vill.L.Rev. 554, 566-68 (1982) (reviewing studies showing that jurors are not overly influenced by scientific proof); Michael S. Jacobs, Testing the Assumptions Underlying the Debate About Scientific Evidence: A Closer Look at Juror “Incompetence” and Scientific “Objectivity”, 25 Conn.L.Rev. 1083 (1993) (reviewing recent studies that show jurors capable of deciding complex cases involving scientific and technical matters); Joe S. Cecil et al., Citizen Comprehension of Difficult Issues: Lessons from Civil Jury Trials, 50 Am.U.L.Rev. 727, 764 (1991) (same); Elizabeth Loftus, Psychological Aspects of Courtroom Testimony, 347 Annals of the New York Academy of Sciences 27, 34 (1980) (jurors more willing to convict on the basis of lay testimony than on high-caliber scientific proof); see also United States v. Jakobetz, 955 F.2d 786, 797 (2d Cir.) (“[A]lthough scientific and statistical evidence may seem complicated, we do not think that a jury will be so dazzled or swayed as to ignore evidence suggesting that an experiment was improperly conducted or that testing procedures have not been established.”), cert. denied, — U.S. -, 113 S.Ct. 104, 121 L.Ed.2d 63 (1992). Studies of juror behavior, while not decisive as to judges, are probably representative of professional as well as lay decision-making.

Statistical evidence must be presented with care. See Margaret A. Berger, “Evidentiary Framework,” in Federal Judicial Center, Reference Manual on Scientific Evidence 95, 97 (1994) (“‘[P]rosecutor’s fallacy’ occurs when a prosecutor presents statistical evidence to suggest that the evidence indicates the likelihood of the defendant’s guilt rather than the odds of the evidence having been found in a randomly selected sample.”); William C. Thompson & Edward L. Schumann, Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor’s Fallacy and the Defense Attorney’s Fallacy, 11 Law & Hum. Behav. 167, 181-82 (1987). Powerful tools such as DNA evidence require particular care. But to reject them is to shackle the courts in their search for the truth. See, e.g., United States v. Jakobetz, 955 F.2d 786 (2d Cir.) (approving use of DNA evidence in kidnapping trial), cert. denied, — U.S. -, 113 S.Ct. 104, 121 L.Ed.2d 63 (1992); Eric S. Lander, DNA Fingerprinting on Trial, 339 Nature 501 (1989); National Research Council of the National Academy of Sciences, DNA Technology in Forensic Science (1992); cf. Ronald J. Allen et al., An Internet Exchange, supra (discussing error requiring reversal in State v. Skipper, 228 Conn. 610, 637 A.2d 1101 (1994), but finding fault with broad negative dicta in decision). Courts which deny themselves the help of statistical tools increase the risks of incorrect conclusions. See Mirjan Damaska, “Approaches to the Evaluation of Evidence: A Comparative View,” in John Henry Merryman: A Festschrift (Berlin 1988) (critiquing “atomistic” approach to admissibility in American system).

Effective techniques for developing and presenting scientific evidence to juries do exist. They will be further developed and refined in the wake of the Supreme Court’s decision in Daubert. See, e.g., The Effect of Daubert on the Work of Federal Trial Judges, Shepard’s Expert & Sci. Evidence Q., Summer 1994, at 1. Putting aside questions of cost and availability to both sides, there is no reason to deny factfinders reliable information or analytic techniques.

B. Use of bare statistics

1. Generally

Once introduced, how much can statistics prove? “[T]he interrelationship between an opinion couched in probabilistic terms and the applicable burden of proof’ has proved to be a “perplexing problem for the courts.” Margaret A. Berger, “Evidentiary Framework,” in Federal Judicial Center, Reference Manual on Scientific Evidence 95 (1994).

Law professors properly dote on hypotheticals in which triers must decide cases on the basis of statistical evidence alone. Popular examples include the “blue bus case” (percentage of blue and white buses passing a point is only evidence of which company’s bus hit plaintiff) (see, e.g., John H. Mansfield, Norman Abrams & Margaret A. Berger, et al., Cases and Materials on Evidence, 45-73 (8th ed. 1988) (discussing Smith v. Rapid Transit Inc., 317 Mass. 469, 58 N.E.2d 754 (1945)); Charles Nesson, Agent Orange Meets the Blue Bus: Factfinding at the Frontiers of Knowledge, 66 B.U.L.Rev. 521 (1986)); the gatecrasher hypothetical (percentage of gatecrashers at rodeo is over 50 percent; can all be found liable?) (see, e.g., L. Cohen, The Probable and the Provable 77-81 (1977); Richard Lempert, Symposium, Probability and Inference in the Law of Evidence: I. Theories of Inference and Adjudication: The New Evidence Scholarship: Analyzing the Process of Proof, 66 B.U.L.Rev. 439, 454 (1986)); and the prison yard hypothetical (999 prisoners out of one thousand have rioted; can each be found guilty?) (see, e.g., Terence Anderson & William Twining, Analysis of Evidence 39-40 (1991); Daniel Shaviro, Statistical-Probability Evidence and the Appearance of Justice, 103 Harv.L.Rev. 530, 533-36 (1989)). As Prof. Green has noted, “[i]n the ensuing debate [on “naked” statistical evidence], numerous blue buses have run untold numbers of near-sighted elderly ladies off the road; hundreds of alleged gatecrashers have been collared; dozens of murderous prisoners have been brought to justice, and countless articles, books, and opinions have been written on the subject.” Eric D. Green, Symposium: Probability and Inference in the Law of Evidence: Foreword, 66 B.U.L.Rev. 377, 378 (1986) (footnotes omitted).

In at least two classes of cases, “naked”— or nearly naked — statistical evidence has proven essential. In mass torts, proof of causation often requires the use of statistically based epidemiological proof. See, e.g., Steve Gold, Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion, and Statistical Evidence, 96 Yale L.J. 376 (1986). Given that determining the causation of many diseases — particularly those with latent effects and no “signature” relationship- — is extremely difficult, plaintiffs in many mass tort cases would be unable to prove that a defendant caused an illness were it not for statistical epidemiological data. See Joseph Sanders, From Science to Evidence: The Testimony on Causation in the Bendectin Cases, 46 Stanford L.Rev. 1, 14-18 (1993) (discussing use of statistics to prove causation in mass torts). In the Agent Orange litigation, the court noted two possible responses to this problem:

Under the “strong” version of the preponderance rule, statistical correlations alone indicating that the probability of causation exceeds fifty percent are insufficient; some “particularistic” or anecdotal evidence, that is, “proof that can provide direct and actual knowledge of the causal relationship between the defendant’s tortious conduct and the plaintiffs injury is required.” The “weak” version of the preponderance rule would allow a verdict solely on statistical evidence....

In re “Agent Orange” Prod. Liability Litig., 597 F.Supp. 740, 835 (E.D.N.Y.1984) (citations omitted). The court went on to explain its decision to reject the strong version of the preponderance rule in mass exposure eases:

[Wjhere the chance that there would be particularistic evidence would be quite small, the consequence of retaining the requirement might be to allow defendants who, it is virtually certain, have injured thousands of people and caused billions of dollars in damages, to escape liability.

Id. at 836. The court concluded: “Except where it appears that the absence of anecdotal evidence may be due to spoliation, probabilities based upon quantitative analysis should support a recovery.” Id. Thus, in mass tort cases the decision to rely on “naked” statistical proof if that is all that can be presented is consistent with the goal of providing the most justice for the most people. See, e.g., Deborah Hensler, Resolving Mass Toxic Torts: Myths and Realities, 1989 U.Ill.L.Rev. 89, 90 (concluding that ag-gregative procedures provide the best possible match between victims’ losses and compensation).

“Naked” statistical evidence has also been decisive in discrimination cases. For example, under Title VII of the Civil Rights Act of 1964, a plaintiff can prove employment discrimination by introducing statistical data showing that the defendant’s hiring practices had a racially disparate impact. See Griggs v. Duke Power Co., 401 U.S. 424, 429-30, 91 S.Ct. 849, 852-53, 28 L.Ed.2d 158 (1971). But see Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 660, 109 S.Ct. 2115, 2126-27, 104 L.Ed.2d 733 (1989) (restricting use of disparate-impact studies in Title VII cases); see also, e.g., Castaneda v. Partida, 430 U.S. 482, 97 S.Ct. 1272, 51 L.Ed.2d 498 (1977) (statistical evidence drawn from census data and grand jury records can establish a prima facie case of discrimination in grand jury selection); Machetti v. Linahan, 679 F.2d 236 (11th Cir.1982) (statistical evidence of disparity between percentage of females in adult population and percentage of females on jury lists sufficient to prove discrimination), cert. denied, 459 U.S. 1127, 103 S.Ct. 763, 74 L.Ed.2d 978 (1983); Ramona L. Paetzold, The Statistics of Discrimination: Using Statistical Evidence in Discrimination Cases (1994).

In other types of litigation, “pirre” statistical cases rarely arise. Statistical evidence is almost always evaluated in the light of non-statistical proof. See In re “Agent Orange” Prod. Liability Litig., 597 F.Supp. 740, 836 (E.D.N.Y.1984) (in cases other than those involving mass torts, anecdotal “evidence is almost always available”). Courts expect parties to proffer anecdotal as well as statistical evidence. As Professor Berger has pointed out, “failure by experts to consider [individual case histories] could lead a court to conclude that the proffered opinion failed to satisfy Rule 703 [requiring a proper basis for expert testimony].” Berger, “Evidentiary Framework,” supra, at 96; cf. Richard Lempert, The New Evidence Scholarship: Analyzing the Process of Proof, 66 B.U.L.Rev. 439, 450-62 (1986) (discussing spoliation inference in litigation). Thus, the issue of “naked” statistical evidence is more hypothetical than real.

2. Criminal cases

Several commentators have expressed particular concern about the use of explicitly probabilistic evidence in criminal eases. See, e.g., Ronald Dworkin, Taking Rights Seriously 13 (1977); Andrew von Hirsch, Prediction of Criminal Conduct and Preventive Confinement of Convicted Persons, 21 Buff.L.Rev. 717, 744-50 (1972), cited in Barbara D. Underwood, Law and the Crystal Ball: Predicting Behavior with Statistical Inference and Individualized Judgement, 88 Yale L.J. 1409, 1412 (1979); Saks & Kidd, supra, at 152; Tribe, supra; Nesson, supra; L. Jonathan Cohen, Subjective Probability and the Paradox of the Gatecrasher, 1918 Ariz.St.L.J. 627, 632 (rejecting use of statistics in criminal cases); Alex Stein, On the Unbearable Lightness of “Weight” and the Refoundation of Evidence Law 48-49 (forthcoming 1995, on file in the instant case) (arguing that the problem with “naked” statistical evidence in criminal cases is not that it is unreliable, but that its “weight” is insufficient to support conviction).

The better view is that no special rule of exclusion is required in criminal cases. In criminal as in civil cases, factfinders need all available information. Significantly, the Federal Rules of Evidence do not distinguish between civil and criminal cases in their pertinent provisions. See, e.g., Rules 401 to 403, Rule 1101, and Article VII.

Protection of defendants in criminal cases warrants special concern, but burdens of proof and existing rules of evidence, as well as constitutional and statutory protections, rather than exclusions of highly probative evidence that happens to be in statistical form, are the best means of avoiding injustice. See, e.g., Daniel Shaviro, Statistical-Probability Evidence and the Appearance of Justice, 103 Harv.L.Rev. 530, 538 (1989) (possibility of unwarranted conviction “suggests raising the burden of proof for all cases. It does not support a special rule for statistical probability cases.”). Rather than excluding statistics, courts should provide for defense and court-appointed experts to ensure that statistics, when available, are properly used. These devices were utilized in this case.

Even were attempts to impose blanket exclusion of statistical evidence in criminal cases not contrary to Rules 401, 402, 403, and 1101, which encourage use of all available probative evidence, the law could not afford to exclude highly probative statistical evidence and useful quantitative methods. Courts ignore whole categories of evidence only at their peril. Thus the court of appeals for this circuit has held that “doubts about whether an expert’s testimony will be useful should generally be resolved in favor of admissibility unless there are strong factors such as time or surprise favoring exclusion.” United States v. Jakobetz, 955 F.2d 786, 797 (2d Cir.) (citation omitted), cert. denied, — U.S. -, 113 S.Ct. 104, 121 L.Ed.2d 63 (1992).

XII. Application of Law to Facts After Remand

The government’s analysis is enticingly simple: using a short computer program and real data about real smugglers, it simulated 100,000 possible seven-trip series. The resulting statistics were impressive.

Nonetheless, in contending, based on the simulations alone, that there is a 99 percent chance that Shonubi carried more than 2090.2 grams of heroin on his first seven trips, the government overstated its case. While Dr. Boyum’s analysis is useful, it is not decisive for the reasons suggested by the other experts.

By contrast, the defense expert, Professor Michael Finkelstein, seems to have overstated the case against statistics. He argues that no statistical analysis would constitute “specific evidence” of Shonubi’s behavior. Since he finds the data imperfect, he gives them no weight whatsoever.

The 706 Panel took a middle view. Although asserting that Dr. Boyum’s analysis alone cannot meet the government’s burden, it did not end its inquiry there. It acknowledged the interdependence of statistical and non-statistical analyses, declaring: “[T]here is no intrinsic or inherent incompatibility between statistical methods of inference or argument and other methods of inference or argument.” Panel Report at 6. It determined that the numbers make it highly likely that Shonubi imported more than 1,000 grams of heroin, enough to constitute a level 32 offense. It also pointed out, correctly, that the available statistics could explain, or confirm, decisions reached without statistics. Some statistical tools, the Panel noted, are not evidence, but ways people “organiz[e] their thoughts about evidence.” Id. These observations comport with the courts’ understanding of the proper uses of the available statistics in the instant case.

Professors Finkelstein, Sehum, and Tillers make powerful and persuasive arguments for treating the statistical analysis of Dr. Boyum as less than conclusive. Certainly, the additional analysis they suggest, linking quantities to trip numbers, would be useful. It is doubtful, however, whether such data can ever be obtained. Because foreign nationals engaged in crime routinely use false passports, available documents do not reliably reveal the number of trips. Nor are smugglers likely to provide these data because revealing prior criminal conduct could a) lead to increased penalties and b) jeopardize family members. As a result, there is no available means of determining with reasonable certainty what trip number is represented in any significant number of the 117 seizures.

That is not to say that the government’s statistical data are unhelpful. A powerful gestalt impression supporting the prosecution’s view is provided by Chart A. Consistent with human variations in swallowing ea-pacity, and differences in the skills of smuggling organizations, some limited variation in the amounts smuggled are to be expected. Even keeping in mind the other experts’ important observations on the role of statistical methodology, including placement of the bins in determining the shape of the curve, the form of the curve strongly suggests one underlying explanation for the distribution of the DEA data: the desire to maximize the amount of heroin carried on every trip. All of the evidence and institutional experience of the court points to the same explanation. Thus, even though the government’s statistical data are less complete than we might like, they support and confirm the conclusions that would be reached without statistics.

A conclusion is only as sound as the assumptions on which it rests, yet “[statistical assumptions ... are almost always open to plausible attack.” United States v. Jessup, 757 F.2d 378, 386 (1st Cir.1985) (Breyer, J.), Campbell v. Board of Educ., 310 F.Supp. 94, 102 (E.D.N.Y.1970). See also In re “Agent Orange" Prod. Liability Litig., 697 F.Supp. 740, 836 (E.D.N.Y.1984) (“[Statistical proof will almost never be as complete or as free from confounding factors as desirable.”). Thus, it is no surprise that there were disagreements among the experts who participated in this case. Nonetheless, each of their analyses was useful, and the court is grateful for their assistance.

A. Conclusions about experts’ reports

At the time of its initial opinion, the sentencing court did not have records of the quantities seized from 117 drug swallowers arrested at Kennedy Airport. It did, however, recall some of those cases. The DEA data provided the sentencing judge with more reliable information. Replacing recollections with more precise data can only improve decision-making.

More than 99 percent of the seizures of heroin from Nigerian swallowers, apprehended where Shonubi was apprehended during the period of his smuggling activity, involved 100 grams or more. The only seizure below 100 grams was an “outlier” that would commonly be discounted in analyzing data. See, e.g., Federal Judicial Center, Ref erence Manual on Scientific Evidence, 362 (1994) (describing statistical treatment of outliers).

These data suggest that each of Shonubi’s trips was likely to have involved 100 grams or more. It would be helpful to know more about how Shonubi compares to the smugglers in the sample. In fact, a good deal is known: It is evident from the trial and other proceedings that Shonubi is unusually brazen even for a smuggler. It is also known that he is capable of carrying 427.4 grams of heroin — which may not be true of all 117 smugglers in the DEA sample.

The government, for purposes of its analysis, “assumed that Mr. Shonubi was a typical heroin swallower.” Boyum Aff. at 6. This was a conservative assumption; Shonubi appears to have been a more-effective-than-typieal swallower.

Further, the concentration of the numbers in the 300-500 gram range confirms the sentencing judge’s conclusion regarding the trip effect — that, if it exists at all, its influence is minor. If swallowers were independent actors, carrying whatever amount they felt comfortable carrying on any given trip, there would surely be a wider range of quantities, including small amounts representing the first efforts of neophyte smugglers. The fact that there are few seizures of more than 500 grams in the Customs Service data — but many seizures just below 500 grams — suggests that there is a natural physical limit, and that most smugglers try to approach that limit as soon as they can.

This information strengthens the court’s findings in the instant case. To put it in Bayesian terms, it increases the “posterior” probability that Shonubi carried some 430 grams on each of his eight trips.

B. Random versus non-random sampling

In the earlier draft of this memorandum, circulated for comment, the court took the position that “[tjhere is no qualitative difference between the two extrapolations [in this case, from four balloons to 103 balloons and from one trip to eight trips], nor does any rule forbid extrapolation in the latter context — in fact, the Sentencing Guidelines demand it. The difference is entirely probabilistic.” Draft memorandum at 43 (March 20, 1995). The memorandum further stated that: “While the extrapolation from the amounts in one to eight trips may arguably rest on weaker inferences than the extrapolation of amounts from 4 to 103 balloons ... how much weaker — bearing in mind the applicable burden of proof — is a question the court must answer based on the policy of protecting defendants in criminal cases against unjust conclusions.” Id.

Professor Finkelstein properly criticized this view as overlooking

an important qualitative difference between the two “extrapolations.” The difference is that the first involves a statistical sample while the second involves an observational study. In a statistical sample the mechanism for selection is known— it is randomization. The fact of randomization justifies the statistician in assuming that the selection was unbiased because it was uncorrelated with biasing factors, and permits him or her to calculate a confidence interval for the estimate. By contrast, the data taken from Shonubi’s trip on which the arrest occurred are the result of an observational study because the trip was not selected at random from the universe of smuggling trips. As a result, we do not have the same assurance that the method of selection was not correlated with biasing factors, referred to as con-founders.
Observational studies are often used in statistical science, but estimates based on such studies must be adjusted for possible confounders. The basic principle described in another context applies here as well: “[I]n observational studies unadjusted treatment effects and adjusted treatment effects generally differ, and to guard against spurious effects adjustment is required. In randomized studies, by contrast, adjusted and unadjusted treatment effects are equal, in theory, because of the orthogonality (uncorrelatedness) of treatment allocation with confounders built in by the randomization.” Michael O. Finkelstein, Bruce Levin & Herbert Robbins, Clinical Trials With Assured New Treatment for Sicker Patients, Part I: Introduction and Theory 15 [ (forthcoming 1995) (on file in the instant ease)].
Applying that theory here, it was appropriate for the forensic chemist to select bags at random_ But it was not appropriate to extrapolate the results from the arrest trip to other trips without accounting for possible eonfounders, since the mechanism of selection was not random, but possibly correlated with the amount carried.
If trip number is a factor in apprehension — as it was in Shonubi’s case because of his failure to account convincingly for his past trips [which led to his detention at the airport] — then the arrests in the government’s data would tend to be of higher-trip-number smugglers. A trip effect would then be a confounder that could cause Dr. Boyum’s data to understate the variation in amounts carried from trip to trip by Shonubi. Under the principle stated above, adjustment for trip number would be required.

Affidavit of Michael O. Finkelstein at 1-3 (April 5, 1995) (paragraph numbers deleted).

Professor Finkelstein’s caveat is important. The trip effect as a possible eonfounder cannot be overlooked. As demonstrated in the remainder of this memorandum, however, the statistical data, viewed in conjunction with other evidence, is helpful even if the trip-number confounder renders it imperfect. Cfi Richard Lempert, DNA, Science and the Law: Two Cheers for the Ceiling Principle, Jurimetrics Journal, Fall 1993, at 41 (noting that even imperfect data can have great utility in forensic decision-making).

C. Use of statistics to illustrate non-statistical decision-making

Table 4 simulates the decision-making process using all information available to the trier in this case. It describes the defendant’s likely behavior, as determined by the sentencing judge using traditional (i.e., non-statistical) methods of decision-making. Numerical values are estimates of probability for illustrative purposes only.

The purposes of the table are several: to assist in the decision-making process; to explain that process; and to open up that process for discussion. See, e.g., Ward Edwards, Comment, The Society of Bayesian Trial Lawyers, 66 B.U.L.Rev. 937, 938 (1986) (“[I]t seems reasonable that probability assessments should be explicit and open to inspection and evaluation.”); Saks & Kidd, supra, at 152 (“Burying [the problems of evaluating evidence] in implicitness is no solution.... Making [the quantifiability of evidence] explicit does not create the problems, it only forces us to ... begin dealing with them.”); id. at 156 (“[c]andid announcements of unavoidable margins of error may be a greater service to individual defendants and to the legal system” than professions of certainty).

In broad outline, the Table describes each of Shonubi’s smuggling trips, starting with the December 10, 1991 trip, about which a great deal is known, and working backwards, through seven additional trips, about which less is known. Although Shonubi may have made more than eight trips, the Table considers only the eight trips found by the trial and appellate courts to have occurred. It assumes, for purposes of this illustration only, the following correlation between probabilities and legal standards:

Beyond a reasonable doubt: 95% plus
Clear and convincing: 70% plus
Preponderance of the evidence: 50% plus

See United States v. Fatico, 458 F.Supp. 388, 410 (E.D.N.Y.1978) (quantifying burdens of proof for purposes of discussion), aff'd on other grounds, 603 F.2d 1053 (2d Cir.1979), cert. denied, 444 U.S. 1073, 100 S.Ct. 1018, 62 L.Ed.2d 755 (1980). Thus, the likelihood of each trip, see Column II, will determine whether it meets the reasonable doubt, clear and convincing, or preponderance of the evidence standard.

TABLE 4
Court’s Model of Amounts Smuggled by Defendant Based on Intuitive Assessment of Probabilities
I. II. III. IV.
Trip Probability Probable Estimate of
Number Trip was made Range of Quantity
(In By defendant Weight Most favorable
Reverse And was a Of heroin To defendant
Order) Narcotics Mixture (Based on
Smuggling Carried Multiplication
Trip (In grams) Of Col. II and
(%) (Assuming a Low end of
Learning curve Col. Ill)
For low end
Of range)
400-450 396
375-450 356
350-450 315
325-450 276
300-450 240
275-450 206
250-450 175
225-450 146
Total 2110 grams
This non-Bayesian and non-statistical model is a simplified summary of the court’s analysis. It does not contain verifiable numbers. Given the information available, precise quantification is not possible.

Column I provides the trip number, in reverse chronological order. Eight trips were proven at trial. Other trips may have escaped detection.

Column II represents the probability that the trip was made and that it was made for purposes of smuggling narcotics. For the most recent trip, this is slightly less than 100 percent. The defendant could be innocent— the victim of a mistaken identification, for example, or a frame-up. However, the probability is at least 95 percent; Shonubi was convicted by a jury “beyond a reasonable doubt.” For each preceding trip, the likelihood is somewhat lower. Trips 5, 6, and 7, which the defendant admitted making — although not for smuggling heroin — are quite likely to have occurred. Trips 1, 2, 3, and 4 were proved by less compelling evidence at trial, and are less likely to have occurred. The likelihood of earlier trips drops off, though not to zero.

Column III represents the amount most likely to have been carried on the trip. For the most recent trip, the amount is that found on the defendant at the time of his arrest, with some allowance for measurement error. For the previous trips, the amount is a range that includes the amount found on the defendant on December 10, 1991, reflecting the court’s view that the defendant probably carried roughly the same amount on every trip. The low end of the range declines with each preceding trip, to reflect the “trip effect” posited by the experts. For reasons discussed earlier, the court does not consider this effect to have been a significant factor. Thus, the drop-off at the low end is less steep than in the Rule 706 Panel’s projections.

Column IV represents the amount fairly attributable to the defendant from each trip. It is derived by multiplying the amount at the low end of the range in Column III by the probability in Column II. It represents an estimate of the amount carried, discounted to account for the uncertainty of the event. This method of calculation properly favors the defendant.

From the analysis reflected in Table 4, it can be concluded that: Trips 7 and 8 satisfy the beyond a reasonable doubt standard, Trips 2 through 8 satisfy the clear and convincing standard. Trips 1 through 8 satisfy the preponderance of the evidence standard. Adding the amounts in Column IV would lead to the conclusion that the amounts carried, under each standard of proof, are:

Beyond a Reasonable doubt (95% + probability) about 752 grams
By clear and convincing evidence (70% + probability) about 1,964 grams
By a preponderance (50% + probability) about 2,110 grams

This analysis, while only illustrative, suggests that the defendant was proven by clear and convincing evidence to probably have imported a total of more than 1,000 grams of heroin on his eight trips.

D. Conclusions on proper role of statistics in this case

Statistical analysis alone could not satisfy the government’s burden. It has, however, interacted with inferences reached through non-statistical analysis in the following ways: 1) strengthening the trier’s confidence in those inferences; 2) permitting the trier to “cross-check” those inferences; and 3) helping the court to illustrate those inferences.

E. Cross-checking

Statistical and non-statistical proof are intertwined. Although there are few “pure” statistics cases, there are many cases in which some quantification is useful.

The effect of heuristic biases present in “conventional” decision-making can be reduced by careful and appropriate statistical reasoning, and vice versa. See, e.g., Deborah Hensler, Resolving Mass Toxic Torts: Myths and Realities, 1989 U.Ill.L.Rev. 89, 90 (describing judicial practice of measuring statistical models against “traditional” measures of proof); J.G. Thornton, R.J. Lilford & N. Johnson, Decision Analysis in Medicine, 304 British Med.J. 1099 (1992) (describing use of statistical methods to correct for biases in human decision-making).

By checking one mode of analysis against the other, triers improve the quality of their decisions. This implies that in evaluating non-statistical proof, courts should not lose sight of the potential utility of statistics. Similarly, in using statistics and other scientific techniques, courts must not ignore the fundamentals of inferential analysis, standard heuristic devices, and other techniques traditionally used by triers of fact in assessing the probative force of evidence. Cf. Cory Dean, Computer Edges Doctors in Heart Cases, N.Y.Times, Mar. 31, 1988, at 13 (“[Djecisions [incorporating a] computer program, rather than ‘the unaided judgment of physicians,’ could avoid 175,000 unnecessary admissions to coronary care units ... each year.”); Saks & Kidd, supra, at 147 (“The mathematical model of a person’s own decision policies is more accurate than [decision-making by] the person.”). The comparison is not between triers using traditional fact-finding methods or statistics, but between triers deciding with or without the help of valuable statistical tools.

The fact that, as used in litigation, statistical and non-statistical proof are inextricably intertwined has implications for appellate review. Even in largely statistical cases, the superior fact-finding ability of the trial court has been acknowledged. As the Panel on Statistical Assessments as Evidence in the Courts concluded,

[wjhen it comes to understanding and coping with statistical evidence, the institutional advantage appears to be with the trial court. In dealing directly with the expert witnesses and acquiring an intimate knowledge of the facts of the case, the trial judge has an opportunity that the appellate judge lacks for instruction on the role of statistical evidence in the case in question. ... [This may lead] appellate judges ... to resolve statistical issues through law-like pronouncements ... without either a full understanding of the issues raised or an appreciation of what good statistical methodology implies given the specific facts of the case.

Evolving Role, supra, at 82; see also United States v. Casamento, 887 F.2d 1141, 1156 (2d Cir.1989) (when a finder of fact has “pieced[d] together circumstantial evidence,” a “reviewing court must view ... [the] evidence not in isolation but in conjunction”), cert. denied, 493 U.S. 1081, 110 S.Ct. 1138, 107 L.Ed.2d 1043 (1990). This is no less true when that conjunction involves statistical and non-statistical proof.

F. Conclusion in light of statistics and other information provided on remand

The statistical evidence, in conjunction with proof already introduced, increases the sentencing judge’s confidence in the conclusion that the defendant probably carried between 1,000 and 3,000 grams of heroin on his eight trips. The data on 117 other defendants, along with evidence that this defendant was more able and determined than the average swallower, support the court’s earlier inferences. Moreover, the high proportion of seizures in the 300-500 gram range confirms the court’s conclusion that knowledge of how much heroin it is possible to swallow is common among smugglers, and that most swallowers attempt to reach that limit. As the 706 Panel noted, any series of related human actions are likely to involve dependencies Panel Report at 24. In this case, the most significant dependency is not the trip effect but the tendency of smugglers to share information so as to maximize profit. By no means independent actors, they endeavor to conform to the expectations of the larger community of drug smugglers and the commercial realities of their pernicious trade.

The court of appeals has ordered the sentencing judge to rely on “drug records, admissions or live testimony.” Shonubi II, 998 F.2d at 89. The sentencing judge has, in fact, relied on a combination of drug records (including DEA and Customs Service records) and the records of Shonubi’s trial, sentencing hearing, and presentence report; the defendant’s admissions at the time of his arrest; and his demeanor at trial and sentencing. The sentencing judge has also relied upon the statistical analysis introduced on remand as well as testimony on the economics of heroin swallowing, and he has drawn on his acquired knowledge of the drug trade — knowledge common to the judges in the district. These “evidentiary elements” cannot be separated; they are inextricably linked in the decision-making process.

The sentencing judge’s obligation in estimation cases under the Guidelines is to analyze the evidence to obtain a range within which the “true” value appears to lie, and then to sentence the defendant for the amount near or at the bottom of that range. The rule arrived at is a rule of lenity as well as of humility. Because triers can only deal with probabilities, the defendant is entitled to have quantities reduced in recognition of possible misconceptions, misapprehensions, or biases by the trier. Much has to be left to the discretion of the trial judge. Cf. United States v. Moore, 54 F.3d 92, 102 (2d Cir.1995) (approving estimation, in drug conspiracy case, where district court estimated “conservatively” and made “reasonable deductions for losses and disruptions in the organization,” thereby choosing “a middle ground” between defense and government numbers).

A rational analysis would support the court’s original finding of a total of some 3,400 grams. Utilizing the subsequently obtained information and applying the analysis described above, the court now concludes, giving the defendant the benefit of every doubt, that the government has proven for Guidelines purposes that the defendant smuggled between 1,000 and 3,000 grams on his eight trips.

XIII. Obstruction of Justice Enhancement

The court of appeals has ordered enhancement of Shonubi’s offense level by two points for obstruction of justice. See U.S.S.G. § 3C1.1 (1994). It was the sentencing judge’s view that to impose such an enhancement, under the facts and circumstances of this case, would be both “inappropriate and cruel.” See Shonubi I, 802 F.Supp. at 863.

A. Purposes of sentencing and of § 3C1.1

In United States v. Dunnigan, decided after the original sentence was imposed, the Supreme Court explained that the purpose of a § 3C1.1 enhancement is to promote

legitimate sentencing goals relating to the principal crime, including ... retribution and incapacitation_ The perjuring defendant’s willingness to frustrate judicial proceedings to avoid criminal liability suggests that the need for incapacitation and retribution is heightened.

Dunnigan, — U.S. -, -, 113 S.Ct. 1111, 1119, 122 L.Ed.2d 445 (1993) (citing 18 U.S.C. § 3553(a)(2)). The Court’s language supports the sentencing judge’s conclusion that such enhancements should be imposed only in light of the purposes of sentencing enumerated by Congress at 18 U.S.C. § 3551 et seq. See Shonubi I, 802 F.Supp. at 863; see also United States v. DeRiggi, 45 F.3d 713, 716 (2d Cir.1995) (“Section 3551 ... bids courts to consider, before they impose sentence, to what end they do so.”). Imposing the enhancement where the purposes of sentencing will not be served would ignore Congress’ clearly expressed intent, as well as such real-life concerns as the effects of prison overcrowding, see Farmer v. Brennan, — U.S. -, 114 S.Ct. 1970, 128 L.Ed.2d 811 (1994), and the expense, borne by the taxpayers, of unwarranted incarceration. It would also ignore the concepts of parsimony — which requires imposition of a sentence no greater than required to accomplish relevant sentencing goals — and lenity. See Bifulco v. United States, 447 U.S. 381, 387, 100 S.Ct. 2247, 2252, 65 L.Ed.2d 205 (1980) (principle of lenity — that statutory ambiguities should be resolved in favor of defendant— “applies not only to interpretations of the substantive ambit of criminal prohibitions, but also to the penalties they impose”); see also William N. Eskridge, Jr. & Philip P. Friekey, Foreword: The Supreme Court, 1993 Term: Law as Equilibrium, 108 Harv. L.Rev. 97, 104 (describing rule of lenity as important principle in recent Supreme Court jurisprudence).

In the instant case, where the defendant’s refusal to plead and insistence on taking the witness stand, where he lied, have effectively added 10 years to his sentence, the goals of § 3C1.1, as expressed by the Court in Dun-nigan, have already been served. While, as the Dunnigan Court stated, it was “rational” for the Sentencing Commission to conclude that “a defendant who commits a crime and then perjures herself in an unlawful attempt to avoid responsibility is more threatening to society ... than a defendant who does not so defy the trial process,” — U.S. at -, 113 S.Ct. at 1118, a § 3C1.1 enhancement is not required in Shonubi’s ease. He has already paid a far higher price for his perjury than the Commission envisioned. (In Criminal History Category I, the enhancement permitted under § 3C1.1 adds, at most, 36 months to a defendant’s sentence).

B. Double-counting

Under the circumstances of this case, a § 3C1.1 enhancement constitutes double-counting, raising due process concerns. See Ball v. United States, 470 U.S. 856, 105 S.Ct. 1668, 84 L.Ed.2d 740 (1985) (prohibiting cumulative punishment for same offense); United States v. Lindsay, 985 F.2d 666 (2d Cir.) (adapting Ball holding, with modification), cert. denied, — U.S. -, 114 S.Ct. 103, 126 L.Ed.2d 70 (1993). Cf. United States v. Greenfield, 44 F.3d 1141, 1146 (2d Cir.1995) (considering the same conduct in imposing enhancements under § 3B1.1 — “organizer, leader, manager or supervisor” — and § 2F1.1(b)(2) — “more than minimal planning” — constitutes “impermissible double-counting”).

The commentary to Section 3C1.1 expressly forbids such double-counting. See U.S.S.G. § 3C1.1, application note 6 (1994) (instructing judge that, where defendant is convicted of an underlying offense as well as perjury, offense level for underlying offense should be increased by 2-point perjury enhancement “or the offense level for [perjury], whichever is greater” (emphasis added)); U.S.S.G. § 3C1.2, application note 1 (1994) (Reckless Endangerment During Flight) (“Do not apply this enhancement where the offense guideline ... or another adjustment ... results in an equivalent or greater increase in offense level solely on the basis of the same conduct.”). Cf. also United States v. Werlinger, 894 F.2d 1015, 1018 (8th Cir.1990) (declining to “construe section 3C1.1 ... so broadly as to multiply punishment of conduct already punished through the application of other sections of the Guidelines”).

The court of appeals has rejected claims of double-counting when it was shown that the sentencing court had considered “different facets of defendant’s conduct.” United States v. Cawley, 48 F.3d 90, 95 (2d Cir.1995) (citing United States v. Marsh, 955 F.2d 170, 171 (2d Cir.1992)). Here, the sentencing judge is being compelled to twice consider the same facet of Shonubi’s conduct. The court probably would not have been alerted to all seven of his other trips if not for his lying on the stand. Thus, considerations of both fairness and utility preclude considering this same behavior again in the context of a § 3C1.1 enhancement.

C. Discretion to enhance under Dunni-gan

In its opinion the court of appeals denied the sentencing court the discretion not to enhance. It relied on its reading of Dunni-gan as requiring enhancement “upon a proper determination that the accused has committed perjury at trial.” Shonubi II, 998 F.2d at 87-88 (quoting Dunnigan, — U.S. at -, 113 S.Ct. at 1119).

This is by no means the only possible reading of Dunnigan. Dunnigan held that enhancement of a sentence under § 3C1.1 does not unconstitutionally burden a defendant’s right to testify; it did not suggest that such enhancements are required. The Court’s own summary of its holding states: “Held: Upon a proper determination that the accused has committed perjury at trial, a court may enhance the accused’s sentence under § 3C1.1.” — U.S. at -, 113 S.Ct. at 1111 (emphasis added). Supreme Court headnotes are prepared under the supervision of the Court.

The court of appeals for the Second Circuit, in a decision issued shortly after its opinion in Shonubi, stated that “[a] district court may enhance a defendant’s sentence for obstruction of justice if the defendant commits perjury.” United States v. Onumonu, 999 F.2d 43, 45 (2d Cir.1993) (emphasis added). In another case decided after Shonubi, the same court of appeals observed:

Dunnigan does not say that every time a defendant is found guilty despite his exculpatory testimony, the court must hold a hearing to determine whether or not the defendant committed perjury. On the contrary, that opinion clearly states that when the court wishes to impose the enhancement over the defendant’s objection, the court “must review the evidence and make independent findings necessary to establish a willful impediment to or obstruction of justice....”

United States v. Vegas, 27 F.3d 773, 782-83 (2d Cir.1994) (emphasis in original) (quoting Dunnigan, — U.S. at -, 113 S.Ct. at 1117), corrected, No. 93-1375, (2d Cir. June 13, 1994), cert. denied, — U.S. -, 115 S.Ct. 284, 130 L.Ed.2d 200 (1994); see also United States v. Cawley, 48 F.3d 90, 94 (2d Cir.1995) (holding § 3C1.1 “enhancement appropriate” and “proper” — not required). Vegas — in particular the emphasized language — and Onumonu suggest that the decision to make a finding of perjury, leading to the enhancement, is within the trial court’s discretion.

Other courts of appeals have read Dunni-gan as permitting, but not requiring, § 3C1.1 enhancements. See, e.g., United States v. Cabral-Castillo, 35 F.3d 182, 186 (5th Cir.1994) (“A sentence may ... be enhanced if a defendant testifies untruthfully.”) (citation omitted), cert. denied, — U.S. -, 115 S.Ct. 1157, 130 L.Ed.2d 1113 (1995); United States v. Yost, 24 F.3d 99, 106 (10th Cir.1994) (“trial judge is entitled” to consider enhancement) (citation omitted); United States v. Dillard, 43 F.3d 299, 308 (7th Cir.1994) (“[Defendant can be given a two-level enhancement ... if he committed perjury.”); United States v. Osobor, No. 91-5792, 991 F.2d 792, 1993 WL 96900, at *1 (4th Cir. Apr. 2, 1993) (unpublished) (trial court is “permitted ... to enhance a defendant’s sentence”).

Commentators, relying on the Supreme Court’s language, have noted that Dunnigan does not compel enhancement.

[Njothing in [Dunnigan ] prevents a sentencing judge from holding a due process hearing before he enhances a sentence for trial perjury, or even from requiring that perjury be proved beyond a reasonable doubt. There is still interpretive room, in other words, for restoring equilibrium to the criminal process....

Barbara Allen Babcock, The American Criminal Justice System Approaching the Year 2000—Introduction: Taking the Stand, 35 Wm. & Mary L.Rev. 1, 19 (1993).

Some circuits have, however, held that the enhancement for perjury is automatic. See United States v. Austin, 948 F.2d 783, 788-89 (1st Cir.1991); United States v. Alvarez, 927 F.2d 300, 303 (6th Cir.1991), cert. denied, 500 U.S. 945, 111 S.Ct. 2246, 114 L.Ed.2d 487 (1991). In United States v. Avila, 905 F.2d 295, 297 (9th Cir.1990), the court held that the enhancement is mandatory, but mitigating circumstances could determine placement within the sentencing range. See also Kevin J. Kelley, To Enhance or Not to Enhance: A Guide to Uniformity in Applying Perjury Enhancements Under Section SC1.1 of the United States Sentencing Guidelines: United States v. Dunnigan, 27 Creighton L.Rev. 585 (1994) (describing “circuit splits” in Dunnigan interpretation). The preferred view is that Dunnigan—as the opinion itself states, and as the Second Circuit Court of Appeals has acknowledged in cases decided since Shonubi II — does not compel enhancement.

D. Unconstitutionality of automatic enhancement

Automatic enhancement strongly discourages defendants from exercising their constitutional right to testify in their own defense. See Vincent L. Broderick, Flexible Sentencing and the Violent Crime Control Act of 1994, 7 Fed.Sent.Rep. 128, 131 (1995) (noting dangers to defendants of going to trial).

A rule of automatic enhancement recalls the medieval belief that a defendant could not be sworn as a witness because perjury would inevitably consign him or her to hell. See, e.g., Thomas Starkie, Evidence 83 (1824) (“This rule of exclusion ... is founded on the known infirmities of human nature, which is too weak to be generally restrained by religious or moral obligations, when tempted and solicited in a contrary direction by temporal directions”); English Common Law Practice Commissioners, Second Report 10 (1853) (“Every [interested] person ... was presumed to be incapable of resisting the temptation to perjury.”); 2 Wigmore on Evidence § 578, at 818-19 (James H. Chadbourn rev. ed. 1979); 6 id. § 1834, at 436 n. 1 (1976) (describing practice). But see English Common Law Practice Commissioners, supra, at 10 (“It is painful to contemplate the amount of injustice which must have taken place under [this exclusionary rule].”). Well into the last century, when defendants were permitted to testify, they were often not permitted to take an oath. See 6 Wigmore on Evidence, supra, § 1827, at 411 (“[I]n many jurisdictions, the concession of an accused person’s right to testify was preceded by a stage in which he was allowed to make a ‘statement,’ but not under oath.... [T]he practice was looked upon as a refusal to concede the ordinary guarantee of credibility.”).

As a matter of policy as well as of constitutional right, defendants, who already have much to lose by testifying, should not be further discouraged from taking the witness stand at the trial or sentencing hearing. Cf. Rudolph B. Schlesinger, Comparative Law 441-55 (1980) (quoted in Mary Ann Glendon, Michael Wallace Gordon & Christopher Osakwe, Comparative Legal Traditions 181, 189 (1985) (under continental systems, “a defendant who decides to testify ... [does not] have to dread that a prosecution for perjury might arise out of such testimony. What he says in his defense is not under oath.”). The court has the discretion to consider perjury in determining the sentence under § 3C1.1 or in fixing the sentence within the Guidelines range. If the court chooses not to enhance, the prosecutor retains the ability to initiate a separate prosecution for perjury, affording the defendant the usual protections afforded by our criminal laws.

E. Particularized finding of perjury

Despite considerable doubt about the court of appeals’ reading of Dunnigan in Shonubi—doubt engendered, in part, by its own opinions in Onumonu and Vegas — this court is bound to follow the “law of the case.” See, e.g., United States v. Ekwunoh, 888 F.Supp. 369, 371-72 (E.D.N.Y.1994) (discussing “law of the case” doctrine). Thus, it must enhance Shonubi’s sentence, assuming it has made a “proper determination that [the defendant] committed perjury at trial.” Shonubi II, 998 F.2d at 87-88.

In Dunnigan, such a determination was defined as “a finding of an obstruction or impediment of justice that encompasses all of the factual predicates for a finding of perjury.” — U.S. at -, 113 S.Ct. at 1117. “[I]t is preferable,” the Court added, “for a district court to address each element of the alleged perjury in a separate and clear finding.” Id.

This language hardly suggests that a finding that a defendant lied, made in another context, will automatically satisfy § 3C1.1. In Vegas, the court of appeals stated that findings made for purposes of § 3C1.1 are a prerequisite to the enhancement. 27 F.3d at 783. The Vegas court distinguished Shonubi by observing that the sentencing judge in Vegas did not make a finding that the defendant had lied, while this court, in Shonubi, did. Id. That is an important distinction. The sentencing court has stated that Shonubi lied.

Yet this court’s statements in Shonubi are not the element-by-element finding suggested by Dunnigan. There is no way for the court of appeals to know, a priori, whether the district court would have made such a finding. The commentary to § 3C1.1 provides that “[i]n applying this provision in respect to alleged false testimony or statements by the defendant such testimony or statements should be evaluated in a light most favorable to the defendant.” U.S.S.G. § 3C1.1, comment 1 (1994). This comment would be meaningless if the trial court did not have the discretion to employ different standards in evaluating conduct in light of the purpose of that evaluation. In Onumo-nu, the court of appeals ruled that judges should apply a standard “indistinguishable from a clear-and-convincing standard” for evaluating conduct under § 3C1.1. 999 F.2d at 45. See also United States v. Montague, 40 F.3d 1251, 1256 (D.C.Cir.1994) (application note to § 3C1.1 “ereate[s] an exception to the usual ... preponderance-of-the-evidence standard”). A finding made in the course of sentencing, by a preponderance standard, see supra Part IV.A, will not in itself satisfy § 3C1.1.

Similarly, when a sentencing factor is in dispute, the defendant must be given an “adequate opportunity to present information to the court regarding that factor.” U.S.S.G. § 6A1.3 (1994). The Second Circuit Court of Appeals has

held, in the context of upward departures, that the sentencing court, before sentence is imposed, must give a defendant both notice of the court’s intention to depart, including “the factors that the judge is planning to rely upon and ... some brief explanation as to why these factors warrant a departure,” and an opportunity to be heard as to why the contemplated departure is unwarranted.

United States v. Jagmohan, 909 F.2d 61, 63 (2d Cir.1990) (alteration in original) (quoting United States v. Kim, 896 F.2d 678, 681 (2d Cir.1990)); accord United States v. Palta, 880 F.2d 636, 640 (2d Cir.1989); United States v. Cervantes, 878 F.2d 50, 55-56 (2d Cir.1989). See also United States v. Zapatka, 44 F.3d 112, 115-16 (2d Cir.1994) (defendant entitled to notice before sentencing hearing that court planned to sentence her under harsher guideline than that used in presentence report). In an “automatic enhancement” environment, this procedural protection would cease to have meaning.

Procedural law and practice preclude the imposition of a § 3C1.1 enhancement without notice of the sentencing judge’s intention to do so. As this court noted in Shonubi I, “[w]ooden application of § 3C1.1 would amount to punishment for perjury by summary bench trial.” 802 F.Supp. at 863. Under this rule a finding, made in another context, would become a verdict in a trial that neither the court nor the prosecutor elected to conduct.

F. Conclusion

Given the mandate of the court of appeals, the sentencing judge has no choice but to increase the defendant’s offense level by two points.

XIV. Additional Sentencing Considerations

A. “Penalty” for going to trial

At one time, the defendant contemplated entering a plea of guilty. Subsequently, the defendant, after consulting his attorney, chose to go to trial. In so doing, he risked receiving a harsher penalty than under the plea bargain. See Vincent L. Broderick, Flexible Sentencing and the Violent Crime Control Act of 1994, 7 Fed.Sent.Rep. at 131 (“[I]f the facts are unfavorable, going to trial will make them abundantly clear to the sentencing judge.”); Tony Garoppolo, The Sentencing Reform, Act: A Guide for Defense Counsel 39-40 (1994) (describing benefits of guilty pleas, especially where Guidelines will permit the sentencing judge to consider related conduct).

Having put the government to its proof, the defendant cannot now ask the court to ignore what it has seen and heard. Shonu-bi’s own testimony alerted the court to the full ambit of his criminal behavior. He is not being penalized for going to trial. He is being penalized for making eight smuggling trips.

B. Sentencing within prescribed Guidelines range

The sentencing judge has the discretion to sentence the defendant anywhere within the applicable Guidelines range. See, e.g., United States v. Reed, 914 F.2d 1288, 1290 (9th Cir.1990). If the sentence exceeds 24 months, the court must explain its decision. 18 U.S.C. § 3553(c)(1). In choosing where, within the range, to sentence the defendant, the court could take one of several approaches.

One approach would be to rely solely on quantity of drugs — that is, a defendant at the low end of the base offense quantity range would be sentenced at the low end of the Guidelines range, and so on. Quantity of drugs reveals something about defendant’s danger to society. See U.S.S.G. § 1B1.3, comment 2 (1994); United States v. Colon, 961 F.2d 41, 43 (2d Cir.1992). This approach, however, has several flaws. First, it further mechanizes a system that is already largely bereft of human considerations, requiring a precision in quantification that the Guidelines themselves recognize is often unattainable. Second, it ignores the other indicia of culpability that may be far more important than quantity in determining the appropriate sentence. Over-reliance on quantity has been perhaps the most frequently criticized aspect of the Guidelines for drug offenses. See, e.g., Mare Miller & Daniel J. Freed, The Disproportionate Imprisonment of Low-Level Drug Offenders, 7 Fed.Sent.Rep. 3 (1994); Jon O. Newman, Five Guideline Improvements, 5 Fed.Sent.Rep. 190 (1993); Douglas A. Berman, The Second Circuit: Attributing Drug Quantities to Narcotics Offenders, 6 Fed.Sent.Rep. 247 (1994).

Another approach would be to consider the number of instances on which the defendant smuggled drugs, thus distinguishing between a person who smuggled 400 grams eight times, and a person who smuggled 3,200 grams once. Cf. United States v. Lara, 47 F.3d 60 (2d Cir.1995) (adopting “quantity/time factor” to gauge the culpability of persistent offenders). The defendant who smuggled a large quantity on one occasion might, given the scale of his operation, appear to pose the greater threat. Yet a strong argument could be made that the defendant who handles smaller quantities repeatedly presents the greater danger. As Professor Brownlee has wisely observed, in choosing to repeat the same behavior, the defendant makes clear that the offense of conviction is not an aberration, but is consistent with his character. The likelihood that he will repeat the crime is high. See generally Ian D. Brownlee, Taking the Strait-Jacket Off: Persistence and the Distribution of Punishment in England and Wales, 14 Legal Stud. 295, 307 (1994). Incapacitation of convicted felons is one of the roles of sentencing enumerated in 18 U.S.C. § 3553(a). The need for incapacitation in a given case is based in part on the danger the defendant will pose to society after release. The repeat offender would, under this approach, receive the higher sentence.

In evaluating persistence, the court’s observation of the defendant’s demeanor can be useful. In the same way that the defendant’s behavior in court can support inferences about his past behavior, see supra Part V.C, it can support inferences about his probable future behavior. See Brownlee, supra, at 307 (discussing connection between “an undesirable attitude toward authority” and the tendency to persist in criminal behavior). Shonubi’s undesirable attitude toward authority — exhibited in each of his appearances before this court — suggests a high likelihood of recidivism, with attendant costs and dangers to society. Thus a lengthy period of incapacitation through incarceration is warranted.

The operation of the Guidelines renders this goal all too easy to achieve. Guidelines sentences for drug offenses were based on the onerous mandatory mínimums enacted in the early 1980’s. See, e.g., Barbara S. Vincent & Paul J. Hofer, The Consequences of Mandatory Minimum Prison Terms: A Summary of Recent Findings, 7 Fed.Sent. Rep. 33 (1994) (describing relationship between mínimums and Guidelines). They are not only longer than the sentences imposed in this district prior to enactment of the Guidelines, but they are longer than are required for any rational sentencing purpose. Consequently, as in the majority of drug cases that come before this court, the most appropriate sentence under the Guidelines is the low end of the applicable range.

C. Added time in prison required by Guidelines system

Before enactment of the Guidelines, under the system in place in the Eastern District of New York, one-time mules were sentenced to approximately 30 months in prison; evidence of repeated trips generally added 18 months. Thus Shonubi would, most likely, have been sentenced to 48 months in prison. A precise finding of the total amount of heroin smuggled would not have been required. Under the parole system then in effect he could have served less than two thirds of this time, or about 32 months, before being deported. The additional 100-plus months defendant will serve in prison over what would have been imposed before the Guidelines will have little or no deterrent effect on the drug trade and will burden the taxpayers with hundreds of thousands of dollars in superfluous costs of incarceration.

XV. Conclusion

The defendant’s offense level, based on a finding that he smuggled over 1,000 but less than 3,000 grams of heroin on eight related trips, is 32. The 2-point enhancement under § 3C1.1, which the court of appeals has ordered this court to impose, brings his offense level to 34. The range for a level 34 offense in the defendant’s criminal history category — “I”—is 151-188 months. A prison term of 151 months — twelve and one half years — more than satisfies the sentencing goals listed at 18 U.S.C. § 3551 et seq.

The defendant is re-sentenced to 151 months in prison plus five years’ supervised release and a $100 assessment.

SO ORDERED.  