
    UNITED STATES of America, ex rel. Patrick J. LOUGHREN, Plaintiff, v. UNUMPROVIDENT CORP., et al., Defendants.
    Civil Action No. 03-11699-PBS.
    United States District Court, D. Massachusetts.
    Feb. 24, 2009.
    
      Colette G. Matzzie, Mary Louise Cohen, Claire M. Sylvia, Phillips & Cohen LLP, Kit A. Pierson, Law Office of Kit A. Pier-son, P.L.L.C., Washington, DC, Jeffrey Mark Cohen, Jeremy M. Sternberg, United States Attorney’s Office, Peter B. Krupp, Sara A. Laroche, Lurie & Krupp, LLP, Boston, MA, for Plaintiff.
    Byrne J. Decker, Geraldine G. Sanchez, Louise K. Thomas, Lucus A. Ritchie, Mark E. Porada, Gavin G. McCarthy, Robert H. Stier, Jr., Pierce Atwood LLP, Portland, ME, John E. Meagher, Shutts & Bowen LLP, Miami, FL, William J. Kayatta, Jr., Stephen Herbert Galebach, Pierce Atwood LLP, Boston, MA, for Defendants.
   MEMORANDUM AND ORDER

SARIS, District Judge.

I. Introduction

Whistleblower plaintiff, Patrick Loughren, brings this qui tam action against UnumProvident Corporation and Genex Services, Inc. (collectively “Unum”) alleging violations of the False Claims Act (“FCA”), 31 U.S.C. § 3729 et seq. Loughren proposes to submit expert testimony from Matthew G. Mercurio, Ph.D., in which Mercurio uses statistical techniques to extrapolate from the number of false claims within a sample of claims to an estimation of the total number of false claims filed. Unum has moved to exclude the testimony under Fed.R.Evid. 702. After hearing and two rounds of briefing, Unum’s motion [Docket No. 282] is ALLOWED.

II. Background

1. Procedural

Plaintiff contends that Unum caused many of its insureds to file applications to the Social Security Administration (“SSA”) for Social Security Disability Insurance (“SSDI”) benefits that falsely state that the claimants were “unable to work” or were “disabled” when Unum knew or should have known that these insureds did not meet the statutory definition of disability required to qualify. At issue are the 468,641 insureds who have submitted long term disability (“LTD”) claims to Unum and whom Unum allegedly caused to apply for SSDI benefits between January, 1997 and July, 2007. Given the enormous number of claims and the significant time and resources it would take to determine if a single claim were false, the plaintiff understandably deemed it impractical to examine each one by one, and so turned to statistical sampling and extrapolation.

Prior to trial, Unum challenged the reliability of the extrapolation on a number of grounds. The Court held a bellwether trial on six claimants (one of whom filed two claims) and deferred ruling on the Daubert motion. The jury returned a split verdict. After the Court directed a verdict for the defendants on one claim, the jury found that two of the remaining claims were false, three claims were not false (including two claims filed by a single claimant), and hung on the final claim.

Prior to trial, the record was unclear as to whether each Unum examiner made a separate subjective evaluation regarding the decision whether to require a claimant to file an application with the SSA, or whether Unum had a general policy of requiring a claimant to file an application whenever the disability was expected to last more than six months. At trial, plaintiff presented evidence from which a jury could reasonably find that Unum had a policy and practice of coercing its insureds to file for SSA benefits as soon as they were disabled for six months. See, e.g., Trial Tr. vol. 14, 38-41, Oct. 15, 2008 (testimony of Unum claim administrator regarding a letter sent to a claimant “based on the time frame she’s been out of work” stating that “[s]ince your disability has extended beyond five months, to receive an unreduced disability benefit, we encourage you to apply for Social Security Disability Insurance benefits.”); Trial Tr. vol. 3,122-23, Sept. 24, 2008 (Unum employee testifying that claims handlers had access to a manual instructing them that “[i]f it is anticipated that the disability will be more than a short duration, the claimant will be asked to apply for SSDI.”); Trial Tr. vol. 4, 22-23, Sept. 25, 2008 (testimony regarding a document stating that, for at least one major claim site, “[generally, if disabled over six months, SSDI advocacy pursued,” and describing a similar policy at another site); Trial Tr. vol. 4, 64-67, Sept. 25, 2008 (testimony of former Unum employee that Unum “would say to the insured, if they believed that the disability was going to last more than six months, they would tell them that they needed to apply for Social Security Disability.... It was just simply a duration analysis” and other eligibility requirements were not considered); Trial Tr. vol. 5, 3945, Sept. 26, 2008 (testimony of former Unum employee that Unum’s policy was to tell insureds that they were required to apply for SSDI with “no assessment with respect to the Social Security requirements” so long as a claimant’s “disability was going to extend beyond five months.”); Trial Tr. vol. 9, 139-142, Oct. 3, 2008 (testimony of Unum employee that internal review indicated that claimants whose disabilities were expected to last more than six months were told to apply for SSDI). As such, the Court concludes that extrapolation is a reasonable method for determining the number of false claims so long as the statistical methodology is appropriate. See, e.g., United States v. Lahey Clinic Hasp., Inc., 399 F.3d 1, 18 n. 19 (1st Cir.2005) (noting that “sampling of similar claims and extrapolation from the sample is a recognized method of proof.”); Hilao v. Estate of Marcos, 103 F.3d 767, 782-87 (9th Cir.1996) (approving the use of random sampling and statistical evidence to determine damages); United States v. Cabrerar-Diaz, 106 F.Supp.2d 234, 240-41 (D.P.R.2000) (approving the use of a statistical sample and extrapolation in a False Claims Act case).

2. The Experts

Plaintiff retained Dr. Mercurio to select a statistically valid random sample of the claims. According to his expert report, Mercurio considered and rejected using simple random sampling, the most basic sampling procedure (the one familiar even to lawyers and judges), and stratified sampling, a process in which the population is divided into several subpopulations, which are then each randomly sampled. (Expert Report of Dr. Matthew G. Mercurio, Ph.D. (“Mercurio Report”) 5-8.) In stratified sampling, the subpopulations are mutually exclusive and, together, represent every element in the population. (Id. at 6.) Instead, Mercurio chose to utilize cohort sampling, which he called “the most efficient and suitable approach” given the situation. (Id. at 8.) In cohort sampling, groups that share a specific trait thought to make them more likely to possess the sought after characteristic are more heavily sampled, and each group’s results are then reweighted to account for the group’s relative size in the overall population. (Id. at 6-7.) The cohorts in cohort sampling are not necessarily exclusive and they do not necessarily represent every element in the population. (Id. at 7.)

As an example of cohort sampling, Mercurio suggests a case where the goal is to determine what percentage of the population suffers from Alzheimer’s disease. Instead of simply sampling the entire population, one could create age-based cohorts of 60-70 year olds, 70-80 year olds, and 80 + year olds, and sample those particular cohorts, later reweighting the results based on the percentage of the total population that those cohorts represent. According to Mercurio, this would result in an accurate estimate, while being a “much more efficient use of resources.... ” (Id. at 7-8.) The cohorts in Mercurio’s example, however, do not overlap.

In this case, plaintiff based the cohorts on factors such as the claimant’s disease classification (identified by an “ICD” number), age, disability date, and whether a claimant’s SSDI claim was denied. (Id. at 8-9.) A total of 22 cohorts were specified, although one was later determined to contain no claims. (Id. at App. A.) Because the cohorts were based on different characteristics, and the ranges for certain characteristics varied, many of the cohorts overlap significantly. (Id. at 14, App. C.). Cohort 22 includes all claims, and thus all the other cohorts overlap with cohort 22 completely. Cohort 13 includes all claimants with an age less than 50 and an ICD from 401 to 405; cohort 4 includes claimants with an age of 40 or less and an ICD from 390 to 459 (reflecting circulatory problems), and thus includes many of the claims in cohort 13. (Id. at App. A, App. C; Daubert Hr’g Tr. 23.) Cohorts 11 and 12 were based solely on ICD number, whereas Cohorts 18-21 were based solely on disability date; as they were based on distinct characteristics, these groups of cohorts naturally overlap. (Mercurio Report App. A, App. C.) Other cohorts overlap for similar reasons.

Attempting to achieve a 95% significance level and a ±5.6% level of precision for each cohort, Mercurio calculated the necessary sample size for each cohort. (Id. at 9-12.) For each of the 21 cohorts, the necessary sample size was between 71 and 77, adding up to a total sample size of 1,593 claims. (Id. at 12.) The appropriate number of claims were selected at random from the cohorts, but because the same two claims were randomly selected from two different cohorts, only 1,591 distinct claims were selected. (Id.) At this point, Mercurio stepped aside as other experts hired by the plaintiff reviewed the data from the selected claims. Based on the data available at the time, the experts concluded that 101 of the 1,591 selected claims were false. (Relator’s Opp’n to Defs.’ Daubert Mot. To Exclude the Test, of Expert Matthew G. Mercurio, Ph.D. (“Relator’s Opp’n”) 2.)

To determine the percentage of false claims within each cohort, Mercurio divided the number of verified false claims in the sample by the total sample size for each cohort. (Mercurio Report 14.) He multiplied that percentage of verified false claims in the sample by the total population size of the cohort to derive the “weighed percent.” (Id.) Mercurio then added up the “weighted percents” for all the cohorts. (Id.) As Mercurio notes, were there no overlaps between cohorts, this sum would have been the total number of false claims filed. (Id.) In order to account for the overlaps, Mercurio took this sum of “weighted percents,” divided it by the sum of the number of claims in all the cohorts, counting overlapping claims in multiple cohorts multiple times, and multiplied the result by the total number of unique claims in the entire population. (Id.) Using this “weighted average” extrapolation technique, Mercurio calculated that, in the total population of 468,641 claims, there were 19,945 false claims, ±8,105.5 claims, with 95% confidence. (Id. at 14-15.)

Following the submission of Mercurio’s report, the Social Security Administration provided additional information to the plaintiff that led the plaintiffs other experts to downgrade their conclusion; instead of 101 false claims out of the 1,591 claim sample, they concluded that there were 62 false claims out of the 1,591 claim sample. (Relator’s Opp’n 3 n. 4.) Mercurio then revised his calculation based on this new information, but using his same technique, finding that in the total population of 468,641 claims, there were 13,979 false claims, ±7,438 claims, with 95% confidence. (Supplemental Expert Report of Dr. Matthew G. Mercurio, Ph.D. (“Mercurio Supplemental Report”) 2-3.) At some point, even more data were revealed, leading the other experts to conclude that there were only 58 false claims in the sample. (Second Supplemental Expert Report of Dr. Matthew G. Mercurio, Ph.D. (“Mercurio Second Supplemental Report”) 2-3.) Taking this new information into account, Mercurio calculated that there were a total of 11,827 false claims, ±6,501.2 claims, with 95% confidence. (Id. at 3)

In response, defendants proffered the testimony of their expert, Roger M. Hayne, Ph.D. Hayne has criticized Mercurio’s work on a number of grounds. In addition to criticizing Mercurio’s choice of cohort sampling over simple random sampling, Hayne finds fault with Mercurio’s calculation of the confidence interval and his use of weighted averages to account for the overlaps between cohorts. (Hayne Report 5-13.) Mercurio rejects Hayne’s criticisms. However, in response to Hayne’s criticism of his use of weighted averages, Mercurio replaced cohort 22, the cohort which contained the entire population, with cohort 22*, which contained only those claims which were not included in any other cohort. (Mercurio Supplemental Report 10-11.) Additionally, Mercurio took into account Hayne’s criticisms regarding his calculation of the confidence interval. (Id. at 12-14.) With the new sampling data, and accepting Hayne’s criticisms, Mercurio calculated that there were 8,027 false claims, ±5,868.3 claims, with 95% confidence. (Mercurio Second Supplemental Report 4.)

III. Discussion

1. The Daubert Standard

The admission of expert evidence is governed by Fed.R.Evid. 702, which codified the Supreme Court’s holding in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993), and its progeny. See United States v. Diaz, 300 F.3d 66, 73 (1st Cir.2002); see also Fed.R.Evid. 702 advisory committee’s note. Rule 702 states:

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the -witness has applied the principles and methods reliably to the facts of the case.

Fed.R.Evid. 702.

The trial court must determine whether the expert’s testimony “both rests on a reliable foundation and is relevant to the task at hand” and whether the expert is qualified. Daubert, 509 U.S. at 597, 113 S.Ct. 2786; Diaz, 300 F.3d at 73 (“[A] proposed expert witness must be sufficiently qualified to assist the trier of fact, and [ ] his or her expert testimony must be relevant to the task at hand and rest on a reliable basis”). An expert’s methodology is the “central focus of a Daubert inquiry,” but a court “may evaluate the data offered to support an expert’s bottom-line opinions to determine if that data provides adequate support to mark the expert’s testimony as reliable.” Ruiz-Troche v. Pepsi Cola of P.R. Bottling Co., 161 F.3d 77, 81 (1st Cir.1998); see Bonner v. ISP Techs., Inc., 259 F.3d 924, 929-930 (8th Cir.2001) (deeming it clear that “it is the expert -witnesses’ methodology, rather than their conclusions, that is the primary concern of Rule 702” and suggesting that a court cannot exclude testimony asserting a “novel” conclusion if the methodology and its application are reliable).

Because “the admissibility of all expert testimony is governed by the principles of Rule 104(a),” the proponents of the expert testimony must establish these matters by a preponderance of the evidence. Fed.R.Evid. 702 advisory committee’s note (citing Bourjaily v. United States, 483 U.S. 171, 107 S.Ct. 2775, 97 L.Ed.2d 144 (1987)). “The proponent need not prove to the judge that the expert’s testimony is correct, but she must prove by a preponderance of the evidence that the testimony is reliable.” Moore v. Ashland Chem., Inc., 151 F.3d 269, 276 (5th Cir.1998).

Daubert itself listed five factors which should guide judges in this determination: (1) whether the theory or technique can be and has been tested; (2) whether the technique has been subject to peer review and publication; (3) the technique’s known or potential rate of error; (4) the existence of standards controlling the technique’s operation; and (5) the level of the theory’s or technique’s acceptance within the relevant discipline. Daubert, 509 U.S. at 593-94,113 S.Ct. 2786. “These factors, however, are not definitive or exhaustive, and the trial judge enjoys broad latitude to use other factors to evaluate reliability.” United States v. Mooney, 315 F.3d 54, 62 (1st Cir.2002) (citing Kumho Tire Co. v. Carmichael, 526 U.S. 137, 153, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999)); see United States v. Vargas, 471 F.3d 255, 261 (1st Cir.2006) (“The trial court enjoys broad latitude in executing its gate-keeping function; there is no particular procedure it is required to follow.”); Hollander v. Sandoz Pharm. Corp., 289 F.3d 1193, 1206 (10th Cir.2002) (noting that “different courts relying on essentially the same science may reach different results” when evaluating evidence under Daubert).

In Kumho Tire, the Supreme Court was careful to emphasize that the trial judge must exercise her gate-keeping role with respect to all expert evidence, but that how she might exercise that role would necessarily vary depending on the type of testimony at issue. See Kumho Tire, 526 U.S. at 150, 119 S.Ct. 1167; United States v. Frazier, 387 F.3d 1244, 1262 (11th Cir. 2004) (“Exactly how reliability is evaluated may vary from case to case, but what remains constant is the requirement that the trial judge evaluate the reliability of the testimony before allowing its admission at trial.”); Amorgianos v. Natl R.R. Pas senger Corp., 303 F.3d 256, 266 (2d Cir. 2002) (recognizing that “the Daubert inquiry is fluid and will necessarily vary from case to case.”).

Under Kumho Tire, the critical inquiry is whether the expert “employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” 526 U.S. at 152, 119 S.Ct. 1167; Rider v. Sandoz Pharm. Corp., 295 F.3d 1194, 1197 (11th Cir.2002). When, for example, “the factual basis of an expert’s testimony is called into question, the district court must determine whether the testimony has ‘a reliable basis’ in light of the knowledge and experience of the relevant discipline.” Crowe v. Marchand, 506 F.3d 13,17 (1st Cir.2007).

The Court’s vigilant exercise of this gate-keeper role is critical because of the latitude given to expert witnesses to express their opinions on matters about which they have no firsthand knowledge, and because an expert’s testimony may be given greater weight by the jury due to the expert’s background and approach. See Daubert, 509 U.S. at 595, 113 S.Ct. 2786; Kumho Tire, 526 U.S. at 148, 119 S.Ct. 1167 (noting that experts enjoy “testimonial latitude unavailable to other witnesses”); United States v. Hines, 55 F.Supp.2d 62, 64 (D.Mass.1999) (noting that “a certain patina attaches to an expert’s testimony unlike any other witness; this is ‘science,’ a professional’s judgment, the jury may think, and give more credence to the testimony than it may deserve.”).

The Court must, however, keep in mind the Supreme Court’s admonition that, “[vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but admissible evidence.” Daubert, 509 U.S. at 596, 113 S.Ct. 2786. If an expert’s testimony is within “the range where experts might reasonably differ,” the jury, not the trial court, should be the one to “decide among the conflicting views of different experts____” Kumho Tire, 526 U.S. at 153, 119 S.Ct. 1167. “Only if the expert’s opinion is so fundamentally unsupported that it can offer no assistance to the jury must such testimony be excluded.” In re Viagra Prods. Liability Litig., 572 F.Supp.2d 1071, 1078 (D.Minn.2008) (quoting Bonner, 259 F.3d at 929-930). As the First Circuit has stated:

Daubert does not require that a party who proffers expert testimony carry the burden of proving to the judge that the expert’s assessment of the situation is correct. As long as an expert’s scientific testimony rests upon “good grounds, based on what is known,” it should be tested by the adversary process — competing expert testimony and active cross-examination — rather than excluded from jurors’ scrutiny for fear that they will not grasp its complexities or satisfactorily weigh its inadequacies. In short, Daubert neither requires nor empowers trial courts to determine which of several competing scientific theories has the best provenance. It demands only that the proponent of the evidence show that the expert’s conclusion has been arrived at in a scientifically sound and methodologically reliable fashion.

Ruiz-Troche, 161 F.3d at 85 (quoting Daubert, 509 U.S. at 590, 113 S.Ct. 2786) (internal citations omitted). It is with these principles in mind that the Court assesses Defendants’ motion to exclude.

2. The Challenge

Although Dr. Hayne has attacked Dr. Mercurio’s testimony on numerous grounds, the Court’s primary concerns with Mercurio’s testimony center on (1) his use of overlapping cohorts and his methodology to account for the overlaps; and (2) the size of his conclusion’s level of precision, ±5,868.3.

Despite the fact that it is the plaintiffs burden to establish that Mercurio’s testimony is reliable, neither Mercurio’s expert report, nor his supplemental expert report, nor his second supplemental expert report cite to any texts or articles that support the reliability of using his method of extrapolation from overlapping cohorts. W.G. Cochran’s Sampling Techniques (3rd ed.), which Mercurio describes as an “authoritative text,” makes no mention of cohort sampling. (Mercurio Supplemental Report 7; Hayne Report 5.) In his deposition, Mercurio pointed to a one and a half page section of Cochran’s text which describes “controlled selection,” a technique Hayne admits involves overlapping claims. (Hayne Report 5.) The text specifically notes that the technique is designed for “small samples,” and the example Cochran gives to illustrate the method features a sample size of two and a total population of nine units, a far cry from the over 1,500 claim sample and total population of over 450,000 claims here. William G. Cochran, Sampling Techniques 126-27 (3d ed. 1977). More importantly, the cited section does not discuss the use of weighted averages to deal with the fact that the cohorts overlap.

At the hearing, Mercurio failed to cite any peer-reviewed literature to support his novel approach to overlapping cohorts. Only after the Court discommoded the plaintiff at the hearing with a request for publications referencing the use of overlapping samples did the plaintiff provide any peer-reviewed literature, necessary for the Court to evaluate such well-established factors as whether the technique has been subject to peer review and publication and the level of the technique’s acceptance within the relevant discipline. These articles with pages of incomprehensible formulae were provided without further explanation or citations to relevant sections, leaving the Court to decipher their complex hieroglyphics on its own without a statistical Rosetta stone. Despite having the burden to persuade the Court of the reliability of Mercurio’s method, the plaintiff failed to highlight any portions of the articles supporting Mercurio’s method of using weighted averages to account for the overlapping nature of the cohorts, and the Court was unable to find any such support on its own. Although Mercurio, in his supplemental report, cites to Cochran to support his use of weighted averages (Mercurio Supplemental Report 7), the referenced text deals primarily with cohorts that do not overlap, and does not appear to support Mercurio’s use of weighted averages to account for the overlapping nature of the cohorts. Cochran, supra, at 142-44. And where Cochran and others discuss sampling from overlapping populations, they appear to use different methods from Mercurio’s to compensate for the overlaps, methods which require independent analysis of the overlapping segment of the populations, not simply using weighted averages. See, e.g., Cochran, supra, 144-46; Graham Kalton & Dallas W. Anderson, Sampling Rare Populations, 149 J. Royal Stat. Soc’y A 65, 75-77 (1986).

Without any peer-reviewed literature supporting Mercurio’s weighted average approach, the Court is left with Hayne’s criticism, which holds significant intuitive appeal. As described earlier, for each cohort, Mercurio would divide the number of false claims by the sample size for that cohort to get a percentage of false claims for that cohort. (Mercurio Report 14.) Mercurio would then multiply that percentage by the total number of claims in the cohort to get a “weighted percent.” (Id.; Mem. in Supp. of Defs.’ Daubert Mot. To Exclude the Test, of Pl.’s Proposed Expert Matthew G. Mercurio (“Defs.’ Mem.”) 4.) For instance, assume that cohort A was made up of 1,000 claims. If the sample for cohort A was made up of 100 claims, 10 of which proved to be false, Mercurio’s percentage of false claims for cohort A would be 10% (10 false claims divided by 100 claims in the sample). Mercurio’s “weighted percent” for cohort A would be 100 (10% multiplied by 1,000 claims, the number of claims in cohort A). Mercurio calculated a weighted percent for each of the 21 cohorts. (Mercurio Report 14; Defs.’ Mem. 4.) Once these weighted percents were calculated, Mercurio added them all together. (Mercurio Report 14; Defs.’ Mem. 4.) For the first set of data, this number was 30,186. (Defs.’ Mem. 4.) As Mercurio notes, were the cohorts not overlapping, this number would be the total number of projected false claims. (Mercurio Report 14.) Were the cohorts not overlapping, this process would find support in the Cochran text.

Because the cohorts did overlap, Mercurio’s process needed to account for the fact that the sum of the weighted percents double counted claims appearing in multiple cohorts. As mentioned earlier, Mercurio’s method to deal with this problem was to take the total sum of the false claims calculated by using the percentages described above and divide it by the total number of claims in all the cohorts, 709,276 for the first set of data, counting a claim that appears in multiple cohorts multiple times. (Mercurio Report 14; Defs.’ Mem. 4-5.) Take for example 2000 total claims which are divided into two cohorts of 1,250 claims each. In each cohort of 1,250 claims, 750 claims are unique and 500 claims overlap, appearing in both cohorts. In such a scenario, Mercurio would have divided the sum of the weighted percents of the two cohorts by 2,500 (1,250 plus 1,250). Mercurio’s calculation (30,186 divided by 709,276) amounted to a percent of 4.2559%. (Defs.’ Mem. 4-5.) Finally, Mercurio took this percent, which he also calls a “weighted percent,” and multiplied it by the number of unique claims in the entire claim population, 486,641. (Mercurio Report 14; Defs.’ Mem. 4-5.) For instance, in the previous example, Mercurio would have multiplied the cumulative weighted percent by 2,000. Mercurio’s calculation (486,641 multiplied by 4.2559%) gave Mercurio his final answer for that set of data: 19,945 total false claims. (Mercurio Report 14-15; Defs.’ Mem. 4-5.)

Dr. Hayne contends that Mercurio’s method for dealing with the overlapping nature of the cohorts is not reliable. Hayne attacked Mercurio’s method with an example featuring black and white marbles: assume a population of 10,000 marbles, 5,000 black and 5,000 white. (Hayne Report 6-9.) Assume further that all 5.000 black marbles are in bin one, and all 5.000 white marbles are in bin two. We desire to calculate the number of black marbles through statistical sampling. We create two cohorts: cohort A includes all the marbles in bin one (5,000 black marbles) and cohort B includes all the marbles in both bins one and two (5,000 black marbles and 5,000 white marbles). We take a simple random sample of 100 marbles from cohort A and a simple random sample of 100 marbles from cohort B. Our sample from cohort A will give us 100 black marbles. Our sample from cohort B will give us approximately 50 black marbles and 50 white marbles.

Following Mercurio’s method, to get a percentage of black marbles (instead of false claims) per cohort, we divide the number of black marbles in the sample by the sample size for each cohort. For cohort A, this is 100% (100 black marbles divided by a sample size of 100). For cohort B, this is 50% (50 black marbles divided by a sample size of 100). Next, we multiply those percentages by the total number of marbles in the cohort to get a weighted percent. For cohort A, this is 5.000 (100% times 5,000 marbles in the cohort). For cohort B, this is 5,000 (50% times 10,000 marbles in the cohort). Following Mercurio’s method, we add these two weighted percents together, getting 10.000 (5,000 for cohort A plus 5,000 for cohort B).

According to Mercurio, the next step is to divide this number by the total number of marbles in all the cohorts, counting a marble that appears in multiple cohorts multiple times, to get a cumulative weighted percent. Here, the total number of marbles in all the cohorts is 15,000 (5,000 marbles in cohort A plus 10,000 marbles in cohort B, 5,000 of which are also in cohort A). Dividing the sum of the weighted percents, 10,000, by the total number of marbles in all the cohorts, 15,000, we get a cumulative weighted percent of approximately 66.6667% (two thirds). Finally, we multiply this cumulative weighted percent by the number of unique marbles in the entire population, 10,000. This calculation leads us to an answer of approximately 6,667 black marbles, a far cry from the correct answer of 5,000 black marbles.

In response to this compelling example, Mercurio merely asserts that Hayne’s criticism is “incorrect” without explaining how it is possible that his method could result in such a patently wrong answer. (Mercurio Supplemental Report 6-11.) Mercurio also provides three examples where his method works, but the examples rely on situations distinct from the scenario that Hayne presented and the case at hand. (Id. at 7-9.) His first example involves cohorts that do not overlap, and thus his example does not involve the problematic calculations that Hayne criticizes. (Id. at 7-8.) His second example does not involve cohorts at all, but rather involves taking two samples from one entire population. (Id. at 8-9.) These two examples, ironically, are closer to the situation that would have arisen had Mercurio adopted either stratified sampling or simple random sampling, respectively. His final example does involve overlapping cohorts, but the cohorts in the example appear to be understood as random cross-sections of the entire population, with approximately identical percentages of black marbles in both of them. (Id. at 9.) This example merely dodges the power of Hayne’s criticism, and the problem in Mercurio’s method, which arises when the overlapping portion of a cohort has a different percentage of black marbles or false claims than the rest of the cohort. Mercurio has not shown that Hayne has lost his marbles.

Despite insisting that Hayne’s criticism is “incorrect,” Mercurio attempts to accommodate it by transforming cohort 22, the cohort which had contained all of the claims in the entire population, into cohort 22*, which includes only those claims that do not appear in any other cohort. (Id. at 7, 10-11.) Mercurio asserts that Hayne’s criticism should “have little, if any, impact on the conclusions” in his report, but replacing cohort 22 with cohort 22* reduced the number of false claims estimated by more than 5% when applied to the second set of data, hardly a trivial impact. (Id. at 10-11.) Even having replaced cohort 22 with cohort 22*, Hayne’s criticism remains powerful: cohorts 13 and 4 still overlap, and cohorts 15 through 21 still overlap with each other and every other cohort besides cohort 22*. (Mercurio Report App. C.)

As a last shot, plaintiffs point out that Unum’s critique has no application to cohorts 1-12, which do not overlap. While non-overlapping cohorts do not have this serious methodological flaw, the Court is troubled by the size of the confidence interval, ± 5,868.8 claims, in Mercurio’s final calculation of 8,027 false claims, with 95% confidence. ±5,868. claims is an extremely wide confidence interval. As Mercurio himself states:

the level of precision ... is the range within which the true value of the population is estimated to fall.
... Thus, if a researcher [in a political campaign poll] finds that 60% of likely voters in the sample support a particular candidate with a precision rate of ± 5%, then he or she can conclude that between 55% and 65% of the respondents in the population actually support that candidate [with 95% confidence].

(Id. at 9.) Viewed in this manner, Mercurio’s result amounts only to a conclusion that somewhere between 2,158.7 and 13,-895.3 false claims were filed, with 95% confidence. As the Reference Manual on Scientific Evidence states, “a broad interval signals that random error is substantial”; “the standard error measures the likely size of the random error____If the standard error is large, the estimate may be seriously wrong.” David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 83, 119 n. 120, 118 (Fed. Judicial Ctr. 2d ed. 2000). This leaves the Court’s confidence in the reliability of Mercurio’s result shaken.

Even were the size of the confidence interval smaller, Mercurio’s flawed attempt to use weighted averages and to compensate for the overlapping nature of the cohorts renders his method unreliable. It is the plaintiffs burden to prove by a preponderance of the evidence that Mercurio’s testimony is reliable, and the plaintiff has failed to establish that Mercurio’s method of using weighted averages to compensate for the overlapping nature of the cohorts has been subject to peer review and publication, or has gained acceptance within the relevant discipline. More fundamentally, Hayne has presented convincing evidence that the technique is susceptible to manipulation and significant error. Hayne’s critique was made in his first expert report, filed in February of 2008. Since then, Mercurio has filed two supplemental reports and testified at a lengthy hearing before the Court. At no point has Mercurio sufficiently explained how Hayne’s example could be produced if Mercurio’s method were reliable or adequately pointed the Court to scientific literature that supports it in the face of this criticism. As such, his testimony must be excluded.

ORDER

The Defendants’ motion to exclude Mercurio’s testimony [Docket No. 282] is ALLOWED. 
      
      . Dr. Mercurio is a litigation consultant employed by Freeman, Sullivan & Company. He holds a Ph.D. in economics from Princeton University and has a substantial education and employment background in econometrics, the application of statistics in an economic context. He has never testified as an expert in court before. However, he has done significant consulting in the field of econometrics, and much of his work has centered on false claims in the Medicare context. (Daubert Hr’g Tr. 7-21, Dec. 22, 2008.) I find he is qualified.
     
      
      . Dr. Hayne is a principal and consulting actuary employed by Milliman, Inc. He holds a Ph.D. in mathematics from the University of California and has extensive experience with statistics. He is a Fellow of the Casualty Actuarial Society and a Member of the American Academy of Actuaries. (Expert Report of Roger M. Hayne, Ph.D., FCAS, MAAA ("Hayne Report”) 2-3.) I find he is qualified.
     
      
      . Mercurio has not provided an alternative calculation if only the non-overlapping cohorts are used.
     