
    Louis Vuitton MALLETIER, Plaintiff, v. DOONEY & BOURKE, INC., Defendant.
    No. 04 Civ. 2990(SAS).
    United States District Court, S.D. New York.
    Dec. 13, 2007.
    
      See, also, 2007 WL 1498323.
    
      Steven Kimelman, Esq., Michael A. Grow, Esq., Alison Arden Besunder, Esq., Arent Fox LLP, Theodore C. Max, Esq., Sheppard Mullin Richter & Hampton LLP, New York, NY, for Plaintiff.
    Douglas D. Broadwater, Esq., Roger G. Brooks, Esq., Darin P. McAtee, Esq., Cra-vath, Swaine & Moore LLP, New York, NY, Thomas J. McAndrew, Esq., Thomas J. McAndrew & Associates, Providence, • RI, for Defendant.
   OPINION & ORDER

SHIRA A. SCHEINDLIN, District Judge.

I. INTRODUCTION

District courts are tasked with the “special obligation” of serving as the “gatekeepers” of expert evidence, and must therefore decide which experts may testify and present evidence before the jury. Recognizing that a purported expert’s opinion often carries special weight with the jury even when unwarranted, the Supreme Court has directed district courts .to “ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable.” Courts are given “broad latitude” in deciding “how to determine reliability” and in making the “ultimate reliability determination” In doing so,, however, courts are reminded that the Federal Rules of Evidence favor the admissibility of expert testimony, and their “role as gatekeeper is not intended to serve as a replacement for the adversary system.” Indeed, “[w]here the expert’s conclusion is drawn from a reliable methodology ... the correctness of that conclusion is still an issue for the finder of fact.” As a result, excluding expert testimony is the exception rather than the rule, particularly since “[vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof’ can serve as the means to “attack[ ] shaky but admissible evidence.”

In cases arising under the Lanham Act, the Court’s gatekeeper function is of heightened importance because the “pivotal legal question ... virtually demands [expert] survey research ... on [issues such as] consumer perception ....” Indeed, expert survey evidence is used more frequently in trademark law cases than in other areas of law, and courts have been advised to carefully scrutinize survey evidence particularly where a jury rather than a bench trial is contemplated.

While errors in a survey’s methodology usually go to the weight accorded to the conclusions rather than its admissibility, the Second Circuit has made clear that this is “subject, of course, to Rule 403’s more general prohibition against evidence that is less probative than prejudicial or confusing.” Although it is the exception, “there will be occasions when the proffered survey is so flawed as to be completely unhelpful to the trier of fact ....” and “its probative value is substantially outweighed by its prejudicial effect.”

As evident from the Report and Recommendation (“R & R”) issued by Professor Daniel J. Capra of Fordham University School of Law and Professor Barton Beebe of Cardozo School of Law (collectively, the “Special Masters”), much of the expert testimony proffered by the parties here warrants exclusion. The Special Masters acknowledged that their recommendation to exclude the majority of the expert testimony may seem “drastic.” They justify their conclusions, inter alia, on the ground that while methodological flaws in a survey generally raise questions of weight rather than admissibility, “questions of weight, when sufficiently accumulated, become so serious as to require exclusion.” The Special Masters further noted that the majority of the testimony presented “easy cases” for exclusion, but nevertheless, they aimed to “give each submission a fair reading with an evenhanded application of the law.”

Although the parties might regard the R & R to be severe in the scope of its recommended exclusions, the Second Circuit Court of Appeals and the lower courts within this Circuit provide support for the exclusion of survey evidence primarily under Rule 403 but also under Rule 702 where flaws are deemed to cumulatively undermine its relevance and reliability. Additionally, other courts considering the admissibility of expert survey evidence in trademark suits have reached similar conclusions.

Upon review of the R & R, it is beyond cavil that the Special Masters discharged their duty with careful consideration and thoughtful analysis of the parties’ opposing positions, the factual details of the expert reports and testimony at issue, the relevant evidentiary rules, and the case law. The Special Masters considered each expert’s survey on its own terms and while the number of exclusions may seem large, that is more properly attributed to the number of experts proffered by the parties than to over-exclusion by the Special Masters. Subject only to the modifications set forth in this Opinion, the Special Masters’ R & R is adopted and will be published as the Memorandum and Order of the Court.

II. BACKGROUND

On March 16, 2007, defendant Dooney & Burke, Inc. (“Dooney & Burke”) filed motions in limine to exclude the testimony and reports of plaintiff Louis Vuitton Mal-letier’s (“Louis Vuitton” or “LV”) experts: Drs. Richard A. Holub, Eugene Ericksen, Jacob Jacoby, and Mr. West Anson. On March 19, 2007, Louis Vuitton filed motions in limine to exclude the testimony and reports of Dooney & Burke’s experts: Drs. Robert N. Reitter and Bradford Cornell.

In light of the volume of the submissions on these motions, the Court appointed the Special Masters pursuant to Federal Rule of Civil Procedure 53(a)(1)(A) and (a)(1)(C) and by Order dated May 18, 2007 (the “May 18 Order”). Pursuant to the May 18 Order, the Special Masters were directed to submit to the Court a collaborative R & R on the pending motions no later than thirty days from May 18, 2007. On June 15, 2007, the Special Masters issued an extensive R & R spanning one hundred and ninety-two pages.

On July 5, 2007, Louis Vuitton objected to the R & R on the ground that the Special Masters had erred in excluding in their entirety the testimony and reports of its three survey experts. Louis Vuitton also objected to the exclusion, in part, of the testimony and report of its damages expert. Dooney & Bourke moved to adopt the R & R as to five of the six experts at issue, and conditionally objected to the exclusion of its survey expert’s report and testimony on the level of consumer confusion in late 2006, as well as the exclusion of his trademark dilution study.

III. APPLICABLE LAW

A. Federal Rule of Civil Procedure 53

Pursuant to Rule 53(g)(1), “in acting on a [special] master’s order, the court must afford an opportunity [for the parties] to be heard and may receive evidence, and may: adopt or affirm; modify; wholly or partly reject or reverse; or resubmit to the master with instructions.” As set forth in the May 18 Order and consistent with Rule 53(g)(3)-(4), the Court reviews de novo all objections to conclusions of law made or recommended by the Special Masters. All findings of fact made by the Special Masters are reviewed by the Court for clear error. Any rulings made by the Special Masters on procedural matters are to be set aside only if the Court finds an abuse of discretion.

B. Admission of Expert Testimony

The proponent of expert evidence must establish admissibility under Rule 104(a) of the Federal Rules of Evidence by a “preponderance of proof.” Rule 702 of the Federal Rules of Evidence states the following requirements for the admission of expert testimony:

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.

Under Rule 702 and Daubert, the trial judge must determine whether the proposed testimony “both rests on a reliable foundation and is relevant to the task at hand.” A district court must act as “a gatekeeper to exclude invalid and unreliable expert testimony.” In doing so, the court’s focus must be on the principles and methodologies underlying the expert’s conclusions, rather than on the conclusions themselves.

Expert testimony may not usurp the role of the court in determining the applicable law. Although an expert “may opine on an issue of fact,” an expert “may not give testimony stating ultimate legal conclusions based on those facts,” Expert testimony is inadmissible when it addresses “lay matters which [the trier of fact] is capable of understanding and deciding without the expert’s help.”

In addition, Rule 403 states that relevant evidence “may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury.” “Expert evidence can be both powerful and quite misleading because of the difficulty in evaluating it. Because of this risk, the judge in weighing possible prejudice against probative force under Rule 403 ... exercises more control over experts than over lay witnesses.”

IV. DISCUSSION

As an initial matter, I address Louis Vuitton’s argument that Special Master Beebe’s previously undisclosed interactions with a former Dooney & Bourke attorney, Jeremy Sheff&emdash;who continues to practice with counsel for defendant but is no longer involved with the instant litigation&emdash;warrants his disqualification and the “disregard! ] in its entirety” of the R & R. By letter dated July 6, 2007, Louis Vuitton informed the Court of its “recent[ ] discover[y]” that, in 2006, Special Master Beebe had commented on a draft of Sheff s legal article on trademark law and dilution, but had failed to disclose this prior to his appointment. Louis Vuitton contends that Special Master Beebe’s “prior relationship” with a Dooney & Bourke attorney and their “collaboration on a project specifically regarding trademark law and dilution” constitute material facts that should have been disclosed prior to his appointment. According to Louis Vuitton, Special Master Beebe’s failure to disclose the “relationship” has created an appearance of impropriety that casts doubt on the impartiality of the R & R, particularly in light of the R & R’s “heavy weight in Dooney [ & Bourke]’s favor.”

Dooney & Bourke acknowledges that Special Master Beebe and Sheff made contact, but disputes the existence of any “relationship” that might warrant disqualification or the wholesale disregard of the R & R. Dooney & Bourke states that Sheff “has not had any responsibility” in its representation since May 2005, and was not aware of Special Master Beebe’s appointment in this action. Moreover, no Dooney & Bourke attorney was aware that Sheff had contacted Special Master Beebe for comments on a draft law review article in the past, nor did they know that Sheff had later thanked Special Master Beebe in the final version of that article.

By letter dated July 11, 2007, Special Master Beebe informed the Court that in August 2006, at Sheffs initiative, the two had briefly corresponded regarding Sheffs legal article on issues of trademark law as well as the legal market for professorships. Special Master Beebe confirms that he has never met or spoken with Sheff, and that his limited review of Sheffs article and the correspondence itself stemmed from his duties as part of Cardozo Law School’s Hiring Committee, as well as his own sense of obligation as “a member of the legal academic community.” Special Master Beebe further wrote that he does not recall Sheffs paper, and is not aware of Sheffs current employment situation.

Under Rule 53(a)(2), a special master “must not have a relationship to the parties, counsel, action, or court that would require disqualification of a judge under 28 U.S.C. § 455 (“section 455”) unless the parties consent with the court’s approval to appointment ... after disclosure of any potential grounds for disqualification.” Section 455(a) requires a judge’s disqualification “in any proceeding in which his impartiality might reasonably be questioned.” The Second Circuit has stated that section 455(a) “sets out an objective standard for recusal,” that is, “ ‘whether an objective, disinterested observer fully informed of the facts underlying the grounds on which recusal was sought would entertain a significant doubt that justice would be done in the case.’ ” Section 455(b)(1) requires the disqualification of a judge “[wjhere he has personal bias or prejudice concerning a party, or personal knowledge of disputed evidentia-ry facts concerning the proceeding.”

Section 455 neither requires the disqualification of Special Master Beebe nor the disregard of the R & R. The Court’s review of the correspondence between Special Master Beebe and Sheff confirms that their “connection” was isolated, brief, and limited to the discussion of Sheffs draft article and the legal market for professorships. The fact that the article’s subject matter is also trademark law and dilution is unremarkable given that Special Master Beebe specializes in intellectual property law. As such, it follows that his comments in that area of law are frequently solicited. Significantly, at no point in their brief correspondence did Sheff mention the instant litigation, his law firm’s representation of Dooney & Bourke, or his own prior participation in that representation. Considering the facts underlying the request to disqualify Special Master Beebe, no objective observer “would entertain a significant doubt that justice would be done in the case” or that the closely-reasoned and well-supported R & R should be wholesale rejected. Moreover, disqualification is not warranted under section 455(b)(1) as Louis Vuitton has offered absolutely no facts to demonstrate that Special Master Beebe harbors a “personal bias or prejudice concerning a party” or has “personal knowledge of disputed evidentia-ry facts” regarding this action.

Indeed, courts in this circuit have held that section 455 does not require disqualification or recusal in far closer cases. While the pre-appointment disclosure of any such interactions are ideal, I am confident that this particular disclosure would not have precluded Special Master Beebe’s appointment had it been disclosed at the outset. For the foregoing reasons, Louis Vuitton’s motion to disqualify Special Master Beebe and to disregard the R & R is denied.

A. Plaintiffs Experts

1. Dr. Eugene Ericksen

Dr. Eugene Ericksen “conducted a hybrid consumer confusion and trademark dilution survey for Louis Vuitton between December 6, 2006 and December 31, 2006 ( [the] ‘Ericksen Survey’).” The Special Masters recommended the exclusion of the Ericksen Survey in its entirety under Federal Rules of Evidence 403 and 702 due to the cumulative effect of a number of flaws. These flaws include the use of an improper stimulus, the poor choice of a control bag, the failure to instruct respondents against guessing, the improper classification of respondents, as well as other significant methodological errors. Moreover, with respect to the trademark dilution component of the Ericksen Survey, the Special Masters found that Dr. Ericksen’s analysis was plagued by the same methodological flaws present in the confusion component, and also “proceeds from a fundamental misunderstanding of the theory of dilution by blurring,” improperly conflating it with consumer confusion.

Finding no clear error in the Special Masters’ factual findings and reviewing their legal conclusions de novo, I adopt the Special Masters’ recommendation that Dr. Ericksen’s report and testimony be excluded in their entirety under Rules 702 and 403. The cumulative effect of the flaws outlined in the R & R render the report and testimony unreliable, and any probative value is substantially outweighed by the danger of unfair prejudice and misleading the jury.

2. Dr. Jacob Jacoby

Dr. Jacob Jacoby was retained by Louis Vuitton to conduct a trademark confusion survey (the “Jacoby Confusion Survey”) and a dilution survey (the “Jacoby Dilution Survey”), both of which have been discussed by this Court in a prior opinion. As I have previously noted and as the Special Masters have also found, the “Ja-coby Confusion Survey [Report] does not, in fact, describe the actual survey that was undertaken.” The Special Masters found that the facts strongly suggest'that the Jacoby Confusion Survey was “not reported in an accurate manner and ... was not conducted in an objective manner,” and is therefore not reliable under Rule 702. The Special Masters also found significant flaws in the survey including the improper definition of its universe, the use of a survey question that asked respondents for a legal conclusion,, and the improper classification of certain respondents as confused “based on factors not relevant to the marks at issue.”

In addition, the Special Masters found numerous, fundamental deficiencies in the Jacoby Dilution Study such as a lack of fit between the survey’s questions and the law of dilution, the improper coding and classification of several responses to the survey thus resulting in an “overstate[ment] of the number of respondents who were ‘diluted,’ ” and the unexplained inconsistency between the results of Dr. Jacoby’s pilot dilution study, which found “little to no net dilution” and the subsequent survey which found dilution.

Finding no clear error in the Special Masters’ factual findings and reviewing their legal conclusions de novo, I adopt the Special Masters’ recommendation that Dr. Jacoby’s report and testimony be excluded in their entirety under Rules 702 and 403. In considering the cumulative effect of the numerous flaws identified by the Special Masters, it is clear that Dr. Jacoby’s report and testimony on the issues of both trademark confusion and dilution are unreliable. Moreover, any probative value is substantially outweighed by the danger of unfair prejudice and misleading the jury.

3. Dr. Richard A. Holub

Dr. Richard Holub was retained by Louis Vuitton “to study and compare the use of color in the multicolor handbags of Dooney & Bourke and Louis Vuitton.” Specifically, Louis Vuitton offers Dr. Ho-lub’s testimony “for two purposes: (1) to prove the likelihood of confusion presented by Dooney [ & Bourkej’s multicolor logo; and (2) to prove Dooney [ & Bourkej’s willful intent to copy the Louis Vuitton Multicolore Monogram mark.” The Special Masters recommended that Dr. Ho-lub’s report and testimony be excluded in their entirety.

With respect to the first purpose, the Special Masters concluded that Dr. Ho-lub’s highly technical report and testimony on the similarity of colors between the parties’ handbags would not be helpful to the jury because the jury members can observe for themselves whether Dooney & Bourke’s mark is confusingly similar to Louis Vuitton’s Multicolore Monogram mark. The jury can make that determination without the help of an expert and certainly without the expert’s “digital photography ... technical jargon and colorim-eter approximation.” Finding no clear error in the Special Masters’ factual findings and reviewing their legal conclusions de novo, I adopt the Special Masters’ recommendation that Dr. Holub’s report and testimony should be excluded to the extent they are offered for the purpose of proving the likelihood of confusion.

With respect to Dr. Holub’s report and testimony offered for the purpose of proving intent to copy, the Special Masters noted that this presented a closer question as “Dooney & Bourke’s intent is potentially important in this case because, among other things, Louis Vuitton is seeking an accounting of Dooney & Bourke’s profits, and such an accounting is possible only upon a finding of willful intent on Dooney & Bourke’s part.” Further, with respect to damages, if Louis Vuitton can establish that Dooney & Bourke’s actions were intentionally deceptive, that gives rise to a rebuttable presumption of confusion. Indeed, the Special Masters recognized the potential probative value of Dr. Holub’s testimony on the issue of intent, noting “while it is true that Dooney & Bourke would not be liable for using even the same exact colors as Louis Vuitton ..., it is also true that evidence of copying the colors is at least probative of an intent to copy the Louis Vuitton mark itself.” Accordingly, the Special Masters stated that “on the question of intent[,] it could be helpful for the jurors to know exactly how much of the pallette was used in each mark, and the exact extent to which the colors used by Dooney & Bourke overlap with the colors used by Louis Vuitton.” “If Dooney & Bourke chose identical or very similar color combinations as were chosen by Louis Vuitton, that fact at least tends to prove an intent to infringe on Louis Vuitton’s mark.”

Ultimately, however, the Special Masters concluded the risk is too great that the jury may be influenced by Dr. Holub’s testimony and improperly use it toward resolving the likelihood of confusion issue, even if instructed by the Court to consider it only for the issue of intent. Moreover, according to the Special Masters, the probative value of expert testimony on the overlapping colors, even on the intent issue, is diminished by the fact that the colors are not themselves the mark that Louis Vuitton seeks to protect. Additionally, the similarity between the colors may be supported by inferences other than an intent to copy, such as fashion trends.

There is no clear error in the Special Masters’ factual findings. Reviewing their legal conclusions de novo, however, I hold that Dr. Holub’s testimony and report are admissible for the purpose of proving intent. Dr. Holub, however, may only testify to the extent of the overlapping use of colors in the Dooney & Bourke and Louis Vuitton multicolored monogram handbags. In doing so, I adopt the Special Masters’ suggestion that, should the Court allow Dr. Holub to testify on intent, he should “not be permitted to testify that his findings in fact indicated that Dooney & Bourke intentionally copied Louis Vuitton’s colors.”

Although I recognize that there is a risk that the jury “may take [Dr. Holub’s] testimony [on the intent issue] as an instruction on how to decide the question of likelihood of confusion,” that risk can be minimized by a limiting instruction. In fact, although the Special Masters ultimately rejected this approach, they did note that the risk of prejudice and confusion posed by Dr. Holub’s testimony may be “lessened somewhat by a limiting instruction.” This Court has routinely relied upon limiting instructions to “remind[] the jury of its role and of the limits of expert testimony [and] clarify the extent of their consideration of such testimony.” Further, the Second Circuit has recognized a “strong presumption that ju-ríes follow limiting instructions.” With respect to the factors that the Special Masters found to diminish the testimony’s probative value, including the fact that the mark at issue consists of more than just colors, as well as inferences that may support color overlap other than intent to copy, those are relatively minor and can be addressed on cross-examination and opening and closing statements.

The probative value of Dr. Holub’s report and testimony, limited to the overlap of colors between the Louis Vuitton and Dooney & Bourke marks, is not substantially outweighed by the dangers of unfair prejudice or confusing the jury. As a result, I will allow limited testimony by Dr. Holub.

4. Mr. Weston Anson

Mr. Weston Anson was retained by Louis Vuitton to review financial documents and accounting information produced by Dooney & Bourke during the course of discovery and “to prove the amount of net profit that Dooney & Bourke derived from its assumed infringement on the Louis Vuitton Multicolore [Monogram] mark; and [] to prove that Louis Vuitton suffered dilution of its Mul-ticolore Monogram mark as a result of Dooney & Bourke’s infringement.” The Special Masters recommended that Mr. Anson’s report and testimony be excluded but for his testimony on the “amount of net profits that Dooney & Bourke obtained from the allegedly infringing sales.”

Mr. Anson’s method of calculating net profits deducted only those costs directly incurred in the production of the allegedly infringing Dooney & Bourke handbags, but did not deduct a proportionate amount of general expenses such as overhead. Because the deduction of general expenses is required only if defendant can prove the connection between those expenses and the sales of the allegedly infringing items, the Special Masters qualified their recommendation to permit Mr. Anson’s testimony. Specifically, they recommended that Mr. Anson be permitted to testify to his calculation of Dooney & Bourke’s net profits, but only if Dooney & Bourke is unable to connect any general expenses to the sales generating those profits. If, however, Dooney & Bourke is able to do so, then Mr. Anson’s testimony must be adjusted to reflect the deduction.

The Special Masters found that Mr. An-son’s dilution study does not “fit” with the substantive law of dilution because he “does not purport to connect a loss of sales [by Louis Vuitton] in the United States to a loss of reputation on the part of Louis Vuitton.” The Special Masters further found that Louis Vuitton had failed to cite any case “to support the proposition that a plaintiffs loss of sales coincident with a defendant’s achievefment of a] ‘critical mass’ in the marketplace necessarily implies a loss of reputation.” Moreover, the Special Masters identified a number of flaws in the dilution study. Chief among those flaws is a lack of fit between the basic premise of the study — ie., that Louis Vuitton suffered a loss of sales in the United States resulting from Dooney & Bourke’s marketing efforts — and Louis Vuitton’s oft-expressed position in this litigation — ie., that it does not claim lost profits. The Special Masters also found Mr. Anson’s opinion “that there is a statistically significant difference between sales of [Louis Vuitton’s] handbags in the United States and the rest of the world .... ” to be unreliable under Rule 702 due to his complete reliance on another expert who has not been produced in this action. The Special Masters concluded that Mr. Anson’s testimony on dilution constitutes “nothing but conduit testimony from an expert on a matter outside his field of expertise.”

Finding no clear error in the Special Masters’ factual findings and reviewing their legal conclusions de novo, I adopt the Special Masters’ recommendation that Mr. Anson’s report and testimony be admitted in part and excluded in part, as set forth above, under Rules 702 and 403. In light of the number of serious flaws that plague Mr. Anson’s report and testimony, specifically on the issue of dilution, the probative value is substantially outweighed by the prejudicial effect and the serious potential to mislead the jury.

B. Defendant’s Experts

1. Dr. Bradford Cornell

I find no clear error in the Special Masters’ factual findings with respect to Dr. Bradford Cornell’s testimony and report on the issue of Louis Vuitton’s damages and Dooney & Bourke’s profits, and come to the same legal conclusions upon a de novo review. For those reasons and because neither party objects to the adoption of the R & R with respect to Dr. Cornell, I adopt the Special Masters’ recommendation and allow his testimony and report subject to the limitations set forth in the R & R.

2. Dr. Robert N. Reitter

Dr. Robert N. Reitter was retained by Dooney & Bourke to conduct a trademark confusion survey and trademark recognition survey in 2004. Both surveys have been previously discussed by the Court. In 2006, Dr. Reitter was once again commissioned to conduct a second trademark confusion survey and a dilution survey. The Special Masters recommended that Dr. Reitter’s reports and testimony on all of these surveys be excluded in their entirety.

Dooney & Bourke “does not challenge the Special Masters’ recommendation with respect to the 2004 [trademark confusion survey] or with the efforts to revive it” through the 2006 confusion survey. Rather, Dooney & Bourke objects on the ground that the Special Masters considered the 2006 confusion survey solely as a means to revive the 2004 confusion survey previously criticized by the Court. As such, Dooney & Bourke contends that the Special Masters failed to regard the 2006 confusion survey, standing alone, “as an independent study that found de minimis confusion as of late 2006” — specifically, November and December 2006.

Although the Special Masters primarily regarded the 2006 confusion survey as a response to the Court’s criticisms of the 2004 survey, the Special Masters did conduct a careful analysis of the 2006 confusion survey, independent of the 2004 survey. In doing so, the Special Masters identified a number of flaws in the methodology of the 2006 confusion survey and explicitly noted that the 2006 survey “suffers from some of the same methodological flaws that beset the 2004 survey.” These flaws include: inappropriate selection of non-upscale malls for the survey; a low screening standard for respondents; a “far from ideal” sampling method that precluded within-location comparisons among respondents who were exposed to the control bag versus Dooney & Bourke bags with name hangtags versus Dooney & Bourke bags without name hangtags; the low number of respondents participating in the survey; and the failure to employ a methodology involving sequential presentation or “line-up” of stimuli which better approximates marketplace conditions.

While the Special Masters remarked that each methodological flaw, standing alone, may not mandate exclusion, each flaw diminished the 2006 confusion survey’s reliability and probative value. Although a survey measuring the level of consumer confusion in late 2006 is of some probative value, the cumulative effect of the methodological flaws identified by the Special Masters so diminishes the reliability and probative value of the 2006 confusion survey that its exclusion is warranted under Rules 403 and 702.

With respect to Dr. Reitter’s dilution study conducted in 2006, the Special Masters recommended that it be excluded in its entirety because “as designed, it could provide no reliable indication of whether the [Louis Vuitton] Multicolore mark was diluted.” Finding that Dr. Reitter’s dilution study “ ‘reveals little except that there is a high consumer recognition of the Louis Vuitton Monogram Multicolore marks’ ” and failed to measure dilution, the Special Masters recommended exclusion under Rules 702 and 403. Specifically, they concluded that Dr. Reitter failed to “preclude the possibility that ... the recognition level of the Louis Vuitton Mul-ticolore [m]ark might have been higher but for the existence in the marketplace of Dooney & Bourke It-Bags.” As such, it was conducted pursuant to “fundamentally flawed” reasoning that rendered it irrelevant to the dilution question and inadmissible. In addition, the Special Masters addressed other flaws that do not, standing alone, require exclusion but “quell[ ] any doubt about [the study’s] exclusion.” These flaws include the improper grouping of handbags in the study, which failed to simulate market conditions, and the failure to utilize follow-up questions to prompt respondents to explain their answers.

Taken cumulatively, the methodological flaws identified by the Special Masters and the “fundamentally flawed reasoning” of the Reitter dilution study warrant its exclusion. Dooney & Bourke objects to the Special Masters’ recommended exclusion on the ground that “surely it is ‘relevant’ to know how strong the claimed mark is as a source-identifier-” But the study’s conclusion that the Louis Vuitton Multico-lore Monogram mark is still very strong does not address whether it has been diluted — rather, it speaks more to the strength of the mark itself and its fame, neither of which are disputed. To admit it as a “dilution” study poses a real threat of unfair prejudice and of misleading the jury. Rather than “settling] too high a bar for what evidence is ‘relevant,’ ” as Dooney & Bourke contends, the Special Masters properly found that the Reitter dilution study “casts no light” on the issue of dilution and amounts to an “ad hoc use of a new theory of testing dilution” that is unreliable. Finding no clear error in the Special Masters’ factual findings and reviewing their legal conclusions de novo, I adopt the Special Masters’ recommendation that Dr. Reitter’s report and testimony, including the 2006 confusion study and the dilution study, be excluded in their entirety.

V. CONCLUSION

For the reasons stated above, plaintiffs motions are granted in part and denied in part, and defendant’s motions are denied. The Clerk of the Court is directed to close the following motions: [Docket Entry Nos. 188, 194, 199, 204, 206]. A teleconference is scheduled for December 21, 2007, at 11:45 a.m.

SO ORDERED.

REPORT AND RECOMMENDATION OF THE SPECIAL MASTERS

Professors BARTON BEEBE and DANIEL CAPRA, Special Masters.

I.Legal Standards for Determining Admissibility of Expert Testimony IC

A. Rule 702 and Daubert. U<) fc-lO

B. Rule 403 . W OO

C. Survey Evidence. OO lo

D. Daubert Hearings. T — f OO

II. The Survey Experts. UN|

A. Dr. Eugene Ericksen. UNI

1. Facts. TO

a. The Ericksen Survey Universe. TO

b. The Ericksen Survey Stimuli. •■ÑP

c. The Ericksen Survey Questions. C*

d. Dr. Ericksen’s Confusion Analysis. ir-

e. Dr. Ericksen’s Blurring Analysis . UJ

2. Discussion. T-1

a. Dr. Ericksen Used an Improper Stimulus. r-1

b. Dr. Ericksen’s Control Was Flawed. UJ

c. Dr. Ericksen’s Confusion Analysis Is Flawed.

d. Dr. Ericksen’s Blurring Analysis Is Flawed. TO

3. Summary on Admissibility of Testimony, Survey, and Expert Report of Dr. Ericksen .

B. Dr. Jacob Jacoby .

1. The Jacoby Confusion Survey.

2. The Jacoby Dilution Survey.

a. Facts .

i. The Jacoby Dilution Survey Universe.

ii. The Jacoby Dilution Survey Stimuli.

iii. The Jacoby Dilution Survey Questions .

iv. The Jacoby Dilution Survey Dilution Analysis.

v. The Jacoby Dilution Survey Pilot Survey.

b. Discussion.

i. The Jacoby Dilution Survey is Not Relevant to the Issue of Dilution.

ii. The Flaws in the Jacoby Dilution Survey’s Categorization of Results .

iii. The Objectivity of the Jacoby Dilution Survey.

C. Robert N. Reitter.

1. Facts.

a. The 2004 Reitter Confusion Survey.

i. The 2004 Reitter Confusion Survey Universe and Sample

ii. The 2004 Reitter Confusion Survey Stimuli.

iii. The 2004 Reitter Confusion Survey Questions.

iv. The 2004 Reitter Confusion Survey Confusion Analysis ..

b. The 2006 Reitter Confusion Survey.

i. The 2006 Reitter Confusion Survey Universe and Sample

ii. The 2006 Reitter Confusion Survey Stimuli.

iii. The 2006 Reitter Confusion Survey Questions.

iv. The 2006 Reitter Confusion Survey Confusion Analysis ..

c. The 2006 Reitter Dilution Survey.

i. The 2006 Reitter Dilution Survey Universe and Sample..

ii. The 2006 Reitter Dilution Survey Stimuli.

iii. The 2006 Reitter Dilution Survey Questions.

iv. The 2006 Reitter Dilution Survey Dilution Analysis.

2. Discussion.

a. The Reitter Confusion Surveys.

b. The Flaws in the 2004 Survey.

i. Reading Test.

ii. Ineffective Control Bag .

iii. Coding Errors.

iv. Choice of Malls and Universe of Respondents.

v. The “Permission” Question .

vi. Preliminary Summary on the Admissibility of the 2004 Reitter Confusion Survey. 05 CO

c. Flaws in Methodology of the 2006 Confusion Survey. 05 CO

i. Mall Selection. Oí CO

ii. Sampling Method. 05 CO

iii. Sample Sizes. 05 CO

iv. Eveready Presentation. 05 CO

v. Other Alleged Methodological Flaws in the 2006 Confusion Survey . CO CO

(a) Poor Choice of Control Bag. CO CO

(b) Reading Test. CO CO

(c) Close Viewing Range. CO

d. The Relation Between the 2004 Reitter Confusion Survey and the 2006 Reitter Confusion Survey. 05 CO ^

e. The 2006 Reitter Dilution Survey. 05 CO 05

i. The Relevance of the 2006 Reitter Dilution Survey. 05 CO 05

ii. The 2006 Reitter Dilution Survey Stimuli. 05 CO <1

iii. Reitter’s Failure to Ask the 2006 Dilution Survey Respondents to Explain Their Answers. OO CO CO

3. Summary on Reitter Surveys. 05 CO CO

III. Dr. Richard A. Holub.

A. Facts.

B. Discussion.

1. Opinion offered to prove the likelihood of confusion.

a. Qualifications:...

i. Colorimetry...

ii. Statistical probability.

b. Reliability of conclusions on statistical probability.

c. Proper Subject Matter.

2. Opinion offered to prove intent.

a. Reliability.

b. Proper Subject Matter.

e. Rule 403.

C. Summary on Dr. Holub.

IV. The Damages Experts. 05 cn

A. Weston Anson. 05 cn

1. Facts. 05 cn

2. Discussion. 05 ai

a. Opinion Offered to Prove Dooney & Bourke’s Net Profits. 05 cti

i. Qualifications. 05 cn

ii. Statements in Anson’s Report Concerning the Existence of Infringement and Dilution. 05 or to

iii. Reliability of Methods Used to Determine Dooney & Bourke’s Profits. ^ iO CO

(a) Use of “incremental method” of deducting costs. lO CO

(b) Attributing 100 percent of the net profits to the alleged infringement.1. LO co

b. Opinion on Dilution. CO CO

i. Lack of “Fit”/ Problem of Prejudice and Jury Confusion-CO co

ii. Lack of “fit” with the substantive law of dilution.662

iii. Improper reliance on another expert.664

iv. Unreliability of regression analysis .666

3. Summary.669

B. Dr. Bradford Cornell.669

1. Facts.669

2. Discussion.671

a. Qualifications:.672

b. The Challenge to Dr. Cornell’s Four-Factor Test for Assessing Damages.672

c. The Challenge to Cornell’s Statistical Analysis of Louis Vuitton’s United States Sales.675

d. The Challenge to Cornell’s Report as Exceeding the Proper Scope of Expert Testimony.677

e. The Challenge to Cornell’s Use of the “Full Absorption” Method.678

3. Summary.679

V. Conclusion.679

Louis Vuitton Malletier (“Louis Vuitton”) brings this action against Dooney & Bourke, Inc. (“Dooney & Bourke”) alleging trademark infringement, trademark dilution, and unfair competition under the Lanham Act, 15 U.S.C. §§ 1051 et seq., and Section 301 of New York General Business Law. In her opinion dated August 27, 2004, Louis Vuitton Malletier v. Dooney & Bourke, Inc., 340 F.Supp.2d 415 (S.D.N.Y.2004) (“Vuitton /”), Judge Shira A. Scheindlin denied Louis Vuitton’s motion for a preliminary injunction. In its opinion dated June 30, 2006, Louis Vuitton Malletier v. Dooney & Bourke, Inc., 454 F.3d 108 (2d Cir.2006) (“Vuitton II”), the Second Circuit affirmed in part and vacated and remanded in part Judge Scheind-lin’s ruling in light of the Second Circuit’s opinion in Louis Vuitton Malletier v. Burlington Coat Factory Warehouse Corp., 426 F.3d 532 (2d Cir.2005). In anticipation now of a jury trial, the parties have fully submitted five motions in limine seeking to exclude the reports and testimony of six proposed experts. By Order dated May 18, 2007, Judge Scheindlin appointed the undersigned as Special Masters in this case and directed us to submit a collaborative Report and Recommendation to aid the Court in resolving these motions in limine.

For the reasons given below, we respectfully recommend that (1) Dooney & Bourke’s Motion in Limine to Exclude the Reports, Testimony, and Opinions of Dr. Eugene Ericksen and Dr. Jacob Jacoby be granted in its entirety, (2) Louis Vuitton’s Motion in Limine to Exclude Dooney & Bourke’s Proposed Expert Opinions, Testimony, and Surveys of Dr. Robert N. Reit-ter be granted in its entirety; (3) Dooney & Bourke’s Motion in Limine to Exclude the Report, Testimony, and Opinions of Richard A. Holub be granted in its entirety (4) Dooney & Bourke’s Motion in Li-mine to Preclude the Report, Testimony, and Opinions of Mr. Weston Anson be granted in part and denied in part, and (5) Louis Vuitton’s Motion to Exclude Defendant Dooney & Bourke’s Proposed Expert Testimony of Dr. Bradford Cornell be granted in part and denied in part.

The underlying facts of this case are set forth in Vuitton I, 340 F.Supp.2d at 424-428, and Vuitton II, 454 F.3d at 112-13, familiarity with which is assumed. In what follows, we use the term “Louis Vuitton Monogram Multicolore Mark” to denote the Louis Vuitton mark consisting of “(1) the interlocking initials [“L” and “V”] interspersed in a repeating pattern with the registered geometric shapes, (2) used in combination with the thirty-three special Murakami colors, (3) set against a white or black background.” Vuitton I, 340 F.Supp.2d at 438. See also Vuitton II, 454 F.3d at 115 (defining the Louis Vuitton mark at issue as “consisting of a design plus color, that is, the traditional Vuitton Toile pattern design — entwined LV initials with the three already described motifs— displayed in the 33 Murakami colors and printed on a white or black background.”). We use the term “Dooney & Bourke Multicolor Monogram Mark” to denote the pattern of interlocking “D” and “B” initials used by Dooney & Bourke on its It-Bags and imprinted on a white or black background.

We proceed by setting forth the basic legal standards applicable to the admissibility of expert testimony in general and survey evidence in particular. We then move to the motions in limine for each expert. The law governing trademark infringement and dilution claims is interspersed throughout the discussion of the expert testimony.

I. Legal Standards for Determining Admissibility of Expert Testimony

A. Rule 702 and Daubert

The admissibility of expert testimony is governed by the Federal Rules of Evidence. Federal Rule of Evidence 702 requires that a challenged expert must be qualified to testify on the basis of scientific, technical or other specialized knowledge, on a subject matter that “will assist the jury to understand the evidence or determine a fact in issue.” Thus, expert testimony is excluded under Rule 702 if it addresses “lay matters which the jury is capable of understanding and deciding without the expert’s help.” United States v. Lumpkin, 192 F.3d 280, 289 (2d Cir.1999). As amended in 2000, Rule 702 further requires that the expert s testimony must be (1) based on sufficient facts or data, (2) the product of reliable principles and methods, (3) reliably applied to the facts of the case. These three reliability-based requirements are intended to codify Daubert v. Merrell Dow Pharmaceuticals, Inc. 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993), and its progeny. See Advisory Committee Note to the 2000 Amendment to Evidence Rule 702. Under Daubert, a court is required to ensure that challenged expert testimony “is not only relevant, but reliable.” 509 U.S. at 589, 113 S.Ct. 2786. The Court in Daubert charged trial judges with the responsibility of acting as gatekeepers to exclude unreliable expert testimony. Subsequently the Court in Kumho Tire Co. v. Carmichael, 526 U.S. 137, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999), made clear that the gatekeep-ing function applies not just to scientific expert testimony as discussed in Daubert, but also to testimony based on technical or other specialized knowledge.

Daubert set forth a non-exclusive list of factors for trial courts to use in assessing the reliability of scientific expert testimony. The specific factors explicated by the Daubert Court are (1) whether the expert’s technique or theory can be or has been tested — that is, whether the expert’s theory can be challenged in some objective sense, or whether it is instead simply a subjective, conclusory approach that cannot reasonably be assessed for reliability; (2) whether the technique or theory has been subject to peer review and publication; (3) the known or potential rate of error of the technique or theory when applied and the existence and maintenance of standards and controls that govern the application of the expert’s process; and (4) whether the technique or theory has been generally accepted in the relevant community of experts. 509 U.S. at 592-94, 113 S.Ct. 2786. The Court in Kumho declared that “the factors identified in Daubert may or may not be pertinent in assessing reliability, depending on the nature of the issue, the expert’s particular expertise, and the- subject matter of his testimony.” Kumho, 526 U.S. at 150, 119 S.Ct. 1167. The Kumho Court emphasized that district courts have wide discretion both in determining the relevant factors to be employed in assessing the reliability of an expert’s testimony, and in determining whether that testimony is in fact reliable. Id. at 153, 119 S.Ct. 1167. See also Zuchowicz v. United States, 140 F.3d 381, 386 (2d Cir.1998) (decisions to admit or exclude expert testimony are evaluated under the “highly deferential abuse of discretion standard”). The ultimate inquiry for the district court is “to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho, 526 U.S. at 151, 119 S.Ct. 1167.

The proponent of the expert testimony must prove by a preponderance of the evidence that it is reliable. Daubert, 509 U.S. at 590, 113 S.Ct. 2786. Admissibility does not depend on whether the judge agrees with the expert’s conclusion; the focus is instead on the expert’s methodology. Id. at 595, 113 S.Ct. 2786. Yet as the Court has recognized, “conclusions and methodology are not entirely distinct from one another.” General Elec. Co. v. Joiner, 522 U.S. 136, 146, 118 S.Ct. 512, 139 L.Ed.2d 508 (1997). When an expert purports to apply principles and methods in accordance with professional standards, and yet reaches a conclusion that other experts in the field would not reach, “the trial court may fairly suspect that the' principles and methods have not been faithfully applied.” Committee Note to 2000 Amendment to Rule 702 (citing Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F.3d 594, 598 (9th Cir.1996)).

B. Rule 403

Rule 403 provides another source for excluding expert testimony. Under Rule 403, evidence “may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury.” The Court in Daubert emphasized that expert testimony “can be both powerful and quite misleading because of the difficulty of evaluating it.” 509 U.S. at 595, 113 S.Ct. 2786 (quotation omitted). Accordingly, the judge in applying Rule 403 must exercise “more control over experts than over lay witnesses.” Id.

C. Survey Evidence

Several of the experts challenged in this case are relying on surveys of potential purchasers of the handbags at issue. Survey evidence is generally admissible in cases alleging trademark infringement under the Lanham Act. See Sobering Corp. v. Pfizer Inc., 189 F.3d 218, 227-28 (2d Cir.1999) (endorsing the “modem view” that evidence of the state of mind of persons surveyed is not inadmissible as hearsay). To assess the admissibility of survey evidence, the court should consider a number of criteria, including whether:

(1) the proper universe was examined and the representative sample was drawn from that universe; (2) the survey’s methodology and execution were in accordance with generally accepted standards of objective procedure and statistics in the field of such surveys; (3) the questions were leading or suggestive; (4) the data gathered were accurately reported; and (5) persons conducting the survey were recognized experts.

Vuitton I, 340 F.Supp.2d at 433 (citation and alterations omitted). A trademark survey must also approximate marketplace conditions. Trouble v. Wet Seal, 179 F.Supp.2d 291, 308 (S.D.N.Y.2001) (“Although no survey can construct a perfect replica of ‘real world’ buying patterns, a survey must use a stimulus that, at a minimum, tests for confusion by roughly simulating marketplace conditions.”). “While errors in survey methodology usually go to weight of the evidence, a survey should be excluded under Rule 403, Fed. R.Evid., when its probative value is substantially outweighed by its prejudicial effect or potential to mislead the jury.” MasterCard Int’l Inc. v. First Nat’l Bank of Omaha, No. 02 Civ. 3691, 2004 WL 326708, at *11 (S.D.N.Y. Feb. 23, 2004) (citing Schering, 189 F.3d at 228). Thus, a survey “should be excluded under Rule 403 when it is so flawed in its methodology” that the survey proves little and the jury is very likely to be misled. Cache, Inc. v. M.Z. Berger & Co., 2001 WL 38283 at *6 (S.D.N.Y.). See also Starter Corp. v. Converse, Inc., 170 F.3d 286, 297 (2d Cir.1999) (“The District Court correctly found ... that a survey may be kept from the jury’s attention entirely by the trial judge if it is irrelevant to the issues.” (citation omitted)). While courts in the Second Circuit rely mainly on Rule 403 to exclude unreliable surveys, we note that Rule 702 is clearly applicable as well, because the result of a survey is essentially expert testimony, and Rule 702 requires that such testimony must be reliable. The bottom line is that if the survey suffers from substantial methodological flaws, it will be excluded under both Rule 403 and Rule 702.

D. Daubert Hearings

Courts often hold pretrial evidentiary hearings — known as Daubert hearings — to determine whether challenged expert testimony is reliable under Rule 702 and admissible under Rule 403. Whether to hold a Daubert hearing is within the discretion of the court. See Committee Note to 2000 Amendment to Rule 702 (noting that the Rule “makes no attempt to set forth procedural requirements for exercising the trial court’s gatekeeping function over expert testimony”). The failure to hold a Daubert hearing may be an abuse of discretion when the admissibility ruling is tantamount to a ruling on summary judgment and there are substantial disputed issues of fact that are pertinent to the reliability inquiry. See, e.g., Padillas v. Stork-Gamco, Inc., 186 F.3d 412, 418 (3d Cir.1999). But see Oddi v. Ford Motor Co., 234 F.3d 136, 154 (3d Cir.2000) (no error in failing to hold Daubert hearing before excluding evidence and granting summary judgment; distinguishing Padillas as a case involving a thin record under which the court could not have evaluated the expert’s methods).

A Daubert hearing is unnecessary when the evidentiary record pertinent to the expert opinions is already well-developed. For example, in Miller v. Baker Implement Co., 439 F.3d 407 (8th Cir.2006), the court found that the trial judge did not abuse discretion in excluding the plaintiffs expert testimony without a Daubert hearing. It noted that the plaintiff submitted affidavits, a detailed explanation of the proposed expert testimony and a legal memorandum addressing the expert evidence issues. The court noted that while Daubert hearings “may be necessary in some cases, the basic requirement under the law is that parties have an opportunity to be heard before the district court makes its decisions.” Miller v. Baker Implement, 439 F.3d at 412. See also Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 249 (6th Cir.2001) (Daubert hearing not required where the record was extensive and the Daubert issue was fully briefed by the parties).

In this case, the parties have extensively briefed the issues pertinent to each expert’s testimony. Each of the challenged experts has been subject to a lengthy deposition. Each of the expert’s reports (as well as each of the reports of experts challenging the reliability of some of those reports) has been submitted to the court. It is difficult to think of anything missing from the presentation by the parties that could be pertinent to these in limine motions. Accordingly, we find that a Daubert hearing is unnecessary.

II. The Survey Experts

A. Dr. Eugene Ericksen

Dr. Eugene Ericksen conducted a hybrid consumer confusion and trademark dilution survey for Louis Vuitton between December 6, 2006 and December 31, 2006 (“Ericksen Survey”). In essence, the Er-icksen Survey took the form of a mall intercept survey in which 308 respondents were shown one of three videos. In the first and second videos, the same woman was shown carrying a Dooney & Bourke bag bearing the Dooney & Bourke Multicolor Pattern imprinted on either a white or black background. In the third video, the woman was shown carrying a Coach bag as a control. After viewing the video, respondents were asked three key questions: (1) “Who do you think makes the handbag you saw on the video?”; (2) “Does the handbag you saw on the video call to mind any other brands?”; (3) “Do you think the maker of the handbag you saw in the video needed to get permission, authorization or licensing from any other company for the use of the multicolored design pattern of this handbag?” Overall, having found no confusion with respect to the Coach control bag, the Ericksen Survey found that 20.2 percent of 104 respondents who saw the first video believed that Louis Vuitton made the bag shown, and that 22.3 percent of the 103 respondents who viewed the second video believed that Louis Vuitton made the bag shown. With respect to dilution, Dr. Er-icksen found that “29.7 and 27.1 percent, respectively, of the qualified handbag consumers considered the white and black multicolored monogram patterns of the Dooney & Bourke handbags to be similar to the white and black Louis Vuitton multicolored monogram trademarks.”

We find that the Ericksen Survey used a severely flawed methodology and is unreliable; as such its probative value is substantially outweighed by its prejudicial effect and potential to mislead the jury. Thus it is inadmissible under Rules 403 and 702. We therefore recommend that the Ericksen Survey be excluded in its entirety. We first describe the facts of the Ericksen Survey in more detail and then explain the reasoning behind our recommendation.

1. Facts

Dr. Ericksen is a Professor of Sociology and Statistics at Temple University. He holds a Ph.D. in Sociology and an M.A. in Mathematical Statistics from the University of Michigan. Dr. Ericksen has published numerous articles on statistical sampling methods and census-taking, among other topics, and has submitted written testimony on the reliability of methods used in the United States Census before United States Senate and House of Representative Committees. Before conducting the survey at issue in this case, Dr. Erick-sen had conducted at least ten trademark confusion surveys. The Ericksen Survey constituted, however, the first time in his career that Dr. Ericksen used a video of a product being tested as the survey stimulus. The Ericksen Survey also constituted the first time in his career that Dr. Ericksen tested on the question of trademark dilution. Despite this possible lack of specialization, we find that Dr. Ericksen is sufficiently qualified to provide expert testimony under Rule 702. See, e.g., Stagl v. Delta Air Lines, Inc., 117 F.3d 76 (2d Cir.1997) (error to exclude expert testimony on whether an airport baggage claim area was unsafe due to its design; the witness had a master’s degrees in mechanical engineering and had consulted on the design of a number of public spaces; he was not unqualified simply because he had never designed an airport baggage claim area).

a. The Ericksen Survey Universe

In his report, Dr. Ericksen states that he designed the Ericksen Survey to test for initial interest and post-sale confusion and for dilution by blurring, but not for point-of-sale confusion or dilution by tar-nishment. Dr. Ericksen oversaw the administration of a mall intercept survey conducted in five malls located in the four U.S. Census regions (with two malls being used in the more populous “South” region). To serve as a location for the survey, a mall was required either to have a store in which Louis Vuitton handbags were sold or to have “upscale stores and be located in an area with higher than average income.” The Ericksen Report does not specify how many of the five malls used in the Ericksen Survey contained a freestanding Louis Vuitton store or a store that sold Louis Vuitton handbags at the time of the survey, and Dr. Ericksen was unable to specify the number in his deposition testimony. Each of the malls contained a Coach store. To qualify for the survey, a potential respondent was required to be a female aged sixteen or older who had either bought a handbag valued at $100 or more in the preceding year or planned to buy a handbag valued at $100 or more in the succeeding year. Dr. Ericksen reports that 89.3 percent of the respondents ultimately included in the survey sample qualified under both criteria.

b. The Ericksen Survey Stimuli

Upon being qualified for the survey, the survey respondent was taken into an interviewing room and seated three to five feet away from a 19- or 20-inch, standard-definition (i.e., non-high-definition) television screen. The respondent was then randomly assigned the number 1, 2, or 3. Each of these numbers was assigned to respondents at each of the five malls. The respondent was then shown one of three videos according to the number assigned to the respondent. Each of these videos showed the same young dark-haired woman wearing a white coat with a white fur-lined trim around the neck, walking first to thé right and then to the left before a white-painted cinderblock wall with a bag slung over the shoulder facing the camera. The camera was located approximately twenty-five feet from the model; the videographer used a lens which magnified the image by a factor of twelve. In each of the videos, the woman carried a different bag. The parties strenuously dispute the degree to which a respondent could perceive, if at all, the details of each of the bags shown in the videos. We first describe each of the three bags that the woman was actually carrying when she was being videotaped. We then describe the degree to which the videos showed the details of the bags.

In Video # 1, the woman was carrying a Dooney & Bourke It-Bag bearing the Doo-ney & Bourke Multicolor Monogram Mark imprinted on a white background. This bag was similar in its structural design to the Louis Vuitton “Papillon” bag. As the woman walked to the left, a pink enameled heart hanging by a leather strap from one of the handles of the bag was visible to the camera. (The heart contained a Dooney & Bourke imprint, but as discussed below, the imprint could not be seen in the video). In Video # 2, the woman was carrying a Dooney & Bourke It-Bag bearing the Doo-ney & Bourke Multicolor Monogram Mark imprinted on a black background. This bag was somewhat similar in its structural design to the Louis Vuitton “Aurelia” bag, though the Dooney & Bourke bag did not display a central outer pocket. As the woman walked both to the right and to the left, the pink enameled heart was visible to the camera, but the lettering on it was not. In Video # 3, which showed the control bag, the woman carried a “Holiday Patchwork Totebag” produced by Coach, Inc. (“Coach Patchwork Bag”) and featured by Coach in its stores and catalogue at the time that Dr. Ericksen conducted his study. Dooney & Bourke asserts, and Louis Vuitton does not dispute, that the Coach Patchwork Bag is not similar in its patchwork design to any Louis Vuitton bag. The Coach Patchwork Bag was significantly larger than the Dooney & Bourke bags used in Videos # 1 and # 2. The fabric design of the Coach Patchwork Bag consisted of a patchwork of various sewn-together rectangles of leather and other material in green, burgundy, varying shades of brown, and one patch featuring a design akin to zebra stripes. On some of these rectangles was imprinted in a single color a stylized “C” in a repeating pattern. This “C” was significantly larger relative to the rest of the bag than the “D” and “B” on the bags shown in Videos # 1 and # 2. The bag did not otherwise bear any repeating monogram pattern. As the woman walked both to the right and to the left, a small Coach hangtag was visible to the camera.

The parties dispute whether Dr. Erick-sen could have used a different Coach bag, if not a different bag altogether as a control. Dooney & Bourke has submitted into evidence numerous images of handbags produced by third-parties featuring a repeating monogram pattern, some of which featured a multicolored repeating monogram pattern. Louis Vuitton does not dispute that these bags were available to Dr. Ericksen as possible controls. Louis Vuitton does, however, dispute Doo-ney & Bourke’s assertion that Coach itself produced a multicolored monogram pattern bag that could have been used as a control. Specifically, Dooney & Bourke asserts that Coach produced a 2006 version of the 2007 “Hamptons Weekend Scribble Tote,” both of which versions feature a multicolor monogram pattern on a white background. However, the 2006 version was discontinued by Coach in early 2006. prior to when Dr. Ericksen conducted his survey, and the 2007 version was not available in stores until March 2007.

Another issue in dispute is what could be seen by respondents in the videos. We separately viewed each of Videos # 1, # 2 and #3 repeatedly — one of us on a 20-inch standard-definition television and the other on a 20-inch computer screen — and independently found on each viewing that, in Videos # 1 and # 2, the details of the Dooney & Bourke bags were blurred beyond comprehension. Specifically, the “D” and “B” are illegible, as is the writing on the pink enameled heart. In fact, it is not clear in Videos # 1 and # 2 what, if any, written design is imprinted on the Dooney & Bourke bags. Instead, all that is perceptible in Videos # 1 and # 2 is a pattern of vaguely-defined colored shapes rendered on a white or black background. It is therefore not surprising that a number of respondents stated that they were unable to distinguish the initials on the bag shown in Video # 1, in particular. As Louis Vuitton points out, numerous respondents explicitly stated that they were able to perceive the “D” and “B” initials on the bags shown in Videos # 1 and # 2. As Dr. Ericksen admitted, however, his methodology does not make it clear whether these respondents were able to do so because they were already familiar with the overall design of Dooney & Bourke It-Bags.

In this connection, it is particularly noteworthy that Dr. Ericksen made a variety of videos prior to choosing the three videos actually used, and that the rejected alternatives included a set of videos, labeled “Close,” that featured close-ups of the bags. We each viewed these “Close” videos and concluded that the “D” and “B” are easily legible. Dr. Ericksen’s explanation for why he did not use these videos is that “only seeing part of a woman is unrealistic ... [and] it overemphasizes the bag.”

As for Video #3, showing the Coach Patchwork Bag, the stylized “C” is legible on those patches on which it was imprinted. In the final frames of the video, as the woman walks to the left, the “C” is especially visible and legible. Any writing on the hangtag is not legible. Numerous survey respondents stated that they could read the “C” on the bag. Other respondents stated that they had seen the Coach Patchwork Bag in Coach stores, with one respondent stating that she had seen the bag in a Coach store only twenty minutes prior to her interview. Dr. Ericksen did not explicitly question the control respondents about their level of previous exposure to the control bag.

c. The Ericksen Survey Questions

After viewing the videos, the respondents were asked a series of questions. The most relevant questions consisted .of the following, which, for ease of reference, are numbered here as they were numbered on Dr. Ericksen’s questionnaire:

Q2. What did you see on the video?
Q3. Who do you think makes the handbag you saw in the video?
Q3a. What makes you say [that]?
Q4. Does the handbag you saw in the video call to mind any other brands?
Q4a. Which brand or brands?
Q4b. What makes you say [that]?
Q5. Do you think the maker of the handbag you saw in the video makes any other products or brands?
Q5a. Which other products or brands do you think they make?
Q5b. What makes you say [that]?
Q6. Do you think the maker of the handbag you saw in the video needed to get permission, authorization or licensing from any other company for the use of the multicolored design pattern of this handbag?
Q7. Who do you think they needed to get permission, authorization or licensing from for the use of the multicolored design pattern?
Q8. What makes you say [that]?

At no time was the respondent told not to guess or explicitly offered the option to answer “I don’t know.”

d. Dr. Ericksen’s Confusion Analysis

We review here Dr. Ericksen’s coding decisions with respect to consumer confusion in some detail because the parties strenuously dispute whether his coding decisions were proper. With respect to Video # 1, of thé 104 respondents who viewed the video, Dr. Ericksen classified twenty-one as “confused,” i.e., as incorrectly believing that the bag shown in the video was made by Louis Vuitton. Specifically, six respondents volunteered in response to Question 2 that they saw a Louis Vuitton bag — though one of these respondents expressed uncertainty. ■ An additional eleven respondents named Louis Vuitton in their answer to Question 3 — though two of these respondents expressed uncertainty. Finally, an additional four respondents answered “yes” to Question 6 and gave Louis Vuitton as their answer to Question 7— though one expressed uncertainty. Looking more closely at these twenty-one respondents’ answers to Questions 3a and 8, nine respondents explained that they named Louis Vuitton at least in part because of the “lettering” or “initials” on the bag. Eight other respondents did not mention the lettering or initials on the bag but explained that they named Louis Vuitton because of the “design” of the bag, the “color and the pattern” on the bag, or the way the bag “looks.” The responses of the remaining four respondents resist reliable classification.

With respect to Video # 2, of the 103 respondents who viewed the video, Dr. Er-icksen classified twenty-three as confused. Specifically, eight respondents volunteered in response to Question 2 that they saw a Louis Vuitton bag — none expressed uncertainty. An additional twelve respondents answered Louis Vuitton to Question 3— one expressed uncertainty. Finally, an additional three respondents answered “yes” to Question 6 and gave Louis Vuitton as their answer to Question 7 — none expressed uncertainty. As with Video # 1, the Video #2 respondents’ verbatim responses to Questions 3 a and 8 show a variety of reasons why the respondents gave the answers they did. Of the twenty-three Video # 2 respondents Dr. Ericksen classified as confused, seven referred at least in part to the “lettering,” “initials,” or “Louis Vuitton patterns” on the bag. Eight other respondents made no reference to the lettering or initials on the bag but rather referred to the “design,” “style,” or “colors and structure” of the bag — with one respondent expressing uncertainty.

Eight of the respondents gave responses that resist reliable classification.

With respect to Video #3, of the 101 respondents who viewed the video, one respondent speculated that the Coach Patchwork Bag shown “could be anywhere from Dolce & Gabbana to Louis Vuitton.” Dr. Erieksen did not classify this control respondent as confused. No other respondent referenced Louis Vuitton as the producer of the bag. Overall, sixty-five of the 101 respondents identified the bag as made by Coach — with one respondent expressing uncertainty. An additional fifteen respondents also expressed uncertainty as to the maker of the bag.

Thus, with no control respondents registering confusion, Dr. Erieksen determined that the level of consumer confusion with respect to the bag shown in Video # 1 was 20.2 percent and the level of consumer confusion with respect to the bag shown in Video # 2 was 22.3 percent. Dr. Erick-sen declared that there is no statistically significant difference between these percentages.

e. Dr. Ericksen’s Blurring Analysis

As above with his confusion analysis and for the same reason, we review Dr. Erick-sen’s coding decisions and reasoning with respect to his blurring analysis in detail. With respect to Video # 1 and the 104 respondents who viewed that video, Dr. Erieksen counted the twenty-one respondents he classified as confused as also demonstrating blurring. He reasoned that “a respondent who was confused by the appearance of the Dooney & Bourke handbag demonstrates that there was a blurring of the distinctiveness of the Louis Vuitton trademarks.” Dr. Erieksen counted an additional respondent as demonstrating blurring — but not as being confused — because she responded to the Question 3 that “I couldn’t tell, it kind of looked like Gucci, but maybe it was Louis Vuitton.” Dr. Erieksen reasoned that this respondent “thought that the bags were similar which therefore blurred the distinctiveness of the Louis Vuitton trademarks.” Finally, Dr. Ericksen classified a further thirteen respondents as demonstrating blurring because they answered “yes” to Question 4 and named Louis Vuitton in answer to Question 4a. Though his reasoning was not explicit on this point, it appears that Dr. Ericksen reasoned that to the extent that the Dooney & Bourke bag shown in Video # 1 “call[s] to mind” the Louis Vuitton brand, it blurs the Louis Vuitton trademark. It is worthwhile to look more closely at the verbatim responses that these thirteen respondents gave to Question 4b. Two of these respondents arguably made specific reference to the repeating monogram pattern on the bag. Eight made no reference to the repeating monogram pattern, but instead said that the Dooney & Bourke bag called to mind Louis Vuitton because of similarities in “style,” “shape,” or “colors.” The remaining three resist classification — such as the respondent who named Louis Vuitton because, she said, this was the “only flashy brand that I could think of off hand.” In sum, Dr. Ericksen concluded that thirty-five or 33.7 percent of the respondents viewing Video # 1 demonstrated blurring.

With respect to Video # 2 and the 103 respondents who viewed that video, Dr. Ericksen counted the twenty-three respondents he classified as confused as also demonstrating blurring. He did so for the same reasoning given above with respect to Video # 1. Dr. Ericksen classified an additional two respondents as demonstrating blurring — but not confusion — because they named Louis Vuitton in response to Question 3 but otherwise volunteered that they thought the bag shown in the video was a “fake” or a “knock off.” Finally, Dr. Ericksen classified an additional seven respondents as demonstrating blurring because they answered “yes” to Question 4 and named Louis Vuitton in answer to Question 4a. The verbatim responses given by these seven respondents to Question 4b are again revealing. Two respondents arguably made specific reference to the repeating monogram pattern on the bag. The remaining five referred instead to similarities in “look,” “style,” or “design.” In sum, Dr. Ericksen concluded that thirty-two or 31.1 percent of the respondents viewing Video # 2 demonstrated blurring.

It is not surprising that so many respondents who viewed Videos # 1 and # 2 referred to similarities in “look” or “style.” As Dr. Ericksen explained in his deposition testimony in answer to a question posed to him by Louis Vuitton’s counsel, the blurring component of his study “measured whether or not the look of the Dooney & Bourke handbags called to mind some other brand.” When Louis Vuitton’s counsel then asked what he meant by look, he answered: “Its design. Its style.” When Louis Vuitton’s counsel invited a narrow answer by asking Dr. Ericksen “[a]re you referring to the Louis Vuitton trademark pattern on the bag when you refer to look?”, Dr. Ericksen responded more broadly: “Well, I am simply referring to the overall look of the bag.”

With respect to the 101 respondents who viewed the control Video # 3, four stated that the Coach Patchwork Bag called to mind Louis Vuitton. Dr. Ericksen explained in his report that “[i]t seems reasonable that some respondents thought that the bag they saw was some expensive brand, and when asked what brand came to mind simply named a ‘high end’ brand they knew.” In any event, Dr. Ericksen adjusted his estimates of blurring with respect to Videos # 1 and # 2 by subtracting four percent from both of them. As a result, he ultimately estimated that 29.7 percent and 27.1 of the respondents who viewed Videos # 1 and 2, respectively, demonstrated blurring.

2.Discussion

Dooney & Bourke argues that the Ericksen surveys, and any testimony about those surveys, is unreliable and not probative because of a number of methodological flaws including:

1. Use of an improper stimulus
2. Poor choice of a control bag
3. The confusion analysis focused on the look of the bag rather than the marks at issue in this case, and also was undermined by flawed questions and the failure to instruct respondents not to guess.
4. The dilution analysis was equally flawed by focusing on the look of the bag, and in addition it is not probative because it did not properly measure dilution as determined by the substantive law.

We turn to each of these contentions, and add a number of other concerns during the course of our analysis.

a. Dr. Ericksen Used an Improper Stimulus

“[A] survey must use the proper stimulus, one that tests for confusion by replicating marketplace conditions.” Conopco, Inc. v. Cosmair, Inc., 49 F.Supp.2d 242, 253 (S.D.N.Y.1999). A survey that uses a stimulus that makes no attempt to replicate how the marks are viewed by consumers in real life may be excluded on that ground alone. See American Footwear Corp. v. General Footwear Co., 609 F.2d 655, 661 n. 4 (2d Cir.1979) (survey that failed even to come close to replicating “actual marketing conditions” was properly rejected by district court). A flaw in the choice of stimulus may not on itself warrant exclusion, but it certainly diminishes the reliability and the probative value of the survey and increases the risk of prejudice and jury confusion, problems which if added to other errors may warrant exclusion. See Vista Food Exch., Inc. v. Vistar Corp., No. 03 Civ. 5203, 2005 WL 2371958, at *5 (E.D.N.Y. Sept. 27, 2005) (rejecting plaintiffs survey under Rule 403 for, inter alia, its use of an altered form of the defendant’s mark, with the result that “the survey failed to replicate actual marketing conditions and improperly skewed the results in favor of responses indicating confusion”); Sears, Roebuck & Co. v. Menard, Inc. No. 01 Civ. 9843, 2003 WL 168642 (N.D.Ill. Jan. 24, 2003) (granting defendant’s motion in limine under Dau-bert and Rule 403 where, inter alia, plaintiffs survey used an altered form of defendant’s advertisement).

In evaluating the propriety of the Erick-sen Survey’s stimulus, “it is useful to be aware of the contours and limits of what [Louis] Vuitton asserts as its trademark.” Vuitton II, 454 F.3d at 115. Fortunately, Judge Scheindlin and the Second Circuit have repeatedly and clearly — and quite emphatically — defined these contours and limits. The Louis Vuitton Monogram Mul-ticolore Mark consists of “a design plus color, that is, the traditional Vuitton Toile pattern design — entwined LV initials with the three already described motifs — displayed in the 33 Murakami colors and printed on a white or black background.” Vuitton II, 454 F.3d at 115. See also id. at 116 (describing the mark at issue as “consisting of styled shapes and letters— the traditional Toile mark combined with the 33 Murakami colors”). Louis Vuitton may not claim exclusive trademark rights in “all uses of a multicolored logo against a white or black background because the use of multiple colors, when divorced from the geometric shapes and ‘LV monogram, lack secondary meaning.” Vuitton I, 340 F.Supp.2d at 440 (emphasis added); id. at 421 n. 6 (“There is no proof whatsoever that anyone believes that all colorful, monogrammed bags emanate from Louis Vuitton.” (emphasis added)); id. at 439. Louis Vuitton “has made numerous statements that this is not a trade dress case,” id. at 438 n. 118, and may not claim protection in a “look,” id. at 421. See also Vuitton II, 454 F.3d at 115 (“Notably, plaintiff does not claim a separate trademark in the colors alone. If it were to claim such a trademark, it would be required to show that the multicolors, set on a white or black background, create a separate and distinct commercial impression, apart from the monogram motif design, and that the colors serve to indicate Vuitton as the source.”) This issue of what is protected — the multicolor logo and not the look — has been fully ventilated in this litigation.

In essence, then, Louis Vuitton may not claim trademark rights in all designs-plus-colors combination as applied to handbags, but only in its particular “traditional Vuitton Toile” plus Murakami colors combination. A competitor is free to develop its own particular combination of initials and/or designs imprinted in various colors, as Dooney & Bourke and many others have, so long as its particular combination is not so similar to Louis Vuitton’s (in both designs and colors) as to mislead consumers as to the true source of the competitor’s goods. Indeed, because Louis Vuitton does not and cannot claim trademark rights in the Murakami colors alone, a competitor is free to use precisely those colors so long as it displays those colors in imprinted initials and/or designs sufficiently dissimilar to the traditional Vuitton Toile as not to cause consumer confusion. Cf. J. Thomas McCarthy, 4 McCarthy on Trademarks and Unfair Competition § 23:52 (4th ed. 2007) (“If defendant has used plaintiffs mark in the same lettering style, color, format, etc., then the likelihood of confusion is increased, whereas if the lettering style is dissimilar, confusion is less likely.”).

The stimulus Dr. Ericksen used in his survey is improper because he did not, in Videos # 1 and # 2, expose the survey respondents both to the colors Dooney & Bourke used to imprint its designs on its It-Bags and to the imprinted designs themselves. See Dreyfus Fund, Inc. v. Royal Bank of Canada, 525 F.Supp. 1108, 1117 (S.D.N.Y.1981) (“It is the impression which the mark as a whole creates on the average reasonably prudent buyer and not the parts thereof which is important”). Instead, Videos # 1 and # 2 show the bag so far away that the interlocking “D” and “B” initials cannot be seen, in a way that appears “blatantly designed to skew the survey’s results.” Conopco, 49 F.Supp.2d at 255. As noted, several respondents complained of the blurriness of Video # 1 in particular. Dr. Ericksen could have used the “Close” set of videos, which featured legible images of the “D” and “B” initials, but declined to do so, despite some precedent for the use of close-ups in video stimuli. See Lois Sportswear, U.S.A. v. Levi Strauss & Co., 799 F.2d 867, 873 (2d Cir.1986) (admitting but ultimately giving “limited weight” to a survey in which “[t]he videotape [stimulus] allowed one of the back pockets to be seen at a distance of about six feet and was then followed by a zoom shot of the pocket.”) Ultimately, as Dr. Ericksen admitted in describing the purpose of his hybrid confusion-dilution survey, Videos # 1 and # 2 tested, if anything, the degree to which the “overall look” of the Dooney & Bourke It-Bags was perceived as similar to the overall look of the Louis Vuitton Murakami bags. But the record in this case could not be clear-

er: Louis Vuitton may not claim trademark rights in a “look.” This is why it is peculiar that in defending Dr. Ericksen’s choice of stimulus, Louis Vuitton cites to Louis Vuitton Malletier v. Burlington Coat Factory Warehouse Corp., 426 F.3d 532 (2d Cir.2005), in which the Second Circuit explained that, in comparing trademarks, “it is the general overall impression that counts.” Id. at 538 (quoting Harold F. Ritchie, Inc. v. Chesebrough-Pond’s, Inc., 281 F.2d 755, 762 (2d Cir.1960)). Yet the respondents to Videos # 1 and # 2 could not perceive a “general overall impression” of the Dooney & Bourke Multicolor Monogram Mark because they could not distinguish the initials that form a critical element of that mark — an element that was likely to have disabused the respondents of whatever confusion they may have experienced. See, e.g., Nabisco, Inc. v. Warner-Lambert Co., 220 F.3d 43, 46 (2d Cir.2000) (defendant’s “prominent use of its well-known house brand therefore significantly reduces, if not altogether eliminates, the likelihood that consumers will be confused as to the source of the parties’ products”). Louis Vuitton then cites to Burlington’s statement that:

Even if a consumer can differentiate between two products, the question is whether, and to what degree, the look of the junior user’s product calls to mind the senior user’s product. It follows that, in the context of the case before us, the handbags need not be identical, but only similar, for there to be a likelihood of confusion.

Burlington, 426 F.3d at 538 n. 3. But in Burlington, Louis Vuitton was claiming trade dress infringement in addition to trademark infringement, id. at 536, with the result that the Burlington Opinion references both forms of infringement throughout. See, e.g., id. at 538 (quoting Fun-Damental Too, Ltd. v. Gemmy In dus. Corp., 111 F.3d 998, 1004 (2d Cir.1997) (“[W]e must ask whether they create the same general overall impression such that a consumer who has seen plaintiffs trade dress would, upon later seeing defendant’s trade dress alone, be confused.” (alterations omitted))). And, of course, the Burlington language that Louis Vuitton quotes refers to the “look,” i.e., the trade dress, of a product.

Louis Vuitton falls back on the theories of initial interest and post-sale confusion to defend the lack of visibility of the logos in Videos # 1 and # 2, and Dr. Ericksen writes in his report that he did not intend to test for point-of-sale confusion. Louis Vuitton argues that, from some distance, consumers will not be able to distinguish the “D” and “B” initials, but will instead perceive a blur of colors and outlined designs, and thus, that from that distance, consumers may believe that a Dooney & Bourke It-Bag was made by Louis Vuitton. To be sure, there is some tension between, on the one hand, the “precision,” Vuitton I, 340 F.Supp.2d at 438, with which the record has defined the trademark at issue and, on the other, Louis Vuitton’s claims of initial interest and post-sale confusion. But Louis Vuitton seeks to take advantage of the theories underlying initial interest and post-sale confusion indirectly to assert rights in a “look.” To the extent that Louis Vuitton asserts that consumers are confused initially or in the post-sale context by the look of a multicolored bag from a distance (when the initials or designs on the bag are obscured), this confusion is being caused by the look of the bag, in which, again, Louis Vuitton has disavowed any trademark rights. Accepting Louis Vuitton’s argument would lead to the absurd premise that its Multi-colore Monogram trademark would be infringed by any bag with relatively the same color and shape of the Louis Vuitton handbag when viewed from a few blocks away — even a bag without any monograms at all. That is obviously an unacceptable result, making it critical to remember that Louis Vuitton may not in this action claim trademark rights in a “look.”

Louis Vuitton also defends Dr. Erick-sen’s choice of stimulus on the ground that if Dr. Ericksen had used a legible close-up of the Dooney & Bourke Multicolor Monogram Mark, then his survey “would not be a real world test. Rather, it would have been merely an irrelevant reading test of the type previously rejected by this Court” in the context of the 2004 confusion survey conducted by Robert N. Reit-ter. See Vuitton I, 340 F.Supp.2d at 445. However, in that survey, a “heart shaped brass name sign” bearing the name “Doo-ney & Bourke” in full was prominently featured on the bag during the respondent’s initial, close-up viewing of the bag. See id. at 446. Here, by contrast, a legible close-up of the Dooney & Bourke Multicolor Monogram Mark would have exposed respondents to a monogram that is at the heart of this case. This would not constitute the kind of reading test that courts traditionally reject. Compare Franklin Resources Inc. v. Franklin Credit Mgt. Corp., 988 F.Supp. 322, 335 (S.D.N.Y.1997) (“[I]n the court’s view, this survey tested the participants’ ability to read [the name ‘Franklin’] and little else.”) with Conopco, Inc. v. Cosmair, Inc., 49 F.Supp.2d 242, 254-55 (S.D.N.Y.1999) (surveys where respondents look at the mark are “helpful” when “the source of the alleged confusion is not just a name, word or phrase”).

In sum, we find that the videos viewed by the respondents did not even come close to replicating the conditions under which the Dooney & Bourke Multicolor Monogram logo might be confused by consumers with the Louis Vuitton Multicolore Monogram logo. The error in methodology is especially troubling because, as stated above, Dr. Ericksen had prepared but did not show respondents a video in which the Dooney & Bourke lettering could have been seen. We believe that use of the improper stimulus renders the survey so fundamentally unreliable that the flaw is enough on its own to justify exclusion under Rules 702 and 403. At the very least, the flawed choice of stimulus significantly diminishes the reliability and probative value of the survey; and when that flaw is combined with other flaws in the methodology discussed immediately below, the survey is without doubt inadmissible.

b. Dr. Ericksen’s Control Was Flawed

A control stimulus is used in trademark surveys to “sufficiently account for factors legally irrelevant to the requisite confusion,” Cumberland Packing Corp. v. Monsanto Co., 32 F.Supp.2d 561, 574-75 (E.D.N.Y.1999), such as the “background noise,” id. at 574, generated by the “[b]e-fuddlement” that “is part of the human condition.” Reed-Union Corp. v. Turtle Wax, Inc., 77 F.3d 909, 912 (7th Cir.1996) (“No matter how clear the markings, no matter how different the names, no matter how distinctive the bottles, some confusion is inevitable.”). “Many courts have required control questions in order to filter out” this background confusion. Ironclad, L.P. v. Poly-America, Inc., No. 3:98 Civ. 2600, 2000 WL 1400762 at *8 (N.D.Tex. July 28, 2000). “A control product is one that is a non-infringing product which is similar to the products at issue.” Nabisco v. Warner-Lambert Co., 32 F.Supp.2d 690, 700 (S.D.N.Y.1999) (citation omitted); Shari Seidman Diamond, Reference Guide on Survey Research, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE at 258 (Federal Judicial Center 2000) (“In designing a control group study, the expert should select a stimulus for the control group that shares as many characteristics with the experimental stimulus as possible, with the key exception of the characteristic whose influence is being assessed.”). The use of an improper control may produce “an artificially low estimate of the normal degree of confusion affecting the purchase of products” bearing the mark at issue. Reed-Union, 77 F.3d at 912. See also Cumberland, 32 F.Supp.2d at 575 (“Given the inadequacy of the controls, one cannot determine from the data the extent of the relevant type of confusion by indirectly approximating the background noise.”).

Despite the availability of numerous handbags produced by third-parties featuring a repeating monogram pattern, some of them multicolored, Dr. Ericksen chose as his control stimulus a bag quite dissimilar in shape and pattern to the bags shown in Videos # 1 and #2. A control stimulus closer in design to the Louis Vuitton Monogram Multicolore Mark would have gone far towards isolating the amount of confusion attributable to the similarities in Dooney & Bourke’s and Louis Vuitton’s marks, rather than to the similarities in the “look” of their bags. Instead, Dr. Ericksen chose a control stimulus that had little in common with the bags at issue in this case and what it did have in common quite likely resulted in the underreporting of background “noise.” While the Coach Patchwork Bag was multicolored in nature, it did not feature a multicolored logo pattern covering the bag. Instead, certain patches on the Coach Patchwork Bag, one of which was clearly visible in the final frames of the control video, bore a logo pattern consisting of a relatively large and legible “C” imprinted in essentially a single dark color on a light background. It is telling that while the Dooney & Bourke Multicolor Monogram Mark was not visible in Videos # 1 and #2, the Coach “C” was legible in Video #3.

The flawed choice of control bag is probably not on its own dispositive of the admissibility of the survey. The Coach bag was not a very good “noise” reducer, but it seems to have been better than no control at all. See Shari Seidman Diamond, Reference Guide on Survey Research, in MANUAL ON SCIENTIFIC EVIDENCE 2d at 258 (Federal Judicial Center 2000) (“[A] survey with an imperfect control group generally provides better information than a survey with no control group at all, but the choice of the specific control group requires some care and should influence the weight that a survey receives”).

However, while the poor choice of control is not dispositive of inadmissibility, both Rule 702 and 403 require the court to look at the cumulative effect of all of the flaws in a survey. See Mastercard Int’l Inc. v. First Nat’l Bank of Omaha, Inc., No. 02 Civ. 3691, 2004 WL 326708, at *10, 2004 U.S. Dist. Lexis 2485, at *30 (S.D.N.Y. Feb 23, 2004) (assessing the cumulative impact of flaws in survey methodology and concluding that the “flaws in the Survey diminish its relevance in predicting actual confusion ... such that the potential for the Survey’s results to prejudice unfairly, to confuse, and to mislead the jury substantially outweighs any limited relevance”). Thus the poor choice of bag is an important factor cutting toward exclusion of the 2004 survey — especially when added to the problematic stimulus, which made no attempt to allow respondents to view the lettering on the Dooney & Bourke bag.

c. Dr. Ericksen’s Confusion Analysis Is Flawed

“In order to prove actual confusion, the confusion must stem from the mark in question,” General Motors Corp. v. Lanard Toys, Inc. 468 F.3d 405, 414 (6th Cir.2006), in this case, the Dooney & Bourke Multicolor Monogram Mark. Confusion caused by stimuli irrelevant to the trademark at issue should be disregarded. See Malaco Leaf, AB v. Promotion In Motion, Inc., 287 F.Supp.2d 355, 375 (S.D.N.Y.2003) (criticizing plaintiffs survey for attributing “the balance of the reported confusion (8%) to other indicia of confusion which are irrelevant to this Court’s trade dress analysis, including, inter alia, consumers’ belief that both products are the ‘same type of candy.’ ”); Cumberland Packing Corp. v. Monsanto Co., 32 F.Supp.2d 561, 573-75 (E.D.N.Y.1999) (determining that numerous survey respondents’ verbatim responses showed that these respondents’ confusion was caused by factors not relevant to the trade dress at issue). Furthermore, consumer confusion surveys should be designed to discourage guessing. See Conopco, Inc. v. Cosmair, Inc., 49 F.Supp.2d 242, 255 (criticizing survey that “was designed to exacerbate confusion by encouraging participants to guess”); Cumberland, 32 F.Supp.2d at 575 (surveys flawed for not discouraging guessing); Jacob Jacoby, A Critique of Rappeport’s “Litigation Surveys — Social ‘Science’ As Evidence,” 92 Trademark Rep. 1480, 1486 (2002) (“[Hjighly credible and substantiated empirical evidence exists to reveal that the absence of an explicit [“Don’t Know”] response category can substantially affect survey findings, often in the order of 20 percentage points or more.”). Cf. Schieffelin & Co. v. Jack Co. of Boca, Inc., 850 F.Supp. 232, 240 (S.D.N.Y.1994) (“Also excluded from the 47 percent figure were an additional two respondents who mentioned DOM PÉRIGNON in a manner suggesting that they were guessing.”).

Dr. Ericksen failed to limit his confusion analysis to the confusion, if any, that was caused specifically by the Dooney & Bourke Multicolor Monogram Mark — despite the clear record in this case that Louis Vuitton may not claim trademark rights in the “look” of the Murakami bags. With respect to Video # 1, of the 21 respondents whom Dr. Ericksen classified as confused, at least eight explained that they named Louis Vuitton in response to Questions 3a or 7 because of similarities in design, colors, or overall look. With respect to Video #2, of the twenty-three respondents classified as confused, at least eight referred to similarities in design, colors, or overall look.

Furthermore, Dr. Ericksen classified as confused respondents who named Louis Vuitton in answer to Question 7. Specifically, Dr. Ericksen classified four respondents exposed to Video # 1 as confused on this basis, and three exposed to Video # 2. Yet, as Judge Scheindlin has already noted in this case, questions akin to those asked in Questions 6, 7, and 8 have been “rejected by courts because they improperly ask respondents for a legal conclusion.” Vuitton I, 340 F.Supp.2d at 445. Regarding a survey conducted by Dr. Jacob Jacoby, Judge Scheindlin determined that the respondent’s answers to a “needed to get permission” question “carry little weight.” Id.

Finally, the Ericksen Survey respondents were not explicitly instructed against guessing. This is particularly disturbing in light of the number of respondents who expressed uncertainty in their verbatim answers to certain of the survey questions and in light of the degree to which the lettering of the Dooney & Bourke Monogram Multicolor Mark could not be seen in Videos # 1 and # 2. Because the Erick-sen Survey did not instruct against guessing, we have no way of knowing how many respondents named Louis Vuitton because, as one respondent put it, this was the “only flashy brand that I could think of off hand.” See J. Thomas McCarthy, 6 McCarthy on Trademarks and Unfair Competition, § 32:172 (4th ed. 2007) (“Caution must be exercised in evaluating the results of some open-ended survey questions about brands because respondents who merely guess will likely just play back the names of the best-known and dominant brands.”). Dr. Ericksen’s failure explicitly to provide a “don’t know” option is especially troubling in the context of this particular survey, in which respondents were screened according to, inter alia, whether they said that had bought or planned to buy a luxury handbag, and in which respondents were then essentially tested on their familiarity with luxury handbags. See Cumberland, 32 F.Supp.2d at 576 (the average person “has a motive to figure out the purpose of the survey and will feel some pressure to ‘answer correctly.’ The position in which the respondent is placed make[s] it important explicitly to instruct the respondents not to guess.”); Jacob Jacoby, A Critique of Rappeport’s “Litigation Surveys — Social ‘Science’ As Evidence, 92 Trademark Rep. 1480,1485-86 (2002) (“Knowledge questions such as whether or not the respondent knows the source of a product raise issues of social presentation. The respondent does not wish to appear foolish or ill informed by giving obviously incorrect answers or admitting to not knowing something that everyone else knows. Explicitly mentioning T don’t know1 as an answer category also reduces perceived threat. These procedures indicate that a ‘don’t know answer’ is acceptable even if it is not the most desirable answer.”) (quoting Seymour Sudman and Norman M. Bradbum, Asking Questions 112-13 (1982) (alterations omitted))).

Though we recommend that Dr. Erick-sen’s confusion analysis be rejected in its entirety, we note that if the Court chooses (1) to disregard respondents who were classified as confused even though they made reference only to the general style of the bag and (2) to disregard respondents who were classified as confused solely because of their response to Question 7, then Dr. Ericksen’s survey would show nine respondents confused in response to Video # 1 (for a confusion rate of 9/104 or 8.7 percent) and twelve respondents confused in response to Video # 2 (for a confusion rate of 12/103 or 11.7 percent).

d. Dr. Ericksen’s Blurring Analysis Is Flawed

As with his confusion analysis, Dr. Ericksen’s blurring analysis is based on the use of an improper stimulus and flawed control. Moreover, Dr. Ericksen failed to limit his blurring analysis to blurring caused specifically by the Dooney & Bourke Multicolor Monogram Mark, rather than by the overall “look” of the Dooney & Bourke bags shown in Videos # 1 and # 2. For example, considering the videos together, Dr. Ericksen counted three respondents as demonstrating blurring because, in Dr. Ericksen’s words, “they thought the bags were similar.” Dr. Er-icksen counted a further thirteen respondents as demonstrating blurring when these respondents made no reference to the Dooney & Bourke Multicolor Monogram Mark, but instead spoke of the style, shape, color, look, or design of the bags shown to them. As with his confusion analysis, Dr. Ericksen also failed to instruct against guessing. For these reasons alone, we find Dr. Ericksen’s blurring analysis to be fatally flawed.

But beyond these previously-discussed substantial flaws in methodology in determining confusion, Dr. Ericksen made a further critical error by counting “confused” respondents as also demonstrating blurring. Dr. Ericksen’s blurring analysis proceeds from a fundamental misunderstanding of the theory of dilution by blurring. It is axiomatic in trademark doctrine that a consumer — or, as here, a survey respondent — who is confused as to source cannot also demonstrate blurring. Consumer confusion occurs when consumers perceive two similar marks as referring to the same source. Trademark dilution by blurring occurs when consumers perceive two identical (or very similar) marks as referring to different sources. In blurring, the harm to the senior mark is that the link between the senior mark and the senior mark’s source is “blurred” by the presence in the marketplace of the identical (or very similar) junior mark linking to the junior mark’s source. See J. Thomas McCarthy, 6 McCarthy on Trademarks and Unfair Competition, § 24.69 (4th ed. 2007) (“Dilution by blurring consists of a single mark identified by consumers with two different sources. One mark: two sources. Traditional trademark infringement involves mistakenly connecting similar marks with the same source or an affiliate source. Similar marks: one source. The ordinary situation of no dilution and no infringement is: two different marks: two different sources.” (footnote omitted)). No individual consumer can believe both that the two marks refer to the same source and to different sources. Because Dr. Erick-sen’s survey so fundamentally misunderstands the theory of “blurring,” McCarthy deserves to be quoted on this issue at length:

A given unauthorized use by defendant can cause confusion in some people’s minds and in other people’s minds cause dilution by blurring, but in no one person’s mind can both perceptions occur at the same time. Either a person thinks that the similarly branded goods or services come from a common source (or are connected or affiliated) or not. In that sense they are inconsistent states of customer perception. But viewing the relevant customer group en masse, while some customers may be confused as to source or connection, other customers recognize the independence of source. For the former group, the legal claim is the traditional one of a likelihood of confusion. For the latter group, the legal claim is one of dilution.
It is important to see that as legal theories, a traditional likely confusion claim and a dilution claim look to separate and distinct harms to a trademark and can be pleaded as alternative legal counts. Both infringement by a likelihood of confusion and dilution can coexist as legal findings only if it is proven that a significant number of customers are likely to be confused and that among a significant number of other customers who are not confused, the defendant’s use will illegally dilute by blurring or tarnishment, but one state of mind does not overlap with the other in one person.

McCarthy, § 24.72 (emphasis in original). See also RESTATEMENT (THIRD) OF UNFAIR COMPETITION § 25, comment f (1995), Reporter’s Note (“Although in a particular case the use of another’s mark may confuse some consumers and dilute the value of the mark in the minds of other consumers, the state of mind required for confusion and dilution are distinct and inconsistent. The confused consumer believes that the actor’s use of the mark is connected with the trademark owner, and thus for such consumers the use does not dilute the distinctiveness of the mark.”). We recognize that dicta in the Second Circuit has confused the binary relationship between confusion and dilution. Compare Nabisco, Inc. v. PF Brands, Inc., 191 F.3d 208, 219 (2d Cir.1999) (“A junior use that confuses consumers as to which mark is which surely dilutes the distinctiveness of the senior mark.”), with: McCarthy, § 24.72 (referring to the Nabisco dictum as a “misunderstanding of the nature of ‘dilution’ by blurring.”). Nevertheless, the basic logic of the theory of dilution by blurring requires that any consumer — or survey respondent — who is confused cannot also demonstrate blurring. To the extent that the respondents whom Dr. Ericksen classified as “confused” make up a large majority of those whom he also counted as demonstrating blurring, we find this to be an additional independent basis to reject Dr. Ericksen’s survey results and testimony with respect to blurring.

3. Summary on Admissibility of Testimony, Survey, and Expert Report of Dr. Ericksen

We conclude that Dr. Ericksen’s survey, when offered to prove confusion and especially when offered to prove dilution, is inadmissible in its entirety under Rules 702 and 403. Accordingly, Dr. Ericksen’s expert report should be excluded and he should not be permitted to testify to any aspect of his survey.

B. Dr. Jacob Jacoby

Louis Vuitton retained Dr. Jacob Jaco-by to conduct a trademark confusion survey (the “Jacoby Confusion Survey”), which Judge Scheindlin has previously reviewed, Vuitton I, 340 F.Supp.2d at 442-45, and a trademark dilution survey (the “Jacoby Dilution Survey”), which Judge Scheindlin has also previously discussed, id. at. 449-51. The Jacoby Confusion Survey consisted of a mall intercept survey in which respondents were shown an advertisement for Louis Vuitton’s Murakami bags and then exposed to a Dooney & Bourke It-Bag and two control bags made by third-party manufacturers. The respondent was then asked a series of questions designed to elicit whether the respondent believed that any of the bags placed before her (1) “came from the same company whose bag was shown in the ad,” (2) was “put out” by a company that has “some business relationship” with the company whose bag was shown in the advertisement, or (3) came from a company that needed to “get permission or a license from the company whose bags were shown in the ad.” Based on the respondents’ answers to these and related follow-up questions, and taking into account their responses to the control bags, Dr. Jacoby found that a net of eleven percent of the respondents incorrectly believed that the IL-Bag “came from” Louis Vuitton, a net of seven percent believed that the It-Bag was “put out” by a company that has “some business relationship” with Louis Vuitton, and a net of seven percent believed that the Ib-Bag came from a company that needed to obtain permission or a license from Louis Vuitton.

The Jacoby Dilution Survey consisted of a mall intercept survey in which ninety-six respondents were shown two It-Bags, two Murakami bags, and a control bag in the form of a Dooney & Bourke bag bearing the interlocking initials “D” and “B” imprinted in red on a red background. While being shown these bags, the respondents were asked a lengthy series of questions designed, in essence, to elicit how knowledge of the availability in the marketplace of the It-Bags influenced the respondent’s perception of the Murakami bags and vice-versa. Dr. Jacoby found that twenty-three percent of the respondents “reported” “feeling one or more” of the following “aspects of dilution” for reasons relating specifically to the IN Bags’ use of the Dooney & Bourke Multicolor Monogram Mark: nineteen percent reported a “negative reaction” to the Mu-rakami bags or otherwise thought the IN Bags were a “copy” of the Murakami bags; one percent were “less likely to buy” the Murakami bags; five percent were “more likely to buy” the INBags; fourteen percent felt the Murakami bags were “less distinctive;” three percent felt they were “less valuable;” and seven percent felt they were “less exclusive.”

With respect to the Jacoby Confusion Study, we find that there are very serious discrepancies between that Study Report’s account of the conduct of the survey and the survey that was actually conducted, as Dr. Jacoby described it in his deposition testimony — discrepancies that Judge Scheindlin has already noted. Vuitton I, 340 F.Supp.2d at 444. The Jacoby Confusion Study Report did not accurately report the data collected and moreover the Jacoby Confusion Study was not conducted in a manner to ensure objectivity. See MANUAL FOR COMPLEX LITIGATION, FOURTH § 11:493 (Federal Judicial Center 2004) (factors relevant to determining whether a survey “conform[s] to generally recognized statistical standards” include whether “the data gathered were accurately reported” and whether “the process was conducted so as to ensure objectivity.”). Because of the seriousness of these discrepancies, as well as other significant methodological flaws in the survey, we will devote relatively brief attention to it. With respect to the Jacoby Dilution Survey, we find that the survey is not relevant to the issue of dilution and that it suffers from several serious methodological flaws, rendering it inadmissible under Rules 702 and 403. We therefore recommend that Dr. Jacoby’s testimony be excluded in its entirety and that his reports if offered be excluded as well.

1. The Jacoby Confusion Survey

In its initial overview of the survey’s “principal findings and conclusions,” the Jacoby Confusion Study Report describes the survey process as follows:

After the respondent indicated she was finished looking at the advertisement, it was removed from view. Next, the respondent was shown a set of three bags, one of the allegedly infringing Dooney bags and two “control” bags (a Coach bag and either an Etienne Aigner or Guess bag). With these bags in view, the respondent was then asked a series of questions designed to determine the degree to which (if at all) she was confused either as to source, affiliation or connection, and/or sponsorship or approval.
After eliminating two respondent data sheets that failed the quality checks at check-in and another respondent data sheet that failed the post-survey validation, data analysis was confined to the remaining 109 respondent data sheets.

The Jacoby Confusion Study Report subsequently states: “To be able to show that the effects obtained were not due to having used a single, perhaps atypical, Dooney bag, respondents were shown one of two Dooney Test bags. Approximately half the respondents were shown Dooney’s large ‘It Bag’ Wristlet; the other half was shown Dooney’s ‘It Bag’ ID/coin purse.” The clear implication of this statement is that the survey used two bags as stimuli, with the “Wristlet” bag exposed to approximately half of the 109 respondents, and the “ID/coin purse” bag shown to the other half. Notably, the report does not provide in an appendix any photographs of the Dooney & Bourke bags used as stimuli in the survey, though it does provide a photograph of the advertisement used to prime the respondents. Thus, the reader of the report has no way of knowing, for example, what the background color of the “Wristlet” bag was or what the background color of the “ID/coin purse” bag was. Also notable is that the Jacoby Confusion Study does not specify what proportion of respondents exposed to the “Wrist-let” were coded as confused, nor does it specify what proportion of the respondents exposed to the “ID/coin purse” were coded as confused. Instead, the report sets forth a table that purports to summarize the survey’s results with respect to the 109 respondents surveyed. This table contains a column headed simply “D & B.” The clear implication of this heading is that it refers to results obtained with respect to both the “Wristlet” and the “ID/coin purse,” and only the “Wristlet” and “ID/coin purse.”

As Judge Scheindlin has recognized, Vuitton I, 340 F.Supp.2d at 444, the Jaco-by Confusion Survey Report does not, in fact, describe the actual survey that was undertaken. As Judge Scheindlin explained:

Dr. Jacoby testified that the survey was conducted in two parts. During phase one, fifty-eight respondents were interviewed, and shown one of four different Dooney & Bourke bags or accessories. A percentage of the respondents expressed confusion as to each of the four bags as follows: 10 % (Bag 33); 7% (Bag 34); 6% (Bag 35); 5% (Bag 36). During phase two, fifty-one additional interviews were performed, and rather than being shown four bags, respondents were shown only one of the four Dooney & Bourke items-Bag 33. Dr. Jacoby explained that, although phase two of the study should have been run with two bags, the white and black wristlets, due to a “mistake” on the part of Dr. Kaplan (who was responsible for the interview process), only the white wristlet was used.

Id. Louis Vuitton offers no explanation for why the report failed to state 1) that the survey was conducted in two phases, the first of which used four different IL-Bags, or 2) that Dr. Kaplan made a “mistake” in his presentation of the survey stimuli to the survey respondents, so that in the second phase only one Dooney & Bourke bag was used — the one that happened to yield the highest confusion level in the first phase of the survey. See Shari Seidman Diamond, Reference Guide on Survey Research, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE at 270 (Federal Judicial Center 2000) (“The completeness of the survey report is one indicator of the trustworthiness of the survey and the professionalism of the expert who is presenting the details of the survey.”); and id. (noting that a survey report should provide in detail all visual exhibits used and a description of special scoring).

In his deposition testimony, Dr. Jacoby offers an explanation for the disjointed nature of the survey: the survey initially took the form of an extended questionnaire that sought to test both for confusion and dilution, but after it became apparent that respondents were fatigued by the number of questions, the survey was broken into a confusion component and a dilution component. Dr. Jacoby then retained the confusion-related data he had already collected with respect to the fifty-eight respondents interviewed up to that time, and sought to interview an additional set of respondents to achieve a sufficient sample size. This may be a reasonable explanation for the peculiar conduct of the survey itself, but the Jacoby Confusion Survey Report alludes to none of this, and appears to have been written in a deliberately ambiguous manner. More significantly, the Jacoby Confusion Survey Report’s failure to explicate the survey’s results with respect to “Bag 35” and “Bag 36” and the survey’s failure to continue to use those stimuli, as well as Bag 34, very strongly suggest that the survey was not reported in an accurate manner and that the survey was not conducted in an objective manner. We therefore conclude that Louis Vuitton has not proved that the Jacoby Confusion Survey was produced through a reliable application of reliable methods, and accordingly the Survey should be excluded under Rule 702. Furthermore, because of the serious questions surrounding the implementation of survey methods and the reporting of its results, the Jacoby Confusion Study’s probative value, if any, is substantially outweighed by its potential to mislead the jury and to create unfair prejudice. Therefore it should be excluded under Rule 403 as well. See Sterling Drug, Inc. v. Bayer AG, 14 F.3d 733, 741 (2d Cir.1994) (“To be probative on the issue of confusion, a survey must have been fairly prepared and its results directed to the relevant issues.” (citation omitted)).

While we find that the above concerns are alone sufficient to require the exclusion of the Jacoby Confusion Survey, we note other fundamental problems with the survey that, taken together with the problems of disjointed implementation and ambiguous reports, leave no doubt that the cumulative errors render the survey inadmissible. See Mastercard Int’l Inc. v. First Nat’l Bank of Omaha, No, 02 Civ. 3691, 2004 WL 326708, at *30 (S.D.N.Y. Feb.23, 2004) (court excludes survey on the basis of cumulative errors, even if each error considered alone might be thought a question of weight). First, the survey improperly defined its universe. The Jacoby Confusion Survey Report states that “the relevant population was defined as females who were 16 years of age or older and were potential Louis Vuitton or Dooney & Bourke customers (or potential customers of both).” The report continues:

To be classified as a potential Louis Vuitton customer, in the past year or two, the respondent had to have bought a handbag costing more than $350, or a purse or wallet costing more than $100, or say she was likely to do so in the next year or so. To be classified as a potential Dooney & Bourke customer, in the past year or two, the respondent had to have bought a handbag costing $100 to $350, or a purse or wallet costing $50 to $100, or say she was likely to do so in the next year or so. Some respondents qualified for both groups.

As Judge Scheindlin has previously noted, Vuitton I, 340 F.Supp.2d at 443, the qualifying questions used to filter for this universe employed the ambiguous term “purse,” which is synonymous with “handbag.” See id. Furthermore, an unknown proportion of respondents may have qualified only because they had bought a “handbag” or “purse or wallet” costing more than the required amount “in the past year or two,” but were not likely to do so in “the next year or so.” It is well-established that only potential future purchasers, not past purchasers, are relevant to a confusion study. See Universal City Studios, Inc. v. Nintendo Co., Ltd., 746 F.2d 112, 118 (2d Cir.1984) (“[T]he survey utilized an improper universe in that it was conducted among individuals who had already purchased or leased Donkey Kong machines rather than those who were contemplating a purchase or lease.”); American Footwear Corp. v. General Footwear Co., 609 F.2d 655, 660-661, n. 4 (2d Cir.1979) (holding that a survey was defective where survey participants, although former purchasers of the product at issue, did not necessarily have any present purchasing interest in the particular matter being surveyed); Jordache Enterprises, Inc. v. Levi Strauss & Co., 841 F.Supp. 506, 518-519 (S.D.N.Y.1993) (survey that “interviewed participants who had purchased or worn jeans within the past six months” but “did not inquire as to whether those participants intended to purchase jeans in the future ... does not necessarily include potential purchasers of jeans” and “does not constitute acceptable evidence of actual confusion”). Consequently, the Jacoby Confusion Survey failed in establishing the proper universe of respondents.

Second, the Jacoby Confusion Survey asked improper questions, and of the respondents whom the survey classified as confused, more than half were classified as confused based on their responses to these questions. The survey’s second main question consisted of the following:

If you have any thoughts about it, do you think the company that put out this bag is not part of and has no business relationship with the company whose bags were shown in the ad, or do you think the company that put out this bag is part of, or does have some business relationship with the company whose bags were shown in the ad?

The phrase “some business relationship” is highly ambiguous. See Shari Seidman Diamond, Reference Guide on Survey Research in MANUAL ON SCIENTIFIC EVIDENCE 2d at 248 (Federal Judicial Center 2000) (“When unclear questions are included in a survey, they may threaten the validity of the survey by systematically distorting responses if respondents are misled in a particular direction, or by inflating random error if respondents guess because they do not understand the question. If the crucial question is sufficiently ambiguous or unclear, it may be the basis for rejecting the survey.”). Notably, if the respondent answered yes to this question, the follow-up question did not ask about the nature of this “business relationship,” but rather asked about the bag: “What, in particular, makes you think that bag number [ ] come from a company that is part of, or has some business relationship with, the company whose bags were shown in the ad?” Thus, the respondents’ understanding of the term “business relationship” cannot be evaluated.

The survey’s third main question consisted of the following:

If you have any thoughts on this, do you think that, in order to come out with this bag the company did not need to get permission or a license from the company whose bags were shown in the ad, or do you think the company that put out this bag did need to get permission or a license from the company whose bags were shown in the ad?

As Judge Scheindlin has noted, this kind of question has been criticized by courts because it “improperly ask[s] respondents for a legal conclusion.” Vuitton I, 340 F.Supp.2d at 445. Indeed, the question asked the respondents the very question that this litigation seeks to answer.

Finally, the Jacoby Confusion Survey classified several respondents as confused based on factors not relevant to the marks at issue. For example, one respondent was classified as confused based on her identification of the It-Bag as “[coming] from the same company whose bag was shown in the ad.” Yet when asked what made her say so, the respondent responded: “white background, it has capital letter initials.” Two other respondents identified the It-Bag in response to the survey’s second main question. When asked what made them say so, one referred only to “the initial on the bag” while the other responded: “The style is familiar because of the lettering.” But Louis Vuitton has not argued that the mere interlocking initials “D” and “B” infringe on any Louis Vuitton mark. Rather, the mark at issue is the Dooney & Bourke Monogram Multicolor Mark. These respondents failed to refer to the multicolor aspect of this mark and are not properly classified as showing confusion. See Revlon Consumer Products Corp. v. Jennifer Leather Broadway, Inc., 858 F.Supp. 1268, 1275-76 (S.D.N.Y.1994) (inconsistent scoring and subjective coding rendered survey of no probative value).

We conclude that the cumulative effect of all the errors discussed above — errors of methodology, implementation, and reporting of results — renders the Jacoby Confusion Survey inadmissible under Rules 702 and 403.

2. The Jacoby Dilution Survey

a. Facts

i. The Jacoby Dilution Survey Universe

Dr. Jacoby defined the relevant universe for his dilution study exactly as he defined it for his confusion study, as “potential Louis Vuitton or Dooney & Bourke customers (or potential customers of both) who had bought a “handbag” or “purse or wallet” within the price ranges specified above.” While the Jacoby Confusion Survey Report stated ambiguously that “[s]ome respondents qualified for both groups,” the Jacoby Dilution Survey Report notes specifically that “[n]early seven in ten respondents (69%) qualified for both groups” — that is, they qualified both as Louis Vuitton and as Dooney & Bourke customers. The Jacoby Dilution Survey sampled this universe by conducting a mall intercept survey in four malls distributed by U.S. Census region.

Respondents were interviewed from March 9 to March 30, 2004.

ii. The Jacoby Dilution Survey Stimuli

As noted above, the Jacoby Dilution Survey had three basic stimuli: the It-Bags, the Murakami bags, and the control bag. More precisely, respondents were exposed to two of four different IL-Bags: the “Large Wristlet” with either a white or a black background and the “ID/Coin purse” with either a white or a black background. Similarly, respondents were exposed to two of four different Murakami bags: the “Pochette Accessoire” with either a white or a black background and the “Pochette Cles” with either a white or black background. As the names of these Dooney & Bourke and Louis Vuitton bags suggest, these bags are all relatively small in size. Finally, all respondents were exposed to a Dooney & Bourke “Small Zip Top” bag as the control.

As above with the Jacoby Confusion Survey, the Jacoby Dilution Survey had no separate control group. All respondents were shown the test bags and the control bag.

iii. The Jacoby Dilution Survey Questions

The Jacoby Dilution Survey used eight versions of the main questionnaire to rotate the order in which respondents were exposed to the bags and to rotate the order of the various questions asked of the respondents. Because the parties dispute whether the Jacoby Dilution Survey’s questions were proper, we review them in some detail. We take the first version of the main questionnaire as an example and, for the sake of clarity, refer to “stages” of the interview — even though Dr. Jacoby himself does not use this term. In each stage, the respondent was asked a series of questions that required the respondent to compare each of the basic stimuli to the other two.

The first stage of the interview proceeded as follows. After being seated in the testing room and told not to guess, the respondent was first exposed to the Large Wristlet and the ID/Coin purse and asked to verify the identification numbers on the bags. Then the respondent was exposed to the Pochette Accessoire and Pochette Cles bags and asked to verify their identification numbers. From thenceforth, the interviewer referred to the bags by these numbers. The interviewer then touched the It-Bags and asked the following: “Does knowing that these bags are being sold in this pattern make it less likely you would want to buy [the Murakami bags], more likely that you would want to buy [the Murakami bags], or does this not affect whether you would want to buy [the Murakami bags]?” If the respondent answered that she would be less likely to want to buy the Murakami bags, the respondent was then asked: ‘What, in particular, is it about the appearance of [the It-Bags] that makes it less likely you would want to buy [the Murakami bags]?” This was followed by a probe in the form of the question: “Anything else?” If the respondent mentioned “look(s),” “design,” “colors,” “pattern,” or “style,” in her response to these follow-up questions, the interviewer then asked: “What, in particular, is it about the [design, color, pattern etc.]”? This was followed by the probe “Can you describe that in more detail?” and then by the probe “Anything else?”

The respondent was then asked essentially the same series of questions, but the interviewer began by touching the Mura-kami bags and asking about the Ib-Bags.

Finally, the It-Bags were removed from view and the control bag was placed in front of the respondent next to the Mura-kami bags. After asking the respondent to verify the identification number on the control bag, the interviewer then asked the respondent essentially the same questions as above, by first touching the control bag and asking about the Murakami bags, and then by touching the Murakami bags and asking about the control bag.

In the second stage of the interview, with the control bag still present in front of the respondent, the interviewer handed the Murakami bags to the respondent and asked the following question, recorded here as it was in the questionnaire’s instructions to the interviewer:

For my next questions, I’d like you to suppose you actually owned one or both of the bags I am handing to you. PAUSE FOR THREE SECONDS, THEN CONTINUE: If you actually owned one or both of these bags, how would knowing that [the control bag] was being sold in this red-on-red pattern make you feel? RECORD VERBATIM. FOLLOW UP WITH: Why do you say that? PROBE ONCE WITH: Anything else?

The interviewer then asked a series of questions, the first of which was as follows: “Would the fact that [the control bag] was being sold in the red-on-red pattern make you feel that the bag you owned was less distinctive, was more distinctive, or would it not affect how you felt about the distinctiveness of the bag?” The respondent was not asked to explain her answer. Then the interviewer asked the same question, but referred to whether the availability in the marketplace of the control bag would make the respondent feel that the bags she was holding were “more valuable,” “less valuable,” or it would not affect how she “felt about the value of your bag.” This process continued with a third question — “less exclusive,” “more exclusive,” or no affect on the “exclusiveness of your bag” — and a fourth question — “more desirable,” “less desirable,” or no affect on the “desirability of your bag.”

This process was then repeated, but the interviewer removed the control bag from view and instead placed the It-Bags in front of the respondent. The respondent was again told to imagine that she “actually owned” the Murakami bags she was holding and was asked how knowledge that the It-Bags “were also being sold in the multi-color pattern on white or black” would make her feel about the Murakami bags.

Finally, in the third stage of the interview, the respondent was shown only'the It-Bags and asked the following, as recorded in the questionnaire:

Without handling them, if you know, which company puts out these bags? RECORD VERBATIM.
IF RESPONDENT GIVES INITIALS, SAY: What do those initials stand for? IF RESPONDENT SAYS SOMETHING LIKE: “I don’t know how to spell it,” SAY: Please spell it as best you can.

The respondent was then shown only the control bag and asked this line of questions and then shown the Murakami bags and asked this line of questions.

iv. The Jacoby Dilution Survey Dilution Analysis

In the Jacoby Dilution Survey Report, Dr. Jacoby defined dilution as follows: “Dilution was assessed as exerting an adverse impact on perceptions of the Louis Vuitton Multicolore bags, as exerting an adverse impact on the desire to buy the Louis Vuitton Multicolore bags, or as exerting a positive impact on the desire to buy the Dooney & Bourke ‘It-Bags.’ ” With respect in particular to the questions asked of respondents in stage one of the interview, which were addressed to the respondent’s purchasing intentions, Dr. Ja-coby specified that respondents would be classified as indicating dilution only if they “explained these intentions by referring to one or more elements of the multi-color monogram pattern or design.” Here, Dr. Jacoby found that a net of one percent of the respondents (which given the sample size, translates into one person) stated that they were less likely to want to buy the Murakami bags in light of the availability in the marketplace of the It-Bags. Meanwhile, Dr. Jacoby found that a net of five percent of the respondents stated that they were more likely to want to buy the It-Bags in light of the availability in the marketplace of the Murakami bags.

With respect to the questions asked in stage two of the interview, Dr. Jacoby found that, in response to the first open-ended “how would it make you feel?” question, a net of nineteen percent of the respondents said either (1) that the availability in the marketplace of the IL-Bags would cause them to have, in Dr. Jacoby’s words, “negative feelings” towards the Murakami bags, or (2) that the It-Bags were knock-offs of the Murakami bags, which Dr. Jacoby coded as indicating dilution. But as Dooney & Bourke points out, many of the verbatim responses that Dr. Jacoby coded as indicating dilution did no such thing. Remarkably, Dr. Jacoby coded the following verbatims as indicating dilution: “Even though they have the same color pattern and letter pattern it wouldn’t make a difference.” “I’m not big on the multi-color pattern but I do like the wrist bag. I like the little [Dooney & Bourke] heart.” “It would make me want to buy one of these [It-Bags]. They are more my style. I like the pattern better.” “It really wouldn’t make a difference. I like all of them. The pattern or color do not make a big difference. I like them all and I like all the purses.” “It wouldn’t matter because I like the mul-ticolors on both of them.” “It wouldn’t matter to me one little bit. I do own a Louis Vuitton purse but I still don’t care if some other company want[s] to make their purses like LV. Why should I care?”

With respect to the questions going to the “distinctiveness,” “value,” “exclusivity,” and “desirability” of the Murakami bags, Dr. Jacoby reported that a net of fourteen percent of the respondents stated that the It-Bags made the Murakami bags “less distinctive,” and three percent stated that the former made the latter “less valuable.” However, Dr. Jacoby reported, but neglected to include in his principal findings, that a roughly equivalent net of two percent of the respondents stated that the It-Bags made the Murakami bags “more valuable.” Dr. Jacoby also reported, but neglected to include in his principal findings, that a net of seven percent of the respondents stated that the It-Bags made the Murakami bags “more desirable.”

With respect to the third stage of the interview, Dr. Jacoby reports that “consumers of the sort tested in this investigation, in overwhelming proportion, are able to correctly identify the Louis Vuitton Multicolore bag as emanating from Louis Vuitton.” Specifically, seventy-six percent of the respondents recognized the Louis Vuitton Multicolore Mark as referring to Louis Vuitton.

v. The Jacoby Dilution Survey Pilot Survey

Finally, we note that, before initiating the Jacoby Dilution Survey, Dr. Jacoby ran a pilot survey, the results of which showed “little to no net dilution.” Vuitton I, 340 F.Supp.2d at 450 n. 196. Dr. Jacoby states that he discarded the results of the pilot survey on the ground that the respondents were fatigued by the length of the interview. Louis Vuitton has produced no other evidence to support this assertion. The Jacoby Dilution Survey used essentially the same questions as the pilot survey, but with slight modifications such as the insertion of the words as “also” or the “appearance of’ and the addition of “in this pattern” to the question asked in stage one. Id.

b. Discussion

Dooney & Bourke criticizes the Jacoby Dilution Survey on a variety of grounds, the most significant of which are the following:

1. The Jacoby Dilution Survey used an improper universe.
2. The Jacoby Dilution Survey is not relevant to the issue of dilution.
3. The Jacoby Dilution Survey’s analysis of dilution was fundamentally flawed.
4. The Jacoby Dilution Survey was subject to methodological flaws that overstated the number of respondents who could fairly be categorized as subject to some diluting influence.

In reviewing the Jacoby Confusion Survey, supra, we found that the universe was improperly defined and that this error diminished the reliability and probative value of the survey. This discussion is equally applicable to the Jacoby Dilution Survey. We now consider the second, third, and fourth of Dooney & Bourke’s criticisms of the Jacoby Dilution Survey.

i. The Jacoby Dilution Survey is Not Relevant to the Issue of Dilution

As Judge Scheindlin repeatedly pointed out in Vuitton I, the central question informing a dilution analysis in this case is the following: “does Dooney & Bourke’s use of the multicolored monogram printed on the white and black backgrounds of its handbags lessen the capacity of Louis Vuitton’s Monogram Multicolore trademarks to identify and distinguish goods or services?” Vuitton I, 340 F.Supp.2d at 448 (quotation omitted). See also id. at 450-51 n. 198 (criticizing a coding decision made by Dr. Jacoby on the ground that “this answer does not reflect ‘dilution’ because it says nothing about whether It-Bags diminish the capacity of the Monogram Multicolore trademarks to identify Louis Vuitton as the source of its bags.”). Both dilution by blurring and dilution by tarnishment are now clearly defined under current law. See 15 U.S.C. § 1125(c)(2)(B) (“ ‘dilution by blurring’ is association arising from the similarity between a mark or trade name and a famous mark that impairs the distinctiveness of the famous mark”); id. at § 1125(c)(2)(C) (“‘dilution by tarnishment’ is association arising from the similarity between a mark or trade name and a famous mark that harms the reputation of the famous mark”). Both were also clearly defined by Judge Scheindlin in 2004. See Vuitton I, 340 F.Supp.2d 415, 437 (“Blurring occurs where the defendant uses or modifies a plaintiffs trademark to identify the defendant’s goods and services, raising the possibility that the mark will lose its ability to serve as a unique identifier of the plaintiffs product.” (quotation and citation omitted)); id. (“Tarnishment occurs where a trademark is linked to products of shoddy quality, or is portrayed in an unwholesome or unsavory context, with the result that the public will associate the lack of quality or lack of prestige in the defendant’s goods with the plaintiffs unrelated goods.” (quotation and citation omitted)).

Several aspects of the Jacoby Dilution Survey are irrelevant to dilution as defined above. Most obviously, a respondent’s statement that she is more likely to want to buy the It-Bags in light of the availability in the marketplace of the Murakami bags does not indicate dilution by either blurring or tarnishment of the Louis Vuitton Monogram Multicolore Mark — and the Jacoby Dilution Survey Report offers no reasoning to support this link. Specifically, an increased interests in the Dooney & Bourke bag does not show that the Dooney & Bourke Monogram Multicolor Mark has somehow impaired the ability of the Louis Vuitton Monogram i Multicolore Mark to serve as a source identifier of Louis Vuitton. See Savin Corp. v. Savin Group, 391 F.3d 439, 455 (2d Cir.2004) (discussing dilution by “blurring of a mark’s product identification”). It also fails to show that the Dooney & Bourke Monogram Multicolor Mark has “tarnished” the reputation of Louis Vuitton or otherwise imbued the Louis Vuitton Multicolore Monogram mark with negative associations or derogatory connotations. See Hormel Foods Corp. v. Jim Henson Productions, Inc., 73 F.3d 497, 507 (2d Cir.1996) (“The sine qua non of tarnishment is a finding that plaintiffs mark will suffer negative associations through defendant’s use.”); Maureen Morrin & Jacob Jacoby, Trademark Dilution: Empirical Measures for an Elusive Concept, 19 J. Publ. Pol. & Marketing 265, 267 (2000) (tarnishment “is distinguished from blurring, however, because there generally are some derogatory connotations, and therefore it is referred to as dilution by tarnishment.”). A consumer’s increased willingness to buy an Ib-Bag on its own says nothing all about the status of Louis Vuitton’s mark. -

We note once again that Louis Vuitton has the burden of proving by a preponderance of the evidence that the Jacoby study is reliable and will assist the jury. Thus Louis Vuitton must make the case for why the willingness of a consumer to buy an It-Bag either dilutes or tarnishes the Multi-colore Monogram Mark. Louis Vuitton has not done so.

The same maybe said of respondents’ statements that expressed the belief that the IL-Bags were a knock-off of the Mura-kami bags. Though Dr. Jacoby does not make it clear in his report, perhaps his assumption is that a knockoff of a Muraka-mi Bag links that bag with a product of “shoddy quality.” Yet Louis Vuitton has not argued that the It-Bags are of inferior quality, nor does the Jacoby Dilution Survey Report or the respondent’s verbatim answers say anything about inferior quality-

Finally, Dr. Jacoby does not explain why a respondent’s statement that she is less likely to want to buy the Murakami bags in light of the availability in the marketplace of the It-Bags indicate blurring or tarnishment. Instead, it may simply indicate that the respondent preferred the design of the Dooney & Bourke bags to the design of the competing Louis Vuitton bags, or that the respondent preferred Dooney & Bourke products in general to those manufactured by Louis Vuitton.

We recognize that dilution may be a “dauntingly elusive concept,” Ringling Bros.-Barnum & Bailey Combined, Shows, Inc. v. Utah Div. of Travel Development, 170 F.3d 449, 451 (4th Cir.1999), but these survey questions do not test for blurring or tarnishment. Cf. Nabisco, Inc. v. PF Brands, Inc., 191 F.3d 208, 224 n. 6 (2d Cir.1999) (“The antidilution statutes do not prohibit all uses of a distinctive mark that the owner prefers not be made.”). Because these survey questions and answers do not indicate dilution under the substantive law, there is accordingly no “fit” between Dr. Jacoby’s findings with respect to them and the facts of the case.

ii. The Flaws in the Jacoby Dilution Survey’s Categorization of Results

Also troubling are the significant flaws in Dr. Jacoby’s categorization of respondents as evidencing dilution. First, as Judge Scheindlin has recognized, see Vuitton I, 340 F.Supp.2d at 450-51 n. 198, Dr. Jacoby miscoded several of the respondents’ verbatim responses. It is true, as we note in our discussion of the Reitter Survey, that miscoding is generally a question of weight and is subject to correction. But we also note that the miscoding of even a small number of verbatim responses could have a very significant impact on the survey’s overall findings. Second, Dr. Jacoby “counted a respondent as ‘diluted’ if she provided one answer evincing dilution, even if all of her other answers suggested a positive influence of the Dooney & Bourke bags on the perception of the Louis Vuitton bags.” id. at 340 F.Supp.2d 451 n. 199 (emphases in original). Third, Dr. Jacoby claims overall that “a ‘net’ of twenty-three percent of the respondents provided an answer indicating one or more aspects of dilution, and gave as their reason for their answer one or more elements of a monogram with multiple bright colors in a repeating pattern.” Yet, as Judge Scheindlin recognized, four of the questions asked of respondents in stage two of the interview, going to the “distinctiveness,” “value,” “exclusivity,” and “desirability” of the bags, did not permit the respondents to explain their answers. Id. Louis Vuitton responds that the questions included a reference to “the multi-color pattern on white or black,” but this only points up the degree to which many of Dr. Jacoby’s questions were leading in the sense that they directed the respondents attention to the similarities between the parties’ marks. See Shari Seidman Diamond, Reference Guide on Survey Research, in MANUAL ON SCIENTIFIC EVIDENCE 2d at 268 (Federal Judicial Center 2000) (“Coding of questions to open-ended answers requires a detailed set of instructions so that decision standards are clear and responses can be scored consistently and accurately. Two trained coders should independently score the same responses to check for the level of consistency in classifying responses. When the criteria used to categorize verbatim responses are controversial or allegedly inappropriate, those criteria should be sufficiently clear to reveal the source of disagreements.”).

In sum, the Jacoby Dilution Survey was subject to methodological flaws that overstated the number of respondents who were “diluted.” These methodological flaws diminish the reliability and probative value of the study.

iii. The Objectivity of the Jacoby Dilution Survey

A final concern about the reliability of the Jacoby Dilution Survey is that Dr. Jacoby ran a pilot survey, the results of which showed “little to no net dilution,” Vuitton /, 340 F.Supp.2d at 450 n. 196, but made no reference to this survey in the Jacoby Dilution Survey Report. As Judge Scheindlin has noted, “[i]t is legitimate to run a pilot survey for purposes of improving a study.” Id. (emphasis in original). However, we agree with Judge Scheindlin that “in this case the circumstances hint at a darker purpose.” Id. Louis Vuitton has not produced evidence of interviewee fatigue — the purported reason for discarding the results of the pilot survey. Moreover, Dr. Jacoby did not substantially change the questions asked of the respondents in the pilot survey, other than to add the phrase “in this pattern,” which is arguably leading. See id. Dr. Jacoby’s failure to report the results of the pilot survey and his decision to restart the survey suggest that the whole process may not have been conducted in a manner to ensure objectivity.

Where the results of a pilot survey are inconsistent with that of the subsequent survey, the obvious questions that are raised would ordinarily be those of weight and not admissibility. The expert witness can be cross-examined about the disparity between the surveys, much the same as any witness who has made a prior inconsistent statement can be cross-examined and impeached. Yet there are only so many questions of weight that can be tolerated; as each flaw in a survey diminishes its reliability and probative value, and correspondingly increases the risk of jury confusion and prejudice, eventually the cumulative effect of the flaws mandates exclusion. This is such a case. Given (1) the lack of fit between the survey questions and the law of dilution, (2) the methodological flaws that overstated the results, and (3) the unexplained inconsistency between the results of the pilot survey and the subsequent survey, we conclude that Dr. Jacoby should not be permitted to testify about the results of his dilution survey, and that the survey itself should be excluded if proffered at trial.

C. Robert N. Reitter

In 2004, Dooney & Bourke retained Robert N. Reitter to conduct a trademark confusion survey (the “2004 Reitter Confusion Survey”), which Judge Scheindlin previously discussed in Vuitton I, 340 F.Supp.2d at 445-46, and a trademark recognition survey (the “2004 Reitter Recognition Survey”), which Judge Scheindlin has also previously discussed, id. at 439 n. 124. In 2006, Dooney & Bourke commissioned Reitter to conduct a second trademark confusion survey (the “2006 Reitter Confusion Survey”) as well as a trademark dilution survey (the “2006 Reitter Dilution Survey”). The 2004 Reitter Confusion Survey took the form of a mall intercept survey in which 349 respondents were shown one of four It-Bags or a control bag from a third party manufacturer and an additional 224 respondents were shown either a set of four It-Bags or a set of four bags from a third-party manufacturer. The respondents were then asked a series of questions, the most important of which consisted of the following: (1) “What company do you think makes this handbag[/these handbags]?”, (2) “Do you think the company that makes this handbagl/these handbags] is connected or affiliated with any other company?,” and (3) “Do you think the company that makes this handbag [/these handbags] has authorization, permission or approval from another company to do so?” Overall, after controlling for responses to the control bag or set of bags, the survey found that 1.2 percent of the 349 respondents who were shown an It-Bag believed that the bag was made by Louis Vuitton or by a company connected with or authorized by Louis Vuitton, and that -2.1 percent of the 224 respondents who were shown a set of It-Bags believed the same with respect to the set of bags.

In Vuitton I, Judge Scheindlin determined that the 2004 Reitter Confusion Survey was “essentially a reading test” because when respondents were first shown the It-Bags, they were able to read the lettering on the heart-shaped brass name sign hanging from one of the handles of the bag or bags and bearing the words “Dooney & Bourke” (the “Dooney & Bourke Name Sign”). Vuitton I, 340 F.Supp.2d at 445. Judge Scheindlin further determined that due to “several mis-coded responses” and “Reitter’s poor choice of one of the control bags,” the survey improperly “inflate[d] the number of responses illustrative of ‘noise,’ which in turn erroneously deflate[d] the confusion figures.” Id. at 446. Judge Scheindlin therefore accorded “little weight” to the survey’s results. Id. at 442.

In an effort to address Judge Scheind-lin’s criticisms of the 2004 Reitter Confusion Survey, Dooney & Bourke commissioned Reitter to conduct the 2006 Reitter Confusion Survey. It is important to emphasize that Dooney & Bourke submits the 2006 Reitter Confusion Survey not as evidence on the likelihood of confusion, but solely as evidence that the flaws found by Judge Scheindlin in the 2004 Reitter Confusion Survey did not in fact affect the reliability of that survey.

The 2006 Reitter Confusion Survey consisted of three components. In the first, Reitter sought to replicate the methodology of one aspect of the 2004 Reitter Confusion Survey by showing 218 respondents one of the four It-Bags bearing the Doo-ney & Bourke Name Sign that had been shown the respondents to the 2004 survey (the “2006 Reitter Name Sign Survey”). In the second component, Reitter showed 222 respondents one of the same four It-Bags, but these bags did not bear the Dooney & Bourke Name Sign (the “2006 Reitter No Name Sign Survey”). In the third component, Reitter showed 298 respondents either the single, brown-colored control bag that he had used in the 2004 survey or a blue-colored control bag (the “2006 Reitter Control Survey”). Based on the results of these three components of the 2006 Reitter Confusion Survey, Reitter concluded that “there is no reason to doubt” the 2004 Reitter Confusion Survey either for using It-Bags bearing the Doo-ney & Bourke Name Sign or for using an improper control.

Finally, the 2006 Reitter Dilution Survey took the form of a mall intercept survey in which 428 respondents were shown one of two large photographs depicting, in roughly life-size, five handbags. One of the photographs included a Louis Vuitton bag bearing the Louis Vuitton Monogram Multicolore Mark, a Dooney & Bourke bag bearing a single-colored pattern of the initials “D” and “B,” and three bags from third-party manufacturers. The other photograph included instead a Louis Vuitton bag bearing Louis Vuitton’s classic brown and chestnut monogram pattern (the “Louis Vuitton Classic Pattern”) among the same four bags. After the photograph was removed from view, the respondents were prompted by the interviewer to “tell me the brand or company names of the five handbags you saw.” Reitter found no statistically significant difference between the percentage of respondents who named Louis Vuitton after having been shown the photograph including the Louis Vuitton Multicolore Monogram Mark (74.2 percent) and the percentage of respondents who named Louis Vuitton after having been shown the photograph including the Louis Vuitton Classic Pattern (70.1 percent). Reitter concluded that because respondents perceived the Louis Vuitton Multicolore Monogram Mark as a source-indicator of Louis Vuitton as strongly as they perceived the Louis Vuitton Classic Pattern as a source-indicator for Louis Vuitton, the Louis Vuitton Multicolore Monogram Mark “has not suffered any distinctive dilution particular to its multi-color character.”

With respect to the two Reitter confusion surveys, we find that the 2006 Reitter Confusion Survey does not sufficiently resolve the methodological flaw of the 2004 Reitter Confusion Survey criticized by Judge Scheindlin as the “reading test.” It does, however, indicate with sufficient reliability that the control bag problem of the 2004 Reitter Confusion Survey did not affect the results of that survey. Nonetheless, the cumulative effect of the several methodological flaws of the 2004 Reitter Confusion Survey (the most of important of which has not been answered by the 2006 survey) render it so unreliable that (even ignoring the control bag problem) its probative value is substantially outweighed by the risks of jury confusion and prejudice. This means, of course, that both the 2004 and 2006 confusion surveys should be excluded under Rules 702 and 403 — along with any report or testimony about such surveys.

With respect to the 2006 Reitter Dilution Survey, we find that as designed, it could not provide any rehable indication of whether the Multicolore Monogram mark was diluted. Thus its probative value, if any, is substantially outweighed by its prejudicial effect and potential to mislead the jury, and as such it should be excluded under Rules 702 and 403, along with any report or testimony about the survey.

We review the facts of the Reitter surveys in more detail and then explain our reasoning.

1. Facts

Reitter is Senior Vice-President of Guideline, formerly named Guideline Associates, a market research firm. He has testified' or been deposed in connection with numerous federal trademark infringement cases. The parties do not dispute his qualifications to conduct any of the surveys discussed here. That said, his resume is rather truncated. It shows that he received a Masters degree in Industrial Administration from Yale, but there is no indication of how that education qualifies him to conduct surveys. Nor is there any indication that he has published in the field, and nothing is mentioned about membership in any pertinent associations. See Shari Seidman Diamond, Reference Guide on Survey Research, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE at 238 (Federal Judicial Center 2000) (“Publication in peer-reviewed journals, authored books, membership in professional organizations ... and membership on scientific advisory panels for government agencies or private foundations are indications of a professional’s area and level of expertise.”). Reitter’s resume does state that he has been accepted as an expert witness in federal and state courts, but upon objection, that would not be a sufficient indication of qualifications for a court exercising its gatekeeper function under Daubert. See, e.g., Thomas J. Kline, Inc. v. Lorillard Inc., 878 F.2d 791, 800 (4th Cir.1989) (noting that “one cannot become an expert simply by accumulating experience in testifying”). Moreover, experience in testifying does not provide much assurance if the witness’s testimony has been rejected as unreliable. See, e.g., Citizens Fin. Group, Inc. v. Citizens Nat’l Bank, 383 F.3d 110, 121 (3d Cir.2004) (affirming trial court’s exclusion of survey conducted by Reitter; trial court did not abuse discretion in finding Reitter’s survey “fundamentally flawed”).

Nonetheless, Reitter has been conducting market research and designing surveys for almost 40 years. Extensive experience can be a sufficient basis for expert testimony on matters such as consumer surveys. See Diamond, supra at 238 (“In some cases, professional experience in conducting and publishing survey research may provide the requisite background.”).

We find that Reitter meets the minimal standards for qualification as an expert under Rule 702. See Steven A. Saltzburg et al., Federal Rules of Evidence Manual at 702-12 (9th ed. 2006) (“Courts have not required a party to show that the witness is an outstanding expert, or to show that the witness is well-known or respected in the field; these are generally questions of weight.”). We note, however, as we have previously in this opinion, that the degree of qualification of an expert is relevant to the reliability inquiry. That is, the more qualified the expert, the more likely that expert is using reliable methods in a reliable manner — highly qualified and respected experts don’t get to be so by using unreliable methods or conducting research in an unreliable manner. See, e.g., Ambrosini v. Labarraque, 101 F.3d 129, 140 (D.C.Cir.1996) (the strength or weakness of a witness’s qualifications is “circumstantial evidence as to whether the expert employed a scientifically valid methodology or mode of reasoning”); United States v. Downing, 753 F.2d 1224, 1239 (3d Cir. 1985) (Becker, J.) (“The qualifications and professional stature of expert witnesses ... may also constitute circumstantial evidence of the reliability of the technique.”). We therefore find Reitter’s relatively thin qualifications as circumstantial evidence of the unreliability of his surveys.

a. The 2004 Reitter Confusion Survey

Louis Vuitton states that Judge Scheindlin both “unequivocally rejected” the 2004 Reitter Confusion Survey and “accorded little weight to [its] conclusions.” These two statements are not consistent. If Judge Scheindlin had “unequivocally rejected” the survey, it would not have been given “little weight”; it would have been given “no weight.” Even a quick reading of Judge Scheindlin’s opinion should have indicated to Louis Vuitton that Reitter’s survey was not “categorically rejected.” Instead, Judge Scheindlin pointed to specific (and serious) flaws in the survey and, based on those flaws, accorded the survey “little weight,” Vuitton I, 340 F.Supp.2d at 442, in the context of Louis Vuitton’s preliminary injunction motion. Thus, the question of admissibility of the survey at a trial on the merits, and before a jury, remains an open one.

Because Dooney & Bourke continues to proffer the 2004 Reitter Confusion Survey, and because the 2006 Reitter Confusion Survey was designed to evaluate the effect of two of the basic errors of the 2004 Reitter Confusion Survey, it is useful to review some of the specifics of that 2004 survey.

i. The 2004 Reitter Confusion Survey Universe and Sample

Reitter defined the relevant universe for his 2004 confusion survey as “females 13 or older, who are likely to purchase, within the next year or so, a purse or handbag costing more than $100.” To sample this universe, Reitter oversaw a mall intercept survey conducted in each of the four U.S. Census regions. For the single-handbag component of the survey, Reitter used three malls per region; for the multiple-handbag component of the survey, he used two malls per region. No single mall was used for both components of the survey. In the 2004 Reitter Confusion Survey Report, Reitter does not state that a mall needed to be an “upscale” mall in order to be selected. Respondents were interviewed from May 21, 2004 through June 1, 2004.

ii. The 2004 Reitter Confusion Survey Stimuli

After being screened and seated, the survey respondents were exposed to the following stimuli. Of the 349 respondents who were exposed to a single bag, 55 were exposed to an It-Bag with white background, 53 were exposed to an It-Bag with a black background, 51 were exposed to an IL-Bag with a “bubblegum” (or pink) background, and 46 were exposed to an It-Bag with a “periwinkle” (or light blue) background. Each of the bags included the Dooney & Bourke Name Sign. The remaining 144 respondents were exposed to a control bag in the form of a Dooney & Bourke bag bearing a single-colored pattern of the interlocking initials “D” and “B” imprinted on a light brown background. The control bag did not include the Dooney & Bourke Name Sign.

Of the 224 respondents who were exposed to a set of four bags, 103 were exposed to a set of four It-Bags with white, black, bubblegum, and periwinkle backgrounds, respectively. Certain of these bags were different, though in insignificant ways, from the It-Bags used in the single bag study. Each of them included the Dooney & Bourke Name Sign. The remaining 121 respondents were exposed to a control set of Dooney & Bourke bags each of which bore a single-colored pattern of the interlocking initials “D” and “B” imprinted on a brown, bubblegum, periwinkle, or gold background. None of these bags included the Dooney & Bourke name sign.

In the context of the 2006 Reitter Confusion Survey, the parties dispute the relevance of that study’s results relating to the bubblegum and periwinkle bags. Specifically, Louis Vuitton claims in its sur-reply in support of its motion to exclude Renter’s testimony that “Louis Vuitton’s complaint did not include bubblegum, periwinkle, or any other colored bags and they have never been the subject of this litigation.” Dooney & Bourke responds that Reitter included the bubblegum and periwinkle bags in his 2004 confusion survey because it was not clear at the time whether Louis Vuitton would seek injunctive relief with respect to those bags, and that Reitter again included those bags in his 2006 contusion survey in order to replicate the conditions of the 2004 confusion survey.

We fail to see the basis for Louis Vuitton’s claim that the bubblegum and periwinkle bags “have never been the subject of this litigation.” At the May 28, 2004 hearing before Judge Andrew J. Peck, which Louis Vuitton itself brings to our attention to support its claim, Judge Peck stated to Louis Vuitton: “If the scope of the sought injunction in any way, shape, or form says, well, because of the black and white infringe[ment], everything should be enjoined or everything in periwinkle should also be enjoined, then we have to have discovery on it.” Counsel for Louis Vuitton responded: “I understand, your Honor. We are willing to narrow this issue, request for an injunction, to the white and black bags at the preliminary injunction stage. That doesn’t preclude us from seeking an injunction at the end of the case if we determine that the others are also infringing.” Louis Vuitton cites no other evidence from the record to support its claim that the subject of this litigation, and of a possible injunction, has always been limited to Dooney & Bourke It-Bags bearing the Dooney & Bourke Multicolor Monogram Mark on a black or white background. Indeed, in light of those parts of the record that have been brought to our attention, we find Louis Vuitton’s assertion puzzling.

iii. The 2004 Reitter Confusion Survey Questions

Upon being shown a handbag or set of handbags, each respondent was prompted “to look at [it] as if you saw it in a store, or being carried by a woman walking near you.” Respondents were further requested to “examine the outside of the bag, but please do not handle it,” and to inform the interviewer when she was finished looking at the bag. The interviewer also explained that respondents should state if they did not know the answer to any question asked of them.

For interviews involving a single non-control bag, the interviewer was instructed then to do the following: “When respondent indicates she is finished examining the bag, place the heart shaped zipper pull so that the name Dooney & Bourke faces against the bag, and the plain shiny side faces out, and place the bag about 5 feet away from the respondent.” For interviews involving a non-control set of bags, the instructions were essentially the same. Reitter’s methodology did not apparently record, nor does the record in this case indicate, whether (1) the interviewers first turned the Dooney & Bourke Name Sign or Signs over and then placed the bag or bags five feet away from the respondent or (2) the interviewers first moved the bags and then flipped the name signs. However, the interviewer instructions suggest that interviewers did the former.

Respondents were then asked a series of questions, the most relevant of which are given here and numbered as they were numbered by Reitter. For respondents who viewed a set of bags, the questions were modified to include “these handbags:”

la.What company do you think makes this handbag?
lb. Even though you don’t know the name of the company, what if anything can you tell me about the company that makes this handbag, other than its name? OR
lc. Why do you think this bag is made by that company? (Probe) Anything else?
2a. Do you think the company that makes this handbag is connected or affiliated with any other company?
2b. What company? OR
2c. Even though you don’t know the name of the company, what if anything can you tell me about the company connected or affiliated with the maker of this handbag?
2d. [Asked regardless of whether respondent was asked 2b or 2c] Why do you think the maker of this bag is connected or affiliated with that company? (Probe) Anything else?
3a. Do you think the company that makes this handbag has authorization, permission or approval from another company to do so?
3b. What company? OR
3c. Even though you don’t know the name of the company, what if anything can you tell me about the company that gave the maker of this handbag authorization?
3d. [Asked regardless of whether respondent was asked 3b or 3c] Why do you think that that maker of this handbag has authorization from that company? (Probe) Anything else?

Interviewers were explicitly instructed to record verbatim all respondent answers to the above questions with the exception of Questions 2a and 3a.

iv. The 2004 Reitter Confusion Survey Confusion Analysis

In the 2004 Reitter Confusion Survey Report, Reitter classified as confused only those respondents who gave “pattern, style, or color” reasons for naming Louis Vuitton in response to Questions la, 2a, or 3a. According to the report, of the 205 respondents shown a single It-Bag, three named Louis Vuitton in response to Question la and gave “pattern, style, or color” reasons for their answer, eight named Louis Vuitton in response to Question 2a and gave such reasons for their answer, and seven named Louis Vuitton in response to Question 3a and gave such reasons for their answer. Taking into account respondents who named Louis Vuitton in response to more than one question, Reitter found that only fourteen (or 6.8 percent) of the 205 respondents shown a single It-Bag named Louis Vuitton in response to one or more questions posed to them and gave “pattern, style, or color” reasons for their answer.

Of the 103 respondents shown a set of It-Bags, one named Louis Vuitton in response to Question la and gave “pattern, style, or color” reasons for her answer, one named Louis Vuitton in response to Question 2a and gave such reasons for her answer, and two named Louis Vuitton in response to Question 3a and gave such reasons for their answer. Again taking into account respondents who named Louis Vuitton in response to more than one question, Reitter found that only three (or 2.9 percent) of the 203 respondents shown a set of It-Bags named Louis Vuitton in response to one or more questions posed to them and gave “pattern, style, or color” reasons for their answer.

As for the control bags, which did not carry the Dooney & Bourke Name Sign, of the 144 respondents exposed to the single brown Dooney & Bourke control bag, one named Louis Vuitton in response to Question la and gave “pattern, style, or color” reasons for her answer, three named Louis Vuitton in response to Question 2a and gave such reasons for their answer, and four named Louis Vuitton in response to Question 3 a and gave such reasons for their answer. Of the 121 respondents who were exposed to a set of Dooney & Bourke control bags, none named Louis Vuitton in response to Question la and gave “pattern, style, or color” reasons for her answer, three named Louis Vuitton in response to Question 2a and gave such reasons for their answer, and five named Louis Vuitton in response to Question 3a and gave such reasons for their answer. Taking into account respondents who named Louis Vuitton in response to more than one question, Reitter found that eight (or 5.6 percent) of the 144 respondents exposed to the single control bag and six (or 5.0 percent) of the 121 respondents exposed to the set of control bags named Louis Vuitton and gave “pattern, style, or color” reasons for their answer.

Reitter reasoned that, “[subtracting the control group from the test group results,” 1.2 percent of the respondents exposed to a single It-Bag incorrectly believed that the bag was made by Louis Vuitton or by a company connected with or authorized by Louis Vuitton, and that -2.1 percent of the respondents who were shown a set of It-Bags believed the same with respect to the set of bags.

Reitter also notes that the accurate identification of the Dooney & Bourke bags was “very high.” Sixty-five percent of the 308 respondents exposed to a single It-Bag or set of It-Bags accurately identified the bag or bags as Dooney & Bourke’s, while 50 percent identified the control bag or bags as Dooney & Bourke’s. 163

b. The 2006 Reitter Confusion Survey

i. The 2006 Reitter Confusion Survey Universe and Sample

As with the 2004 Reitter Confusion Survey, Reitter defined the relevant universe for his 2006 confusion survey as “females 13 or older, who are likely to purchase, within the next year or so, a purse or handbag costing more than $100.” The parties dispute whether the method Reit-ter used to sample this universe in his 2006 survey is proper, which requires that we review this sampling method in some detail. For each of the three components of the 2006 survey, Reitter conducted a mall intercept survey across each of the four U.S. Census regions, with two malls being used per region. Specifically, for the 2006 Reitter Name Sign Survey, Reitter used a total of eight malls from eight different markets. Of these eight markets, four of them had been used by Reit-ter in his 2004 survey, and in these four markets, three of the malls had been used in that survey. For the 2006 Reitter No Name Sign Survey, Reitter also used a total of eight malls from eight different markets. Of these markets, four of them had been used by Reitter in 2004, and in each of these markets, Reitter used the same mall that he had used in 2004. Notably, none of the malls used in the 2006 Reitter Name Sign Survey were also used in the 2006 Reitter No Name Sign Survey. In other words, Reitter did not take a population of respondents at a given mall location and expose some of them to IN Bags bearing the Dooney & Bourke Name Sign and others of them to It-Bags not bearing the Dooney & Bourke Name Sign — and then compare the two groups’ responses. Instead, all respondents at a given mall location were either exposed to bags bearing the name sign or to bags not bearing the name sign. Thus, with respect to any particular mall location he used, Reitter could not and cannot compare the answers of respondents exposed to IN Bags bearing the name sign to the answers of respondents exposed to INBags not bearing the name sign. In his deposition testimony, Reitter defended this survey methodology on the grounds that, first, if the various components of the 2006 Reitter Confusion Survey were conducted at the same mall locations, then interviewers could have mistakenly shown respondents the wrong bag, and second, the three components of the overall survey could not be conducted at the same locations under the time constraints originally imposed on him.

Finally, for the 2006 Reitter Control Survey, of the eight markets Reitter used, each had been used in 2004, and in all but one, Reitter used the same mall. Here again, none of the malls used in the 2006 Reitter Control Survey were used in the other two components of the 2006 Reitter Confusion Survey. This means that, again, Reitter is unable to compare, within any given location’s population, the answers of some respondents who were exposed to an It-Bag bearing or not bearing the Dooney & Bourke Name Sign to the answers of other respondents who were exposed to one of the control bags.

As with the 2004 Reitter Confusion Survey, Reitter does not state in his 2006 report that a mall needed to be an “upscale” mall in order to be selected. In his deposition testimony, Reitter admitted that the anchor stores connected with several of the malls he used indicated that those malls were not in fact “upscale.”

Interviews were conducted from November 14 to November 28, 2006.

ii. The 2006 Reitter Confusion Survey Stimuli

In the 2006 Reitter Name Sign Survey, of the 218 respondents surveyed, 55 were exposed to an It-Bag with a cream-colored background, 55 were exposed to an It-Bag with a black background, 56 were exposed to an It-Bag with a bubblegum background, and 52 were exposed to an It-Bag with a periwinkle background. The It-Bags used in the 2006 Reitter Name Sign Survey were not the same It-Bags used in the 2004 Reitter Confusion Survey; nevertheless, the bags differed only in insignificant ways. Each of the It-Bags used in the 2006 Reitter Name Sign Survey carried the Dooney & Bourke Name Sign.

In the 2006 Reitter No Name Sign Survey, of the 222 respondents surveyed, 54, 58, 56, and 54 respondents were exposed to the same It-Bags as were used in the 2006 Name Sign Survey with a cream-colored, black, bubblegum, and periwinkle background, respectively. However, the Doo-ney & Bourke Name Sign was removed from these bags before they were shown to respondents.

Finally, in the 2006 Control Survey, of the 298 respondents surveyed, 147 were exposed to a Dooney & Bourke bag bearing a single-colored pattern of the interlocking initials “D” and “B” imprinted on light-brown background (the “Brown Control Bag”). This bag appears to be the same bag as was used as a control bag in the single-bag component of the 2004 Reit-ter Confusion Survey; in any event, the bags do not differ in significant ways. The remaining 151 respondents were exposed to a Dooney & Bourke canvas bag bearing a single-colored white pattern of the interlocking initials “D” and “B” on a blue background (the “Blue Control Bag”). Louis Vuitton states that it is currently producing at least two bags with a blue background, which it identifies as “Monogram Denim and Monogram Denim Cruise”; but no photographs of those bags have been submitted in the record provided to us. Dooney & Bourke counters that Louis Vuitton is currently producing bags in a wide variety of colors, photographs of which it did submit.

iii. The 2006 Reitter Confusion Survey Questions

The interview process of the 2006 Reit-ter Name Sign Survey exactly replicated the interview process of the single-bag component of the 2004 Reitter Confusion Survey. The interview processes of the 2006 Reitter No Name Sign Survey and the 2006 Reitter Control Survey did so as well, except that the interviewer did not flip the facing of the Dooney & Bourke Name Sign because the bags did not carry one.

iv. The 2006 Reitter Confusion Survey Confusion Analysis

In the 2006 Reitter Confusion Survey, Reitter classifies as confused only those respondents who gave an “issue-specific” reason for naming Louis Vuitton in response to Questions la, 2a, or 3a. The 2006 Reitter Confusion Report does not define “issue-specific.” It is not clear, therefore, whether “issue-specific” refers to “pattern, style, or color” reasons, as was the standard in the 2004 Reitter Confusion Survey Report, or to reasons relating only to the Dooney & Bourke Multicolor Monogram Mark — or indeed to some other reasons.

In any event, according to the 2006 Reit-ter Confusion Survey Report, of the 218 respondents participating in the 2006 Reit-ter Name Sign Survey, two, seven, and four respondents named Louis Vuitton in response to Questions la, 2a, and 3a, respectively, for “issue-specific” reasons. Taking into account respondents who named Louis Vuitton in response to more than one of these questions, Reitter concludes that only twelve or (5.5 percent) of the 218 respondents participating in the survey named Louis Vuitton for an “issue-specific” reason.

Of the 222 respondents participating in the 2006 Reitter No Name Sign Survey, two, seven, and one respondent(s) named Louis Vuitton in response to Questions la, 2a, and 3a, respectively, for “issue-specific” reasons. There were apparently no such respondents who named Louis Vuitton in response to more than one of these questions, as Reitter concluded that only ten (or 4.5 percent) of the 222 respondents participating in the survey named Louis Vuitton for an “issue-specific” reason.

Because there is no statistically significant difference between the percentage of respondents classified as confused in the 2006 Reitter Name Sign Survey and the percentage of respondents classified as confused in the 2006 Reitter No Name Sign Survey, Reitter concludes that the presence or absence of the Dooney & Bourke Name Sign “did not significantly affect the level of possible confusion” between the two surveys. Reitter further notes that there is no statistically significant difference among the levels of confusion found in these two surveys and the levels of confusion found in the 2004 Reit-ter Confusion Survey. Therefore, Reit-ter explains, “the concern that a test conducted according to the protocol used in the [2004 Reitter Confusion Survey] would be a ‘reading test,’ and hence distorted, was not substantiated.”

With respect to the 2006 Reitter Control Survey, of the 151 respondents shown the brown Dooney & Bourke bag, none named Louis Vuitton in response to Question la, two named Louis Vuitton in response to Question 2a for “issue-specific” reasons, and none named Louis Vuitton in response to Question 3a for “issue-specific” reasons. By comparison, of the 147 respondents shown the blue Dooney & Bourke bag, none named Louis Vuitton in response to Question la, four named Louis Vuitton in response to Question 2a for “issue-specific” reasons, and three named Louis Vuitton in response to Question 3a for “issue-specific” reasons. Overall, taking into account respondents who named Louis Vuitton in response to more than one of these questions, Reitter found that two (or 1.3 percent) of the 151 respondents shown the brown Dooney & Bourke named Loin’s Vuitton for “issue-specific” reasons, and six (or 4.1 percent) of the 147 respondents shown the blue Dooney & Bourke bag named Louis Vuitton for “issue-specific” reasons.

From these data, Reitter concludes that the control bag used in the single-bag component of the 2004 Reitter Confusion Survey “did not increase the apparent noise level,” and that “the concern that a test conducted according to the protocol used in the [2004 Reitter Confusion Survey] would inflate the estimated noise level was not substantiated.” Additionally, after subtracting the level of confusion found among respondents exposed to the blue Dooney & Bourke bag (from the 2006 Reit-ter Control Survey) from the level of confusion found among respondents exposed to an It-Bag not bearing the Dooney & Bourke Name Sign (from the 2006 Reitter No Name Sign Survey), Reitter concludes that “Likelihood of Confusion Net of Noise” in the 2006 Reitter Confusion Survey is -0.4 percent, which he describes as “far below the level of any material significance.”

c. The 2006 Reitter Dilution Survey

i. The 2006 Reitter Dilution Survey Universe and Sample

Reitter defined the relevant universe for the 2006 Reitter Dilution Survey as “females 25 or older who have purchased a designer handbag costing more than $350 in the last 3 years or who are likely to purchase a designer handbag costing more than $350 in the next 12 months.” To sample this universe, Reitter conducted a mall intercept survey at sixteen malls located in sixteen different markets, with four used from each of the four U.S. Census regions. The 2006 Reitter Dilution Survey report states that “[bjecause the Louis Vuitton monogram patterns being studied appear on expensive handbags, the sampling frame was focused in Metropolitan Areas that have upscale shopping malls with interviewing facilities.” There are numerous discrepancies between the names of the malls as given in the 2006 Reitter Confusion Survey Report and the names as given in the 2006 Reitter Dilution Survey Report, but it appears that all but one of the sixteen malls Reitter used for his dilution study were also used for his confusion study.

Interviews were conducted from November 29 through December 11, 2006.

ii. The 2006 Reitter Dilution Survey Stimuli

As previously noted, 428 respondents were exposed to one of two large photographs depicting five handbags in roughly life-size. Specifically, 217 respondents were exposed to a photograph showing a Louis Vuitton bag bearing the Louis Vuitton Multicolore Monogram Mark on a white background, a Dooney & Bourke bag bearing a single-colored pattern of the interlocking initials “D” and “B,” and three bags from third-party manufacturers. These third-party manufacturers were Jimmy Choo, Kate Spade, and Chanel. An additional 211 respondents were exposed to a photograph showing the classic Louis Vuitton single-colored monogram partem in gold and chestnut (the “Louis Vuitton Classic Pattern”) among the same four bags. The 2006 Reitter Dilution Survey Report states, and Louis Vuitton does not dispute, that “the other 4 bags were selected to create a lineup that was balanced in terms of light and dark colors, and in terms of solid and multi-color bags.” Reitter thereby sought to insure that “neither the lighter Multicolore Louis Vuitton bag nor the darker Classic Louis Vuitton bag stood out more than the other in their respective photographs.”

iii. The 2006 Reitter Dilution Survey Questions

After being seated, the respondent was shown one of the photographs and invited “to look at these 5 handbags as if you were seeing them in stores, or being carried by women walking near you.” The respondent was then told to inform the interviewer when she was finished looking at the handbags, at which time the photograph was removed from view.

Reitter asked both groups of respondents the same, relatively simple set of two questions. After being instructed not to guess, the respondent was asked two questions:

la. Now, I just showed you a photo of five handbags. I’d like you to please tell me the brand or company names of the five handbags you saw. If you don’t know or can’t remember the brand or company names of all five handbags, please just name the ones that you do remember seeing.
lb. Any others?

Respondents were not asked follow-up questions in the nature of “What makes you say so?”

iv. The 2006 Reitter Dilution Survey Dilution Analysis

Because the parties dispute whether Reitter’s dilution analysis is proper, we review it here in some detail. In the 2006 Reitter Dilution Survey Report, Reitter asserts that dilution surveys “most commonly” take the form of surveys in which one group of respondents is exposed to the defendant’s mark while another group is not. Both groups are then exposed to the plaintiffs mark and asked questions to measure their ability to identify the plaintiff as the sole source of goods associated with that mark. If the group exposed to the defendant’s mark is significantly less able to identify the plaintiff as the sole source of goods associated with its mark, then this may constitute evidence that the defendant’s mark blurs the distinctiveness of the plaintiffs mark. Louis Vuitton’s expert Dr. Itamar Simonson argues that Reitter’s description of a standard dilution survey is incorrect. In his view, the standard survey exposes the respondent to the offending product and asks the respondent if any company comes to mind. A second group is exposed to a non-offending control product and asked the same question. The difference between the two sets of answers shows the likelihood that the defendant’s product is diluting the plaintiffs mark.

We do not need to resolve the argument as to how a standard dilution survey is to be conducted, because both sides agree that a standard dilution survey, whatever it is, cannot be conducted in the circumstances of this case. Reitter explains that the standard methodology was “infeasible” in the present case, primarily because by 2006 multicolored bags produced by Doo-ney & Bourke as well as many other manufacturers had been on the market for several years. Thus, it would not have been possible to find respondents “from the relevant universe who have not already been exposed to multi-color patterns on bags that do not originate from Louis Vuitton.” Furthermore, Reitter asserts, it would not have been possible to measure across time the degree to which any particular mark blurred the Louis Vuitton Multicolore Monogram Mark because of the inability to control for changes in advertising, press coverage, and “on the street” sightings of the mark due to changing fashions, each of which factors might cause a blurring of the distinctiveness of the mark.

In response to the problem of changed market conditions, Reitter adopted what he called “an alternative survey design.” Dooney & Bourke argues that Reitter’s alternate survey design provides a way to “to at least probe” whether Louis Vuitton’s Multicolore Monogram trademark has been diluted. In essence, Reitter sought to compare, in November and December, 2006, the trademark recognition level of the Louis Vuitton Multicolore Monogram Mark to the trademark recognition level of the traditional Louis Vuitton Toile Monogram Mark in brown and chestnut. The recognition level of the latter mark was thought to serve as a “benchmark” or control against which to compare the recognition level of the Louis Vuitton Multico-lore Monogram Mark. Reitter reasoned:

The idea behind this design is that, if the claimed Multicolore trademark has been diluted in some way particular to its multi-color character, members of the relevant universe should perceive it as a source-indicator for Louis Vuitton at a lower rate than they perceive the non-diluted, non multi-color Classic Louis Vuitton trademark pattern as a source-indicator for Louis Vuitton.

In this connection, Reitter points to the results of the 2004 Reitter Recognition Survey, which showed that “a monochromatic Louis Vuitton monogram pattern and a Multicolore monogram pattern had equivalent abilities to indicate to consumers that Louis Vuitton was the source of the bags.” Ultimately, Reitter reasons, “if the Classic and Multicolore looks are still equally recognized in 2006, it would indicate that the Multicolore look could not have been meaningfully diluted.”

Working from this “alternative” methodology, the 2006 Reitter Dilution Survey found that 148 (or 70.1 percent) of the 211 respondents shown the photograph including a bag bearing the Louis Vuitton Classic Pattern named Louis Vuitton in response to either Question la (145 respondents) or lb (3 respondents). By comparison, 161 (or 74.2 percent) of the 217 respondents shown the photograph including a bag bearing the Louis Vuitton Multieolore Monogram Mark named Louis Vuitton in response to either Question la (157) or lb (4). From these results, Reitter concludes: “[T]he Multicolore monogram pattern is as effective a trademark for Louis Vuitton as the very famous Classic brown monogram pattern. Therefore, it would be fair to conclude that the Multicolore monogram pattern has not lost any ability to function as a trademark for Louis Vuitton, and, therefore, could not have been meaningfully diluted.”

2.Discussion

а. The Reitter Confusion Surveys

Louis Vuitton challenges the admissibility of both the 2004 Reitter Confusion Survey and the 2006 Reitter Confusion Survey. As to the 2004 survey, it reiterates the arguments it previously made before Judge Scheindlin. The objections that were credited by Judge Scheindlin were, as stated above: (1) it was a reading test; (2) the choice of the control bag was improper; and (3) there were coding errors.

As to the 2006 survey, Louis Vuitton raises a number of arguments, including:

1. The three components of the 2006 Reitter Confusion Survey were not uniformly conducted at “upscale” malls.
2. The three components of the 2006 Reitter Contusion Survey were conducted at different sets of locations.
3. The control bag used in the 2006 Reitter Control Survey improperly resembled a Louis Vuitton bag.
4. The sample sizes used in each cell of the three components of the 2006 Reitter Confusion Survey were too small to produce meaningful results.
5. The 2006 Reitter Confusion Survey improperly analyzed confusion.
б. The 2006 Reitter Confusion Survey cannot vindicate the results of the 2004 Reitter Confusion Survey, given changes in the marketplace.
7.The survey was result-oriented because counsel for Dooney & Bourke “gave Mr. Reitter a mandate to prove [Judge Scheindlin] wrong.”

We focus first on the problems of methodology and application that are evident in the 2004 survey. We address these problems quickly because Judge Scheindlin has already discussed the most important flaws in the methodology. Dooney & Bourke does not argue that Judge Scheindlin was incorrect in finding flaws in Reitter’s methodology. Rather it argues that the 2006 study shows that those flaws had no effect on the results.

We then determine whether the 2006 survey reliably shows that the flaws in the 2004 survey had no effect on the results.

b. The Flaws in the 2004 Survey

i. Reading Test

As noted above, Judge Scheindlin found that the 2004 Reitter Confusion Survey was flawed because it was nothing but a reading test, because the Dooney & Bourke nametag was readily evident to all the respondents. We note that the respondents were not reading the Dooney & Bourke Multicolor Monogram Mark itself. Rather, they were reading a sign that stated the provenance of the bag. The survey thus could not have been more flawed if the interviewer had said, “Now, look at the Dooney & Bourke bag and tell me who you think makes that bag.” The courts have, rightly we believe, held that surveys in which the respondents are able to read the name of the manufacturer on the product are not probative of consumer confusion. See Franklin Res., Inc. v. Franklin Credit Mgmt. Corp., 988 F.Supp. 322, 335 (S.D.N.Y.1997) (“Surveys which do nothing more than demonstrate the respondents’ ability to read are not probative on the issue of likelihood of consumer confusion.”) That proposition is illustrated by American Greetings Corp. v. Dan-Dee Imports, Inc., 619 F.Supp. 1204, 1216 (D.N.J.1985), modified on other grounds, 807 F.2d 1136 (3d Cir.1986), where two surveys allowed the respondents to view toys in packaging which clearly contained the name of the manufacturers. The court found that the surveys were not probative of an absence of confusion:

The results of these two surveys do not establish that there was no confusion between the Care Bears and New Good-time Gang, either by the children or their mothers. Rather, in the court’s view, this survey tested the participants’ ability to read and little else. To put the products side by side and ask for a selection to be made by product name, when the product name appears on the package does not adequately test the existence of confusion.

Id. at 1216.

We conclude that the presence of the Dooney & Bourke nametags is a flaw so fundamental that it is enough on its own to render the 2004 Reitter Confusion Survey inadmissible — unless the 2006 Reitter Confusion Survey Report can somehow come to its rescue. We recognize that Judge Scheindlin did not find the 2004 Confusion Survey “inadmissible” because it was a reading test. But she did not have to do so under the circumstances, as the survey was offered in a preliminary injunction proceeding with the court as factfinder. It is well-established that the Daubert gatek-eeping standard is applied more flexibly when the judge is the factfinder, and accordingly more rigorously when the expert testimony is to be presented to a jury. See, e.g., Deal v. Hamilton County Board of Educ., 392 F.3d 840, 852 (6th Cir.2004) (“The ‘gatekeeper’ doctrine was designed to protect juries and is largely irrelevant in the context of a bench trial.”); United States v. Brown, 415 F.3d 1257, 1269 (11th Cir.2005) (“There is less need for the gatekeeper to keep the gate when the gatekeeper is keeping the gate only for himself.”)

Therefore it is incumbent upon Dooney & Bourke to prove by a preponderance of the evidence that this serious flaw did not affect the outcome of the 2004 Reitter Confusion Survey. This it seeks to do through the 2006 Name Sign and 2006 No Name Sign Surveys, which are discussed below.

ii. Ineffective Control Bag

With respect to the 2004 Confusion Survey, Judge Scheindlin also noted that “Reitter’s failure to take into account the similarity between his brown, monochrome Dooney & Bourke control bag and Louis Vuitton’s traditional gold and chestnut bag decreases the probative value of his [2004] study.” Louis Vuitton I, 340 F.Supp.2d at 446. This error in judgment on Reitter’s part would not on its own be enough to exclude the 2004 Confusion Survey. As discussed above with respect to Dr. Erick-sen’s survey, the control bag chosen by Reitter was not a very good “noise” reducer, but it seems to have been better than no control at all. See Shari Seidman Diamond, Reference Guide on Survey Research in MANUAL ON SCIENTIFIC EVIDENCE 2d at 258 (2000) (quoted above).

However, while the poor choice of control bag is not dispositive of inadmissibility, we note again that both Rule 702 and 403 require the court to look at the cumulative effect of all of the flaws in a survey. See Mastercard Int’l Inc. v. First Nat’l Bank of Omaha, Inc., 2004 WL 326708 at 10-11, 2004 U.S. Dist. Lexis 2485 at * 30 (excluding a survey because of the cumulative impact of flaws in survey methodology). Thus the poor choice of control bag is an important factor cutting toward exclusion of the 2004 survey — unless the 2006 survey proves that the error in choice of control bag did not affect the results.

iii. Coding Errors

The coding errors mentioned by Judge Scheindlin in Vuitton I may have resulted in an undercounting of confusion, but as we discussed above in connection with the Ericksen and Jacoby surveys, coding errors can be corrected by a review of the underlying data, and can be inquired into on cross-examination. Nonetheless, the more serious the coding errors, the less probative and reliable is the survey because its results are overstated. While the coding errors are not dispositive of admissibility, we find they add to the cumulative effect of the methodological errors in the 2004 Reitter Confusion Survey.

iv. Choice of Malls and Universe of Respondents

As discussed above, many of the malls used by Reitter in both the 2004 and 2006 surveys were not “upscale.” The parties disagree over the significance of this fact, Reitter argues that “upscale” people can be found in midscale and downscale malls — it just takes a little longer to find them. Louis Vuitton argues that the kind of upscale people who condescend to shop at midseale and downscale malls are not typical of the consumers who are relevant in this litigation, and so the “universe” for the survey was improperly skewed.

“A ‘universe’ is that segment of the population whose perceptions and state of mind are relevant to the issues in the case.” Citizens Financial Group, Inc. v. Citizens Nat. Bank of Evans City, 383 F.3d 110, 118-19 (3d Cir.2004) (quoting J. Thomas McCarthy, McCarthy on Trademarks and Unfair Competition, § 32:159 (4th ed.2003)). “A flawed universe minimizes the probative value of a survey” and may lead to “skewed results.” Conopco, Inc. v. Cosmair, Inc., 49 F.Supp.2d 242, 253 (S.D.N.Y.1999).

We find Reitter’s selection of malls to be troubling, especially because his explanation — it took us longer but we got our universe — is dependent on the questions that he used to weed out potential respondents who are not representative of the consumers whose confusion would matter in this case. The screening criteria used by Dr. Reitter were mild indeed. Reitter screened for respondents who were likely to purchase “within the next year or so” a purse or handbag costing more than $100. Louis Vuitton remarks that a $100 handbag is “hardly an indicia of prestige,” and notes that most of the bags Dooney & Bourke sells exceed this price. We agree that Reitter’s screening criteria did little to assure that he got the right consumers for the survey. Therefore there is serious doubt that Reitter’s rationale for using the malls he did was justified. Thus we have another flaw in the survey’s methodology that decreases its reliability and probative value and correspondingly increases its prejudicial effect — another factor to add to the accumulating case against admissibility of the 2004 Reitter Confusion Survey.

v. The “Permission” Question

In Louis Vuitton I, 340 F.Supp.2d at 444-45, Judge Scheindlin criticized the Ja-coby survey, in which respondents were asked to indicate whether “to come out with this bag,” the company “needed to get permission or a license from the company whose bags were shown in the ad [Louis Vuitton].” Judge Scheindlin noted that “[similar questions have been included in previous Jacoby studies and rejected by courts because they improperly ask respondents for a legal conclusion.”

Reitter’s 2004 Confusion Survey suffered from essentially the same flaw. Question 3c asked “what if anything can you tell me about the company that gave the maker of this handbag authorization?” and 3d asked “Why do you think that that maker of this handbag has authorization from that company?” It is true that the Reitter question does not use the word “license” but that is at best a distinction of degree and not kind.

The flaw in asking questions about permission or licensing does not infect the entirety of the survey and is unlikely to skew the results so dramatically as to be dispositive of admissibility on its own. But once again it is a flaw that is accumulated with the other flaws in methodology that must be considered under Rules 702 and 403.

vi. Preliminary Summary on the Admissibility of the 2004 Reitter Confusion Survey

Without some major assistance from the 2006 Reitter Confusion Survey, we have no doubt that the 2004 Reitter Confusion Survey should be excluded. Dooney & Bourke has not come close to proving that Reitter employed reliable methods in a reliable manner. Indeed at least one of the flaws in methodology — the reading test — is enough on its own under the applicable case law to render the survey inadmissible. And we have noted several other problems that, when accumulated, add even more to the case for exclusion. We now proceed to consider whether the 2006 Reitter Confusion Survey can save the 2004 Reitter Confusion Survey from exclusion.

c. Flaws in Methodology of the 2006 Confusion Survey

i. Mall Selection

We begin by noting that the 2006 Reit-ter Confusion Survey suffers from some of the same methodological flaws that beset the 2004 survey. Like the 2004 survey, the 2006 survey targeted a number of malls that were admittedly not upscale and used the same low screening standard by admitting respondents who were likely to purchase in the near future a purse or handbag costing more than $100.

ii. Sampling Method

Another indicator of unreliability of the 2006 Reitter Confusion Survey is that the sampling method Reitter employed was far from ideal. The sampling method prevented within-location comparisons among respondents who were exposed to the control bags used in the 2006 Reitter Control Survey, respondents who were exposed to the Dooney & Bourke Name Sign, and respondents who were not exposed to the name sign. Thus, to make inferences across the various components of the overall 2006 Reitter Confusion Survey, we must assume that the malls in which the various surveys were conducted were roughly similar, or at least that the respondents in those malls were roughly similar. That assumption is not obvious. See Keystone Camera Products Corp. v. Ansco Photo-Optical Products Corp., 667 F.Supp. 1221, 1233 (N.D.Ill.1987) (including use of different stimuli in different test locations is a flaw in the methodology of a survey). Reitter’s sampling method, insofar as it precludes within-location comparison, diminishes the reliability and probative value of the 2006 Confusion Survey— though it is not the kind of fundamental error that would mandate exclusion on its own.

iii.Sample Sizes

The individual cells of the 2006 Reitter Name Sign Survey and the 2006 Reitter No Name Sign Survey contained an average of 55 respondents. In Mastercard Intern. Inc. v. First Nat. Bank of Omaha, Inc., No. 02 Civ. 3691, 2004 WL 326708, at *10 (S.D.N.Y. Feb 23, 2004), Judge Cote excluded the defendant’s survey pursuant to Rules 403 and 702 of the Federal Rules of Evidence in part due to the survey’s low number of respondents. Specifically, “of the 52 respondents who completed the internet survey, 27 were shown the FNBO materials (the ‘test cell’), and 25 were shown the Fleet materials (the ‘control group’)”. Id. at *3. In comparison, in Pilot Corp. of America v. Fisher-Price, Inc., 344 F.Supp.2d 349 (D.Conn.2004), involving a survey conducted by Reitter, the court recognized that “the Reitter Survey suffered from a small sample size” but credited it nevertheless. Id. at 359. There, the survey consisted of 160 respondents divided evenly into a test group of 80 respondents and a control group of 80 respondents. Id. at 353. In denying the plaintiffs preliminary injunction motion, the court explained that the “small sample size only makes the results less precise, not wholly inaccurate, and ... because the results in this case showed 0% confusion, even if that result is somewhat imprecise, it is still strong evidence of lack of confusion.” Id. at 353 n. 9. Pilot Corp., however, involved a preliminary injunction motion; its reasoning is not as persuasive when applied to a survey that is intended to be used in a jury trial, where, as discussed previously, Daubert is more stringently applied.

The low number of respondents is one more factor that diminishes the reliability and probative value of the 2006 Reitter Confusion Survey. But the flaw is not dispositive on its own, in part because Reitter’s results are so strong that they counter to some extent a concern about the size of the sample. See Lon Tai Shing Co., Ltd. v. Koch + Lowy, 1991 WL 170734, at *9 (S.D.N.Y. June 20, 1991) (small sample size countered by strength of the results). Furthermore, the surveys’ results were roughly equivalent across the various test cells, which buttresses the reliability of the surveys’ overall findings. Cf. Schering Corp. v. Pfizer, Inc., 189 F.3d 218, 222-23 (2d Cir.1999) (five surveys with sample sizes from 74 to 200, all supporting finding of confusion); McNeilab, Inc. v. American Home Prods. Corp., 675 F.Supp. 819, 822-824 (S.D.N.Y.1987) (survey of 149 people, supported by two additional studies finding confusion).

iv. Eveready Presentation

Louis Vuitton asserts that Reitter should have employed a methodology involving a “ ‘sequential presentation’ or ‘line-up’ ” of stimuli, because, in the marketplace, “consumers have a reasonable likelihood of encountering the marks at issue one after the other in a store or mall.”

We believe this criticism has merit, especially given the Second Circuit’s instruction in this case — that likelihood of confusion may be determined in part by how the products appear to a prudent consumer “when viewed sequentially in the context of the marketplace.” Louis Vuitton Malletier v. Dooney & Bourke, Inc., 454 F.3d 108, 117 (2d Cir.2006). Thus a survey that purports to approach market conditions pertinent to the substantive standard of likelihood of confusion should try to take account of the possibility of sequential viewing. The use of the “Eveready” method of presentation — at least its exclusive use — diminishes the reliability and probative value of the survey and correspondingly raises the risk of jury confusion and prejudice.

v. Other Alleged Methodological Flaws in the 2006 Confusion Survey

(a) Poor Choice of Control Bag

Louis Vuitton claims that Reitter’s use of the Blue Control Bag in the 2006 Reit-ter Confusion Survey rendered the results just as questionable as those in 2004. Louis Vuitton argues that because the Blue Control Bag resembled at least two Louis Vuitton bags being sold at the time that Reitter conducted the 2006 Reitter Confusion Survey, respondents would bé more likely to name Louis Vuitton upon seeing the Blue Control Bag, which would improperly inflate the level of “noise” detected by the 2006 Control Survey. We disagree; the Blue Control Bag marks a substantial improvement from the brown bag used in the 2004 Survey. Unlike the Brown Control Bag, the Blue Control Bag does not resemble a well-known Louis Vuitton bag, at least not one so well known as the iconic Louis Vuitton bag bearing the Louis Vuitton Classic Pattern. The Blue Control Bag also differed in important respects from the Louis Vuitton “Monogram Denim and Monogram Denim Cruise” bags. The Blue Control Bag was made of canvas, while the Louis Vuitton bags were made of denim. The Blue Control Bag was also of a different shade of blue. We reject Louis Vuitton’s complaints about the Blue Control Bag.

(b) Reading Test

Louis Vuitton argues that the 2006 Reit-ter No Name Sign Survey was as much a reading test as the 2004 Reitter Confusion Survey or the 2006 Reitter Name Sign Survey. But respondents in the 2006 Reit-ter No Name Sign Survey were exposed to no more than the interlocking initials “D” and “B.” As noted above, this does not constitute the kind of reading test traditionally criticized by courts. See Conopco, Inc. v. Cosmair, Inc., 49 F.Supp.2d 242, 254-55 (S.D.N.Y.1999) (surveys where respondents look at the mark are “helpful” when “the source of the alleged confusion is not just a name, word or phrase”).

(c) Close Viewing Range

Louis Vuitton further argues that because respondents were invited to view the bag placed before them at “close viewing range” for as long as they wished, Reitter failed to test for initial-interest or post-sale confusion. We have gone over this ground already in connection with the Ericksen surveys. The allegedly infringing mark is not the “look” of the It-Bags, but rather the Dooney & Bourke Multicolor Monogram Mark. To test whether consumers are confused with respect to this mark, rather than the overall “look,” survey respondents must be able to see the allegedly infringing mark. Reitter’s methodology did no more than meet this basic requirement. Moreover, such a methodology does approach market conditions, as it can be expected that consumers will investigate an expensive bag closely before purchasing it. It is true that close viewing does not replicate all aspects of the market that are pertinent to trademark infringement, but that flaw in the Reitter 2006 Confusion Survey is already discussed above in the context of the Eveready presentation of products.

d. The Relation Between the 2004 Reitter Confusion Survey and the 2006 Reitter Confusion Survey

We now assume that despite the methodological flaws referred to above, the 2006 Reitter Confusion Survey is reliable enough, standing on its own, to satisfy the strictures of Rules 702 and 403. But even given that assumption, the 2006 Confusion Survey by definition is only useful (and only offered) to shore up the substantial concerns expressed by Judge Scheindlin about the 2004 Confusion Survey. It is the 2004 Confusion Survey that would be doing the heavy lifting at trial.

The question we must address is whether the findings of the 2006 survey actually do answer the doubts about the 2004 survey. We start by addressing Louis Vuitton’s argument that the 2006 survey was a failed venture from the beginning, because its intent was just to prove Judge Scheind-lin wrong, and Reitter was directed by counsel to provide that proof. This is an overstatement. First, neither Reitter nor Dooney & Bourke are arguing that Judge Scheindlin was wrong in her assessment that the 2004 survey was (1) a reading test with (2) a poor choice of control bag. Rather they argue that while Judge Scheindlin’s concerns had merit, the flaws she pointed out did not, in this specific case, affect the results of the 2004 survey. Second, contrary to Louis Vuitton’s assertion of a “mandate”, counsel for Dooney & Bourke did not direct Reitter to prove Judge Scheindlin wrong. Rather, the direction was “let’s conduct studies to determine whether the criticisms that the criticisms that the Court had were valid and to what extent they were valid.”

Louis Vuitton next argues that the findings of the 2006 Reitter Confusion Survey cannot vindicate the results of the 2004 Reitter Confusion Survey because the marketplace changed significantly during the two-year period separating the surveys. Specifically, with respect to the 2006 Reitter No Name Sign Survey, Louis Vuitton argues that a greater number of the relevant universe of consumers was familiar with the Dooney & Bourke Multicolor Monogram Mark in 2006 than was familiar with it in 2004. Thus, a substantial proportion of respondents viewing an It-Bag in 2006 may have been able to identify the bag as made by Dooney & Bourke even if the bag did not bear a Dooney & Bourke Name Sign, while the same cannot be said of respondents viewing an It-Bag in 2004. In essence, Louis Vuitton argues that by 2006, respondents were sufficiently familiar with Dooney & Bourke’s line of Ib-Bags so that the inclusion or exclusion of the name sign made no difference.

We agree that the results of the 2006 Reitter Name Sign Surveys must be excluded for the reasons stated by Louis Vuitton. The very reason that a standard dilution test cannot be reliable when conducted after the fact — the change in the market and in consumer knowledge of the products — is also the reason that the 2006 Reitter No Name Sign Survey is by definition unreliable and of no assistance to the trier of fact. Reitter could not control for the fact that the respondents looking at the Dooney & Bourke bag without a name tag had been subject to two years of seeing those bags in advertisements in stores, on the street, at the office, etc. These were not the same type of respondents who were looking at the bag in 2004. Moreover, even if Reitter could somehow control for the change in market conditions and consumer awareness between 2004 and 2006, he in fact made no attempt to do so. Dooney & Bourke certainly has not carried its burden of proving that the No Name Sign Survey is reliable, when it fails to account for the very confounding factor that, by Dooney & Bourke’s own admission, would render a standard dilution survey unreliable.

The failure to account for the likely effect of a change of market conditions and consumer awareness on the respondents thus renders the 2006 Reitter No Name Sign Survey inadmissible; and because the No Name Sign Survey is inadmissible, the Name Sign Survey becomes meaningless and is of no probative value. Accordingly, the Name Sign Surveys are inadmissible.

We now assess the impact of the fundamental flaw in the methodology of the Name Sign Surveys. We find that it is enough to dispose of both the 2004 and 2006 Reitter Confusion Surveys, for the following reasons:

1. The flaw of not controlling for (or being unable to control for) changed market conditions is so critical that it alone is enough to render the 2006 Name Sign Surveys inadmissible under Rules 702 and 403.
2. Even if that flaw is not dispositive, the other methodological flaws discussed above in the 2006 Confusion Survey, including choice of malls and small sample size, serve to further diminish the reliability of the Name Sign Surveys, removing any possible doubt about their inadmissibility.
3. Because the Name Sign Surveys are inadmissible, this means that a crucial flaw in the 2004 Reitter Confusion Survey — the reading test — remains unexplained. That is such a critical methodological flaw that it is enough under the case law to render the 2004 survey inadmissible under Rules 702 and 403.
4. Even if that flaw were not disposi-tive, the other methodological flaws in the 2004 Reitter Confusion Survey, discussed above — including choice of malls, the skewed universe, and the permission question — serve to further diminish the reliability of the 2004 Reitter Confusion Survey, removing any possible doubt about its inadmissibility.

Finally, for the sake of completeness, we consider whether the 2006 Reitter Confusion Survey adequately answered the legitimate concerns about the poor choice of control bag in the 2004 survey. We find that it did. Unlike the 2006 Reitter No Name Sign Survey, we cannot think of a way in which the change in market conditions and consumer awareness would affect the use of a different (and better) control bag. So the critical flaw in the 2006 Reit-ter No Name Sign Survey does not infect the Control Bag Test. And while it is true that the other methodological flaws referred to above — choice of malls and small sample size — are probably pertinent to the results of the 2006 Reitter Control Survey, those flaws are not so serious on their own as to render this component of the overall 2006 Reitter Confusion Survey completely inadmissible.

But even if the 2006 Reitter Control Survey satisfies the standards of Rules 702 and 403, that provides no help to Dooney & Bourke. All it could mean is that the poor choice of control bag did not affect the results of the 2004 Reitter Confusion Survey. It provides no answer for the critical error of turning a survey into a reading test; nor does it explain away the other methodological flaws of the 2004 Reitter Confusion Survey which while not dispositive on their own, add significant weight to the case for exclusion.

e. The 2006 Reitter Dilution Survey

Louis Vuitton challenges the admissibility of the 2006 Reitter Dilution Survey on the following grounds:

1. The results of the 2006 Reitter Dilution Survey are not relevant to the issue of dilution.
2. The 2006 Reitter Dilution Survey used improper stimuli.
3. The 2006 Reitter Dilution survey did not ask respondents to explain their answers.

We consider each of these criticisms in turn.

i. The Relevance of the 2006 Reitter Dilution Survey

With respect to Louis Vuittoris blurring claim, “the central issue [is] whether the It-Bag monogram diminishes the ability of the Monogram Multicolore marks to identify the Louis Vuitton bags.” Vuitton /, 340 F.Supp.2d at 451. The results of the 2006 Reitter Dilution Survey cast no light on this “central issue.” Like the dilution survey conducted by Dr. Wind and considered by Judge Scheindlin in Vuitton I, 340 F.Supp.2d at 451-52, the 2006 Reitter Dilution Survey “reveals little except that there is high consumer recognition of the Louis Vuitton Monogram Multicolore marks.” Id. at 451. Dooney & Bourke argues that because the 2006 Reitter Dilution Survey uses the recognition level of the Louis Vuitton Classic Pattern as a benchmark, the survey does more than merely show that the Louis Vuitton Multicolore Mark is well-recognized by consumers. Dooney & Bourke reasons that because, in 2006, the recognition level of the Louis Vuitton Multicolore Mark was not lower than the recognition level of the Louis Vuitton Classic Pattern (and was, in fact, higher), the ability of the former mark to identify Louis Vuitton bags was not diminished up to that point by anyone, let alone Dooney & Bourke.

Yet Dooney & Bourke’s reasoning is fundamentally flawed. It assumes that the recognition level of the Louis Vuitton Monogram Multicolore Mark could not have been higher than that recorded in the 2006 Reitter Dilution Survey. In fact it is possible that the recognition level of the Louis Vuitton Multicolore Monogram Mark may have been higher but for the existence in the marketplace of Dooney & Bourke’s It-Bags. The use of the Louis Vuitton Classic Pattern as a “benchmark” does nothing to control for this possibility. Dooney & Bourke argues that the 2004 Reitter Recognition Survey shows that at the time of that survey, the Louis Vuitton Classic Pattern and the Louis Vuitton Multicolore Monogram Mark “had equivalent abilities to indicate to consumers that Louis Vuitton was the source of the bags.” That may be so, but, again, it does not preclude the possibility that during the two-year period between the two studies, the recognition level of the Louis Vuitton Multico-lore Monogram Mark might have been higher but for the existence in the marketplace of Dooney & Bourke It-Bags.

We also note that Reitter provides no support in the literature or in any other survey for the method he used. Reitter’s attempt to solve a problem — the inability to assess dilution in 2004 from the perspective of 2006 — with an untested theory does not meet the standards of Rule 702 and Daubert. Certainly Dooney & Bourke has not met its burden of proving that Reit-ter’s ad hoc use of a new theory of testing dilution satisfies the reliability requirements of Rule 702 and Daubert. See, e.g., Braun v. Lorillard Inc., 84 F.3d 230, 235 (7th Cir.1996) (expert testimony properly excluded where it relied on unproven methods used after the established methodology led to an inconclusive result; if an expert “proposes to depart from the generally accepted methodology of his field and embark upon a sea of scientific uncertainty, the court may appropriately insist that he ground his departure in demonstrable and scrupulous adherence to the scientist’s creed of meticulous and objective inquiry.”).

We recognize that for many of the reasons Reitter himself outlined in the 2006 Reitter Dilution Survey Report, Reitter faced significant difficulties in crafting in 2006 a dilution survey that would test for whether dilution had occurred over the course of the preceding two years. But this cannot excuse the fundamental methodological flaw of failing to account for the possibility of dilution in what is supposed to be a dilution survey. Cabrera v. Cordis Corp., 134 F.3d 1418, 1422 (9th Cir.1998) (testing procedure used only by the expert was not shown to be reliable, where the relevant community did not recognize any method for resolving the problem presented, and the expert provided no explanation for why his methods were accurate).

We find this fundamental methodological flaw to be a sufficient basis to exclude the 2006 Reitter Dilution Survey under Rules 702 and 403. We nevertheless review Louis Vuitton’s other criticisms of the 2006 Dilution Survey. We find that these other criticisms do not identify flaws in the survey that would in themselves require exclusion, but they do diminish to some extent the reliability and probative value of the 2006 Reitter Dilution Survey.

ii. The 2006 Reitter Dilution Survey Stimuli

“Although no survey can construct a perfect replica of ‘real world’ buying patterns, a survey must use a stimulus that, at a minimum, tests for confusion by roughly simulating marketplace conditions.” Trouble v. Wet Seal, Inc., 179 F.Supp.2d 291, 308 (S.D.N.Y.2001). Louis Vuitton argues that the stimuli Reitter used in the 2006 Reitter Dilution Survey failed to simulate marketplace conditions because consumers rarely if ever see a Louis Vuitton bag next to the four bags that were shown in the two photographs Reitter used. This criticism has some validity. But if grouping the handbags was the only flaw, we believe that it would go to weight and not admissibility. While not an exact replication of the market place, Reitter’s grouping procedure was not so wildly out of place as to render the results completely unreliable. However, the flawed presentation adds to the case for inadmissibility — so even if the misguided nature of the enterprise were not enough to exclude the Dilution Survey, the flaw in presentation quells any doubts about its exclusion.

Louis Vuitton also argues that the presentation of the bags was flawed because the four bags shown in the photographs were not on the market in 2004 when Dooney & Bourke’s It-Bags allegedly began to blur the distinctiveness of the Louis Vuitton Monogram Multicolore. That criticism is a red herring. The 2006 Reitter Dilution Survey sought to test the level of recognition of the Louis Vuitton Multico-lore Mark in 2006, and therefore sought to replicate marketplace conditions as they existed at that time, not as they existed in 2004.

iii. Reitter’s Failure to Ask the 2006 Dilution Survey Respondents to Explain Their Answers

It is well-established that trademark confusion surveys should ask follow-up questions in the nature of “What makes you say that?” See Sears, Roebuck and Co. v. Menard, Inc., No. 01 C 9843, 2003 WL 168642, at *3 (N.D.Ill.2003) (characterizing as a “major flaw” of the survey at issue its failure to ask a follow-up question); ConAgra, Inc. v. Geo. A. Hormel & Co., 784 F.Supp. 700, 725 (D.Neb.1992) (“The ‘[w]hat makes you say that?’ question is a typical question designed to isolate, and therefore explain, the real thought processes of the respondent who evidences confusion. In other words, this type of question is designed to determine whether a person is confused for relevant trademark reasons or for some other unrelated and therefore irrelevant reason. Well-designed studies typically employ the ‘[w]hat makes you say that?’ question or some variation.”); 6 McCarthy on Trademarks and Unfair Competition § 32:175 (2007) (“Often, an examination of the respondents’ verbatim responses to the ‘why’ question are the most illuminating and probative part of a survey, for they provide a window into consumer thought processes in a way that mere statistical data cannot.”)

The 2006 Reitter Dilution Survey failed to ask a follow-up question seeking an explanation for why the respondent named the brand she did. It is possible, therefore, that some proportion of the respondents named Louis Vuitton because, for example, they mistakenly believed that the Kate Spade bag shown was manufactured by Louis Vuitton or because Louis Vuitton was the only brand that came to mind. Yet as Reitter explained in the 2006 Reit-ter Dilution Survey Report, none of the four other bags used in the photographs reported an especially high or low recognition level. The respondents were also instructed not to guess. Though the failure to ask follow-up questions is a flaw in the methodology of the 2006 Dilution Survey, it is not under the circumstances so serious as to render the Survey inadmissible on its own. But, again, it does add strength to the case for exclusion on the more important ground of the essential irrelevance of the survey in proving dilution — not to speak of the flawed presentation of the handbags.

3. Summary on Reitter Surveys

We conclude that all of the Reitter surveys should be excluded in their entirety, and also that any testimony or report on the basis of such surveys should be similarly excluded.

III. Dr. Richard A. Holub

A. Facts

Dooney & Bourke moves to exclude the testimony and report of Dr. Richard Ho-lub. Dr. Holub was retained by Louis Vuitton to study and compare the use of color in the multicolor handbags of Doo-ney & Bourke and Louis Vuitton. He relied on “color measurements that enable ‘apples-to-apples’ comparisons of data gathered from the two brands such that the comparisons are independent of the manufacturing process.” Dr. Holub also relied on “principles of Probability and Statistics” for his conclusions on the meaning of the findings of the use of color in the handbags. Dr. Holub first conducted a “macro-analysis” that “involved careful, qualitative observation of the samples, occasionally employing a loupe, or magnifying glass to reveal finer details of the printed, textiles.” From this “macro-analysis” Dr. Holub drew an “immediate impression of similarity of the products” which he derived from the common textured canvas and background colors, as well as the “very similar” use of interlocking letters; in addition, “one can’t help but be struck by the similarities in the use of gold and leather accents.” Dr. Holub used the “macro-analysis” to determine the basic colors that were used in the monograms on the respective handbags. He concluded that the color pairings in the Dooney & Bourke sample were so similar to the Louis Vuitton sample that the pairings “are not random, numerically or statistically speaking.” According to Dr. Holub, the results he obtained from his “macro-analysis” “suggest that there was deliberation regarding which of the possible permutations to include in the design.”

Dr. Holub also conducted a “micro-analysis” which involved the following steps; (1) the samples were digitally photographed, with adjustments made to minimize glossy reflections and to “normalize the response across the camera’s photo-dector array”; (2) a Digital Color Checker was used which, together with the photographic calibrations, operated to simulate a colorimeter; (3) pictures of the samples were converted into Tagged Image File Format (“TIFF”); (4) the stored images were opened in Adobe Photoshop, and Photoshop’s “eyedropper” tool was used to sample areas of the handbags; (5) “CIELAB” coordinates were taken — according to Dr. Holub, “CIELAB” is a “three-coordinate color space (having variables L*, a* and b*) that is mathematically interconvertible with the TriStimulus Values, X, Y and Z.”; (6) Dr. Holub then took readings of the samples — three readings from each monogram element — and he also recorded values for black and white backgrounds; (7) the readings for each monogram element were averaged as a means of estimating the “blended colors of the permutations when monogram elements are viewed under conditions in which individual letters cannot be resolved.” This meant that for the Doo-ney & Bourke products, “the palette comprises not only the 7 or 9 basic colors for printing letters, but also the additive effects of the colors of interlocking pairs of letters viewed at a distance or under circumstances in which visual acuity is reduced.” According to Dr. Holub, a blended colors adjustment was necessary for the Dooney & Bourke monogram because at distances of ten feet or more or under subdued illumination, the colors in the Dooney & Bourke monogram appear to blend. In Dr. Holub’s more technical jargon, “the various elements have different integrated color attributes depending on the viewing conditions.”

The results of the “micro-analysis” indicated to Dr. Holub that the color differences between the Dooney & Bourke bags and the Louis Vuitton bags “are remarkably small in view of all the variables at work in manufacturing the products.” To quantify just how “remarkably small” the difference was, Dr. Holub resorted to probability theory:

Although it is difficult to estimate the exact probability of picking a cyan, for instance, as close to a Louis Vuitton cyan [as found by Dr. Holub in the Dooney & Bourke bags] from among all the colors that could be picked, the probability is relatively low. Now multiply that fraction by a similar fraction for each Basic Color selection that is similarly close to a Louis Vuitton color. My conclusion is that the overall probability of the degree of correspondence seen in the Tables and Figures is very low and unlikely to have occurred merely by chance.

To confirm his findings of similarity of color usage, Dr. Holub prepared “6-way averages” for the color permutations that were not found in the Dooney & Bourke Multicolor Monogram Mark, and used the same procedure for the color combinations that were actually employed in the Dooney & Bourke design. The results of this procedure indicated to Dr. Holub that “permutations were chosen with the goal of favoring the representation of warm colors and flesh tones,” consistent with the data obtained in the “micro-analysis.”

Finally, Dr. Holub relied on tests conducted by Swain and Ballard with seeing robots that were taught to recognize objects based on color codes. These tests indicated that the lightness or dimension of color “is much less important than the chromatic attributes of hue and saturation.” As applied to the instant case, “Swain and Ballard’s results suggest that product confusion based on color cues occur even when Lightness values differ. Stated otherwise, the colors of a given multicolored monogram design pattern may appear lighter or darker depending on illumination conditions, but the hue will remain relatively constant and identifiable.”

B. Discussion

Dooney & Bourke argues that Dr. Ho-lub’s testimony and report are inadmissible on a number of grounds, including the following:

1. Dr. Holub’s expertise in colorimetry does not qualify him to render an opinion on the basis of probability theory.
2. Dr. Holub’s testimony will not assist the jurors, who can assess for themselves whether the Dooney & Bourke multicolor pattern is confusingly similar to the Louis Vuitton multicolor pattern.
3. Dr. Holub’s conclusions are faulty because he considered only the use of color, and not the actual logos and shapes in which the colors appear; as such his opinion is “useless to jurors, who must evaluate Louis Vuitton’s complete claimed mark” and not just the colors.
4. Dr. Holub’s conclusions are unreliable because he relied upon “subjective and unreviewed methodologies” that are “impossible to objectively confirm or test for reliability.”
5. Dr. Holub’s report is so complicated and over-technical that it is “unduly confusing and far more likely to hinder than to help a jury.”

Louis Vuitton offers Dr. Holub’s report and testimony for two purposes: (1) to prove the likelihood of confusion presented by Dooney’s multicolor logo; and (2) to prove Dooney’s willful intent to copy the Louis Vuitton Multicolore Monogram mark. The admissibility questions depend in part on the purpose for which Dr. Ho-lub’s testimony is offered, so we divide our analysis accordingly.

1. Opinion offered to prove the likelihood of confusion

a. Qualifications:

i. Colorimetry

Dr. Holub is sufficiently qualified to testify to issues pertaining to colorime-try. He has been, among other things, the Chief Color Scientist at Eastman Kodak, a principal engineer in color technology, and a professor at Boston University, researching and teaching issues pertinent to vision and color. Dooney & Bourke does not seriously suggest that Dr. Holub is unqualified to testify to the scientific and technical intricacies of color. Dooney & Bourke does, however, assert that Dr. Holub is unqualified to conclude, on the basis on probability theory, that the selection of colors for the Dooney & Bourke logo was a deliberate attempt to copy Louis Vuitton’s use of colors.

ii. Statistical probability

Dr. Holub’s expertise in colorimetry does not establish his expertise as a statistician. An expert qualified in one subject matter does not thereby become an expert for all purposes. Testimony on subject matters unrelated to the witness’s area of expertise is prohibited by Rule 702. See, e.g., Seatrax, Inc., v. Sonbeck Int’l, Inc., 200 F.3d 358 (5th Cir.2000) (in an infringement action, expert on marine cranes could not testify to the defendant’s profits from infringing activity). See also Eagleston v. Guido, 41 F.3d 865 (2d Cir.1994) (sociologist was qualified to testify about effects of domestic violence, but not about whether a police department provided sufficient training to its officers responding to domestic violence reports). Louis Vuitton certainly had the opportunity to retain an expert to interpret the statistical probabilities of random or deliberate choice of similar colors, but it did not do so. See Ancho v. Pentek Corp., 157 F.3d 512, 519 (7th Cir.1998) (“Just as a qualified and board certified heart surgeon does not possess sufficient knowledge of orthopaedic medicine to render an expert opinion on spine surgery, likewise we agree -with the trial court’s ruling that a mechanical engineer such as Lobodzinski lacks qualifications to give expert testimony about plant reconfiguration.... Ancho should have retained a qualified plant engineer to testify at trial and his failure to do so was a mistake in judgment for which he has no one to blame but himself.”).

Louis Vuitton argues that Dr. Holub is indeed qualified to give an expert opinion on probability, and asserts that he “used his familiarity with statistics/probability for 40 years in connection with his studies and work.” In his deposition, however, Dr. Holub admitted that he was “maybe not” an expert in probability theory. While he may have used statistics in his work (as most people do to one extent or another) this does not mean that he is sufficiently qualified to testify to the statistical significance of a common choice of colors. See IMPACT v. Firestone, 893 F.2d 1189 (11th Cir.1990) (no error in excluding testimony from a political scientist regarding statistical disparities in employment decisions, where the witness did not have training or significant experience as a statistician).

It is notable that Dr. Holub gives no explanation for why the color similarities he found are statistically significant. He simply concluded that the degree of similarity is “highly unlikely by chance alone.” He gave no probability number and no standard for determining statistical significance, but rather simply concluded that the chance of an unintentional overlapping choice of colors was “very low.” He cites probability theory in the simplest terms (e.g., that 72 is the number of permutations of nine things taken two at a time), but that is Probability 101. Dr. Holub’s bare conclusion on statistical significance, bereft of explanation, certainly does not bespeak his qualification as an expert on statistical probabilities.

b. Reliability of conclusions on statistical probability

Even if Dr. Holub were qualified as a statistician, his bare conclusion on probability should be excluded. The court does not fulfill its gatekeeper function if it simply accepts the ipse dixit of an expert. General Electric Co. v. Joiner, 522 U.S. 136, 146, 118 S.Ct. 512, 139 L.Ed.2d 508 (1997). Dr. Holub provided no explanation for why he found that the common use of colors was statistically significant, gave no indication that he employed reliable statistical methods to his findings, and even failed to place a figure on the degree of probability that he determined. His conclusion that the copying was deliberate on the basis of probabilities is the classic ipse dixit. It certainly does not establish by a preponderance of the evidence that he reliably applied a reliable methodology.

Finally, Dr. Holub’s conclusion that Doo-ney & Bourke’s choice of colors was unlikely to be “by chance alone” does not fit the facts of the case under Daubert. Doo-ney & Bourke does not contend that it chose colors out of a hat. Dr. Holub’s probability analysis (such as it was) looked only at whether the choice was made at random, and therefore did not take account of a number of obvious confounding factors that are pertinent to the dispute. For example, there are many colors that could be chosen “at random” but that would on sight be found unattractive or inappropriate for a handbag logo — most obviously the colors that are too close to the background color of the handbag itself. Moreover, general fashion trends may well influence the choice of palette. A statistical analysis conducted without attempting to rule out obvious alternative causes, as here, renders the expert opinion unreliable under Daubert. Raskin v. Wyatt Co., 125 F.3d 55 (2d Cir.1997) (statistician’s testimony properly excluded under Daubert when he failed to take account of other obvious causes for statistical deviation).

For these reasons, we find that Dr. Ho-lub is qualified to testify to issues of colori-metry, but he should not be permitted to testify to the unlikelihood that Dooney & Bourke’s choice of colors was by chance— nor to what amounts to the same assertion, that the choice of colors was a deliberate attempt to copy Louis Vuitton’s color choices.

c. Proper Subject Matter

While Dr. Holub is qualified to testify about the colors used in the Dooney & Bourke and Louis Vuitton handbags, this does not mean that his testimony on similarity of colors will assist the jury. The jury must decide whether Dooney & Bourke’s Multicolor Monogram Mark is confusingly similar to the Louis Vuitton Multicolore Monogram Mark. “The crucial issue in an action for trademark infringement ... is whether there is any likelihood that an appreciable number of ordinarily prudent purchasers are likely to be misled, or indeed simply confused, as to the source of the goods in question.” Mushroom Makers, Inc. v. R.G. Barry Corp., 580 F.2d 44, 47 (2d Cir.1978). Thus the jury must put itself in the position of an ordinarily prudent purchaser of the products. Ordinarily prudent purchasers do not come armed with digital photography, CIELAB, technical jargon and colorimeter approximation in evaluating and comparing products. Holub’s sophisticated techniques and highly technical presentation therefore provide no assistance to the jury’s determination of whether ordinary purchasers would be misled by the Dooney & Bourke Multicolor Monogram Mark.

The Ninth Circuit’s analysis in United States v. Hanna, 293 F.3d 1080 (9th Cir.2002) is instructive on the problem raised by Dr. Holub’s testimony in this case when offered to prove confusing similarity. Hanna was charged with threatening the President, based on letters he sent to various acquaintances expressing his opinion that the President should be killed. The legal standard for conviction was whether a reasonable person could foresee that his statements would be interpreted as a serious threat by those to whom he sent the letters. The trial court allowed Secret Service agents to testify as experts on threats to the President. The agents testified that they thought the letters sent by the defendant constituted a serious threat. The court of appeals found an abuse of discretion and reversed the conviction. It stated that “[wjithout additional assistance, the average layperson is qualified to determine what a ‘reasonable person’ would foresee under the circumstances.” 293 F.3d at 1086. The Hanna court emphasized that the experts were in fact “particularly unqualified to comment on what the ‘reasonable person’ would have foreseen.” Because of their extensive training, experience and expertise, Secret Service agents would be likely to see potential dangers to the President that a reasonable person receiving Hanna’s documents might not notice or would consider innocuous. It concluded that “using highly trained agents to determine what a reasonable person would foresee was like using a bloodhound to determine whether the average person would pick up a scent.” Id.

Similarly, in this case Dr. Holub is using sophisticated, hyper-sensitive methods and technical terminology to determine whether the average person would be confused by looking at the colors of the two marks. None of this would assist the jury’s determination of whether an ordinarily prudent purchaser would be confused as to the source of the Dooney & Bourke Multicolor Monogram Mark. See also Price v. Fox Entm’t Group, Inc., 499 F.Supp.2d 382, 389 (S.D.N.Y.) (Scheindlin, J.) (expert testimony on similarity of works would not assist the jury, which “can review the two works and decide for itself whether there are similarities that are probative of copying and how probative of copying those similarities are in light of plaintiff s proof of access”).

Nor would Dr. Holub’s opinion on color-blending at a distance and visual acuity assist the jury. Jurors can be given the opportunity to view the bags at a distance, and can see for themselves whether the colors blend. There is no need for an expert to tell them what they can see. Cf. Qualitex Co. v. Jacobson Products Co., Inc., 514 U.S. 159, 167-68, 115 S.Ct. 1300, 131 L.Ed.2d 248 (1995) (rejecting the argument that “[bjecause lighting (morning sun, twilight mist) will affect perceptions of protected color, competitors and courts will suffer from ‘shade confusion’ as they try to decide whether use of a similar color on a similar product does, or does not, confuse customers and thereby infringe a trademark.... We do not see why courts could not apply [legal] standards [relating to the similarity between word marks] to a color, replicating, if necessary, lighting conditions under which a colored product is normally sold.”).

Louis Vuitton asserts that “the very existence of colorimetry ... confirms that there is more to color than meets the eye.” That may be so, but under the applicable substantive law, the question of confusing similarity must be determined by what “meets the eye” of an ordinarily prudent purchaser.

Louis Vuitton cites a handful of trademark decisions in which experts have been permitted to testify to the similarity between the plaintiffs mark and the defendant’s product. But all of these cases are distinguishable. For example, Sherrell Perfumers, Inc., v. Revlon, Inc., 483 F.Supp. 188 (S.D.N.Y.1980), involved a claim of false advertisement — specifically that the defendant falsely claimed that its “copycat” perfume smelled the same as the plaintiffs more expensive import. The issue for the jury in such a case is whether the smells were the same in fact, not whether an ordinarily prudent person would find them to be so.

Louis Vuitton also relies on a number of cases involving injunctive relief in which the court and not a jury decided the issue of likelihood of confusion. For example, in Brennan’s Inc. v. Brennan’s Restaurant, LLC, No. 02 Civ. 9858, 2003 WL 1338681, at *2 n. 6 (S.D.N.Y.), the court allowed an expert to testify that people may truncate personal names included in a mark. But the court specifically noted that the expert was allowed to testify only for “purposes of the present motion (including the non-jury hearing)” without prejudice to defendants seeking a Daubert hearing in a subsequent proceeding before a jury. Injunction cases are distinguishable because there is no risk that a jury will be confused by experts telling them what to think, when the issue for the jury is what reasonable people think. See Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir.1999) (noting that the jury is “more likely than the judge to be awestruck by the expert’s mystique”). Many decisions recognize that the court’s obligation to scrutinize expert testimony is reduced when the judge is the factfinder. See, e.g., Gibbs v. Gibbs, 210 F.3d 491, 500 (5th Cir.2000) (no error in a bench trial in the court considering polygraph evidence: “Most of the safeguards provided for in Daubert are not as essential in a case such as this where a district judge sits as a trier of fact.”), and other cases cited above.

This is not to say that expert testimony is never admissible on the question of likelihood of confusion. The use of expert testimony on survey evidence is common in cases such as this one. But unlike Dr. Holub’s technical testimony on color choices, survey evidence, if reliably produced, can assist the jury’s determination of what an ordinarily prudent purchaser would be likely to find confusing. If the survey is properly conducted, the jury is informed about what ordinary purchasers out in the market actually found confusing or not. Through a reliable survey, jurors receive information that is central to their task — as distinct from technical information about color choices that has no bearing on what an ordinarily prudent purchaser would be seeing or thinking.

For all these reasons, we recommend that Dr. Holub not be permitted to testify to the likelihood of confusion between the Dooney & Bourke and Louis Vuitton multicolored monograms; nor should his report be admissible when offered to show that the color choices created a likelihood of confusion.

2. Opinion offered to prove intent

Louis Vuitton argues that Dr. Holub should nonetheless be permitted to testify insofar as his findings indicate that Doo-ney & Bourke may have intentionally copied the Louis Vuitton Multicolore Monogram Mark. The question of intent is separate from that of likelihood of confusion. Dooney & Bourke’s intent is potentially important in this case because, among other things, Louis Vuitton is seeking an accounting of Dooney & Bourke’s profits, and such an accounting is possible only upon a finding of willful intent on Dooney & Bourke’s part. Louis Vuitton Malletier v. Dooney & Bourke, Inc., 500 F.Supp.2d 276, 280 (S.D.N.Y.2007) (willful intent is a prerequisite for awarding profits). Moreover, while intent to infringe and likelihood of confusion are conceptually separate, the applicable law provides that if the infringement is unintentional, the plaintiff must prove actual consumer confusion, whereas if infringement is intentional, there is a presumption of confusion. See Louis Vuitton Malletier v. Dooney & Bourke, 500 F.Supp.2d at 279 n. 8. (“With respect to damages, it is well settled that a plaintiff will be entitled to damages upon establishing either actual consumer confusion or deception resulting from the violation, or that the defendant’s actions were intentionally deceptive thus giving rise to a rebuttable presumption of consumer confusion.” (quotations and citation omitted)). And while it is true that Dooney & Bourke would not be liable for using even the exact same colors as Louis Vuitton (because as discussed above, intent to copy colors is not the same as intent to infringe Louis Vuitton’s mark), it is also true that evidence of copying the colors is at least probative of an intent to copy the Louis Vuitton mark itself.

Even assuming, however, that Dr. Holub would be permitted to express an opinion pertinent to Dooney & Bourke’s intent to infringe on Louis Vuitton’s mark, he would not be permitted to testify that his findings in fact indicated that Dooney & Bourke intentionally copied Louis Vuitton’s colors. That conclusion would be based on the statistical unlikelihood of a so-called random match. For reasons expressed above, Dr. Holub is not qualified to provide an opinion on probability and his opinion on that subject is unreliable in any event. Thus, if permitted to testify on the question of intent, Dr. Holub would only be permitted to state his findings on the extent of the overlapping use of colors in the Dooney & Bourke and Louis Vuitton multicolored monogram handbags. Whether Dr. Holub’s testimony as so limited is admissible depends on three considerations:

1. Has Louis Vuitton proved by a preponderance of the evidence that Dr. Holub has employed reliable methods in identifying the colors that were used?
2. As applied to the question of intent to copy, does Dr. Holub’s testimony on color usage cover a topic that will assist the jury?
3. Would the probative value of Dr. Holub’s opinions, as bearing on Doo-ney & Bourke’s intent to copy Louis Vuitton’s mark, be substantially outweighed by the risk of prejudice and jury confusion, in light of the fact that the jury may use those opinions improperly as proof of likelihood of confusion?

We proceed to these questions.

a. Reliability

Dooney & Bourke argues that Dr. Ho-lub’s conclusions on the overlapping use of color in the multicolor monograms is unreliable because (1) the “macro-analysis” amounted to little more than counting, and (2) his “micro-analysis” suffers from lack of testing and is unsupported by any studies or literature in the field.

We note that Dr. Holub, in a declaration dated March 29, 2007, cites a textbook and articles that support his conclusions on the phenomenon of “blended colors.” But if, as we recommend, Dr. Holub will only be permitted to testify insofar as his opinions are relevant to intent to copy, then his opinion on “blended colors” cannot be part of that testimony. Any “blended colors” phenomenon is not probative of Dooney & Bourke’s intent to copy the colors, because there is no basis for finding that Dooney & Bourke might have been aware of the blended colors phenomenon, and intended to exploit it, when developing its multicolored logo. The reliability question is therefore focused on Dr. Holub’s methodology for determining the similar use of colors in the respective marks.

Dr. Holub’s opinions on the use of colors in the respective marks (when isolated from the opinions on probability and the blended colors phenomenon) do raise some questions of reliability under Daubert. We are not persuaded, however, by Doo-ney & Bourke’s argument that the “macro-analysis” is little more than counting. Dr. Holub used his extensive experience in colorimetry to evaluate the samples and to determine the number of color permutations and combinations presented. He explained how he reached his conclusions on the colors used in the samples. This is a sufficient showing of reliability to satisfy Daubert and Rule 702. See McCullock v. H.B. Fuller Co., 61 F.3d 1038 (2d Cir.1995) (engineer properly permitted to testify on the basis of experience, where he explained the methods and reasoning that led him to his conclusion).

Dr. Holub’s “micro-analysis” is somewhat more questionable, because he admits that he does not use a colorimeter; his process is only “capable of simulating a colorimeter, approximately.” He did not calculate any rate of error for his colormetric analysis. On the other hand, Exhibit 3 to Dr. Holub’s expert report cites a number of authorities that support his methods of systematizing colors.

On balance, we find that Dr. Holub’s “micro-analysis” methodology is sufficiently close to the generally accepted methods in the field as to provide a reliable means of determining the existence and mix of colors in the multicolored monograms. See Committee Note to 2000 Amendment to Rule 702 (“A review of the caselaw after Daubert shows that the rejection of expert testimony is the exception rather than the rule. Daubert did not work a “seachange over federal evidence law,” and “the trial court’s role as gatekeeper is not intended to serve as a replacement for the adversary system.”” United States v. 14.38 Acres of Land Situated in Leflore County, Mississippi, 80 F.3d 1074, 1078 (5th Cir.1996)). As the Court in Daubert stated: “Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but admissible evidence.” 509 U.S. at 595, 113 S.Ct. 2786. See also Ruiz-Troche v. Pepsi Cola of P.R. Bottling Co., 161 F.3d 77, 85 (1st Cir.1998) (“Daubert neither requires nor empowers trial courts to determine which of several competing scientific theories has the best provenance. It demands only that the expert’s conclusion has been arrived at in a scientifically sound and methodologically reliable fashion.”).

b. Proper Subject Matter

Dr. Holub’s opinion on the use of colors, as applied to the question of intent to copy, can assist the jury. The jury does not need assistance to determine the similarity of the marks, but on the question of intent it could be helpful for the jurors to know exactly how much of the pallette was used in each mark, and the exact extent to which the colors used by Dooney & Bourke overlap with the colors used by Louis Vuitton. That more technical assessment is something that not all jurors are likely to understand on their own. See generally United States v. Onumonu, 967 F.2d 782 (2d Cir.1992) (expert testimony can assist the jury if it is on a topic not likely to be within the ordinary understanding of some of the jurors); United States v. Mulder, 273 F.3d 91 (2d Cir.2001) (expert testimony on the operations and structure of a labor coalition is permissible because those matters are not widely known among the general public).

c. Rule 403

As discussed above, the Court in Dau-bert declared that the judge in applying Rule 403 must exercise “more control over experts than over lay witnesses.” 509 U.S. at 505, 113 S.Ct. 2742. In this case, if Dr. Holub testifies about the similarity in colors ostensibly offered only to show intent, the jury is likely to be influenced on the question of likelihood of confusion. It that happens, the jury will be improperly deferring to an expert on a question that the law leaves to reasonable jurors. Even if Dr. Holub is, as we recommend, prohibited from testifying to the statistical probability that the color overlap was intentional, the risk remains that the jury will be overcome by the references to CIELAB, colo-rimetry, Delta factors and the like, and that it will take the expert testimony as an instruction on how to decide the question of likelihood of confusion. Thus, Dr. Ho-lub’s testimony, even if limited to issues of intent, carries a high risk of prejudice and confusion — though the risk would be lessened somewhat by a limiting instruction.

Balanced against this risk of confusion and prejudice that is caused by usurping the jury’s function is the probative value that Dr. Holub’s opinion will carry on the question of intent. If Dooney & Bourke chose identical or very similar color combinations as were chosen by Louis Vuitton, that fact at least tends to prove an intent to infringe on Louis Vuitton’s mark.

But the probative value of overlapping colors to show intent is severely diminished by a number of significant factors. First, the color is not itself the mark in this case. So to the extent that other aspects of the mark are different — most obviously the fact that Dooney & Bourke uses different characters in its logo — the probative value of the use of similar colors is weakened. This is especially so because Dooney & Bourke could candidly admit that it replicated the Louis Vuitton colors and yet still not be liable for infringement. Moreover, as discussed above, there may be a number of inferences, other than intent to copy the mark, that might be drawn by the similar use of colors, such as following fashion trends and the likelihood that some colors in the spectrum are simply inappropriate under the circumstances.

The probative value of Dr. Holub’s testimony on intent to copy is further diminished by the fact that Louis Vuitton is not bereft of other evidence on that subject. See Louis Vuitton Malletier v. Dooney & Bourke, 340 F.Supp.2d 415, 446-47 (S.D.N.Y.2004) (noting inferences that can be derived from timing, Peter Dooney’s awareness of Louis Vuitton’s mark, etc.). See also United States v. Awadallah, 436 F.3d 125, 133 (2d Cir.2006) (“Probative value is also informed by the availability of alternative means to present similar evidence.”).

Given the Daubert Court’s mandate to exercise special control over expert testimony under Rule 403 in jury trials, we recommend that Dr. Holub’s testimony be excluded even insofar as it could be offered to prove intent. The risk that Dr. Holub’s opinions will usurp the jury’s decisionmak-ing on the question of the likelihood of confusion substantially outweighs the attenuated probative value of similar color choice when offered only to prove intent to copy Louis Vuitton’s mark. Awadallah, 436 F.3d at 134 (district court did not abuse discretion in excluding evidence under Rule 403, where its probative value in proving a mental state was found to be substantially outweighed by the risk that the jurors would give undue deference to the witnesses and interpret the testimony as “advice on how to determine the central issue of the case”).

C. Summary on Dr. Hoiub

For the foregoing reasons, we recommend that Dr. Holub’s testimony be excluded in its entirety, and that the report, if offered at trial, be found inadmissible as well.

IV. The Damages Experts

A. Weston Anson

1. Facts

Dooney & Bourke moves to exclude the testimony and report of Weston Anson. Anson’s company, CONSOR, was retained by Louis Vuitton “to review, analyze and evaluate the trademark infringement and trademark dilution damages claimed by Louis Vuitton” and to express expert opinions on these issues. Anson’s company specializes in “consulting and valuing intellectual property and related assets for litigation support, transaction advice, and leverage strategies.”

Anson’s company reviewed financial documents and accounting information produced during the litigation, as well as the basic pleadings in the case, a hearing transcript, and the websites of Louis Vuitton and Dooney & Bourke. Anson bases his opinion on these documents as well as “the extensive experience of CONSOR’s professionals, related to the assessment of damages and valuation of trademarks, brand names, and other intangible assets.”

In his expert report, Anson states specifically that he and his company have “assumed that Dooney was found to have infringed Louis Vuitton’s monogram” and that his expert report is thereby limited to (1) ascertaining and quantifying the profits retained by Dooney & Bourke; and (2) determining the extent of dilution. The next section of his report provides “general considerations” about the value of trademarks and the concept of dilution. Then the report extols the Louis Vuitton Monogram Multicolore Trademarks as denoting “exclusivity, style and quality” — in contrast to Dooney & Bourke products “which do not enjoy the same level of quality.”

Anson calculated the amount of “infringing sales” by reviewing the business records produced by Dooney & Bourke. An-son calculated a total of $100.6 million in “infringing sales” for the period between July 2008 and October 2006. From this amount, Anson deducted costs by using standard costing, which he describes as “a management accounting system that assigns costs to products based on expected costs of resources used, which may differ from both normal and actual costs.” He determined the standard costs to be $87 million, leaving a “gross profit” of $63.6 million. Anson then calculated Dooney & Bourke’s “justified incremental costs,” which he defines as “only those expenses necessary for the manufacturing and distribution of the infringing sales.” For Anson, the relevant expense categories were: (1) cost of goods sold (purchases plus net change in inventory); (2) Freight-in (raw materials); (3) Freight-out (finished goods); (4) Payment of duties on the black and white multi-colored bags; (5) Direct labor and payroll taxes; (6) Direct labor — outside; and (7) Commissions— wholesale. After deducting only those expenses directly related to the “infringing” sales, Anson came to a figure of $45,325,976 in “Infringer’s Profits.”

Anson’s report then proceeds to determine the injury suffered by Louis Vuitton from the dilution by blurring of its Multi-colore Monogram mark, due to the assumed-to-be infringing sales of Dooney & Bourke. He opines that an “empirical indication” of the asserted blurring can be found in “the dissociation of the sales progression between the United States and the international sales of the Louis Vuitton Monogram Multicolore Trademarks, a gap which increased as sales of the Dooney infringing line of ‘It’ bags accumulated.” Anson measures this decreasing proportion from May 2004, the asserted time at which “infringing sales of the Dooney products reached the same level as the Louis Vuitton United States sales.” After this point and over time, the sales data indicated to Anson “a strong divergence between the United States and International sales patterns.” To establish the “statistical significance” of the shift in the direction of United States sales, “a regression analysis was performed.” Anson concludes that the change in proportion of Louis Vuitton Multicolore Monogram Mark sales in the United States as compared to the rest of the world means that “the distinctiveness of the Louis Vuitton Monogram Multicolore Trademarks is shown to have suffered from dilution.”

It is important to note that the regression analysis supporting Anson’s conclusion of dilution was not conducted by An-son himself. Rather, it was conducted by Fernando Torres, Senior Economist employed by CONSOR. Anson did not independently review or validate Torres’s statistical analysis, assuming for the sake of argument that he was qualified to do so.

2. Discussion

Dooney & Bourke argues that Anson’s testimony and report must be excluded on a number of grounds, including the following:

1. Anson’s opinion on the amount of Dooney & Bourke’s net profit on the multicolor handbags is unreliable because it fails to account for the actual allocable costs incurred by Doo-ney & Bourke that go into making and selling all of their products— including overhead, advertising, general administrative expenses, and taxes.
2. Anson’s opinion on the amount of profits that Dooney & Bourke should disgorge to Louis Vuitton is unreliable because it attributes 100 percent of the profit figure to the alleged infringement.
3. Anson is not qualified to testify to the existence of infringement or dilution. Therefore the many statements in the report on the existence of confusion or dilution (as distinct from profits and loss of sales) must be excluded, and testimony on these matters prohibited.
4. Auson’s opinion that dilution can be found through a proportionate decline in United States sales of the Louis Vuitton multicolor handbags as compared to worldwide sales is unreliable because it is based on a flawed methodology.
5. There is no relationship between loss of sales and dilution by blurring, and therefore the research relied upon by Anson, even if reliable, does not fit the facts of the case.

Louis Vuitton offers Anson’s report and testimony for two purposes: (1) to prove the amount of net profit that Dooney & Bourke derived from its assumed infringement on the Louis Vuitton Multicolore mark; and (2) to prove that Louis Vuitton suffered dilution of its Multicolore Monogram Mark as a result of Dooney & Bourke’s infringement. These two purposes differ in a number of respects, and so we will evaluate Anson’s report and testimony under both purposes.

a. Opinion Offered to Prove Dooney & Bourke’s Net Profits

i. Qualifications

Dooney & Bourke does not seriously contest Anson’s qualifications to determine the net profits of a business in a trademark matter. Anson has an M.B.A. from Harvard University, and has worked for twenty-five years on brand valuation and related financial matters. He is a member of a number of relevant associations and organizations, and lectures on a regular basis to industry and appraisal associations. The Internal Revenue Service has invited him to speak on several occasions on the subject of intellectual property royalty rates and evaluation. He has published more than 100 articles and a number of books and book chapters on corporate valuation and related matters. It is clear that Anson is sufficiently qualified under the standards of Rule 702 to testify to Dooney & Bourke’s profits from the sale of its multicolor monogram handbags. Compare United States v. Majors, 196 F.3d 1206 (11th Cir.1999) (witness employed as a financial analyst and who performed such analysis in more than 50 cases was qualified to provide a financial analysis of a small business even though he was not a certified accountant and had no experience in small business management or in the preparation of financial statements).

ii. Statements in Anson’s Report Concerning the Existence of Infringement and Dilution

As stated above, Anson’s report begins with the stated assumption that infringement has occurred. He does not purport to be an expert on the nuances of trademark law (as opposed to the valuation of trademarks). Nor does his expertise in brand valuation establish an expertise on the existence of infringement or dilution. Yet Anson’s report is replete with broad statements that go well beyond the initial reservations in his report, and far beyond his expertise as well. Among these statements are:

“a product line that imitates the highly distinctive and recognizable elements of the popular Louis Vuitton Monogram Multicolore Trademarks ... misappropriates the stimulant effect created by the Louis Vuitton Multicolore Trademarks, and creates a dissonance that blurs the stimulant effect of those trademarks on consumers.”
The Dooney & Bourke handbags are “confusingly similar in overall design and impression to the original Louis Yuitton Monogram Multicolore Trademarks line.”
Potential Louis Yuitton customers were “bombarded with repeated impressions” of that confusing similarity.
Consumers were “prompted to buy a handbag that emulated their recollection of the iconic handbags featured in fashion magazines favored by celebrities.” “The unique selling position earned by the Louis Vuitton Monogram Multico-lore Trademarks is diluted by the availability of ‘less expensive’ similar products, thus reducing the attractiveness of the original Louis Vuitton Monogram Multicolore Trademarks to potential customers. Trading on the extensive goodwill created by the successful Louis Vuitton Monogram Multicolore Trademarks, Dooney is unjustly benefiting and profiting. This effect does not only shift sales away from Louis Vuitton and towards Dooney, but also blurs the highly distinctive characteristics of the Monogram Multicolore Trademarks created by and associated exclusively with Louis Vuitton products.”

The above statements and similar statements in the Anson Report are improper on any number of grounds, including: (1) they are beyond his stated expertise in valuing brands and providing financial analysis; (2) even assuming Anson is qualified, these broad statements are made without any indication that any expert methodology at all has been applied; (3) many of the statements cross the border into unhelpful legal conclusions, essentially telling the jury that “there is infringement and dilution”; and (4) the statements are contradictory to, or at the very least in serious tension with, Anson’s assertion that he is assuming and not deciding that infringement occurred.

Louis Vuitton insists that Anson’s broad assertions on infringement and dilution are set forth “by way of background.” But this assertion must be rejected. A fair reading of the report indicates that An-son’s broad statements go far beyond what would be necessary to place Anson’s conclusions as to profits, sales, etc. in context. Anson’s report is twelve pages long. We find about five pages of material that is essentially unsupported assertions about the factual or legal merit of Louis Vuitton’s claims. See page 2 (runover paragraph and first full paragraph); page 3 (paragraphs with headings A and B, and bottom paragraph running over to page 4); page 4 in its entirety; page 5 in its entirety; page 6 (first half); page 7 (essentially the entire page); page 9 (bottom half); and page 10 (runover paragraph). That is a lot of “context.” Nor does Anson ever attempt to limit these assertions by, for example, using qualifiers such as “alleged”, “assuming without deciding”, and the like. Anson’s Report is rife with unqualified statements about the merits of the case.

Alternatively, Louis Vuitton argues that Anson is in fact sufficiently qualified to testify to such matters as trademark liability, confusion and dilution — and to their existence under the facts of this case. That argument is somewhat surprising, because Anson’s Report at its outset disavows any attempt to opine on those issues and as stated above, Anson admitted in his deposition that he was not an expert in matters pertinent to trademark law and liability. Moreover, he was proffered solely as a damages expert. But even if he were sufficiently qualified and properly disclosed, the broad statements in his report would not be admissible, because (1) Anson has not described or established any reliable methodology, nor any basis, for his conclusions about such matters as likelihood of confusion and the intricacies of trademark law — and under Daubert, as discussed above, the court acting as a gatekeeper is not to take the ipse dixit of an expert; and (2) most of his statements are nothing but unexplained and unhelpful conclusions of law. See United States v. Scop, 846 F.2d 135, 141 (2d Cir.1988) (“It is not for witnesses to instruct the jury as to applicable principles of law, but for the judge.”).

Louis Vuitton’s counsel could not have stated its case on the merits better than Anson has in the statements in his Report, referenced above. But an expert is not supposed to be doing the work of counsel; an expert must “bring to the jury more than the lawyers can offer in argument.” Salas v. Carpenter, 980 F.2d 299, 305 (5th Cir.1992). The statements highlighted above are fodder for a legal brief, not an expert’s report. We recommend, therefore, that if Anson is permitted to testify, he should be instructed in advance to confine his opinions to the matters of Dooney & Bourke’s profits and interpreting the sales data of Louis Vuitton — and that to the extent Anson’s report might be admissible, statements that provide comment beyond the profits and sales data should be struck.

iii. Reliability of Methods Used to Determine Dooney & Bourke’s Profits

(a) Use of “incremental method” of deducting costs

Dooney & Bourke complains that Anson used the wrong methodology to calculate the net profits from the sale of the allegedly infringing handbags. Specifically, Dooney & Bourke notes that Anson’s profit calculations did not take account of a proportionate amount of overhead expenses, which would be deducted by an expert who used the so-called “full absorption” method of determining costs. Instead, Anson used the “incremental approach” to cost allocation, under which only those costs that were incurred as a direct result of the production of the infringing items aré to be deducted from profits. Anson’s response to the “full absorption” method of assessing expenses is that it makes no sense: for example, deducting the cost of office facilities or manufacturing equipment would in Anson’s view fail to recognize that the offices and equipment would still be there even if the handbags were not produced or sold.

The dispute over whether the “incremental approach” or the “full absorption” method should be used to determine costs is not a question of the reliability of expert methodology but is rather a question of substantive law. Dooney & Bourke cites Warner Bros., Inc. v. Gay Toys, Inc., 598 F.Supp. 424, 428 (S.D.N.Y.1984), New Line Cinema Corp. v. Russ Berrie & Co., 161 F.Supp.2d 293, 303-04 (S.D.N.Y.2001), and W.E. Bassett Co. v. Revlon, Inc., 435 F.2d 656, 665 (2d Cir.1970), all for the proposition that the “full absorption” method is the methodology that is mandated for determining costs against the profit made by infringing products. Warner v. Gay Toys, which was a contempt action brought after the infringer of a copyright violated a temporary restraining order, specifically rejects the incremental approach and states that the “full absorption” method is the law of the circuit. The New Line court relied upon Warner in a copyright infringement action, and upheld the use of the full absorption method after finding that the infringement was not willful. Specifically, the New Line court allowed the defendant to deduct from its profits the following: “direct selling, sales support, shipping, administrative (which ... includes customer service, computer operations, invoicing), design and product development, advertising, taxes and other general expenses, including ‘bad debt.’ ” 161 F.Supp.2d at 303-04. Finally, in W.E. Bassett, a case in which the defendant was found to have intentionally infringed on the plaintiffs mark, the court decided as a matter of law that in determining the defendant’s profits, a deduction must be made for overhead, operating expenses, and federal income tax.

These cases would seem to draw An-son’s use of the “incremental approach” into question as a matter of substantive law. Put another way, if these cases are controlling, Anson’s cost assessment, even if reliably conducted, does not “fit” the facts of the case and would be subject to exclusion under Rule 702. See, e.g., Concord Boat Corp. v. Brunswick Corp., 207 F.3d 1039 (8th Cir.2000) (error to admit testimony by an economics professor who used an economic model that was inconsistent with controlling facts and law; the testimony failed the Daubert “fit” requirement). But Louis Vuitton counters that Warner Bros., and New Line Cinema are distinguishable as cases of copyright rather than trademark infringement, and that Bassett, while a trademark case, adopted the “full absorption” method without analysis and without specifically barring the possibility of using the “incremental” method. Next, and somewhat inconsistently, Louis Vuitton relies on Warner Bros, and New Line Cinema for the proposition that if the defendant wants to use the “full absorption” method, it is up to the defendant to prove the connection between a general expense and the production or sale of the infringing product. For example, in Warner Bros, the court denied the defendant any deductions for legal expenses, bonuses paid to its corporate officers, and costs of mold amortization and repair, finding that the defendant “had failed to show a connection between these expenses and the contemptuous products.” 598 F.Supp. at 431. See also Manhattan Indus., Inc. v. Sweater Bee by Banff Ltd., 885 F.2d 1, 7 (2d Cir.1989) (declaring that the infringer “must prove not only that it has borne the particular cost or expense but also that the cost or expense is attributable to its unlawful sales”; while the infringer “need not prove its overhead expenses and their relationship to the production of the contemptuous goods in minute detail, it still must carry its burden of demonstrating a sufficient nexus between each expense claimed and the sales of the unlawful goods” (citations omitted)).

Both parties are correct in their positions under the substantive law, insofar as they go. In assessing the defendant’s net profit from the infringing sales, courts are to use the “full absorption” method, but only if the defendant proves the connection between a general expense such as overhead and the infringing sales. Because of this substantive law condition on using general costs to offset profits, it cannot be said at this point that Anson’s use of the “incremental” method is improper. His opinion on net profits conditionally fits the facts of the case, the condition being Dooney & Bourke’s inability to prove a sufficient connection between any of the expenses not credited by Anson and the production and sale of its multicolor monogram handbags. At trial, Anson’s opinion could therefore be expressed conditionally, i.e., assuming that Dooney & Bourke cannot connect any of the general expenses that Anson refused to deduct, then the amount of net profits for the period investigated is X amount — subject to adjustment 1) for any particular expenses that Dooney & Bourke can connect and 2) for a time period later than October 2006. See Committee Note to the 2000 Amendment to Rule 702 (noting that experts are not barred from testifying on the basis of conditional or hypothetical facts).

We note that Dooney & Bourke does not seriously challenge Anson’s calculation of net profits assuming the incremental method is proper. It does argue that An-son made some errors, such as overstating sales by omitting' the parentheses that should be around sales discounts — that is, Anson mistakenly treated a negative as a positive. But these kinds of mistakes in arithmetic can be corrected (unlike the mistake of using an unreliable methodology). and they present questions of weight and not admissibility. See, e.g., Cummings v. Standard Register Co., 265 F.3d 56 (1st Cir.2001) (expert on damages was properly permitted to testify even though he made a computational error; the error was exposed in his testimony at trial and corrected).

Finally, Louis Vuitton argues that Doo-ney & Bourke has already failed to meet its burden of connecting its general expenses to the allegedly infringing sales, by failing to provide Anson with the necessary information to make that determination. That argument reflects a misunderstanding of when Dooney & Bourke’s burden must be met. The case law cited above establishes that the infringing party must prove the necessary connection to the fact-finder. See Warner Bros., 598 F.Supp. at 431. In this case that means that the burden will need to be met at trial before the jury — not in a motion in limine challenging an expert’s methodology. Louis Vuitton is correct, however, in asserting that “Dooney’s failure to produce detailed evidence on the remaining expenses for which it seeks a deduction is not grounds to strike Mr. Anson’s report.”

(b) Attributing 100 percent of the net profits to the alleged infringement

Dooney & Bourke contends that Anson’s determination of net profits from the allegedly infringing handbags is unreliable because Anson did not attempt to determine how many of the sales were actually attributable to the alleged consumer confusion. It is clear that not all of Dooney & Bourke’s multicolor handbag sales were made because of consumer confusion. Louis Vuitton’s experts put the confusion level at about 20-30 percent of the possible consumers. Dooney & Bourke argues that Anson’s conclusion that it owes 100 percent of its profits to Louis Vuitton must be excluded because he attributed 100 percent of the handbag sales to consumer confusion.

As with the dispute over the use of the “incremental” rather than the “full absorption” method of deducting costs from profits, Dooney & Bourke’s challenge is one about the substantive law rather than evi-dentiary reliability. Dooney and Bourke does not at this juncture complain that Anson’s figure is the result of an unreliable methodology if Louis Vuitton is entitled to 100 percent of the sales of an infringing product. The parties instead disagree about whether the plaintiff may be entitled to 100 percent of the net sales of an infringing product even if some of the sales were not attributable to confusion. Dooney & Bourke relies on Int’l Star Class Yacht Racing Ass’n v. Tommy Hilfiger U.S.A., 146 F.3d 66, 72 (2d Cir.1998), for its position that the plaintiff is entitled only to the proportion of sales directly attributable to consumer confusion. The lower court in Tommy Hilfiger had found infringement, but also found that the infringement was not in bad faith. It awarded the plaintiff all of the net sales of the infringing product, refusing to deduct the amount of sales attributable to the defendant’s own mark. The court of appeals first found an evidentiary error and remanded for a redetermination of whether the infringement was in bad faith. It then addressed the defendant’s argument that any award should be reduced by the amount of sales attributable to its own mark. The court began its analysis by surveying the relevant law on recovery of the defendant’s profits in trademark infringement cases:

A district court faced with a Lanham Act violation possesses “some degree of discretion in shaping [the] relief’ according to the principles of equity and the individual circumstances of each case. George Basch Co. v. Blue Coral, Inc., 968 F.2d 1532, 1537 (2d Cir.1992) (citing 15 U.S.C. § 1117(a) (1994)). Nevertheless, that discretion must operate within the parameters for allowing an accounting of profits in this circuit. Id.
We have held that an accounting for profits is available, even if a plaintiff cannot show actual injury or consumer confusion, “ ‘if the accounting is necessary to deter a willful infringer from doing so again.’ ” Id. (quoting Burndy Corp. v. Teledyne Indus., Inc., 748 F.2d 767, 772 (2d Cir.1984)). As with the decision to award profits at all, the decision whether to award a full or partial accounting must be based on what is necessary to deter future misconduct. In W.E. Bassett Co. v. Revlon, Inc., 435 F.2d 656, 664 (2d Cir.1970), a case concerning particularly egregious infringement of a competitor’s mark, we stated that “the only way the courts can fashion a strong enough deterrent is to see to it that a company found guilty of willful infringement shall lose all its profits from its use of the infringing mark.” (emphasis in original). While this language could be read to suggest that a defendant must disgorge all of its profits any time willful infringement is proved, more recent cases establish that a district court has discretion to fashion an alternative remedy, or to award only a partial accounting, if the aims of equity would be better served. See George Basch, 968 F.2d at 1540 (stating that a finding of willful infringement is necessary but not sufficient to award an accounting for profits); Allen v. Men’s World Outlet, 679 F.Supp. 360, 371 (S.D.N.Y.1988) (declining to award an accounting for profits for willful use of the plaintiffs likeness in an advertisement because a permanent injunction would adequately serve the goal of deterrence).

Id. at 146 F.3d at 71-72. The Hilfiger court then specifically addressed the question which the parties in this case disagree: whether the amount of profits subject to accounting should be reduced by the sales attributable to the defendant’s own mark:

Hilfiger further claims that the district court should have subtracted the percentage of profits attributable to Hilfiger’s mark rather than ISCYRA’s in assessing any award to ISCYRA. In Mishawaka Rubber & Woolen Mfg. Co. v. S.S. Kresge Co., 316 U.S. 203, 206, 62 S.Ct. 1022, 86 L.Ed. 1381 (1942), the Supreme Court held that a plaintiff “is not entitled to profits demonstrably not attributable to the unlawful use of his mark,” but that the burden of proving any deduction for sales not based on the infringing mark falls upon the infringer. Id. at 206-07, 62 S.Ct. 1022; see also 15 U.S.C. § 1117 (1994); George Basch, 968 F.2d at 1540 (listing the degree of certainty that the defendant benefitted from its unlawful conduct as one factor to consider in determining whether to order an accounting for profits in cases of willful infringement). Hilfiger presented evidence at trial through the testimony of Allan Zwerner, a buyer for a large chain of department stores, that some portion of the sales of its nautical sportswear line was attributable to the appeal of Hilfíger’s well-known mark and reputation. The district court may consider this evidence on remand in assessing whether Hilfiger has met its burden of proof.
However, where infringement is especially malicious or egregious, allowing a defendant, especially a dominant competitor who has made use of the mark of a weaker entity, to deduct profits due to its own market dominance in some circumstances inadequately serves the goal of deterrence. See Truck Equipment Service Co. v. Fruehauf Corp., 536 F.2d 1210, 1222-23 (8th Cir.1976) (declining to allow an eighty percent deduction for profits attributable to strong consumer association with the mark of a well-known infringer that had copied the distinctive design of a competitor); cf. W.E. Bassett, 435 F.2d at 664 (ordering a full accounting of all profits where Revlon deliberately made use of the mark of a smaller competitor because such a remedy was “the only way the courts can fashion a strong enough deterrent”). As with ISCYRA’s argument on damages, we cannot determine whether this case presents such a situation without further fact-finding by the district court as to the degree of bad faith, if any, displayed by Hilfiger. We therefore leave the issue for the district court to address on remand.

Id. at 146 F.3d at 72.

For its part, Louis Vuitton relies on most of the same cases cited above. Its take on those cases is that (1) assuming the defendant gets a reduction for sales not attributable to confusion, it is the defendant’s burden to prove the lack of connection; and (2) in some cases a court would be within its discretion to award 100 percent of the defendant’s profits from the infringing product, even if the defendant could prove that some of those sales were not connected to the infringement.

Judge Scheindlin summarized the pertinent law on recovery of profits in an opinion entered in this case on April 24, 2007. Louis Vuitton Malletier v. Dooney & Bourke, Inc., 500 F.Supp.2d 276, 279 (S.D.N.Y.2007) (footnotes and citations omitted);

This Circuit’s law relating to profit awards was set forth in George Basch Co. v. Blue Coral, Inc., which held that “a finding of defendant’s willful deceptiveness is a prerequisite for awarding profits” in federal trademark infringement suits. In its analysis, the Second Circuit noted that although section 1117(a) does not contain an explicit scienter requirement, and rather grants courts “equitable latitude” to authorize profit awards, this latitude must nevertheless “operate within legally defined parameters.” To establish these parameters, the Second Circuit engaged in a thoughtful analysis of the historical origin of an award of profits and the policy rationales that support such awards. The court ultimately concluded that “in order to justify an award of profits, a plaintiff must establish that the defendant engaged in willful deception.”
Although “[a] finding of bad faith, or willful deceptiveness, is necessary to warrant an accounting [of profits],” it “may not be sufficient.” To determine whether “on the whole, the equities weigh in favor of an accounting” of defendant’s profits, additional considerations include “(1) the degree of certainty that the defendant benefited [sic] from the unlawful conduct, (2) availability and adequacy of other remedies, (3) the role of a particular defendant in effectuating the infringement, (4) plaintiffs laches; and (5) plaintiffs unclean hands.”

Given the applicable law as summarized by Judge Scheindlin, the validity and fit of Anson’s opinion — that 100 percent of net lost profits on the multicolored handbags is the measure of recovery to Louis Vuitton — is subject to two conditions: (1) Doo-ney & Bourke must be found in bad faith; and (2) the court must then determine that the equities weigh in favor of an award of 100 percent of Dooney & Bourke’s net profits.

Assuming those conditions are met, is Anson’s conclusion on 100 percent recovery of infringing profits then admissible? The answer, under the circumstances of this case, is no. If those two conditions are found, Anson’s opinion that a 100 percent recovery of net profits provides the proper remedy would at that point no longer be of any assistance. The jury will only need to determine what the net profits are, not how they are to be allocated. The judge will take the jury’s finding as to the amount and, exercising powers in equity, will then determine the proper allocation of profits to award to the plaintiff.

We have already recommended that An-son be permitted to testify to his opinion on what the net profits are, subject to Dooney & Bourke meeting its burden of proving more reductions than Anson has already credited. But at no point should Anson be permitted to testify that “100 percent of the profits, regardless of whether or not they were achieved as a result of the infringement, are properly recoverable as damages.” Under the circumstances of this case, that opinion amounts to an opinion on the law — an instruction on how the court should exercise its equitable powers. And Judge Scheindlin, who will be making the decision on allocating profits if it comes to that, does not need to be instructed by an expert on how to exercise those powers.

b. Opinion on Dilution

As discussed above, Anson’s disquisition on the law of dilution, as well as his broad statements extolling the Louis Vuitton mark and disparaging Dooney & Bourke’s products and conduct as causative of dilution, must be excluded as these statements are unreliable and not the proper subject of expert testimony. This section considers whether the remainder of Anson’s testimony concerning dilution is admissible under Rules 702, 703 and 403. Specifically, should Anson be permitted to testify that his findings indicate (1) that Louis Vuitton suffered lost sales on its Multico-lore Monogram handbags in the United States, and (2) that this loss of profit was attributable to sales of Dooney & Bourke’s multicolor handbags?

We find that Anson should not be permitted to testify on any aspect of dilution. There are a number of potential grounds for exclusion:

1. Anson’s opinion on dilution does not fit the facts of the case, as shown by Louis Vuitton’s own argument — and even if there is not an absolute lack of fit, there is nonetheless a substantial risk of jury confusion and prejudicial effect from Anson’s testimony on dilution, which warrants its exclusion under Rule 403.
2. Anson’s opinion on dilution does not fit the legal standards that determine when dilution exists.
3. Anson’s complete reliance on another expert for the critical assertion in his opinion — that there is a statistically significant difference between sales of the Multicolore Monogram handbags in the United States and the rest of the world during the subject period — renders his opinion unreliable under Rule 702 and any discussion of that statistical analysis inadmissible under Rule 703.
4. The failure to conduct a multiple regression on the Louis Vuitton sales data renders the opinion on statistical significance unreliable in any event.

We take these flaws in order.

i. Lack of “Fit”/ Problem of Prejudice and Jury Confusion

The basis of Anson’s conclusion on dilution is that Louis Vuitton lost sales in the United States during the subject period. This conclusion is in substantial tension with Louis Vuitton’s own assertions in the case. Louis Vuitton has stated on any number of occasions that it does not claim lost.profits. See, e.g., Objections and Response to Interrogatory No. 17, Further Amended Response to Dooney & Bourke’s Third Set of Interrogatories (“Louis Vuitton does not seek its lost profits.”); Reply Memorandum in Support of Motion to Exclude Proposed Expert Testimony of Bradford Cornell at 6 (“Dr. Cornell incorrectly assumed that Louis Vuitton is seeking its own lost profits rather than Dooney’s illicit profits.”); Id at 7 (“Dooney’s sales likely cannot be linked to lost sales by Louis Vuitton because ... Louis Vuitton limits its production and sold the infringed products in other markets ... ”).

And yet the basic premise of Anson’s opinion is that Louis Vuitton suffered a loss of sales in the United States as a result of the marketing of the Dooney & Bourke Multicolor Monogram handbags. We recognize that the two positions are possibly explainable as not absolutely inconsistent. Because Louis Vuitton markets a scarce product it can argue, as it has, that United States sales decreased but there was no loss of profit because Louis Vuitton could accommodate waiting lists elsewhere in the world. But even if that argument were tenable, that is not the way that Anson’s report is pitched at all. Anson’s report (and any testimony that matches it) would indicate to the jury that Louis Vuitton has in fact lost profits because of the drop in United States sales. There is nothing in his report about the possibility that these lost sales were made up elsewhere. And if those sales were made up elsewhere, then what are the “damages” to which Anson refers? Thus, Anson’s unqualified assertion that Louis Vuitton was damaged by lost sales in the United States is difficult to explain given Louis Vuitton’s position in this litigation.

A good example of Louis Vuitton’s mixed signals with respect to the opinion of its expert on “damages” is its argument in the motion to exclude the testimony of Dooney & Bourke’s damages expert, Bradford Cornell, discussed below. Cornell spends a good deal of time critiquing the “Damages analysis of Louis Vuitton’s expert.” Louis Vuitton’s central response to Cornell’s report is that it “myopically focuses on Louis Vuitton’s lost profits, which Louis Vuitton has expressly (and repeatedly) stated it does not seek in this litigation.” That may be, but Cornell’s conclusions on lost profits are largely in response to Anson’s report on lost sales. It seems remarkable that Louis Vuitton can offer an expert to prove a point, and then criticize Dooney & Bourke’s responsive expert on the ground that the point is not being contested.

The tension between Louis Vuitton’s litigation position and Anson’s dilution opinion raises the probability of a lack of “fit” between Anson’s testimony on lost profits and the disputed issues in the case. See, e.g., Bogosian v. Mercedes Benz of N.A., 104 F.3d 472, 479 (1st Cir.1996) (expert testimony on a theory that contradicted the plaintiffs own assertions about her case was properly excluded: “The district court appropriately found it very odd that Bogosian would present an expert witness who would testify that [Bogosian’s] own unwavering testimony was incorrect.”). That discordance also raises a substantial Rule 403 issue. The probative value of Anson’s testimony on dilution is likely to be substantially outweighed by the risks of jury confusion and prejudice, because the jury may assume from Anson’s opinion that Louis Vuitton has lost profits when Louis Vuitton itself has decided not to present evidence of lost profits.

We note again, however, that Louis Vuitton’s position is susceptible to at least a colorable explanation. Louis Vuitton might be arguing that it has lost sales and profits but does not have to prove them other than as some basis for dilution. Or it might be saying that it has lost sales in the United States but made up for them by reducing waiting lists in the rest of the world, and that Anson did not have to elaborate on those foreign sales because he was focusing on the loss of sales in the United States. While Louis Vuitton is clearly walking a tightrope, reasonable minds can differ about whether it has fallen off and that Anson’s report should be excluded solely on that ground. No matter, however, because whatever Louis Vuitton’s position is on lost sales and profits, Anson’s opinion does not fit the substantive law of dilution and is excludable on that ground at any rate. We turn now to that fundamental flaw in Anson’s testimony.

ii. Lack of “fit” with the substantive law of dilution

Anson’s testimony is simply not probative of dilution under the substantive law. Anson is only too happy to find dilution, but nowhere does he specify how his lost sales methodology actually proves either dilution by blurring or dilution by tarnishment. Instead of tracking and applying the applicable law, the Anson Report speaks vaguely of a diluting “effect”:

The unique selling position earned by the Louis Vuitton Monogram Multico-lore Trademarks is diluted by the availability of “less expensive” similar products, thus reducing the attractiveness of the original Louis Vuitton Monogram Multicolore Trademarks to potential customers.... This effect does not only shift sales away from Louis Vuitton and towards Dooney, but also blurs the highly distinctive characteristics of the Monogram Multicolore Trademarks created by and associated with Louis Vuitton products.

The problem with this analysis is that it is at odds with the substantive law of dilution. Courts have considered a variety of factors to determine whether a defendant’s trademark has diluted or is likely to dilute a plaintiffs trademark by blurring, see, e.g., Vuitton I, 340 F.Supp.2d at 437, and the law currently identifies six such factors. See 15 U.S.C, § 1125(c)(2)(B) (i)-(vi):

(i) The degree of similarity between the mark or trade name and the famous mark.
(ii) The degree of inherent or acquired distinctiveness of the famous mark.
(iii) The extent to which the owner of the famous mark is engaging in substantially exclusive use of the mark.
(iv) The degree of recognition of the famous mark.
(v) Whether the user of the mark or trade name intended to create an association with the famous mark.
(vi)Any actual association between the mark or trade name and the famous mark.

Not one of these factors refers to a decline in plaintiffs sales coincident with a defendant’s achieving “critical mass” in the marketplace, which is the linchpin of Anson’s dilution analysis. Nor do any of the blurring factors previously considered by courts in this circuit refer to a decline in the plaintiffs sales. See Vuitton I, 340 F.Supp.2d at 437; Nabisco, Inc. v. PF Brands, Inc., 191 F.3d 208, 217-22 (2d Cir.1999) (setting forth ten factors to determine dilution by blurring). See also Moseley v. V. Secret Catalogue, Inc., 537 U.S. 418, 433, 123 S.Ct. 1115, 155 L.Ed.2d 1 (2003) (rejecting the Fourth Circuit’s view that “an actual loss of sales or profits” must be proved to show actual dilution). As for dilution by tarnishment, there is again no connection between the substantive law and Anson’s testimony. 15 U.S.C. § 1125(c)(2) defines dilution by tarnishment as an “association arising from the similarity between a mark or trade name and a famous mark that harms the reputation of the famous mark.” Anson does not purport to connect a loss of sales in the United States to a loss of reputation on the part of Louis Vuitton and Louis Vuitton cites no case law to support the proposition that a plaintiffs loss of sales coincident with a defendant’s achieving “critical mass” in the marketplace necessarily implies a loss of reputation. Therefore, to the extent it is intended to prove blurring or tarnishment, Anson’s testimony does not “fit” the law of the case and must be excluded. See, e.g., Leverette v. Louisville Ladder Co., 183 F.3d 339 (5th Cir.1999) (expert’s testimony on ladder defect was properly excluded for lack of fit, because under applicable law the ladder could only be defective if it deviated from the industry’s manufacturing specifications, and the expert did not consider those standards of evaluate the ladder against them). For these reasons, we recommend that An-son’s testimony on dilution be excluded for lack of fit and as testimony that will not assist the jury.

iii. Improper reliance on another expert

Let us assume for the sake of argument that during the period covered by Anson’s testimony, Louis Vuitton’s sales of its Multieolore Monogram Mark handbags was proportionately reduced as compared to sales of those handbags in the rest of the world. Let us even assume away the fit issues (both as to Louis Vuitton’s litigation position and the substantive law). Even these assumptions would not justify admitting Anson’s opinions on dilution, because the crux of his testimony is his conclusion that the proportionate downturn in Louis Vuitton United States sales was caused by the sales of Dooney & Bourke multicolor handbags. Anson bases this critical conclusion on a regression analysis conducted on the Louis Vuitton sales data. But Anson did not conduct that regression analysis. The analysis was conducted by CONSOR’s senior analyst, Fernando Torres. While Mr. Torres was probably qualified to conduct a regression analysis — Dooney & Bourke does not contend otherwise — Anson was not presented as and was demonstrably unqualified to be an expert on statistical analysis.

At his deposition, Anson testified that “in simplistic terms” he knew how to conduct a regression analysis. But that asserted ability was based on studying statistics in graduate school 30 years earlier, and no good faith argument can be made that 30 year-old course study is a sufficient qualification to testify as a statistician. See, e.g., Andrews v. Metro N. Commuter R.R., 882 F.2d 705 (2d Cir.1989) (expert not qualified by a few experiences relevant to the subject matter). Ultimately, Anson admitted that he essentially had nothing to do with the preparation of the regression analysis. Rather, his practice was to “turn this over to an economist.” Like Dr. Holub, Anson’s occasional use of statistics in his daily life simply does not qualify him as an expert on that complex subject.

Because Anson is not qualified to conduct or interpret statistical analyses, the regression analysis could only be admissible if Anson is permitted to give an opinion by relying completely on Torres’s opinion. It is true that experts are permitted to rely on opinions of other experts to the extent that they are of the type that would be reasonably relied upon by other experts in the field. Fed.R.Evid. 703. But in doing so, the expert witness must in the end be giving his own opinion. He cannot simply be a conduit for the opinion of an unproduced expert. This fundamental requirement was well-discussed in Dura Automotive Sys. v. CTS Corp., 285 F.3d 609 (7th Cir.2002), where the court encountered a problem similar to that presented by Anson’s testimony in this case — a hy-drogeologist was relying almost exclusively on the opinions of expert groundwater-flow modelers to draw a conclusion about the flow of pollutants into a town’s water supply. The underlying experts were not produced to testify, just as Torres was not produced in this case. Judge Posner held that the testimony of the hyrdogeologist was not admissible under Rule 702. He reasoned as follows:

An expert witness is permitted to use assistants in formulating his expert opinion, and normally they need not themselves testify. The opposing party can depose them in order to make sure they performed their tasks competently; and the expert witness can be asked at his deposition whether he supervised them carefully and whether his relying on their assistance was standard practice in his field. If the requisite assurances are forthcoming, the assistants’ work need not be introduced into evidence....
Analysis becomes more complicated if the assistants aren’t merely gofers or data gatherers but exercise professional judgment that is beyond the expert’s ken.... Now it is common in technical fields for an expert to base an opinion in part on what a different expert believes on the basis of expert knowledge not possessed by the first expert; and it is apparent from the wording of Rule 703 that there is no general requirement that the other expert testify as well. The Committee Notes to the 1972 Proposed Rule 703 give the example of a physician who, though not an expert in radiology, relies for a diagnosis on an x-ray. We ... do not believe that the leader of a clinical medical team must be qualified as an expert in every individual discipline encompassed by the team in order to testify as to the team’s conclusions. But suppose the soundness of the underlying expert judgment is in issue. Suppose a thoracic surgeon gave expert evidence in a medical malpractice case that the plaintiffs decedent had died because the defendant, a radiologist, had negligently failed to diagnose the decedent’s lung cancer until it was too advanced for surgery. The surgeon would be competent to testify that the cancer was too advanced for surgery, but in offering the additional and critical judgment that the radiologist should have discovered the cancer sooner he would be, at best, just parroting the opinion of an expert in radiology competent to testify that the defendant had x-rayed the decedent carelessly. The case would be governed by our decision in In re James Wilson Associates, 965 F.2d 160, 172-73 (7th Cir.1992), where the issue was the state of repair of a building and “the expert who had evaluated that state— the consulting engineer — was the one who should have testified. The architect [the expert who did testify] could use what the engineer told him to offer an opinion within the architect’s domain of expertise, but he could not testify for the purpose of vouching for the truth of what the engineer had told him — of becoming in short the engineer’s spokesman.” It is the same here.

Id. at 613-14.

The Dura court located its rule of exclusion of “mouthpiece” experts under Dau-bert:

The Daubert test must be applied with due regard for the specialization of modern science. A scientist, however well credentialed he may be, is not permitted to be the mouthpiece of a scientist in a different specialty. That would not be responsible science. A theoretical economist, however able, would not be allowed to testify to the findings of an econometric study conducted by another economist if he lacked expertise in econometrics and the study raised questions that only an econometrician could answer. If it were apparent that the study was not cut and dried, the author would have to testify; he could not hide behind the theoretician.

Id.

The same problem exists with Anson’s testimony about the regression analysis conducted on Louis Yuitton’s sales data— which is the only basis Anson gives for concluding that Louis Vuitton suffered dilution at the hands of Dooney & Bourke. In the words of the Dura court, Torres exercised “independent judgment” that was “beyond [Anson’s] ken.” With respect to the regression analysis, Anson was not an expert but rather a “mouthpiece.” Louis Vuitton thus produced the wrong expert to prove the reliability of the regression analysis.

We note also that the Dura court grounded any permissible reliance on other experts on the guarantees that the opposing party can (1) depose the underlying experts in order to make sure they performed their tasks competently, and (2) ask the testifying expert whether he supervised them carefully and whether his relying on their assistance was standard practice in his field. But in this case, Torres was not listed as an expert and was not made available for deposition; and An-son in his own deposition made it plain that he exercised little if any supervision over Torres’s work.

For all these reasons, we conclude that testimony from Anson about the regression analysis must be excluded under Rule 702 because it is nothing but conduit testimony from an expert on a matter outside his field of expertise. As such it is unreliable and will not assist the jury.

But there is more. If Anson were allowed to relate the findings of the regression analysis at trial, his testimony would violate the hearsay rule. Anson would be relating the out-of-court statements of Torres, and those statements would be offered for the truth of Torres’s opinion. Fed.R.Evid. 801(c). Torres’s regression analysis is not admissible under any hearsay exception. It is not, for example, a business record, because it was prepared for purposes of litigation. Certain Underwriters at Lloyd’s, London v. Sinkovich, 232 F.3d 200 (4th Cir.2000) (expert report inadmissible as a business record because it was prepared in anticipation of litigation). It is true that under Rule 703, experts can rely on hearsay in reaching their own opinions. But a party cannot call an expert simply as a conduit for introducing hearsay under the guise that the testifying expert used the hearsay as the basis of his testimony. Under the 2000 Amendment to Rule 703, an expert is not precluded from relying on hearsay, but he is precluded from disclosing the hearsay to the jury unless its probative value in illustrating the basis of the expert’s opinion substantially outweighs the prejudicial effect of having the jury hear about the otherwise inadmissible hearsay. See Committee Note to 2000 Amendment to Rule 703 (“Rule 703 has been amended to emphasize that when an expert reasonably relies on inadmissible information to form an opinion or inference, the underlying information is not admissible simply because the opinion or inference is admitted.”). In this case, as Anson’s testimony about the regression analysis has no probative value independent of Torres’s report, it is clear that he could not disclose it to the jury as a basis for his own testimony under the stringent balancing test imposed by Rule 703.

iv. Unreliability of regression analysis

Even if Torres and not Anson were to testify at trial, we would find that Torres’s opinions could not be admitted under Rule 702 because he employed unreliable methodology. Torres conducted a regression on the basis of a single factor— the proportionate downturn in United States sales of Louis Vuitton Multicolore Monogram logo handbags during the subject period as compared to sales of those handbags in the rest of the world. But a reliable regression analysis requires the expert to consider more than a single factor. Standard and reliable methodology requires a “multiple” regression analysis, which determines the effect of two or more explanatory variables on a variable to be explained, called the dependent variable. The expert can then reliably determine the causal relationship, if any, between the explanatory variables and the dependent variable. See Munoz v. Orr, 200 F.3d 291 (5th Cir.2000) (statistical analysis excluded for failure to conduct a multiple regression analysis).

Judge Scheindlin explained the necessity for a multiple regression analysis in Bonton v. City of New York, No. 03 Civ.2833, 2004 WL 2453603, 2004 U.S. Dist. Lexis 22105 (S.D.N.Y. Nov. 3, 2004), a case in which the plaintiffs expert conducted a regression analysis that considered discrimination as the only possible causative factor for a higher percentage of black children being remanded to foster care. Judge Scheindlin found that the expert’s report was inadmissible because it would not assist the trier of fact under Rule 702 and Daubert:

To determine whether there is causal link between race and the observed disparity in remand rates, it is necessary to conduct a multiple regression analysis to control for explanatory variables such as parents’ income level or employment status.... [I]t is impossible for the Court to agree with Zellner’s assumption that these factors are “unimportant” to an analysis of the outcomes of ACS investigations. As a result, for a jury to determine solely on the basis of Zellner’s report that a causal link exists between the observed disparity and an alleged policy of racial discrimination would require a logical leap that amounts to mere speculation.
Courts have repeatedly held that statistical analyses that fail to control for any nondiscriminatory explanations are inadmissible. ... I conclude, therefore, that Zellner’s proposed expert testimony would only confuse, rather than assist, the trier of fact.

In this case, Dooney & Bourke raises a number of legitimate alternative causes that should have been evaluated in a multiple regression analysis, including the possibilities of (1) disproportionate allocation of advertising; (2) relative increase in market interest in Japan given the fame of the designer, Takashi Murakami; (3) growth in new markets; (4) decrease in purchases by Japanese tourists in the United States; and (5) a greater incidence of counterfeiters in the United States. Most if not all of these factors are quantifiable and could therefore be analyzed in a regression analysis. Because Torres (and Anson as his conduit) did not take account of any of these plausible alternative causes, the regression analysis must be excluded under Rule 702. See also In re Wireless Tel. Servs. Antitrust Litig., 385 F.Supp.2d 403, 427 (S.D.N.Y.2005) (regression analysis excluded as irrelevant because it failed to “incorporate major independent variables”).

Louis Vuitton argues that Anson did take account of independent variables, and points to Anson’s deposition testimony where he stated that “we did ask those questions and we did look at those, and there is not a discernible major variable between the United States market having Dooney & Bourke and the rest of the world.” Among the variables Anson testified “we” considered were whether there was a “radical or major change in retail channels outside the United States or inside the United States”; “major changes in pricing policies”; and disproportionate increase in new store openings. The problems with Anson’s response are two: (1) other plausible causes raised by Dooney & Bourke and discussed above were not considered (e.g., disproportionate counterfeiting); and (2) Anson once again was the wrong witness to testify to what variables were considered and what were not, as by his own admission the regression analysis was conducted by Tones — and so Anson’s testimony on consideration of variables should be given little if any credit.

In response to the critique of Torres’s regression analysis, Louis Vuitton cites U.S. Info. Sys. v. Int’l Bhd. of Elec. Workers Local Union No. 3, AFL-CIO, 313 F.Supp.2d 213, 235, 238 (S.D.N.Y.2004) for the unremarkable proposition that the expert is not required to categorically exclude each and every possible alternative cause before singling out one possible causative factor. We do not reject the regression analysis in this case because Torres failed to rule out all possible causes other than Dooney & Bourke Multicolor Monogram handbags for the proportionate downturn in Louis Vuitton United States sales. We reject the regression analysis because Louis Vuitton has not set forth any credible testimony from a knowledgeable witness that the obvious alternatives were considered, analyzed, and ruled out.

3. Summary

Anson should be permitted to testify to the amount of net profits that Dooney & Bourke obtained from the allegedly infringing sales, subject to the condition that Dooney & Bourke cannot establish a connection between those sales and any of the general costs that Anson did not deduct. Anson’s testimony on all other matters should be excluded.

B. Dr. Bradford Cornell

1. Facts

Louis Vuitton moves to exclude the testimony and the expert report of Dr. Bradford Cornell. Dr. Cornell was retained by Dooney & Bourke to evaluate and determine whether Louis Vuitton “incurred any damages resulting from the alleged trademark infringement and/or ‘dilution’ by Dooney & Bourke.” Dr. Cornell was also retained to review and critique An-son’s expert report. Dr. Cornell bases his opinions on his review of hearing and deposition transcripts, material produced in discovery, spreadsheets prepared by Doo-ney & Bourke, websites of the parties, the 2006 Annual Report of Louis Vuitton, and a few articles about Dooney & Bourke and Louis Vuitton.

Dr. Cornell reaches the following conclusions: (1) Louis Vuitton “was not damaged as a result of the challenged activities of Dooney & Bourke” because it “did not reduce prices, miss sales forecasts, or cut production because of sales of the Dooney & Bourke products”; (2) Dooney & Bourke’s net profit from the multicolor handbags ranges “from $1.77 million to $14.39 million, depending upon the financial assumptions that are used”; and (3) Anson’s report contains “significant errors and his regression analysis on dilution is statistically unreliable.”

In coming to his conclusion that Louis Vuitton suffered no damages, Dr. Cornell cites “fundamental” economic theory, which, according to him, provides that the only way for a company to be economically damaged is to (1) miss sales forecasts, (2) reduce prices, (3) experience lower margins, or (4) cut planned production schedules. He reasons that none of these setbacks occurred to Louis Vuitton, because it did not reduce prices on the Multi-colore Monogram handbag; it did not miss a sales forecast, given the fact that there were waiting lists for the handbags; because of that demand it did not cut production; and it did not experience lower margins. Dr. Cornell’s premise is that “all value comes from cash flow.” Therefore he did not consider any other way that Louis Vuitton might have been damaged other than the four possibilities described above.

Dr. Cornell corroborated his “fundamental conclusion” that Louis Vuitton suffered no damages by conducting “an analysis of Louis Vuitton sales trends in the United States as compared to the most relevant non-U.S. regions (where Dooney & Bourke does not have significant sales).” His analysis differed from that of Anson in the geographical scope of the comparison with Louis Vuitton’s United States sales. An-son’s comparison was with all sales in the world, while Dr. Cornell’s comparison was with “other mature countries in Europe and North America.” Dr. Cornell’s reasoning for limiting the comparison to those countries is that it “eliminates conclusions that may be caused by the growth in the Asian and developing markets.” When European sales were used as the explanatory variable, Dr. Cornell found no statistically significant difference in the proportion of United States sales of Multicolore Monogram handbags during the period in question.

Dr. Cornell then considered the Dooney & Bourke sales data in his regression analysis, on the ground that the “standard procedure would be to regress Louis Vuitton U.S. sales on Louis Vuitton international sales and Dooney & Bourke U.S. sales.” When he conducted that regression, he found that the results were “very robust.” He explains as follows:

Whether the rest of the world or European sales are used as the explanatory variable and whether the regression is estimated in arithmetic or logarithmic form, the coefficient for the Dooney & Bourke’s sales in the United States variable is positive and highly significant. This means that higher U.S. sales for Dooney & Bourke are associated with higher U.S. sales for Louis Vuitton, and vice versa. In other words, there is a positive relationship between Dooney & Bourke sales and Louis Vuitton sales in the United States. This result is fundamentally inconsistent with the claim that Dooney & Bourke sales eroded Louis Vuitton sales in the United States.

Dr. Cornell also provided an opinion on the amount of net profits that Dooney & Bourke obtained over the subject period. Dr. Cornell essentially used the “full absorption” method of deducting costs. Like Anson, he deducted the expenses associated with the design, production and sales of the multicolor handbags. But in addition Dr. Cornell deducted proportionate amounts of taxes, salaries and other overhead; this brought him to a figure of $14,393 million in net sales on the multicolor handbags.

Finally, Dr. Cornell’s report criticizes the Anson report on a number of grounds, including: (1) its analysis-free conclusion of “confusing similarity”; (2) its “fundamentally flawed” regression analysis, in which the regression model is “misspeci-fied and misleadingly fails to take account of Dooney & Bourke’s actual sales;” its failure to consider “the unique impact of Japanese sales during the relevant time period;” and its failure to rule out other alternatives such as the impact of counterfeiters. Dr. Cornell notes that the model used by Anson “merely assumed that the only factor that might matter is sales by Dooney & Bourke.”

We do not further address Dr. Cornell’s critique of Anson’s regression analysis. Dr. Cornell’s criticisms are obviously in accordance with our recommendation to exclude Anson’s opinion on dilution because (for one thing) it is unreliable. If our recommendation as to Anson’s regression analysis is accepted, then there will be no need to admit Dr. Cornell’s critique of it. If for some reason Anson is permitted to testify to such matters, however, we believe that Dr. Cornell’s critique of An-son’s methodology is without question more than reliable enough to be admissible under Rule 702, for reasons stated in our own critique of Anson’s methodology.

2.Discussion

Louis Vuitton argues that Dr. Cornell’s testimony and report must be excluded on a number of grounds, including the following:

1. Dr. Cornell’s four-factor test for determining damages is an unsupported theory, suffering from “serious methodological deficiencies and flawed departures from accepted damages analysis under the Lanham Act,” and rendering his opinion not helpful to the trier of fact and unreliable under Rule 702.
2. Dr. Cornell’s statistical analysis of Louis Vuitton’s sales — offered in counterpoint to Anson’s analysis— must be excluded because it lacks a scientific basis.
3. Dr. Cornell’s report must be struck because it is constructed “more like a litigant’s Proposed Findings of Fact and Conclusions of Law” and thus “blatantly violates the limitations placed upon the role of an expert.”
4. In its reply memorandum, Louis Vuitton argues that Dr. Cornell’s calculation of Dooney & Bourke’s profits is unreliable because he used the “full absorption” method of assessing costs, and also improperly deducted taxes.

Our analysis of these arguments and others pertinent to Dr. Cornell’s testimony and report is necessarily colored by our previous recommendations with respect to Anson — and by our puzzlement at the mixed signals sent by Louis Vuitton about the need to provide expert testimony on the downturn in United States sales on one hand and yet to state that it is not seeking lost profits on the other. Because much of Dr. Cornell’s report is intended as a response to Anson, our previous recommendations will control some of the outcome with respect to Dr. Cornell.

a. Qualifications:

Dr. Cornell received a Masters degree in Statistics and a doctorate in Financial Economies from Stanford. He is a Professor of Economics at the California Institute of Technology and Professor Emeritus at the Anderson Graduate School of Management at the University of California. He has edited a number of journals on matters of business and finance, and written more than 75 articles and two books on corporate finance and securities. He has received prizes and awards in his fields. He is unquestionably qualified to opine on such matters as sales data, valuation, statistical analysis and finance— which is to say, all of the matters covered in the analysis and conclusions in his expert report.

While not going so far as to argue that Dr. Cornell is unqualified under Rule 702, Louis Vuitton lodges several complaints that appear to be directed toward his qualifications. It stresses that Dr. Cornell was unfamiliar with “the basic principles of law relating to computation of damages in trademark cases.” But this misses the point of Dr. Cornell’s testimony. He does not purport to and does not in fact testify as an expert on trademark law, but rather as an expert on corporate valuation and statistical analysis. Louis Vuitton also complains that Dr. Cornell did not understand its contention that it was not seeking lost profits. But this is an argument about legal doctrine, not about Dr. Cornell’s qualification to testify as an expert on financial and statistical questions. Moreover, any misunderstanding of Louis Vuitton’s position on Dr. Cornell’s part is understandable given the fact that he was retained for the very purpose of rebutting Louis Vuitton’s expert on “damages”; one might well be confused when informed that Louis Vuitton was not seeking the damages it retained an expert to prove. Finally, Louis Vuitton’s argument that Dr. Cornell did not even understand the concept of goodwill is rejected for reasons discussed below.

b. The Challenge to Dr. Cornell’s Four-Factor Test for Assessing Damages

Louis Vuitton argues that the Cornell report is not helpful to the trier of fact “because it myopically focuses on Louis Vuitton’s lost profits, which Louis Vuitton has expressly (and repeatedly) stated it does not seek in this litigation.” Essentially Louis Vuitton is contending that Dr. Cornell’s testimony on lack of damages does not “fit” the case because Louis Vuitton is seeking the defendant’s profits and not its own damages.

Of course the same can be — said and was, see supra — about Louis Vuitton’s own professed “damages” expert, Anson. Louis Vuitton seems to want to have it both ways, i.e., to allow its own expert to testify to damages, but to exclude the defendant’s rebuttal expert because Louis Vuitton is not seeking damages.

It is difficult for us at this point, given Louis Vuitton’s position, to determine whether “damages” in the classic sense will be at issue in any respect if this case goes to trial. We believe the answer should be no, given Louis Vuitton’s many statements that it is not seeking its own damages but is rather seeking an accounting of Dooney & Bourke’s profits as a proxy for any damages it may have suffered but cannot or does not wish to prove. If those statements are to be taken at face value, then we believe that both Dr. Cornell’s testimony that Louis Vuitton suffered no damages and Anson’s testimony on Louis Vuitton’s lost profits should be excluded as irrelevant. Louis Vuitton’s concession that it is not claiming lost profits should be treated as taking the issue of lost profits out of the case.

However, if Louis Vuitton continues to raise arguments about the downturn of sales of its Multicolore Monogram handbag in the United States, then we believe that the only fair result is to reject, at trial, its argument that Dr. Cornell’s testimony on lack of damages is irrelevant. As Dooney & Bourke correctly points out, Louis Vuitton’s lack of economic harm is at the very least a relevant factor in determining whether and to what extent Louis Vuitton is entitled to an accounting of profits from the allegedly infringing handbags. See Bandag v. Al Bolser’s Tire Stores, Inc., 750 F.2d 903, 919 (Fed.Cir.1984) (noting that “an inability to show actual damages does not alone preclude a recovery under section 1117” but that lack of actual damages is relevant to the court in fashioning an equitable remedy). If Louis Vuitton is unwilling to concede lack of economic harm explicitly, then Dr. Cornell’s testimony on lack of damages will fit the facts in dispute. Put another way, Louis Vuitton should be found to “open the door” to testimony on lack of damages by presenting either evidence or argument of lost sales in the United States. If Louis Vuitton argues, for example, that it was harmed but that it does not have to prove it to recover Dooney & Bourke’s lost profits, then the existence and extent of economic damage remains a live issue and Dr. Cornell’s testimony fits the fact of the case.

We assume at this point for the sake of argument that Dr. Cornell’s damages testimony is relevant because Louis Vuitton leaves the issue of lost profits open either implicitly or explicitly. We turn now to whether Dr. Cornell used reliable methods in determining the impact of the alleged infringement on Louis Vuitton.

Louis Vuitton argues that Dr. Cornell’s method of determining damages is unreliable because it focuses solely on cash flow and “completely failed to analyze the impact of infringement on the goodwill of the Louis Vuitton Monogram Multicolore trademark.” But Dr. Cornell does in fact attribute monetary value to goodwill under his cash flow analysis. Louis Vuitton overstates the case in claiming that Dr. Cornell’s cash-flow theory constitutes an unreliable methodology. Cornell’s basic premise — that all corporate value is monetary — is quite unremarkable. Compare Frymire-Brinati v. KPMG Peat Marwick, 2 F.3d 183, 186 (7th Cir.1993), where the expert valued corporate assets by using a discounted cash flow analysis, under which he assessed property value solely on the basis of net, rather than potential, cash flow. Judge Easterbrook, writing for the court, found that the expert’s methodology was unreliable under Daubert because he failed to consider potential cash flow, and his methodology would lead to the conclusion that “raw land is worthless and that a large office building in the final stages of construction also has no value even though it is fully leased out and could be sold for a hundred million dollars.” Dr. Cornell’s method suffers no such flaw. Judge Easterbrook noted that “[t]o determine a market value using a discounted cash flow analysis, one must consider potential cash flows (for example, what the office building will produce after occupancy) and not simply historical cash flows.” Dr. Cornell’s deposition testimony, both in general and in the context of valuation of intellectual property and goodwill, indicates that his cash flow analysis does in fact sufficiently take account of future cash flows. Therefore his methodology is reliable under Daubert and Rule 702. See also F.D.I.C. v. Suna Associates, Inc., 80 F.3d 681 (2d Cir.1996) (expert using a valuation methodology that was a blend of two standard approaches to valuation was properly permitted to testify, given Dau-bert’s flexible and permissive approach; any internal contradictions in the expert’s testimony presented questions of weight and not admissibility).

Louis Vuitton’s complaint about Dr. Cornell’s valuation analysis is really a challenge to Cornell’s application of his cash-flow theory; this is a legitimate concern under Rule 702, which requires that a reliable methodology must be reliably applied. Dr. Cornell recognized that goodwill has cash value, but admits that he did not attempt to value goodwill in determining whether Louis Vuitton suffered any damages in this case. We agree that Dr. Cornell, in applying his cash flow methodology theory to Louis Vuitton, appears to have overlooked the possible long-term effect on cash flow that might occur through diminution of a mark and loss of goodwill. Dooney & Bourke argues that any assessment of loss of goodwill at the time of Dr. Cornell’s report would have been completely speculative given the disputes on whether there was even infringement and if so whether it resulted in any harm to Louis Vuitton. We have sympathy with that argument but note that damages testimony will always have some degree of speculation, and that Dr. Cornell does not appear even to have thought about the possibility of valuing goodwill by the time of his deposition.

Yet this possible oversight does not justify exclusion of Dr. Cornell’s testimony at trial. Dr. Cornell can correct this oversight before trial, by assessing the lost value from the diminution of the mark— assuming hypothetically that the mark was actually damaged by infringing conduct on the part of Dooney & Bourke. At that time, his assessment can be challenged as Daubert envisions, by cross-examination and argument to the fact-finder (again assuming that Louis Vuitton’s damages are even an issue in the case). See Ellis v. Gallatin Steel, 390 F.3d 461 (6th Cir.2004) (where expert misapplied his methodology in reaching his damage calculations before trial, this did not necessitate exclusion under Daubert, where the expert was using reliable methodology and corrected his error before trial).

c. The Challenge to Cornell’s Statistical Analysis of Louis Vuitton’s United States Sales

Louis Vuitton argues that Dr. Cornell’s statistical analysis of Louis Vuitton sales must be excluded because it “lacks scientific basis.” In analyzing this argument, we assume that Dr. Cornell’s report is going to be relevant at trial in the first place, which will only be the case if Louis Vuitton fails or refuses to make a clear concession that it has not suffered any lost profits from the sales of the allegedly infringing Dooney & Bourke handbags. We also assume that Anson’s testimony on lost profits will be excluded, for the many reasons we set forth in our review of his report and projected testimony, supra. It is clear to us that if Rules 403, 702, 703 and the hearsay rule are all permissive enough to allow Anson to testify on the basis of Torres’s single-factor regression analysis, then Cornell’s opinion is a fortiori admissible and we would need to go no further.

The question remaining is whether Dr. Cornell’s statistical analysis is admissible under Rules 702 and 403 if and when Anson’s testimony on the subject is excluded. Louis Vuitton contends that Dr. Cornell did not use reliable methods to justify his decision to compare United States sales with sales in Europe, as opposed to the rest of the world. Dr. Cornell’s explanation for this comparison was that “the European countries have demographics and other characteristics similar to the U.S., and have mature markets for Louis Vuitton products, whereas parts of Asia do not.” He cites no support for these assumptions; he relies on no articles or studies to justify either the empirical assumptions or the premise that a statistical analysis is more reliable (or reliable at all) when the comparison base is substantially narrowed. Moreover, as Louis Vuitton points out, the “European countries” in-elude some countries in which the demographics are not in fact similar to the United States, especially when it comes to purchasing a luxury item like a Louis Vuitton handbag. Many of the countries excluded by Dr. Cornell (e.g., Singapore and Australia) are probably more similar in terms of the relevant demographic than are some of the countries he included. We conclude, therefore, that Dr. Cornell’s methodology was flawed to the extent that he picked European sales as the comparison to United Sales without using a reliable methodology to do so. See, e.g., Raskin v. Wyatt, 125 F.3d 55 (2d Cir.1997) (statistical analysis properly excluded where expert did not establish the validity using a comparison group as a variable); Anderson v. Westinghouse Savannah R. Co., 406 F.3d 248 (4th Cir.2005) (statistician’s testimony properly excluded where his comparison group was not sufficiently similar for the statistical comparison to be reliable). Dr. Cornell’s change of the variable to the European market does not mean that his results are any more reliable than Anson’s.

But there is an additional aspect of Dr. Cornell’s study that merits attention, one that goes beyond merely reworking An-son’s analysis. Dr. Cornell at least attempted to assess the impact of Dooney & Bourke sales on Louis Vuitton, while An-son simply assumed Dooney & Bourke sales to be a causative factor. Dr. Cornell regressed Louis Vuitton United States sales on Louis Vuitton international sales and Dooney & Bourke United States sales. His findings indicated to him a positive correlation between Dooney & Bourke sales and Louis Vuitton sales in the United States; and this was so whether the explanatory variable is European sales or worldwide sales. If Dr. Cornell’s regressions are reliable, then the previously-discussed reliance on the European market as a variable is no longer a concern.

Unfortunately, Dr. Cornell has not given a good explanation of why he chose to conduct the regression the way he did. He simply states that it is “standard procedure” to do two regressions rather than one. We are not convinced that the flaws in a single regression are remedied by conducting two separate single regressions. Dr. Cornell’s analysis is not a multiple regression in the sense that it factors in a number of independent explanatory variables (in this case, for example, the possibility of proportionately greater counterfeiting of Louis Vuitton bags in the United States, or the possibility of a skewed market in Japan). Rather, he conducted two separate single regressions, perhaps under the theory that two is better than one.

Under Rule 702, the proponent of the expert must prove by a preponderance of the evidence that the expert used reliable methods and applied those methods reliably. Nothing in Dooney & Bourke’s submission proves that Dr. Cornell’s serial regression analysis is reliable; his conclu-sory statement that it is standard procedure is not sufficient to justify its admission; that is simply the ipse dixit of the expert.

We recognize that Dr. Cornell is not trying to prove that Dooney & Bourke sales caused an increase in Louis Vuitton sales. Dr. Cornell notes that a positive correlation does not itself justify opinion on causation. Rather, Dr. Cornell’s study is offered to prove that because there is a positive correlation, the increase in Dooney & Bourke sales could not have caused a decrease in Louis Vuitton sales. Yet even this opinion cannot be reliably drawn from a single-factor correlation. Without considering and regressing confounding factors, there is a very plausible argument that some independent market factor was positively affecting the sales of both companies.

Under the circumstances, we believe that the best result is to exclude Dr. Cornell’s findings of a correlation between Dooney & Bourke and Louis Vuitton sales, for the same reason that Anson’s findings should be excluded — -the use of a regression with a single variable is not reliable. We reiterate that if Anson’s findings on causation are somehow found admissible, Dr. Cornell’s findings of a positive correlation should be admissible as well.

d. The Challenge to Cornell’s Report as Exceeding the Proper Scope of Expert Testimony

Louis Vuitton argues that pages 3-7 of Dr. Cornell’s report should be struck because it includes “purported factual ‘background’ including unfounded opinions” such as an assertion about the strength of the Dooney & Bourke brand. Louis Vuitton further complains that “the entire body of the report is intertwined with over fifty footnotes carefully marshalling selected references to deposition and hearing transcripts, news articles and other materials.”

We note that Louis Vuitton’s own expert, Anson, goes well beyond Dr. Cornell’s asserted violations in his own report. As discussed above, about half of Anson’s report is nothing but unproved assertions about the strength of the Louis Vuitton mark, the lack of quality of Dooney & Bourke merchandise, exposition on trademark law, and opinions on the merits. The examples given by Louis Vuitton in Dr. Cornell’s report pale in comparison to the broad and unsubstantiated statements made throughout Anson’s report. Dr. Cornell’s elicitation of background facts looks much more like “context” for his expert opinion than do Anson’s broad assertions.

We certainly see the value of providing in an expert report a short introductory statement of the underlying facts of the case. But at trial, those facts should of course be proved by admissible evidence and not expert assertion. Thus if Dr. Cornell testifies, he should not be permitted to summarize the background facts, as that will not assist the jury. And if Dr. Cornell’s report is admitted at trial, the section on “Background” from pages 3-7 should be struck if offered to prove any of the facts related. See Highland Capital Mgmt., LP. v. Schneider, 379 F.Supp.2d 461, 468-69 (S.D.N.Y.2005), vacated in part and remanded on other grounds, 485 F.3d 690 (2d Cir.) (when the expert “is simply rehashing otherwise admissible evidence about which he has no personal knowledge, such evidence — taken on its own — is inadmissible” because “an expert cannot be presented to the jury solely for the purpose of constructing a factual narrative based upon record evidence.”). See also In re Rezulin Prods. Liab. Litig., 309 F.Supp.2d 531, 551 (S.D.N.Y.2004) (“Dr. Gale’s ‘history of Rezulin’ is merely a narrative of the case which a juror is equally capable of constructing... .Such material, to the extent it is admissible, is properly presented through percipient witnesses and documentary evidence.” (citations omitted)).

e. The Challenge to Cornell’s Use of the “Full Absorption” Method

Louis Vuitton contends that Dr. Cornell miscalculated Dooney & Bourke’s profits, essentially by using the “full absorption” method of deducting costs. We conclude, as we did when reviewing the same issue involving Anson’s testimony, that the dispute over the appropriate method of accounting for costs against profits is a question of law and not a dispute about the reliability of expert testimony. The applicable law provides that Dr. Cornell’s method is appropriate so long as Dooney & Bourke proves at trial a “nexus between each expense claimed and the sales” of the multicolor handbags. Manhattan Industries, Inc. v. Sweater Bee by Banff, Ltd. 885 F.2d 1, 8 (2d Cir.1989). Contrary to Louis Vuitton’s suggestion, Dooney & Bourke does not need to prove the connection at the motion in limine stage, but rather must do so at trial. Dr. Cornell is not prohibited from testifying to his opinion as to the amount of net profits, subject to Dooney & Bourke proving the necessary connection.

Louis Vuitton argues that Dr. Cornell “apparently” deducted from Dooney & Bourke’s profits the attorney fees expended in this litigation. Such a deduction would be improper because it is not connected to the sale of the handbags. See New Line Cinema Corp. v. Russ Berrie & Co., 161 F.Supp.2d 293, 304 (S.D.N.Y.2001). But we see no such deduction mentioned in Cornell’s report, nor in his deposition, nor in Exhibit 7 to his report, which lists the expenses deducted. If Cornell has deducted such expenses, this is an error that can be corrected before he testifies and it can be made the subject of cross-examination at trial. Cummings v. Standard Register Co., 265 F.3d 56 (1st Cir.2001). So there is no reason to exclude Dr. Cornell’s testimony on profits on this ground.

Finally, Louis Vuitton contends that it was improper for Dr. Cornell to deduct a proportionate share of income taxes from Dooney & Bourke’s profits. This again is a dispute over the applicable law and does not raise a concern about the reliability of Dr. Cornell’s methodology itself. Louis Vuitton states that income taxes “generally are not a valid deduction from infringer’s profits because the damages award, if any, will be deductible to the defendant and taxable to the plaintiff.” We read the applicable case law to leave the deductibility of taxes from profits to the district court, exercising its discretion in equity. See, e.g., New Line Cinema Corp. v. Russ Berrie & Co., 161 F.Supp.2d 293, 304 (S.D.N.Y.2001) (exercising equitable discretion and allowing deduction of taxes from profits of the infringer). See also L.P. Larson, Jr., Co. v. William Wrigley, Jr., Co., 277 U.S. 97, 99-100, 48 S.Ct. 449, 72 L.Ed. 800 (1928) (rejecting deduction of taxes for a willful infringer, but holding that deductibility depends on the circumstances); W.E. Bassett Co. v. Revlon, Inc., 435 F.2d 656, 665 (2d Cir.1970) (allowing defendants to deduct income tax payments despite willful infringement); In Design v. K-Mart Apparel Corp., 13 F.3d 559, 566-567 (2d Cir.1994) (holding non-willful in-fringer was entitled to deduct taxes paid on its “innocently-acquired unlawful profits”). Under this case law, deductibility of the infringer’s taxes from its profits on the infringing goods depends at least in part on whether the infringement was intentional. Dooney & Bourke’s intent to infringe (assuming there is infringement at all) is obviously a fact that will be determined by the jury. Once again the dispute over deductibility of taxes is one that must be resolved at or after trial. Cornell should be permitted to testify to the amount of taxes that should be deducted from the profits, conditioned on proof of sufficient nexus, and subject to a finding of willfulness.

3. Summary

Dr. Cornell should be permitted to testify to Louis Vuitton’s lack of damages on the basis of his cash flow analysis, unless Louis Vuitton takes that issue out of the case. Dr. Cornell should also be permitted to testify to his opinion on the amount of Dooney & Bourke’s net profits from the sale of the allegedly infringing handbags, the relevance of which is subject to proof by Dooney & Bourke of a nexus between the profits and the overhead costs counted by Dr. Cornell (and, with respect to taxes, subject also to the district court’s equitable discretion on whether the deduction is justified). Finally, Dr. Cornell should not be permitted to testify on the basis of his statistical analysis of Louis Vuitton and Dooney & Bourke sales.

V. Conclusion

We realize that it may seem drastic to recommend exclusion of so much of the expert testimony before us: that is, all of the testimony and reports of the three survey experts, all of the testimony and the report of Dr. Holub, almost all of the testimony and the report of Anson and a good part of the testimony and report of Dr. Cornell. We know that some of the flaws we describe, if looked at individually, are properly considered as questions of weight and not admissibility. But we also know that the gatekeeper function imposed by Daubert and Rule 702 requires the proponent to prove that its expert has used reliable methods in a reliable manner; that experts are not allowed to testify on matters that are left for the jury; that an expert’s testimony must fit the substantive law and the facts of the case; that experts qualified in one area are not permitted to testify on other subject matter for which they are not qualified; and that questions of weight, when sufficiently accumulated, become so serious as to require exclusion.

June 15, 2007.

We took each expert on his own merits; we believe that we applied the Daubert analysis and Rule 403 fairly to both sides. Most of our decisions presented easy cases. But even where that decision was easy, we tried to make sure that we gave each submission a fair reading with an evenhanded application of the law. 
      
      . Kumho Tire Co. v. Carmichael, 526 U.S. 137, 137, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999).
     
      
      . See Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).
     
      
      . See Jinro Am. Inc. v. Secure Invs., Inc., 266 F.3d 993, 1004 (9th Cir.2001).
     
      
      . Daubert, 509 U.S. at 589, 113 S.Ct. 2786. The Supreme Court in Kumho extended this gatekeeping function to expert testimony based on "technical” and "other specialized” knowledge, in addition to testimony based on scientific knowledge. See Kumho, 526 U.S. at 141, 119 S.Ct. 1167.
     
      
      . Kumho, 526 U.S. at 142, 119 S.Ct. 1167 (citing General Electric Co. v. Joiner, 522 U.S. 136, 143, 118 S.Ct. 512, 139 L.Ed.2d 508 (1997)).
     
      
      . See Daubert, 509 U.S. at 588, 113 S.Ct. 2786.
     
      
      . United States v. 14.38 Acres of Land Situated in Leflore County, Mississippi, 80 F.3d 1074, 1078 (5th Cir.1996).
     
      
      . Manual for Complex Litigation (Fourth) § 23.24 (2004).
     
      
      . See Fed.R.Evid. 702 Advisory Committee Note.
     
      
      . Daubert, 509 U.S. at 596, 113 S.Ct. 2786. See also Manual for Complex Litigation (Fourth) § 23.24 (2004) (“Cross-examination and presentation of contrary evidence by the opposing party, as suggested in Daubert, would identify for the jury the shakiness of the foundation on which the conclusion is based.”).
     
      
      . Shari Seidman Diamond, Reference Guide on Survey Research, in Reference Manual on Scientific Evidence (Second) 229, 235 (2000).
     
      
      . See Sobering Corp. v. Pfizer, Inc., 189 F.3d 218, 225 (2d Cir.1999).
     
      
      . See Kargo Global, Inc. v. Advance Magazine Publishers, Inc., No. 06 Civ. 550, 2007 WL 2258688, at *11 (S.D.N.Y. Aug. 6, 2007).
     
      
      . See Sobering, 189 F.3d at 228.
     
      
      . Id. (citations omitted).
     
      
      . AHP Subsidiary Holding Co. v. Stuart Hale Co., 1 F.3d 611, 618 (7th Cir.1993).
     
      
      . Trouble v. Wet Seal, Inc., 179 F.Supp.2d 291, 307 (S.D.N.Y.2001) (citing Schering, 189 F.3d at 228). Accord Mastercard Int’l Inc. v. First Nat’l Bank of Omaha, Inc., Nos. 02 Civ. 3691, 03 Civ. 707, 2004 WL 326708, at *7 (S.D.N.Y. Feb.23, 2004).
     
      
      . 6/15/07 Report and Recommendation of the Special Masters ("R & R”).
     
      
      . Id. at 679. See also id. at 565-66 ("While courts in the Second Circuit rely mainly on Rule 403 to exclude unreliable surveys, we note that Rule 702 is clearly applicable as well, because the result of a survey is essentially expert testimony, and Rule 702 requires that such testimony must be reliable. The bottom line is that if the survey suffers from substantial flaws, it will be excluded under both Rule 403 and Rule 702.”).
     
      
      . Id. at 679.
     
      
      .See, e.g., Starter Corp. v. Converse, Inc., 170 F.3d 286, 296-98 (2d Cir.1999) (affirming district court’s exclusion of survey as irrelevant in trademark infringement case where survey did not test for or demonstrate likelihood of confusion and any probative value was outweighed by prejudicial effect); Universal City Studios, Inc. v. Nintendo Co., Ltd., 746 F.2d 112, 118 (2d Cir.1984) (finding survey to be "badly flawed” where it utilized an improper universe and posed an unfair and leading question to respondents); Kargo Global, 2007 WL 2258688, at *6-12 (excluding consumer confusion survey where it used improper stimuli and an impermissibly leading format that failed to approximate real-world conditions); Trouble, 179 F.Supp.2d at 307-08 (excluding survey under Rule 403 "[gjiven the lack of a proper universe and sample, the poor choice of location, the lack of proper stimuli, and questions that have little or no relevance to issues in the case”); Mastercard, 2004 WL 326708, at *9-10 (excluding survey under Rules 403 and 702 where survey’s flaws included a low number of respondents, the failure to address whether respondents were representative of people "whose potential confusion is relevant” and where the survey bore little similarity to the relevant real-world decision-making process). Other courts in the Circuit, however, have exercised their discretion to admit survey evidence despite methodological flaws. See, e.g., Friesland Brands, B.V. v. Vietnam Nat’l Milk Co., 221 F.Supp.2d 457, 461 (S.D.N.Y.2002) (admitting survey evidence where flaws, such as failure to offer standard translations of questions for non-English speaking respondents, "are not so obvious and egregious” that probative value is outweighed by prejudicial effect); Cache, Inc. v. M.Z. Berger & Co., No. 99 Civ. 12320, 2001 WL 38283, at *10-11 (S.D.N.Y. Jan. 16, 2001).
     
      
      
        .See, e.g., Citizens Fin., Group, Inc. v. Citizens Nat’l Bank of Evans City, 383 F.3d 110, 118-21 (3d Cir.2004) (affirming district court's exclusion of survey evidence under Rules 702 and 403 in trademark infringement case where survey relied on an improper universe and its questions were vague and imprecise). See id. at 121 (stating that the court properly fulfilled its gatekeeping duty to exclude the survey where these flaws were deemed to be fatal rather than merely technical).
     
      
      . Familiarity with the underlying facts of this litigation is assumed. For a thorough discussion of the factual background, see Louis Vuitton Malletier v. Dooney & Bourke, Inc., 340 F.Supp.2d 415, 419-28 (S.D.N.Y.2004), vacated in part, 454 F.3d 108 (2d Cir. 2006).
     
      
      . See Order Appointing and Directing Special Masters (the "May 18 Order”).
     
      
      . See id. ¶ 2.
     
      
      . See Objections of Plaintiff Louis Vuitton Malletier to the Report & Recommendation of the Special Masters Dated June 15, 2007.
     
      
      . See Memorandum in Support of Dooney & Bourke’s Motion to Adopt the R & R of the Special Masters (And Conditional Objections) (”Def. Mem.”) at 17-25. Dooney & Bourke states that their objections are "only made should the Court grant an objection or motion for modification from [Louis Vuitton]; if the Court should deny all of [Louis Vuittonj's objections, [Dooney & Bourke] withdraws its conditional objection[s].” Id. at 2.
     
      
      . Fed.R.Civ.P. 53(g)(1).
     
      
      . See May 18 Order at ¶ 8(A). See also Fed R. Civ. P. 53(g)(4).
     
      
      . See May 18 Order at ¶ 8(B). See also Fed R. Civ. P. 53(g)(3).
     
      
      . See May 18 Order at ¶ 8(C). See also Fed. R.Civ.P. 53(g)(5).
     
      
      . Bourjaily v. United States, 483 U.S. 171, 175-76, 107 S.Ct. 2775, 97 L.Ed.2d 144 (1987). See also Velez v. Sony Discos, No. 05 Civ. 0615, 2007 WL 120686, at *4 (S.D.N.Y. Jan. 16, 2007) ("[T]he proponents of the [expert] Report[ ] bear the initial burden of demonstrating the admissibility of the testimony.”).
     
      
      . Fed.R.Evid. 702.
     
      
      . 509 U.S. at 597, 113 S.Ct. 2786.
     
      
      . Bickerstaff v. Vassar Coll., 196 F.3d 435, 449 (2d Cir.1999) (quotation omitted).
     
      
      . See Daubert, 509 U.S. at 595, 113 S.Ct. 2786.
     
      
      . See United States v. Lumpkin, 192 F.3d 280, 289 (2d Cir.1999).
     
      
      . United States v. Bilzerian, 926 F.2d 1285, 1294 (2d Cir.1991). Accord Hygh v. Jacobs, 961 F.2d 359, 363 (2d Cir.1992) ("This circuit is in accord with other circuits in requiring exclusion of expert testimony that expresses a legal conclusion.”).
     
      
      . Andrews v. Metro N. Commuter R.R. Co., 882 F.2d 705, 708 (2d Cir.1989) (citing cases).
     
      
      . Fed.R.Evid. 403.
     
      
      . Daubert, 509 U.S. at 595, 113 S.Ct. 2786 (quotation omitted).
     
      
      . 7/6/07 Letter from Theodore C. Max, LV’s counsel, to the Court at 3.
     
      
      . Id. at 1.
     
      
      . See id. at 1-3.
     
      
      . Id. at 2.
     
      
      . Id.
      
     
      
      . See I/10/0I Letter from Douglas D. Broad-water, Dooney & Bourke's counsel, to the Court at 1, 3.
     
      
      . Id. at 2.
     
      
      . See id. at 3. Dooney & Bourke also notes that it could have objected to Special Master Beebe’s appointment on the ground that he had thanked Louis Vuitton’s survey expert, Dr. Jacob Jacoby, in a law review article published in 2006, and had once referred to him as a leading expert in his field. According to Dooney & Bourke, this constitutes a far more legitimate claim of bias, yet it never "asserted such a claim because it would not have passed the good faith test.”
     
      
      . See 7/11/07 Letter from Special Master Beebe to the Court (“Special Master Beebe Letter”) at 1-2. Sheff also emailed Special Master Beebe in June 2007 regarding the legal market for professors, but had not received a response as of the date of Special Master Beebe’s letter to the Court. See id. at 2.
     
      
      . Id. at 2.
     
      
      . See id.
      
     
      
      . Fed.R.Civ.P. 53(a)(2).
     
      
      . 28 U.S.C. § 455(a).
     
      
      . DeLuca v. Long Island Lighting Co., Inc., 862 F.2d 427, 428 (2d Cir.1988).
     
      
      . Id. at 428-29 (quoting Pepsico, Inc. v. McMillen, 764 F.2d 458, 460 (7th Cir.1985)).
     
      
      . 28 U.S.C. § 455(b)(1).
     
      
      . See 8/16/06-8/20/06 Email Correspondence Between Special Master Beebe and Jeremy Sheff Esq., attached to Special Master Beebe Letter.
     
      
      . DeLuca, 862 F.2d at 428.
     
      
      . 28 U.S.C. § 455(b)(1).
     
      
      . See, e.g., Faulkner v. National Geographic Soc., et al., 296 F.Supp.2d 488 (S.D.N.Y.2003) (denying plaintiffs request for recusal where judge once represented a subsidiary of a minor defendant in an unrelated matter, and where judge's former law firm colleague served as board member for main defendant during their colleagueship), aff'd, 409 F.3d 26 (2d Cir.2005); Local 338, RWDSU v. Trade Fair Supermarkets, 455 F.Supp.2d 143 (E.D.N.Y.2006) (denying defendant's request for recusal under section 455(a) where plaintiff’s counsel’s father was an acquaintance and fellow partner at the judge’s former law firm, and her mother briefly worked with the judge in the past in connection with a discrete matter). Cf. McMillen, 764 F.2d at 460-61 (granting petition for writ of mandamus directing recusal of judge where appearance of impropriety was created when judge’s agent mistakenly contacted law firms, representing parties in pending action, regarding judge’s possible future employment at those firms).
     
      
      . Notably, in advancing its argument for disqualification, plaintiff cites Special Master Beebe's “anti-trademark protection views” and “the pro-Dooney nature of the R & R” as further grounds for disqualification. 7/16/07 Letter from Michael A. Grow, Esq., defendant's counsel, to the Court at 2. The former criticism is more properly raised prior to the appointment of a special master, while the latter is merely sour grapes — in any dispute, one side loses and one side wins. This is not a ground for disqualification.
     
      
      . R & R at 568.
     
      
      . See id. at 591-98.
     
      
      . Id. at 598.
     
      
      . See id. at 599-600.
     
      
      . See id. at 600. See also Louis Vuitton, 340 F.Supp.2d at 442-45, 449-51.
     
      
      . R & R at 602. See also Louis Vuitton, 340 F.Supp.2d at 444.
     
      
      . R& Rat 603.
     
      
      
        .See id. at 603-04.
     
      
      . See id. at 604 (“Indeed, the question [of whether defendant required permission or a license from plaintiff] asked the respondents the very question that this litigation seeks to answer.”).
     
      
      . Id.
      
     
      
      . See id. at 609-11.
     
      
      . Id. at 612.
     
      
      . Louis Vuitton, 340 F.Supp.2d at 451 n. 196. See also R & R at 612-13.
     
      
      . R& Rat 639.
     
      
      . Id. at 641.
     
      
      . See id. at 643-44. See also Price v. Fox Entm’t Group, Inc., et al., 499 F.Supp.2d 382, 389 (S.D.N.Y.2007) (holding expert testimony on probative similarities between works is unnecessary where the works are not highly technical and the "jury is capable of recognizing and understanding the similarities between the works without the help of an expert”).
     
      
      . R& Rat 643.
     
      
      . See id. at 680 n. 355.
     
      
      . Id. at 646 (citing Louis Vuitton Malletier v. Dooney & Bourke, Inc., 500 F.Supp.2d 276, 282, 283 (S.D.N.Y.2007)).
     
      
      . See Louis Vuitton, 500 F.Supp.2d at 279 n. 8 (citing George Basch Co. v. Blue Coral Inc., 968 F.2d 1532, 1537 (2d Cir.1992)). See also R & R at 647.
     
      
      . R & R at 646.
     
      
      . Id. at 648.
     
      
      . Id. at 649.
     
      
      . See id. at 648-49 (“The risk that Dr. Ho-lub's opinions will usurp the jury’s decision-making on the question of likelihood of confusion substantially outweighs the attenuated probative value of similar color choice when offered only to prove intent to copy Louis Vuitton’s mark.”).
     
      
      
        .See id. at 648.
     
      
      . See id.
      
     
      
      . See id. at 646-47.
     
      
      . Id. at 646-47.
     
      
      . Id. at 649.
     
      
      . See id.
      
     
      
      . Id.
      
     
      
      . Hnot v. Willis Group Holdings Ltd., No. 01 Civ. 6558, 2007 WL 1599154, at *3 (S.D.N.Y. June 1, 2007) (citing United States v. Snype, 441 F.3d 119, 130 (2d Cir.2006)).
     
      
      . Snype, 441 F.3d at 130 (citing Zafiro v. United States, 506 U.S. 534, 540-41, 113 S.Ct. 933, 122 L.Ed.2d 317 (1993)).
     
      
      . R & R at 652.
     
      
      . Id. at 669.
     
      
      . See id. at 654.
     
      
      . See id. at 656.
     
      
      . See id. at 656.
     
      
      . See id. at 656-57. The Special Masters further noted that Dooney & Bourke's burden of connecting general expenses to the sales of the allegedly infringing items must be met at trial before the jury. See id. at 663.
     
      
      . Id. at 663.
     
      
      . Id. at 663-64.
     
      
      . See id. at 661-62 (citing Objections and Responses to Interrogatory No. 17, Further Amended Response to Dooney & Bourke’s Third Set of Interrogatories).
     
      
      . See id. at 664-66.
     
      
      . Id. at 666.
     
      
      . See Del Mem. at 16-17.
     
      
      . See R& Rat 612.
     
      
      . See Louis Vuitton, 340 F.Supp.2d at 445-46, 439 n. 124.
     
      
      . See R& Rat 612.
     
      
      . Def. Mem. at 19.
     
      
      . See id.
      
     
      
      . Id.
      
     
      
      . See id. at 21.
     
      
      . R& Rat 632.
     
      
      . Id.
      
     
      
      . See id. at 621 (“Instead, all respondents at a given mall location were either exposed to bags bearing the name sign or to bags not bearing the name sign” or to the control bag).
     
      
      . See id. at 633-34.
     
      
      . See id. at 633 ("Thus a survey that purports to approach market conditions pertinent to the substantive standard of likelihood of confusion should try to take account of the possibility of sequential viewing.”). See also Louis Vuitton, 454 F.3d at 117 (similarity of the marks to be assessed "when viewed sequentially in the context of the marketplace”).
     
      
      . See R & R at 632 ("Reitter's sampling method, insofar as it precludes within-location comparison, diminishes the reliability and probative value of the 2006 [cjonfusion [sjurvey — though it is not the kind of fundamental error that would mandate exclusion on its own”). See also id. at 632-33 ("The low number of respondents is one more factor that diminishes the reliability and probative value of the 2006 Reitter [cjonfusion [sjur-vey”); id. at 633 ("The use of the ‘Eveready’ method of presentation — at least its exclusive use^ — diminishes the reliability and probative value of the survey and correspondingly raises the risk of jury confusion and prejudice.”).
     
      
      . R& Rat 615.
     
      
      . Id. at 636 (quoting Louis Vuitton, 340 F.Supp.2d at 451).
     
      
      . Id. at 637.
     
      
      . Id.
      
     
      
      . Id. at 638.
     
      
      . See id. ("[T]he flawed presentation adds to the case for inadmissibility — so even if the misguided nature of the enterprise were not enough to exclude the [d]ilution [sjurvey, the flaw in presentation quells any doubts about its exclusion.”).
     
      
      . See id. at 639 (stating that although the failure to ask follow-up questions is a methodological flaw that does not, on its own, warrant exclusion, "it does add strength to the case for exclusion ....”).
     
      
      . Id. at 639.
     
      
      . Def. Mem. at 24.
     
      
      . Id.
      
     
      
      .R& Rat 637.
     
      
      . See also MANUAL FOR COMPLEX LITIGATION, FOURTH § 11.493 (Federal Judicial Center 2004) (setting out seven criteria). See generally REFERENCE MANUAL ON SCIENTIFIC EVIDENCE at 236-72 (Federal Judicial Center, 2d ed.2000) (discussing criteria to be considered to determine the admissibility of and weight to be accorded to survey evidence).
     
      
      . See NERA Economic Consulting, Survey of Designer Handbag Purchasers (January 12, 2007) ("Ericksen Report”).
     
      
      . In total, 316 interviews were conducted. Seven of these were excluded because they did not validate in the post-interview validation process and one was excluded because the interviewer did not follow proper interview procedure. Ericksen Report, Appendix B at 5-6.
     
      
      . Ericksen Report at 5-6.
     
      
      . Id. at 8-9.
     
      
      . Id. at 12.
     
      
      . Id., Appendix F.
     
      
      . See 2/21/07 Deposition of Eugene P. Erick-sen (“Ericksen Dep.”) at 8:23-9:5.
     
      
      . Id. at 63:18-22.
     
      
      . Id. at 9:6-11.
     
      
      . Ericksen Report at 2.
     
      
      . Ericksen Dep. at 42:4-13.
     
      
      . Ericksen Report at 10.
     
      
      . Ericksen Report, Appendix B at 1. We were unable to find anywhere in the record a definition of "upscale” or "higher than average income" as Dr. Ericksen used those terms.
     
      
      . Ericksen Report, Exhibit B. at 44:11-15.
     
      
      . Ericksen Dep. at 52:4-9.
     
      
      . Ericksen Report, Appendix B at 2.
     
      
      . Id. at 4-5.
     
      
      . See, e.g., Ericksen Report, Appendix K (Spreadsheet of Verbatim Reponses) (“Verbatim Responses”).
     
      
      . See DVDs labeled Handbag Video # 1 (“Video #1”), Handbag Video #2 (“Video # 2”), and Handbag Video # 3 ("Video # 3”) accompanying the Ericksen Report.
     
      
      . Ericksen Report at 4.
     
      
      . See Ericksen Report, Appendix C.
     
      
      . See 3/16/07 Declaration of R. Corey Worcester in Support of Dooney & Bourke, Inc's Motion to Exclude the Reports, Testimony and Opinions of Dr. Jacob Jacoby and Dr. Eugene Ericksen ("3/16/07 Worcester DecL”), Exhibit J at 9.
     
      
      . See Ericksen Report, Appx. C.
     
      
      . See 3/16/07 Worcester Decl., Exhibit K at 1.
     
      
      . See Memorandum in Support of Dooney & Bourke, Inc.'s Motion in Limine to Exclude the Reports, Testimony and Opinions of Dr. Jacob Jacoby and Dr. Eugene Ericksen at 9-10.
     
      
      . In one of the patches, the "C" may have been printed in two colors, though it is difficult to tell from Video # 3 or from the exhibits submitted by the parties.
     
      
      . See 3/16/07 Worcester Decl. Exhibits D and E; 4/6/07 Declaration of R. Corey Worcester in Support of Dooney & Bourke, Inc.’s Motion in Limine to Exclude the Reports, Testimony and Opinions of Dr. Jacob Jacoby and Dr. Eugene Ericksen ("4/6/07 Worcester Decl.”), Exhibit 6.
     
      
      . See 3/16/07 Worcester Decl., Exhibit E at 3-5; 4/6/07 Worcester Deck, Exhibit 6.
     
      
      . See 3/16/07 Worcester Deck, Exhibit E; Declaration of John Lund ("Lund Deck”), Exhibit 1 at 2, and ¶ 3.
     
      
      . Lund Decl., ¶ 3.
     
      
      . Id. at ¶ 2.
     
      
      . Verbatim Responses, Resp. No. 230 ("[T]he colors were not clear.”); Resp. No. 248 ("I couldn't see the initial on the bag.”); Resp. No. 278("I didn’t see any letters or anything just patterns that I didn’t recognize”); Resp. No. 300("It looked like Louis Vuitton but you can’t really see the initials.”); Resp. No 431 the design, it was hard to decipher though.”). The Ericksen Survey’s record of the respondent’s verbatim responses was replete with spelling and grammatical errors. For ease of comprehension, we have corrected many of them in our quotations from the Verbatim Responses.
     
      
      . Id., Resp. No. 219 (shown Video # 2, "because the logo on the handbag says D and B”); Resp. No. 236 (shown Video # 1, "because I saw the D B logo on the purse”); Resp. No. 247 (shown Video # 1, “the D and B signs and the heart chain that hangs in the back”); Resp. No. 260 (shown Video # 1, "the design of the little D’s on the bag”); Resp. No. 271 (shown Video # 1 "the letters D and B and the coloring”); Resp. No. 281 (shown Video # 1, "because I can see the D B”); Resp. No. 286(shown Video # 2, "the signature on the bag was DB”).
     
      
      . Ericksen Dep. at 192:10-16("[W]e don’t know whether they saw the design then they identified the letter because they knew they would have to be there as part of the design. Or whether they read the letters first then knew what it was. We don’t really know which is the horse and which is the cart in that situation.”).
     
      
      . See DVD labeled “Hand Bags” accompanying the Ericksen Report.
     
      
      . Ericksen Dep. at 67:7-11.
     
      
      . See, e.g., Verbatim Responses, Resp. No. 126 ("the C on it”); Resp. No. 132 ("the C's and the designs”); Resp. No. 172 ("because there were C’s all over it”); Resp. No. 193 ("the patchwork. The C’s, the logos”); Resp. No. 212 ("the signature C’s”) (spelling corrected).
     
      
      . See, e.g., Verbatim Responses, Resp. No. 363("I’ve seen the bag in the store, in the Coach store”); Resp. No. 390 ("a Coach patchwork bag that is currently being sold in the stores”); Resp. No. 482 ("I have seen it in the stores a bunch and I have the catalog at home”) (spelling corrected).
     
      
      . See Verbatim Responses, Resp. No. 478("I just saw it in the Coach store 20 minutes ago”).
     
      
      . See id., Resp. No. 248 ("A lady in a white coat with fur with a Louis Vuitton or a Gucci bag. I couldn’t see the initials on the bag.”).
     
      
      . See id., Resp. No. 300 (“It looked like Louis Vuitton but you can’t really see the initials”); Resp. No. 455 (“I don't know, maybe Louis Vuitton”).
     
      
      . See id., Resp. No. 406 ("Maybe Louis Vuitton”).
     
      
      . See id., Resp. No. 248 (“I've looked at their bags and I liked them. Actually I’ve bought a Louis Vuitton earlier this year. (P) I think on the bag there was a gold tag on the bag that shows it is not a knock-off. I also thought I saw the lettering but I'm not sure.”); Resp. No. 268 ("the writings on the bag looks almost like the Louis Vuitton writings on his products (P) n/e” ("(P)" indicates that the respondent was prompted to elaborate)); Resp. No. 300("(P) the colors (?) the initials (P) the design looks a lot like a Louis Vuitton one I own”); Resp. No. 304 ("the initials on the bag. It was the Louis logo.”); Resp. No. 316 (“The little symbols reminded me of Louis Vuitton. (P) All the different colors reminded me of a Louis Vuitton pattern.”); Resp. No. 409 ("because it is the VL. It kind of looked like it from the distance.”); Resp. No. 422 ("It has the print.”); Resp. No. 455 ("They have that white purse with their letters all over it in colors.”); Resp. No. 465 ("The letters and the shape of the bag.”).
     
      
      . Id.., Resp. No. 228 ("because of the way it looks”); Resp. No. 249 (“I seen that [sic] a bag like that with colors”); Resp. No. 311 (“the color and the pattern”); Resp. No. 318 ("the design of its looked like something I saw Paris Hilton carrying”); Resp. No. 332 ("because of the little labels and the design of the bag made it look like that was the designer”); Resp. No. 351 ("because the bag looks similar to one of their bags”); Resp. No. 371 ("the straps that are on the bag”); Resp. No. 429 ("because that is one of their signature designs”).
     
      
      . See id., Resp. No. 224 ("because it looks like a designer handbag and the symbols on the bag made it look like an expensive hand bag”); Resp. No. 401 ("It looks like a Louis Vuitton bag.”); Resp. No. 406 ("well really LVMH, that what [sic] they belong to. They had it first and the bag became very popular and then you saw that Dooney and Bourke had it after.”); Resp. No. 486 ("It’s a Louis Vuitton Papillon, anyone can tell that.”)
     
      
      . Id., Resp. No. 182 ("Possibly Louis Vuitton, but it didn’t have their logo. I'm not sure.”).
     
      
      . See id., Resp. No. 182 ("because they are known for their logo with the initials and the brown and gold logo”); Resp. No. 213 ("because of the print”); Resp. No. 308 ("Patterns-Louis”); Resp. No. 312(“(P) the lettering. (P) A satchel over the shoulder tote. (P) It was a multicolor pattern all over the bag.”); Resp. No. 434("The little Louis Vuitton patterns on the bag look like the trademark for Louis Vuitton”); Resp. No. 472 ("I have one and know what it looks like. The pattern is has LV all over it.”); Resp. No. 469 (“I've seen the white version of that bag before. (P) The Louis Vuitton patterns are the same on this bag except that the bag is black.”).
     
      
      . See id., Resp. No. 128 ("because I have seen the design before”); Resp. No. 175 (“because of the design”); Resp. No. 267 ("That’s what it looked like. (P) The handles and the tag hanging and the flat base bottom.”); Resp. No. 272 ("the style of the bag, (P) the color of the straps”); Resp. No. 277 ("That’s their design. (P) I think it’s theirs. (P) That’s what they put on their stuff.”); Resp. No. 340 ("The pattern, how it’s colored. (P) That’s all.”); Resp. No. 428 ("Because of the colors and structure of it I guess. (P) It is colorful and that is how Louis Vuitton makes theirs.”); Resp. No. 247 ("They had the multicolored design first.”).
     
      
      . See id., Resp. No. 428 ("Because of the colors and structure of it I guess. (P) It is colorful and that is how Louis Vuitton makes theirs.”);
     
      
      . See id., Resp. No. 150 ("because I have one”); Resp. No. 206 (“from a first glance that's what it looked like”); Resp. No. 346 ("nothing”); Resp. No. 399 ("It looks like a Louis Vuitton bag.”); Resp. No. 417 ("I’m familiar with Louis Vuitton.”); Resp. No. 419 ("The material, it looked like Louis Vuitton and it looked like I have seen that bag before.”); Resp. No. 405 ("because some people will think they are buying those bags”); Resp. No. 400 ("The[y] make a line of stuff. He makes something like that").
     
      
      . See id., Resp. No. 370. See also Erieksen Report at 9.
     
      
      . See id., Resp. No. 427 ("I don't know if it is Coach maybe.”)
     
      
      . See id., Resp. No. 134 (“I do not know.”); Resp. No. 143 ("I have no idea.”); Resp. No. 168 ("I'm going to say Gucci. But I don't really know.”); Resp. No. 183 ("I don't know. I would say (patchwork). I[t] didn’t look like it was leather, it can't be Dooney, can’t be Coach. It might be Kors, Michael Kors. It has a lot of patchwork.”); Resp. No. 201 ("no idea”); Resp. No. 215 ("Maybe Guess, I don't know who makes it.”); Resp. No. 266 ("I don't know”); Resp. No. 269 ("I really don't know.”); Resp. No. 273 (“I have no idea.”); Resp. No. 276 ("I don't know.”); Resp. No. 301 ("maybe Gucci”); Resp. No. 319 ("It wasn't Oscar de la Renta. I don't know.”); Resp. No. 323 ("could possibly be a Dooney and Bourke”); Resp. No. 341 ("I don’t know.”); Resp. No. 423 (“no clue ... I don’t know.”)..
     
      
      . Erieksen Report at 8.
     
      
      . We were unable to determine from Dr. Ericksen’s report or testimony what level of statistical significance he was using.
     
      
      . Erieksen Report at 10.
     
      
      . See 3/16/07 Worcester Decl. at 8(Table 3 entitled "Responses Identifying Handbag as Louis Vuitton and Considered Evidence of Dilution Only”).
     
      
      . Verbatim Responses, Resp. No. 431.
     
      
      . Ericksen Report at 10.
     
      
      . See Verbatim Responses, Resp. No. 138 (“They have the same patterns on their bags and their symbols are similar.”); Resp. No. 334 ("because they have a white leather bag that has the different color LVs on it”).
     
      
      . See id.., Resp. No. 152 (“The colors are similar.”); Resp. No. 220 (“because they make the same style”); Resp. No. 251 (“They have the similar style by [sic] the Louis Vuitton have a cleaner white.”); Resp. No. 278 ("By the style of it. (P) The shape of it. The way it looked.”); Resp. No. 295 ("I’ve seen something similar that they make. It looks very similar.”); Resp. No. 347 ("because they make one similar to that”); Resp. No. 439 ("Because of the shape of the bag.”); Resp. No. 480 ("The colors and the color of the bag are very much alike.”).
     
      
      . Id., Resp. No. 192 ("because I just know it.”); Resp. No. 203 (“because that is the one that I know for sure”); Resp. No. 403 ("only flashy brand that I could think of off hand.”).
     
      
      . Id., Resp. No. 366 ("I like that was Louis Vuitton, but it looked fake.”)
     
      
      . Id., Resp. No. 438 ("I saw a knock off of a purse.” (responding to Question 2)).
     
      
      . See id., Resp. No. 133 ("because of the letters being on the bag”); Resp. No. 322 ("A lot of them have leather purses and both of them have the 2 letter symbols.”)
     
      
      . See id., Resp. No. 254 (“It looked like a Louis Vuitton bag.”); Resp. No. 367 (“They look a lot alike, but Dooney and Bourke have the little heart.”); Resp. No. 372 (“design and price”); Resp. No. 443 ("It has the same style.”); Resp. No. 451 (“because of the style of the handbag”).
     
      
      . See Ericksen Dep. at 278:11-279:5, in Worcester April 6 Deck, Exhibit 5.
     
      
      . Ericksen Report at 11.
     
      
      
        .Id., at 11.
     
      
      . See, e.g., Transcript of the February 12, 2007 Hearing Before the Honorable Shira A. Scheindlin at 55:20-21 ("You can’t protect the look. We’ve gone over and over this.”).
     
      
      . Memorandum of Louis Vuitton Malletier in Opposition to Dooney & Bourke, Inc.'s Motion in Limine to Exclude the Report, Testimony and Opinions of Dr. Eugene Ericksen and Dr. Jacob Jacoby at 14.
     
      
      . Verbatim Responses, Resp. No. 403 (“only flashy brand that I could think of off hand.'').
     
      
      . While we find that the Ericksen survey should be excluded for its many methodological flaws and fundamental lack of focus, we do reject out of hand Dooney & Bourke’s assertion that the survey was flawed by filming the model with a so-called expensive fur coat, thus raising an inference that she was carrying a Louis Vuitton luxury bag. We have viewed the film and can state confidently that the coat does not appear luxurious at all. It has a faux fur trim and is unremarkable in every respect. Moreover, the model walks back and forth in front of a cinderblock wall — this hardly exudes an aura of luxury.
     
      
      .Dr. Jacob Jacoby is the Merchant’s Council Professor of Consumer Behavior and Retail Management at New York University's Leonard N. Stern Graduate School of Business. He is responsible for numerous, highly significant publications in the area of consumer behavior. The parties do not dispute his qualifications to conduct any of the surveys discussed here. We note that we consider Dr. Jacoby's stellar qualifications as circumstantial evidence of the reliability of his methods. See, e.g., United States v. Downing, 753 F.2d 1224, 1239 (3d Cir.1985) (Becker, J.) ("The qualifications and professional stature of expert witnesses ... may also constitute circumstantial evidence of the reliability of the technique.”). As will be seen, however, the fundamental flaws in Dr. Jacoby's methodology and execution in this case more than counter any circumstantial evidence provided by his qualifications.
     
      
      . See Expert Report of Jacob Jacoby, Ph.D., on Likelihood of Confusion, April 2004 ("Ja-coby Confusion Survey Report”).
     
      
      . See Expert Report of Jacob Jacoby, Ph.D., on Dilution, April 2004 ("Jacoby Dilution Survey Report”).
     
      
      . See Jacoby Confusion Survey Report at 4.
     
      
      . See id. at 13-18.
     
      
      . Id. at 24-25.
     
      
      . Jacoby Dilution Survey Report at 6.
     
      
      . Id. at 5-6.
     
      
      . Id. at 4-5.
     
      
      . Id., at 8-9.
     
      
      . Id. at 24.
     
      
      . For example, Appendix J of the report consists of a tabulation, by respondent number, of how the 109 respondents were classified. See Jacoby Survey Confusion Report, Appendix J. The table reports that respondents were confused upon being exposed to Bags 33, 34, 35 and 36, yet the report itself never mentions bags 35 and 36.
     
      
      . In making this recommendation, we emphasize again that we take due note of the fact that Dr. Jacoby is a highly qualified expert in the field. See Ambrosini v. Labar-raque, 101 F.3d 129, 140 (D.C.Cir.1996) (witness's strong credential provided "circumstantial evidence” that he employed a scientifically valid methodology). But the jumbled nature of the Jacoby Confusion Survey, and the disconnect between the report of the survey and its disjointed implementation, far outweigh any inference derived from Dr. Jacoby's qualifications.
     
      
      .Jacoby Confusion Survey Report at 4.
     
      
      . Id. at 5.
     
      
      . Jacoby Confusion Survey Report, Appendix I, Resp. 1112.
     
      
      . Id., Resp. 2119.
     
      
      . Id., Resp. 3124.
     
      
      . Jacoby Dilution Survey Report at 9-10.
     
      
      . Jacoby Confusion Survey Report at 9.
     
      
      . Jacoby Dilution Survey Report at 10.
     
      
      . Id. at 11.
     
      
      . Id. at 18.
     
      
      . As noted in our review of Dr. Ericksen, the instruction not to guess establishes a protection that cuts in favor of the reliability of the survey.
     
      
      . Id. at 20.
     
      
      . Id. at 24.
     
      
      . Jacoby Confusion Survey Report, Appendix I, Resp. No. 1108.
     
      
      . Id., Resp. No. 1110.
     
      
      . Id., Resp. No. 1111.
     
      
      . Id., Resp. No. 2103.
     
      
      . Id., Resp. No. 3126.
     
      
      . Id., Resp. No. 2110.
     
      
      . Jacoby Dilution Survey Report at 23.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id. at 26.
     
      
      . Id.
      
     
      
      . Id. at 25.
     
      
      . See 7/2/04 Affidavit of Robert N. Reitter ("7/2/04 Reitter Affid.”), Exhibit A, Guideline Associates, Study of the Likelihood of Source Confusion Between Dooney & Bourke and Louis Vuitton Created by Dooney & Bourke's Use of Color on Its Monogram Patterned Handbags (July 2004) ("2004 Reitter Confusion Survey Report”).
     
      
      . See 7/2/04 Reitter Affid., Exhibit B, Guideline Associates, Study of the Extent of Consumers' Recognition of Louis Vuitton as the Source of Its Toile Monogram Pattern, When Presented in Multicolor and When Presented in Black and White (July, 2004) ("2004 Reitter Recognition Survey Report”).
     
      
      . See 3/16/07 Declaration of Alison Arden Besunder ("3/16/07 Besunder Decl."), Exhibit A, Guideline, Study of the Likelihood of Source Confusion Between Dooney & Bourke and Louis Vuitton Created by Dooney & Bourke's Use of Color on Its Monogram Patterned Handbags (Jan.2007) (“2006 Reitter Confusion Survey Report”).
     
      
      . See 3/16/07 Besunder Decl., Exhibit B, Guideline, Study to Determine Whether or Not Dilution Has Occurred With Respect to Louis Vuitton’s Multicolore Monogram Pattern (Jan.2007) ("2006 Reitter Dilution Survey Report”).
     
      
      . 2004 Reitter Confusion Survey Report at 5.
     
      
      . Id. at 13.
     
      
      . Id. at 26.
     
      
      . 2006 Reitter Confusion Survey Report at 2 (stating that the purpose of the survey was "to repeat the [2004 Reitter Confusion Survey] — but in a manner that addressed the [Court’s] criticisms of that Study”). See also 2/28/07 Reitter Deposition at 89:9-14 ("[TJhe mandate that I had from my client was not let's measure a confusion [sic], likelihood of confusion today. It was let's conduct studies to determine whether the criticisms that the court had were valid and to what extent they were valid”).
     
      
      
        .See Memorandum in Opposition to Louis Vuitton’s Motion to Exclude the Expert Opinions, Testimony & Surveys of Robert N. Reit-ter at 3 ("These changes [in the marketplace] do indeed, as [Louis Vuitton] argues, result in the 'inability to test’ in 2006 the level of confusion in 2004/ and make a late 2006 study by [Louis Vuitton] or [Dooney & Bourke] ‘irrelevant and unreliable’ as direct evidence of the likelihood of confusion in 2003, 2004, 2005, or 2006.” (quoting Expert Report of Dr. Itamar Simonson at ¶ 36, attached as Exhibit A to the 3/30/07 Declaration of R. Corey Worcester)).
     
      
      . For convenience we will refer from time to time to the two Reitter surveys addressed to the Dooney & Bourke nametag as the "2006 Name Sign Surveys.”
     
      
      . Id. at 3.
     
      
      . 2006 Reitter Dilution Report at 5-6.
     
      
      . Id., Appendix B at 4.
     
      
      . Id. at 7.
     
      
      . Id.
      
     
      
      . We note that at the 2004 preliminary injunction hearing, Dooney & Bourke offered a survey and testimony of Dr. Yoram Wind to prove lack of dilution. Judge Scheindlin held that the Wind survey lacked probative value, because the method he used (placing Louis Vuitton bags in an array of other multicolor bags) “has never been accepted or endorsed by any court in the context of trademark dilution.” Louis Vuitton I, 340 F.Supp.2d at 451. Dooney & Bourke apparently is not intending to use Dr. Wind’s survey or testimony for the trial and no motion in limine has been brought to exclude it. We therefore find no reason to review the Wind Report for admissibility.
     
      
      . See Reitter 2004 Confusion Survey Report, Appendix A.
     
      
      . Louis Vuitton’s Memorandum in Support of its Motion to Exclude Defendant Dooney & Bourke's Proposed Expert Opinions, Testimony & Surveys of Robert N. Reitter ("Memo in Support (Reitter)”) at 7.
     
      
      . 2004 Reitter Confusion Survey Report at 6.
     
      
      . See id. at 9.
     
      
      . Id. at 16.
     
      
      . Id. at 5. Photographs of the bags are provided in Appendix F of the 2004 Reitter Confusion Survey Report.
     
      
      . Id.
      
     
      
      . See id., Appendix F at 5.
     
      
      . 2004 Reitter Confusion Survey Report at 9.
     
      
      . Id.
      
     
      
      . See id., Appendix F at 7.
     
      
      . Sur-Reply Response in Support of Louis Vuitton's Motion to Exclude Defendant Doo-ney & Bourke’s Proposed Expert Opinions, Testimony, and Surveys of Robert N. Reitter ("Sur-Reply (Reitter)”) at 2-3.
     
      
      . See Transcript of May 28, 2004 Hearing Before the Honorable Andrew J. Peck ("Peck Transcript”) at 11:6-10, attached as Exhibit B to the Declaration of Alison Arden Besunder, April 13, 2007.
     
      
      . See Peck Transcript at 11:11-15.
     
      
      . See Sur-Reply (Reitter) at 2-3.
     
      
      . 2004 Reitter Confusion Survey Report at 12.
     
      
      . See, e.g., id., Appendix B at 4.
     
      
      . See, e.g., id., Appendix B at 17 ("When respondent indicates she is finished examining the bags, place the heart shaped zipper pull on each bag so that the name Dooney & Bourke faces against the bags, and the plain shiny side faces out, and place the bags about 5 feet away from the respondent.”).
     
      
      . See, e.g., id., Appendix B at 4-7.
     
      
      . See id.
      
     
      
      . Because Reitter included "style” in this classification scheme, it appears that he may have coded as confused respondents who named Louis Vuitton for reasons, at least in part if not in whole, unrelated to the Dooney & Bourke Multicolor Monogram Mark. The 2004 Reitter Confusion Survey Report refers to various tables in one of its appendices, but these tables do not provide the verbatim responses of each particular respondent by respondent number. Instead, the tables tabulate generally how many respondents gave a given reason for their answer. One respondent could thus be counted under several different reasons. It is therefore not apparent from these tables how many respondents named Louis Vuitton for reasons entirely unrelated to the Dooney & Bourke Multicolor Monogram Mark. But any miscategorization could only operate to the benefit of Louis Vuitton.
     
      
      . See id., Appendix E, Table 4.
     
      
      . See id.., Appendix E, Table 11.
     
      
      . See id., Appendix E, Table 18.
     
      
      . Id. at 21.
     
      
      . See id., Appendix E, Table 4.
     
      
      . See id., Appendix E, Table 11.
     
      
      . See id., Appendix E, Table 18.
     
      
      . Id. at 21.
     
      
      . Id. at 25
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id. at 26.
     
      
      . Id. at 7.
     
      
      . Id. at 9.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . 2/28/07 Reitter Deposition at 240:6-15 (“I didn’t want the same, interviewing service to be doing a study where they were carefully instructed to do one thing with those hang-tags and then do another study where the instruction was changed because I was afraid that they would have gotten used to doing it in the first way, for example, and be still doing that when they were supposed to be doing the revision phase [i.e., the 2006 No Name Sign Survey], So it seemed cleaner to me to separate the locations.").
     
      
      . Id. at 240:16-24 ("And then the reason that the brown and blue control bags were tested in yet another set of eight markets was that originally this entire study had a very short time line attached to it. The deadline was lifted at some point, but originally we couldn’t have finished the study in time if we had done the [2006 Control Survey] in the same markets where the [2006 Name Sign Survey] or the [2006 No Name Sign Survey] was being done.”).
     
      
      . Id.
      
     
      
      . See 2006 Reitter Confusion Survey Report at 8-9.
     
      
      . Robert N. Reitter Deposition Transcript at 248:8-15, 254:19-263:25, attached as Exhibit C to the Declaration of Alison Arden Besunder, March 16, 2007. We note that one of us (for reasons unnecessary to develop) has personally visited a number of the malls on Reiter’s list, and can attest to the fact that several of them are not “upscale” within any reasonable definition of that term.
     
      
      . 2006 Reitter Confusion Survey Report at 14.
     
      
      . Id. at 6. See also id., Appendix F.
     
      
      . Compare 2004 Reitter Confusion Survey Report, Appendix F, to 2006 Reitter Confusion Survey Report, Appendix F.
     
      
      . See 2006 Reitter Confusion Survey Report, Appendix F at 10.
     
      
      . See Declaration of R. Corey Worcester in Opposition to Louis Vuitton’s Motion to Exclude the Expert Opinions, Testimony & Surveys of Robert N. Reitter, Exhibits G & H.
     
      
      . See 2006 Reitter Confusion Survey Report at 11-12.
     
      
      . See id.
      
     
      
      . See z'd. at 16-20.
     
      
      . Again, the tables given in the appendix to the 2006 Reitter Confusion Survey are no help, as they do not give the respondents’ verbatim responses by respondent number. See id., Appendix E.
     
      
      . id. at 16-18.
     
      
      . Id. at 18.
     
      
      . Id. at 16-18.
     
      
      . Id. at 18.
     
      
      . Id. at 3. Reitter does not define what level of statistical significance he is using.
     
      
      . Id. at 15.
     
      
      
        .Id. at 3.
     
      
      . Id. at 19-20.
     
      
      . Id.
      
     
      
      . Id. at 20-21.
     
      
      . Id. at 3.
     
      
      . Id.
      
     
      
      . Id. at 21. Reitter does not provide information on when a percentage would become of “material significance.”
     
      
      . 2006 Reitter Dilution Survey Report at 9.
     
      
      . Id. at 10.
     
      
      . Id. at 11.
     
      
      . Compare the list of malls given in the 2006 Reitter Confusion Survey Report at 9-10, to the list of malls given in the 2006 Reitter Dilution Survey Report at 11-12.
     
      
      . 2006 Reitter Dilution Survey Report at 17.
     
      
      . Id. at 6.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id. at 14.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Id. at 3.
     
      
      . We have not been asked to rule on the admissibility of Simonson’s expert report, as Dooney & Bourlce has made no motion to exclude it. Simonson's report is essentially an extensive critique of Reitter’s studies; he did not himself conduct a survey. Simonson is eminently qualified to analyze survey techniques and we will rely from time to time on his observations.
     
      
      .2006 Reitter Dilution Survey Report at 3.
     
      
      . Id.
      
     
      
      . Id. at 4.
     
      
      . Memorandum in Opposition to Louis Vuitton's Motion to Exclude the Expert Opinions, Testimony & Surveys of Robert N. Reit-ter at 5.
     
      
      . 2006 Reitter Dilution Survey Report at 5.
     
      
      . Id. at 4.
     
      
      . Id. at 5.
     
      
      . Id.
      
     
      
      . Id. at 18.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      . Louis Vuitton's Memorandum in Support of Its Motion to Exclude Defendant Dooney & Bourke's Proposed Expert Opinions, Testimony & Surveys of Robert N. Reitter at 7.
     
      
      . In its reply brief, Louis Vuitton also introduces the argument that Reitter’s "surveys were heavily weighted towards the Northeast," but offers no explanation for how this would skew the results of the 2006 Reitter Confusion Survey. Reply (Reitter) at 3, 5. The 2006 Reitter Confusion Survey's screening quotas mirrored the population distribution as measured by the U.S. Census. See 2/28/07 Reitter Deposition at 364:25-365:6. It may be true that the actual participants in the survey did not mirror this distribution, but the distribution of purchasers of luxury handbags in general may also fail to mirror the U.S. Census distribution. See 2/2Í/07 Reitter Deposition at 356:24-357:7 ("What I am trying to say is that the effort to collect respondents was identical in all four regions. The same number of people were asked 'do you buy handbags worth this much money’ and we had more productive screening in the northeast than we did in the south. That’s why we have more respondents in the northeast.”).
     
      
      . See Reply (Reitter) at 5.
     
      
      . We also note that the 2006 Reitter Dilution Survey set a substantially higher bar for its universe (past or future purchasers of "designer handbags” costing more than $350) than did the confusion survey.
     
      
      . With respect to the universe used in the 2006 Reitter Confusion Survey, Louis Vuitton also argues that the survey was flawed because it failed to "measure the responses of both Louis Vuitton and Dooney customers.” See Memo in Support (Reitter) at 11. This is not true. Reitter screened for respondents who were likely to buy a handbag costing in excess of $100, regardless of who manufactured that handbag. Reitter’s universe included, therefore, potential Louis Vuitton customers.
     
      
      .Judge Scheindlin cites National Football League Props., Inc. v. ProStyle, Inc., 57 F.Supp.2d 665 at 666-67 (E.D.Wisc.1999); and Novo Nordisk of N. Am., Inc. v. Eli Lilly & Co., 1996 U.S. Dist. LEXIS 12807, No. 96 Civ. 5787, 1996 WL 497018, at *6 nn. 24, 26 (S.D.N.Y. Aug.30, 1996). She also cites as but see Schieffelin & Co. v. Jack Co. of Boca, Inc., 850 F.Supp. 232, 247 (S.D.N.Y.1994).
     
      
      . Memo in Support (Reitter) at 15.
     
      
      . 2/28/07 Reitter Deposition at 89:11-14. Louis Vuitton's broader argument that the 2006 Survey was invalid because it was result-oriented is frivolous. All surveys prepared for purposes of litigation by paid experts are to some extent result-oriented. But if that were enough to exclude surveys, then surveys would never be admissible in infringement cases.
     
      
      . 2004 Reitter Recognition Survey Report at 5.
     
      
      . See 2006 Reitter Dilution Survey Report at 19.
     
      
      . Expert Report on the Use of Color in Louis Vuitton and Dooney & Bourke’s Multicolored Monogram Design Patterns, ("Holub Report”) at 1.
     
      
      . Id.
      
     
      
      . Id. at 12.
     
      
      . Id.
      
     
      
      . Id. at 13-14.
     
      
      . Id. at 14.
     
      
      . Id. at 16.
     
      
      
        .Id. at 7. Dr. Holub states that CIELAB has "several desirable properties for representing colors, including: 1) It is approximately uniform, perceptually, meaning that a unit color difference (or distance between two colors) in one region of the space (e.g., 'yellow') is similar, perceptually, to a unit color difference in another region (e.g., ‘blue’), and b) It separates the chromatic and non-chromatic variables of a color stimulus, such as that the L* variable encodes the relative lightness/darkness of a color while the a* and b* variables encode the chromatic properties.” Id.
      
     
      
      . Id. at 19.
     
      
      . Id. at 22.
     
      
      . Id. at 21.
     
      
      . Id. at 25.
     
      
      . Id. at 26.
     
      
      . Id. at 29.
     
      
      . Id. at 29.
     
      
      . Id.
      
     
      
      . Memorandum in Support of Dooney & Bourke, Inc.'s Motion in Limine to Exclude the Reports, Testimony, and Opinions of Richard A. Holub, at 1.
     
      
      . Id. at 2.
     
      
      . Id.
      
     
      
      .Memorandum of Louis Vuitton Malletier in Opposition to Dooney & Bourke, Inc.’s Motion in Limine to Exclude the Report, Testimony and Opinions of Richard A Holub (“Memo in Opposition (Holub)”) at 16 n. 11.
     
      
      . 2/23/07 Holub Deposition at 34.
     
      
      . Holub Report at 26.
     
      
      . Holub Report at 13.
     
      
      . We defer a discussion on the reliability of Dr. Holub’s methods for determining color usage and related questions, as it is best discussed in the section on proof of intent.
     
      
      . The court cited its previous opinion in Shaw v. Lindheim, 919 F.2d 1353, 1356 (9th Cir.1990), a copyright infringement case, in which it found that expert testimony was not appropriate for determining whether a reasonable person would find similarity between artistic works.
     
      
      . Memo in Opposition (Holub) at 12.
     
      
      . Some of the decisions cited can be dismissed out of hand. Louis Vuitton cites a random quote from Master Distribs., Inc. v. Pako Corp., 986 F.2d 219, 224 (8th Cir.1993), to the effect that "expert witnesses are available to testify regarding the similarity of the colors at issue.” In Master Distributors the trial judge dismissed the infringement action as a matter of law on the ground that it was not possible to trademark a color. The court of appeals rejected the trial court’s legal ruling. No expert was presented in the case, and the court’s statement that experts might be permitted to testify to the similarity of a color is (1) dictum and (2) distinguishable from a situation in which the color itself is not the mark. Louis Vuitton lifts a quote from In re Zeidler, 682 F.2d 961, 966 (C.C.P.A.1982), to the effect that an expert's evaluation of color is entitled to more weight than that of a layman. But Louis Vuitton does not mention that Zeidler involved a patent claim, and the court’s holding was that the Patent and Trademark Court of Appeals could not substitute its own judgment for that of an expert. Zeidler has no applicability whatsoever to a jury trial on confusing similarity. Louis Vuitton also cites Mataco Leaf AB v. Promotion in Motion, Inc., 287 F.Supp.2d 355, 371-72 (S.D.N.Y.2003), but in that case the court in fact rejected the plaintiff's linguistics expert because his testimony on linguistic similarity of the marks had no bearing on how consumers would "actually hear and view” the marks.
     
      
      . Sherrell was a summary judgment decision and did not involve a jury trial in any event.
     
      
      . The other preliminary injunction cases relied upon by Louis Vuitton are Primcot Fabrics, Dep't of Prismatic Fabrics v. Kleinfab Corp., 368 F.Supp. 482 (S.D.N.Y.1974); Am. Assoc. For the Advancement of Science v. Hearst Corp., 498 F.Supp. 244 (D.D.C.1980); and Nikon, Inc., v. Ikon Corp., 803 F.Supp. 910, 916-17 (S.D.N.Y. 1992).
     
      
      . As stated above, we find that Dr. Holub is qualified to provide an opinion limited to the use of colors.
     
      
      . Holub Report at 14.
     
      
      . We note that Louis Vuitton argues in its brief that Dooney & Bourke received Dr. Ho-lub’s raw data and did not calculate a rate of error, and so Dr. Holub's testimony cannot be challenged on this ground. Memo in Opposition (Holub) at 22. This argument misplaces the burden. It is Louis Vuitton that must prove the reliability of Dr. Holub’s methods by a preponderance of the evidence, and under Daubert, assessment of the rate of error is a factor that is pertinent to (though not dis-positive of) the reliability of an expert’s methods.
     
      
      . Dooney & Bourke argues that Dr. Ho-lub’s testimony must be excluded because he is only testifying to the use of colors and has no opinion about other aspects of the mark. That argument is rejected. Nothing in Rule 702 requires an expert to testify about every aspect of a case. Dooney & Bourke’s objection on this ground goes to sufficiency and not reliability.
     
      
      . Trademark Infringement and Trademark Dilution Damages, Expert Report, CONSOR Intellectual Asset Management, Feb. 9, 2007 ("Anson Report") at 1.
     
      
      . Id.
      
     
      
      . Id., Appendix C.
     
      
      . Id. at 2.
     
      
      
        .Id.
      
     
      
      . Id. at 4.
     
      
      . Id. at 5.
     
      
      . Id. at 6.
     
      
      . Id. at 8.
     
      
      . Id.
      
     
      
      . Id. at 8-9.
     
      
      . Id. at 10.
     
      
      . Id.
      
     
      
      . Id. at 10 and Appendix D.
     
      
      . Id. at 12. He makes this conclusion in part upon a fact that is not disputed in the case, i.e., that Dooney & Bourke does virtually all of its business in the United States.
     
      
      . See 3/5/07 Anson Deposition at 120-121.
     
      
      . Anson Report at 7.
     
      
      . Id. at 9.
     
      
      . Id.
      
     
      
      . Id.
      
     
      
      
        .Id. at 9-10.
     
      
      . Memorandum of Louis Vuitton Malletier in Opposition to Dooney & Bourke, Inc.’s Motion in Limine to Exclude the Report, Testimony and Opinions of Mr. Weston Anson ("Memo in Opposition (Anson)”) at 11.
     
      
      . See Letter from Theodore Max, Esq. to Hon. Shira A. Scheindlin dated February 27, 2007 (referring to Anson as "Louis Vuitton’s damages expert”); Letter from Charles Le-Grand, Esq. to Darin McAtee, Esq., dated February 9, 2007 ("Enclosed please find the report of Louis Vuitton Malletier’s damages expert, Consor.”).
     
      
      . Specifically we recommend striking the following: page 2, runover paragraph and first full paragraph; page 3, paragraphs with headings A and B, and bottom paragraph running over to page 4; page 4 in its entirety; page 5 in its entirety; page 6, first half; page 7 in its entirety; page 9, bottom half; and page 10 runover paragraph.
     
      
      .The difference between the two methods is explained by Judge Knapp in Warner Bros. v. Gay Toys, Inc., 598 F.Supp. 424, 428 n. 2 (S.D.N.Y.1984):
      For example, if defendant had bought new machinery to produce the infringing toys, this would be deductible under either approach. However, if it had used the same machinery to produce the infringing items that had been used to produce nonin-fringing items, the costs of operating and maintaining such machinery would be deductible under the full absorption approach (to the extent that the machinery was used for the infringing items) [but] under the incremental approach, no deduction at all would be allowed for such costs.
     
      
      . 3/5/07 Anson Deposition at 238.
     
      
      . As to the permissibility of reducing the defendant’s profit by a proportionate amount of taxes, the New Line court found a split of authority. The cases cited are L.P. Larson, Jr., Co. v. William Wrigley, Jr., Co., 277 U.S. 97, 99-100, 48 S.Ct. 449, 72 L.Ed. 800 (1928) (rejecting deduction of taxes for a willful in-fringer but holding that deductibility depends on the circumstances); W.E. Bassett Co. v. Revlon, Inc., 435 F.2d 656, 665 (2d Cir.1970) (allowing defendants to deduct income tax payments despite willful infringement); In Design v. K-Mart Apparel Corp., 13 F.3d 559, 566-567 (2d Cir.1994) (holding non-willful in-fringer was entitled to deduct taxes paid on its "innocently-acquired unlawful profits”). Under this case law, deductibility of the in-fringer’s taxes from its profits on the infringing good depends at least in part on whether the infringement was intentional. Dooney & Bourke's intent to infringe (assuming there is infringement at all) is obviously a fact that will be determined by the jury.
     
      
      .The only one of the claimed deductions that the defendant was not allowed was an expense incurred for overlabelling. The W.E. Bassett court found that the defendant "should have to bear the cost of correcting its own wrongdoing.” 435 F.2d at 665.
     
      
      . This stated distinction is not followed with any explanation and we see no basis in the case law for such a distinction when it comes to determining the net profits on infringing sales.
     
      
      . We agree with Louis Vuitton's contention that the fees expended by Dooney & Bourke in this action cannot be deducted from its profits. New Line Cinema, 161 F.Supp.2d at 304..
     
      
      .Of course it is possible that the parties might agree that Anson has correctly applied his incremental method and come to the correct figure if that method in fact applies — in which even Anson’s testimony would no longer assist the jury. But there is no such agreement at this time.
     
      
      . Memo in Opposition (Anson) at 6.
     
      
      . Id. at 9.
     
      
      . We note that on page 11 of its memorandum, Louis Vuitton relies heavily on W.E. Bassett, without acknowledging that the court’s order of disgorgement of lost profits was clearly influenced by the fact that the defendant dominated the market. That point was later emphasized by the court in Tommy Hilfiger, when it stated that "where infringement is especially malicious or egregious, allowing a defendant, especially a dominant competitor who has made use of the mark of a weaker entity, to deduct profits due to its own market dominance in some circumstances inadequately serves the goal of deterrence.” 146 F.3d at 72 (emphasis added). That condition is of course not present in this case.
     
      
      .Establishing loss of profits on the plaintiffs part does not appear to be an absolute condition to recovering profits from the defendant. See W.E. Bassett Co. v. Revlon, Inc., 435 F.2d 656, 664 (2d Cir.1970) (“Bassett could not have sustained monetary damages from the sale of the 'Cuti-Trim' articles since it did not sell a cuticle trimmer at the time; and it did not sustain damages in good will since Revlon's product was of high quality. Nevertheless, Revlon was found to have deliberately and fraudulently infringed Bassett’s mark .... Accordingly, a full accounting is proper as a deterrent.”).
     
      
      . An accounting under the Lanham Act is an equitable remedy and the award is determined by the court. George Basch Co. v. Blue Coral, Inc., 968 F.2d 1532, 1537 (2d Cir.1992) ("Clearly, the statute’s invocation of equitable principles as guideposts in the assessment of monetary relief vests the district court with some degree of discretion in shaping that relief.”); Id. at 1540 (“The district court’s discretion lies in assessing the relative importance of [listed] factors and determining whether, on the whole, the equities weigh in favor of an accounting.”).
     
      
      . Anson Deposition at 244, 246-247.
     
      
      . Expert Report of Bradford Cornell ("Cornell Report") at 18-23.
     
      
      . Memorandum in Support of Dooney & Bourke, Inc.'s Motion in Limine to Exclude the Report, Testimony and Opinions of Mr. Weston Anson ("Memo in Support (Anson)”) at 3.
     
      
      . Anson Report at 10.
     
      
      . Similarly, if Anson’s report were to be found admissible in any other respect, his opinions on dilution should be struck.
     
      
      . 3/5/07 Anson Deposition at 316.
     
      
      
        .Id. That decision was probably prudent. When questioned further about his possible qualifications as a statistician, he incorrectly identified a term on the chart prepared by Mr. Torres. Id.
      
     
      
      . 3/5/07 Anson Deposition at 316. Nor do we find in Anson’s deposition any indication that it is common for experts on valuation of intellectual property to rely on the opinions of those who are qualified to conduct regression analyses.
     
      
      .Dooney & Bourke takes issue with a number of miscalculations made by Torres in conducting the regression analysis, including an error in converting sales in Euros in a way that undervalued the United States sales. See Memo in Support (Anson) at 14. We conclude that these miscalculations would in the ordinary case go to weight and not admissibility. But we note that the traditional means of addressing such errors, i.e., cross-examining the expert at trial, would not work in this case because, for reasons stated above, Louis Vuitton has produced the wrong expert to testify on the regression analysis. Indeed, this is another reason why Torres and not Anson should have been produced — otherwise errors in computation that traditionally go to weight should probably be treated as going to admissibility. Given the many other reasons for excluding Anson's testimony as to the regression analysis, we find it unnecessary to decide whether the errors in computation, under these circumstances, are enough on their own to render the regression analysis inadmissible.
     
      
      . See Daniel L. Rubinfeld, Reference Guide on Multiple Regression, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 179, 181 (Federal Judicial Center 2000). Professor Rubinfeld notes that the failure to include “a major explanatory variable that is correlated with the variable of interest may cause an included variable to be credited with an effect that actually is caused by the excluded variable” and that this flaw in methodology may lead to "inferences made from regression analyses that do not assist the trier of fact.” Id. at 188.
     
      
      . Judge Scheindlin cited Bickerstaff v. Vassay College, 196 F.3d 435, 450 (2d Cir.1999) (holding that assumption that race bias tainted professor's course evaluation scores is untenable without attempting to control for other causes for low score); Smith v. Xerox Corp., 196 F.3d 358, 370-71 (2d Cir.1999) (holding that plaintiff's statistical analysis failed oh its own to support an inference of discriminatory treatment sufficient to withstand a summary judgment motion because the analysis did not account for any other causes for the fact that older workers were more likely to be terminated); Hollander v. American Cyanamid Co., 172 F.3d 192, 203 (2d Cir.1999) (holding that expert report is inadmissible because its "inference of [age] discrimination solely on the basis of the raw numbers is impermissible in the absence of any attempt to account for other causes of the ... anomaly”); and Raskin v. Wyatt Co., 125 F.3d 55, 67-68 (2d Cir.1997) (holding that expert report was inadmissible in part because it "assumed any anomalies in the ... data must be caused by age discrimination, and [made] no attempt to account for other possible causes”).
     
      
      . See Daniel L. Rubinfeld, Reference Guide on Multiple Regression, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 179, 188 (Federal Judicial Center 2000) (explanatory variables that are quantifiable should be included in a regression analysis).
     
      
      . 3/5/07 Anson Deposition at 294-95.
     
      
      . Id.
      
     
      
      . Cornell Report at 1.
     
      
      . Id. at 2.
     
      
      . Id. at 3.
     
      
      . Id.
      
     
      
      . Id. at 8.
     
      
      . 2/2/07 Cornell Deposition at 69.
     
      
      . Cornell Report at 12.
     
      
      . Id. Cornell’s inclusion of other countries in North America tended to wash out, as he substituted Mexico and Canada for Hawaii, which resulted in “pretty much an overlap” with United States sales. 2/2/07 Cornell Deposition at 176-177. Henceforth we will refer, as do the parties, to Europe and the United States as the comparison markets used by Cornell.
     
      
      . Cornell Report at 22.
     
      
      . Id. Exhibit 13
     
      
      . Id. at 22.
     
      
      . Id. at 23. Dr. Cornell does not claim that higher Dooney & Bourke sales caused higher Louis Vuitton sales. He recognizes that there are possible alternative explanations “such as a fashion trend” that drove the sales of both companies. Dr. Cornell did not purport to do a regression analysis to determine actual causation, as the point of the study was to determine whether Dooney & Bourke sales decreased Louis Vuitton sales in the United States. Cornell Report at 23.
     
      
      .He did so by computing expenses as a percentage of total revenue for all products, and then applied those percentages to revenue received on the handbags during the subject period. Cornell Report, Exhibit 7. Cornell states that his method “is consistent” with Anson's report in this respect. Cornell Report at 15.
     
      
      . Cornell Report, Exhibit 9. Dr. Cornell reached different figures based on different dates postulated by Dooney & Bourke, e.g., measured from the time that Louis Vuitton sent a cease and desist letter, etc.
     
      
      . Cornell Report at 19-21. Cornell also cites mathematical errors, most importantly the error in converting euros. See supra, note 309.
     
      
      . Cornell Report at 21.
     
      
      . Louis Vuitton’s Memorandum in Support of Its Motion to Exclude Defendant Dooney & Bourke’s Proposed Expert Testimony of Bradford Cornell ("Memo in Support (Cornell)”) at 1.
     
      
      . Id. at 17.
     
      
      . Louis Vuitton's Reply Memorandum in Support of Its Motion to Exclude Defendant Dooney & Bourke's Proposed Expert Testimony of Bradford Cornell ("Reply (Cornell)”) at 6-8. We do not choose to speculate on why Louis Vuitton raised this argument only in its reply memorandum.
     
      
      .See Daniel L. Rubinfeld, Reference Guide on Multiple Regression, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 179, 200 (Federal Judicial Center 2000) (“A doctoral degree in a discipline that teaches theoretical or applied statistics, such as economics, histoiy and psychology, usually signifies to other scientists that the proposed expert meets this preliminary test of the qualification process.”).
     
      
      . We note specifically that unlike Anson, Dr. Cornell has sufficient education, training and experience to conduct and interpret a regression analysis on sales data. Moreover, unlike Anson, Cornell conducted his own regression analysis — he was not acting as a "mouthpiece” for another expert.
     
      
      . Reply (Cornell) at 1.
     
      
      . Id. at 6.
     
      
      . Memo in Support (Cornell) at 6 (emphasizing that "a plaintiff can use the defendant’s profits as a proxy for its damages precisely because of its inability to prove them”).
     
      
      . Concretely this would mean that the jury would not be instructed to reach any finding on lost profits, and the court in exercising its equitable authority to order an accounting under the Lanham Act would assume that Louis Vuitton suffered no loss of profits.
     
      
      .We here emphasize our previous recommendation to exclude Anson’s testimony regarding United States sales not only (or even primarily) for lack of fit but also because it fails the standards of Rules 702 and 703. Thus, even if Vuitton does explicitly or implicitly raise the issue of lost sales in the United States, it should not be able to do so by way of Anson.
     
      
      .Memo in Support (Cornell) at 8. We note that Louis Vuitton spends a good deal of briefing time and space in arguing that Cornell’s testimony is unreliable, given its position that his testimony is irrelevant in the first place because Louis Vuitton is not seeking lost profits. Perhaps the extensive treatment on reliability is intended to shore up Louis Vuitton’s own "damages” expert, Anson. But again, this seems to indicate that Louis Vuitton is of two minds about damages in this case.
     
      
      . See 3/2/07 Cornell Deposition at 64 (acknowledging that goodwill has a monetary value).
     
      
      . See Lucian Arye Bebchuk & Marcel Kahan, Fairness Opinions: How Fair Are They and What Can Be Done About It?, 1989 Duke L.J. 27, 35-37, for the requirements of discounted cash flow analysis.
     
      
      . 3/2/07 Cornell Deposition at 81.
     
      
      . Memo in Support (Cornell) at 13.
     
      
      .3/2/07 Cornell Deposition at 174.
     
      
      . Cornell Report, Exhibits 14 A and B.
     
      
      . Cornell Report at 22.
     
      
      . Nothing that we could find in the Federal Judicial Center's REFERENCE MANUAL ON SCIENTIFIC EVIDENCE (2d ed.2000) or any other treatise on statistics (that we were able to understand) appears to support the theory that single variable regression becomes more reliable when a second regression is conducted with a separate single variable.
     
      
      . See David H. Kaye & David A. Freedman, Reference Guide on Statistics, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, 2d at 138 (Federal Judicial Center 2000) (noting that correlation is not equivalent to causation: “For an easy example, among schoolchildren, there is an association between shoe size and vocabulary. However, learning more words does not cause feet to get bigger, and swollen feet do not make children more articulate.”).
     
      
      . We note that despite Cornell’s technical language and regressions, it is possible that his evaluation of Dooney & Bourke and Louis Vuitton sales data is not expert testimony at all. At bottom, all he appears to do is track the sales units of Dooney & Bourke and Louis Vuitton over the same time period. His use of Louis Vuitton's European or world sales as the variable seems unnecessary under the circumstances because his conclusion is simply that the United States sales of both companies went up during the subject time period. Doo-ney & Bourke does not need an expert for such an enterprise.
     
      
      .Memo in Support (Cornell) at 18.
     
      
      . Reply (Cornell) at 8.
     
      
      . Louis Vuitton cites Caffey v. Cook, 409 F.Supp.2d 484, 506 (S.D.N.Y.2006) for the proposition that income taxes "generally are not a valid deduction from infringer’s profits” but the court’s statement of the law was actually that "no willful infringer may deduct such costs.” The court in Caffey in fact allowed a deduction of taxes from the defendant’s profits, after finding that the infringement was not willful.
     
      
      . The closest questions presented were 1) Dr. Jacoby’s dilution survey; 2) Dr. Holub’s testimony on intent; and 3) Dr. Cornell’s statistical analysis of Louis Vuitton and Dooney & Bourke sales.
     
      
      . Id.
      
     