
    The PROCTER & GAMBLE COMPANY, Plaintiff-Appellee, Cross-Appellant, v. CHESEBROUGH-POND’S INC., Defendant-Appellant, Cross-Appellee. CHESEBROUGH-POND’S INC., Plaintiff-Appellant, Cross-Appellee, v. The PROCTER & GAMBLE COMPANY and Benton & Bowles, Inc., Defendants-Appellees, Cross-Appellants.
    Nos. 195, 240, Dockets 84-7549, 84-7569.
    United States Court of Appeals, Second Circuit.
    Argued Sept. 10, 1984.
    Decided Oct. 30, 1984.
    
      James R. Phelps, Washington, D.C. (Thomas J. Donegan, Jr., Hyman, Phelps & McNamara, P.C., Washington, D.C., Proskauer, Rose, Goetz & Mendelsohn, New York City, Arnold I. Friede, Greenwich, Conn., of counsel), for Chesebrough-Pond’s Inc.
    Harold P. Weinberger, New York City (Geoffrey M. Kalmus, Greg A. Danilow, Steven E. Greenbaum, Kramer, Levin, Nessen, Kamin & Frankel, New York City, Thomas R. Hillhouse, Cincinnati, Ohio, of counsel), for Procter & Gamble Co. and Benton & Bowles, Inc.
    Before MANSFIELD, MESKILL and KEARSE, Circuit Judges.
   MANSFIELD, Circuit Judge:

Comparative television advertising on a national scale by manufacturers claiming superiority for their products over competing brands has magnified the risk of competitive harm from false advertising and has led to the proliferation of suits' by competitors alleging violations of § 43(a) of the Lanham Act, 15 U.S.C. § 1125(a). See, e.g., Coca-Cola Co. v. Tropicana Products, Inc., 690 F.2d 312 (2d Cir.1982); Vidal Sassoon, Inc. v. Bristol-Myers Co., 661 F.2d 272 (2d Cir.1981); American Home Products Corp. v. Johnson & Johnson, 577 F.2d 160 (2d Cir.1978). These consolidated appeals represent another example of efforts to use the courts as a means of policing this relatively new practice, raising issues as to the scope of the Lanham Act and the court’s role in evaluating tests relied upon as support for claims of product superiority.

Each of two giants in the soap and toilet goods market, Chesebrough-Pond’s Inc. (“Chesebrough”) and The Procter & Gamble Company (“P & G”), appeals from an order of the Southern District of New York, 588 F.Supp. 1082, Gerard L. Goettel, Judge, denying preliminary relief in its action to enjoin the other from using alleged false claims of superiority or equality for its product. We affirm the denial of preliminary injunctive relief in both actions.

The products forming the basis of the two actions are widely advertised and distributed hand and body lotions, P & G’s “New Wondra” and versions thereof and Chesebrough’s “Vaseline Intensive Care Lotion” (“VICL”) and variations of it. Beginning in November 1983, P & G began advertising New Wondra on TV and in print as superior to other leading lotions in the therapeutic treatment of dry skin. The parties do not dispute that both New Wondra and VICL are “leading lotions.” Indeed, judged by volume of domestic sales, VICL was the nation’s most popular brand of hand and body lotion in 1983. P & G advertised, for instance, (1) that “New Wondra beats the leading Lotions”; (2) that it is “[bjetter than any top lotion”; (3) that it “relieves dry skin better than any leading lotion”; (4) that “[djermatologists proved it in clinical tests. New Wondra improves the condition of rough dry skin better”; and (5) that New Wondra “works better than any other leading lotion at turning rough dry hands soft and smooth.” At about the same time, Chesebrough began to run television commercials and advertisements making a somewhat less extravagant claim that P & G’s assertion of superiority. Chesebrough claimed parity for VICL, i.e., that it was equal in effectiveness to any other leading brand. According to the ads “[w]hen it comes to relieving dry skin, no leading lotion beats Vaseline Intensive Care Lotion” and “you can’t buy better lotions to heal winter dry skin.”

In support of its advertising claims, P & G relied on two large-scale clinical tests (SC-207 and SC-215). Both were “double-blind” tests in which each of several groups of individuals used a version of P & G’s New Wondra or one of the leading skin lotions over a period of several weeks, with a dermatologist periodically grading the efficacy of the product on the subjects’ dry skin. The products tested in the SC-207 study included Chesebrough’s VICL and a slightly earlier version of P & G’s New Wondra. Test SC-215 compared New Wondra with Chesebrough’s “Extra Strength” VICL and “Vaseline Dermatology Formula Lotion,” (“VDL”) among other products. P & G used an “ad libitum” procedure in these tests, that is, the subjects were told to apply the test lotions to the skin as they customarily used skin lotions, in whatever quantity and locations of the body they preferred. P & G maintains that its tests led it to conclude that its New Wondra was superior to the competitors' products in improving skin condition. SC-207 revealed the difference in effectiveness between the earlier formula of New Wondra and VICL to be statistically significant. SC-215 documented a statistically significant difference between New Wondra and the Chesebrough lotions VICL Extra Strength and VDL, but this difference was evident only when the analysis was confined to data taken from a subgroup of those persons tested, i.e., only those subjects who had initially had rough skin. To reflect the results of this second test, P & G limited its claims of product superiority to those with “rough, dry skin.”

When Chesebrough became aware of P & G’s claims for the newest New Wondra formula, it did not abort its parity claims, but initiated its own tests to compare VICL with the reformulated New Wondra. Its two small-scale tests (involving 28 and 11 subjects, respectively) and one larger-scale clinical test (73 subjects) revealed no significant difference between the two lotions.

With the fat thus in the advertising world’s fire, P & G brought its present action against Chesebrough in January 1984, claiming the latter’s ads violated the Lanham Act and seeking preliminary injunctive relief against the Chesebrough advertisements. On the next day, Chesebrough countered by suing for similar injunctive relief against P & G’s advertisements and joining P & G’s advertising agency, Benton & Bowles, Inc., as a co-defendant.

There followed extensive hearings before Judge Goettel, which were devoted principally to evidence and expert testimony about the tests conducted by each manufacturer. Chesebrough attacked P & G’s tests on the ground that, unlike Chesebrough’s tests, they did not compare the currently advertised products at issue, New Wondra and VICL, but compared in one case an earlier version of New Wondra with VICL, and in the other instance, the current version of New Wondra with Chesebrough formulas claimed to be significantly different (“Extra Strength” VICL and VDL). P & G replied with evidence that the variations were not significant for purposes of the claims at issue. Chesebrough also attacked P & G’s tests and results as unreliable and the product of highly questionable data manipulated to reach a pre-designed conclusion. P & G, on the other hand, challenged Chesebrough’s tests as either too small to produce meaningful results or as “poorly designed, sloppily executed and improperly analyzed,” pointing to such deficiencies as failure to use an articulated grading scale and employment of an erroneous and inappropriate statistical analysis.

The hearings were taken up with a “battle of the experts” on the appropriateness of the testing methodologies used by the parties to support their claims. Expert testimony, much of it conflicting, was adduced regarding the reliability and acceptance of the “ad libitum” procedure, the “parametric” system of statistical analysis, the use of a “one-tail” test (which seeks only to determine superiority) rather than a “two-tail” test (designed to discover whether one product is equal to or better than another), the separation out and reanalysis of a “subset” from the total body of statistical data derived from a study, and reliance upon subjective rather than purely objective scientific judgments.

Judge Goettel, after reviewing in detail the tests made by P & G and Chesebrough, concluded that, although P & G’s tests were not worthless, they “were far from perfect and are subject to various infirmities.” Chesebrough’s tests, he found, were “more questionable than P & G’s,” but he noted that they had been used only to support a lesser claim of parity rather than a claim of superiority such as that asserted by P & G for New Wondra. Although some of the weaknesses in each party’s tests and analytical methods were noted, none was found sufficiently significant to call for outright rejection of the test results.

Relying on our decision in Vidal Sassoon, Inc. v. Bristol-Myers Co., supra, the district court held that false statements in advertising regarding consumer test results or methodology are actionable when they would deceive a person as to a product’s inherent quality or characteristics. However, after some asides on the subjects of good faith and the qualifications of the judiciary (as distinguished from the Federal Trade Commission) to evaluate such tests, Judge Goettel concluded that he could not determine whether each company’s claim was true or false and that “neither party has shown a likelihood of success on the merits of its claim.”

Discussion

Upon an appeal from denial of injunctive relief sought under § 43(a) of the Lanham Act the district court’s decision may be reversed only upon a showing that it abused its discretion, which may occur when a court bases its decision on clearly erroneous findings of fact or on errors as to applicable law. See Coca-Cola v. Tropicana Products, Inc., supra, 690 F.2d at 315 (2d Cir.1982). The burden is upon the party seeking preliminary relief from the district court to show not only that it is likely to suffer irreparable injury if relief is denied but also that there is either (1) a likelihood of success on the merits or (2) sufficiently serious questions going to the merits to make them a fair ground for litigation, with a balance of hardships tipping decidedly in the plaintiff’s favor. Id. at 314-15.

No issue is raised in the present case as to the existence of irreparable injury. Both parties, however, argue that the district court misapprehended the applicable legal standards governing a false advertising claim under the Lanham Act by implying that a plaintiff must prove that .the challenged advertisement contains an obvious and intentional falsehood. We disagree.

Section 43(a) of the Lanham Act provides that any person who, in connection with the sale of goods or services, uses a “false description of origin, or any false description or representation, including words or other symbols tending falsely to describe or represent the same,” shall be liable to any person damaged thereby. 15 U.S.C. § 1125(a). After initial uncertainty as to the statute’s reach, with some believing it to be little more than a codification of the common law action for deceitful advertising, see e.g., Samson Crane Co. v. Union National Sales, Inc., 87 F.Supp. 218, 222 (D.Mass.1949), aff'd per curiam, 180 F.2d 896 (1st Cir.1950); Chamberlain v. Columbia Pictures Corp., 186 F.2d 928, 925 (9th Cir.1951), it is now settled that it creates a new statutory tort of broader scope, which requires neither proof of literal or obvious falsehood, American Home Products Corp. v. Johnson & Johnson, 577 F.2d 160, 165 (2d Cir.1978), nor of intent to deceive. Johnson & Johnson v. Carter-Wallace, Inc., 631 F.2d 186, 189 (2d Cir.1980). As we stated in Vidal Sassoon, supra, “§ 43(a) of the Lanham Act encompasses more than blatant falsehoods. It embraces ‘innuendo, indirect intimations, and ambiguous suggestions’ evidenced by the consuming public’s misapprehension of the hard facts underlying an advertisement.” 661 F.2d at 277 (quoting American Home Prods. Corp. v. Johnson & Johnson, 577 F.2d at 165).

In the present case, the district court’s opinion does refer to the two actions as attacks on “advertisements that are not obviously false but that rest upon tests whose efficacy is questioned” (Op. 23) (emphasis supplied) and notes that the tests “were conducted in apparent good faith but with somewhat differing results” (Op. 24) (emphasis supplied). The opinion also goes on to state that neither party had proved that the tests had been chosen or conducted “in such a manner as to mislead the public” (Op. 24). However, when these statements are read in context it is apparent that the court was not suggesting that each plaintiff must prove an obvious or bad faith falsity. Indeed the court cites decisions holding to the contrary, e.g., Vidal Sassoon, Inc., supra; American Home Products Corp. v. Johnson & Johnson, supra. At most, the district court implied that proof of bad faith in conducting the tests might indicate that the tests were unreliable or invalid, which would support an inference that advertising claims based on such tests were false; the court did not conclude that proof of good faith would establish the validity of product tests or that conducting tests in good faith would provide a defense immunizing a manufacturer from liability for false advertising claims. We find no error in this analysis.

Although a plaintiff need not prove obviousness or bad faith and although proof of good faith does not immunize a defendant, each plaintiff bears the burden of showing that the challenged advertisement is false and misleading, see Johnson & Johnson v. Carter-Wallace, Inc., supra, 631 F.2d at 192; Coca-Cola Co., supra, 690 F.2d at 314-15, 317-18, not merely that it is unsubstantiated by acceptable tests or other proof. See Toro Co. v. Textron, Inc., 499 F.Supp. 241, 253 (D.Del. 1980). The mere fact that one party’s evidence in support of the truth of its advertisements was unpersuasive would not ipso facto entitle the other party to relief. In the present case, for instance, regardless of the weaknesses of the tests made and relied on by Chesebrough to support its parity claims, P & G would be entitled to relief against Chesebrough only upon adducing evidence that the Chesebrough ads were false. Conversely, even if P & G did not prevail on its claim, Chesebrough could obtain an injunction against P & G only by establishing that the latter’s advertising claim of test-proven superiority was false. To prove such falsity Chesebrough assumed the burden of showing that the tests referred to by P & G were not sufficiently reliable to permit one to conclude with reasonable certainty that they established the proposition for which they were cited. The fact-finder’s judgment should consider all relevant circumstances, including the state of the testing art, the existence and feasibility of superior procedures, the objectivity and skill of the persons conducting the tests, the accuracy of their reports, and the results of other pertinent tests.

The issuance of preliminary injunctive relief in the present two cases therefore turns mainly on issues of fact rather than of law, i.e., the weight to be given by the district court to the evidence about the comparison tests conducted by the parties. If a party fails to show a likelihood of success by adducing through tests or other evidence that the other’s claim of superiority or parity is probably false, the district judge’s exercise of his broad discretion to deny relief must be upheld. Chesebrough contends that Judge Goettel erred in failing to reject the tests relied upon by P & G because they did not directly compare the advertised formulations (New Wondra and VICL) but rather used an earlier Wondra formula in the first test (SC-207) and different Chesebrough formulas, “Extra Strength” VICL and VDL, in the second test (SC-215). We reject this contention.

After hearing extensive testimony on whether the differences in the formulations were significant, Judge Goettel found that “neither side has actually demonstrated the importance or lack of importance” (Gp. 12) of the differences between the New Wondra formula tested in SC-207 and the formula marketed in November 1983, and he therefore observed that Chesebrough had-failed to sustain its burden on the issue (n. 22). Conversely, the district judge rejected P & G’s contention that its SC-215 test, which showed New Wondra to be superior to VICL Extra Strength arid VDL in treatment of rough dry skin, demonstrated that New Wondra must a fortiori be superior to VICL. He found that the increased viscosity and greasiness of Extra Strength and VDL might have decreased the amount of them used.

Nor can we agree with the parties’ contention that the district court held that it was without authority to evaluate the testing standards used as the basis for their advertising claims. Although the task of evaluating scientific product tests may be challenging and distasteful because of the technical and theoretical nature of the procedures involved and the intricate statistical analysis .needed to derive qualitative inferences and conclusions from the data, the court is under just as much of a duty to consider and weigh such evidence as it is to analyze economic or scientific evidence in a complicated patent or antitrust case. See, e.g., Bio-Rad Laboratories, Inc. v. Nicolet Instrument Corp., 739 F.2d 604, 608-12 (Fed.Cir.1984) (analyzing the validity of a patent for an interferometric optical phase discrimination apparatus); Brown Shoe Co. v. United States, 370 U.S. 294, 339-54, 82 S.Ct. 1502, 1531-41, 8 L.Ed.2d 510 (1962) (analyzing the effect of a merger). That a district court, sitting as trier of the facts, must consider, analyze and weigh expert testimony regarding clinical testing standards, procedures and results, has long been recognized. See, e.g., Johnson & Johnson v. Carter-Wallace, Inc., supra, 631 F.2d at 192 & n. 5; American Home Products Corp. v. Johnson & Johnson, supra, 577 F.2d at 169 & n. 19, 171 & n. 24.

It is true that in evaluating the tests and standards used in the present case, the district court indulged in unnecessary skepticism regarding its role in this regard, likening it to judicial “policy-making” and “activism” that should be exercised by the Federal Trade Commission rather than the courts. Despite these comments the record reveals that the court did not shirk its duty but carefully analyzed and evaluated the parties’ tests. It was only after that thorough review as fact-finder that the district court found that neither party had established a likelihood of successfully proving that the other party’s advertising claims were false. We cannot label its finding clearly érroneous.

The order of the district court is affirmed. 
      
      . Each company sought relief under § 43(a) of the Lanham Act and under New York General Business Law §§ 349 and 350. The New York law claims are not presently before this court.
     
      
      . In 1977 P & G first marketed "Wondra” lotion. Since then, the company has repeatedly tested and reformulated its product and marketed the improvements as New Wondra. Starting in 1981, after clinical studies had revealed the current formula to be no more effective in relieving dry skin than other leading lotions, P & G began developing a new formula containing more glycerin than other lotions and its earlier version. The new formula proved to be more effective than previous formulations and was marketed starting in the middle of 1983.
     
      
      . Chesebrough sells not only VICL but also VICL Extra Strength and Vaseline Dermatology Formula Lotion ("VDL”).
     
      
      . Upon the' return date of the parties’ motions for preliminary injunctive relief, Chesebrough applied for a temporary restraining order against advertising use by P & G of an article written by it and edited by its dermatologist, Dr. Frank E. Dunlap (who had graded the skin conditions of the persons involved in P & G’s clinical tests SC-207 and SC-215), to the effect that the test results had been supported by various instrumental measurements of the subjects’ skin condition. Upon P & G’s statement that it would not disseminate the instrument study and results until a decision had been reached on the preliminary injunction, Judge Goettel denied Chesebrough’s application for a TRO.
     
      
      . Some weaknesses in P & G's testing methods reviewed by the court were: (l).its failure to compare New Wondra directly with VICL, (2) ■ dependence on subjective visual observations by dermatologists or trained graders, which may differ between graders, (3) use of numerical designations that do not necessarily correspond to verbal descriptions on the numerical scale used, (4) variations in the amount of each product applied by the subjects in "ad libitum” testing, and (5) the selection of a subset of subjects with rough skin as indicated by a 1.5 figure on the scale rather than the 2.5 figure shown on the scale as the first indication of skin roughness.
      P & G introduced evidence intended to explain some of these weaknesses, which included testimony, for instance, that despite changes in the New Wondra formulation there was no change in the effective ingredients of New Wondra, only in the inactive ingredients. However, as Judge Goettel noted, the changes could have affected the amount of the product applied by a subject in an ad libitum procedure, as compared with the amount of other lotions applied in the test, which in turn could affect the claims regarding the efficacy of the products. The district court concluded, however, that the "ad libitum" procedure, notwithstanding some of its weaknesses, represented a valid method of testing the overall effectiveness of a product in use.
      The district court, also found that Chesebrough's testing procedures suffered from weaknesses, including (1) the inadequacy of the number of subjects used in the first two tests; (2) the risk that the unsupervised subjects in the third test might become confused by the complicated instructions calling for them to apply New Wondra to one hand and VICL to the other, and might therefore accidentally switch hands or might contaminate the test application by using a hand treated with one lotion to apply the other lotion; (3) the fact that the subjects had such severe skin conditions that they were not representative users; (4) defects in the grading scale and inconsistent grading by the tester, Dr. McIntyre; (5) the brevity of the test period; and (6) the use of measured tests rather than the P & G ad libitum tests with which they were compared.
     