
    LIBAS, LTD., Plaintiff-Appellant, v. UNITED STATES, Defendant-Appellee.
    No. 97-1145.
    United States Court of Appeals, Federal Circuit.
    Oct. 7, 1999.
    
      Elon A. Pollack, Law Offices of Elon A. Pollack, of Los Angeles, California, argued for plaintiff-appellant. With him on the brief was Heather C. Litman.
    Bruce N. Stratvert, Attorney, Civil Division, Commercial Litigation Branch, International Trade Field Office, of New York, New York, argued for defendant-appellee. With him on the brief were David M. Cohen, Director, Commercial Litigation Branch, Department of Justice, of Washington, DC; and Joseph I. Liebman, Attorney in Charge, International Trade Field Office. Of counsel was Edward N. Maurer, Attorney, Office of Assistant Chief Counsel, International Trade Litigation, U.S. Customs Service, of New York, New York.
    Before MICHEL, GAJARSA and CUDAHY, Circuit Judges.
    
      
       The Honorable Richard D. Cudahy, United States Court of Appeals for the Seventh Circuit, sitting by designation.
    
   CUDAHY, Circuit Judge.

This case is centrally about the responsibilities of a trial court to ensure that its determinations based on expert testimony are founded upon reliable, scientifically trustworthy procedures. The issue arises here in the context of our review of a trial court’s determination concerning the weight rather than the admissibility of evidence. Libas, Ltd. (Libas), a fabric importer, challenges the determination of a trial court that the United States Customs Service (Customs) properly classified certain fabric imported by Libas as power-loomed rather than as hand-loomed and therefore subject to a higher duty rate and an entry quota. Libas argues that the trial court erred in holding, first, that Customs has the legal authority to make the determination by testing the fabric and second, that, on the basis of certain tests performed by Customs, the fabric was power-loomed. We agree that Customs had the authority to classify the fabric, but we hold that the trial court’s ruling that the fabric was power-loomed was not supported by evidence in the record of the reliability of the tests, and hence was clearly erroneous.

I.

Libas imports fabric from India through the Port of Los Angeles. In August 1994, it imported 32 bales of rolled cotton fabric which had been certified by the Indian Government as hand-loomed, and entered this fabric as “certified hand-loomed” under Subheading 5208.42.1000, Harmonized Tariff Schedules of the United States (HTSUS). In September 1994, Customs demanded redelivery of the fabric and subjected it to a new test developed by the Los Angeles Customs Laboratory, namely, the “Methodology for the [A]nalysis of Woven Fabric to Determine whether Fabric had been Power-loomed or Hand-loomed” (the Customs test). In November 1994, on the basis of this test, Customs reclassified the fabric as power-loomed, Subheading 5208.42.4000, HTSUS.

Under the Customs test, fabrics are classified as hand-loomed or power-loomed based on characteristics which are supposed to result from different means of manufacture. Woven fabric of any kind is made by running horizontal “weft” or “woof” yarns through a set of vertical “warp” yarns with a shuttle; patterns in the fabric are created by lifting or lowering selected warp yarns at each pass or “pick” of the shuttle. Use of a special kind of shuttle called a “fly-shuttle” can increase the speed at which a pick is “thrown” or completed. Fly-shuttles are in common use in India and can be either hand-thrown or machine-powered. The same kinds of yarns can be used with both processes.

The Customs test is premised on the idea that, because weavers cannot regulate their movements with the precision of a machine, hand-loomed fabrics exhibit less uniformity, evenness and consistency than machine-loomed fabrics. The fabric at issue exhibited two particular characteristics upon testing which were central in Customs’ classification of them as power-loomed. First, Customs found a weft tension defect of only one-and-one half inches, associated, according to Customs, with the small variations found in machine weaving as against the greater weft tension defects in hand weaving, where weft tension is unregulated. Second, Customs found an area where the thread ran out, which Customs concluded was characteristic of machine weaving because it assumed that hand weavers, being close to their work, would notice that the thread had ended and would therefore reweave the fabric to remove the defect.

Classifying the fabric as power-loomed was a matter of some import to Libas. Power-loomed fabrics are subject under HTSUS to nearly double the duty rate for hand-loomed fabrics and, unlike hand-loomed fabrics, are subject to a quota and require a visa for entry. Libas protested in a timely way, but in December 1994, Customs denied the protest on the ground that the laboratory report of the test showed that the fabric “exhibitfed] characteristics of machine-made fabric.” Fourteen additional shipments of similar fabric by Libas were held up by this determination at the time of the filing of this lawsuit in January 1996. At a bench trial held in May 1996, the Court of International Trade heard evidence regarding the production process of the fabric both from experts who testified about various aspects of the Customs test and from a witness who claimed personal knowledge of the process of manufacture. The trial court found against Libas in an opinion issued in October 1996. Libas then filed this appeal.

II.

Libas denies that Customs has the statutory authority to unilaterally reclassify the fabric based on the test because under the governing legal scheme, Libas says, Customs must defer to the Indian Government’s certification or, in case of a dispute, consult with that government. (HTSUS is indeed a statute but is not published physically in the United States Code. See 19 U.S.C. § 1202.) The key statutory language at issue is that of Additional Note 4 to Chapter 52 (Cotton), HTSUS, defining the term “certified hand-loomed fabrics,” as “fabrics made on a hand loom (i.e., a nonpower-driven loom) by a cottage industry and which prior to exportation have been certified by an official of a government agency of the country where the fabrics were produced to have been so made.” The Court of International Trade read this as plainly granting Customs the authority to determine whether the fabrics at issue were actually “made on a hand loom.” The trial court treated the requirement that the fabrics actually be made on a hand loom as separate and independent from the requirement of prior certification by the exporting country’s government. The trial court therefore rejected Libas’ claim that satisfaction of the requirement of Indian Government certification was dispositive.

Libas, however, argues that this determination was erroneous because statutes must be interpreted as consistent with subsequent international agreements. At the least, Libas argues, later international agreements control over prior statutes if there is a direct conflict. HTSUS became effective January 1,1989. See Pub.L. 100-418. Libas maintains that Customs has no authority to reclassify fabric which has been certified as hand-loomed by the Indian Government in view of the Agreement Relating to Trade in Textiles and Textile Products, Feb. 6, 1987, U.S.-India (the Bilateral Agreement), and the later Amendment to that Agreement, Dec. 21, 1989, 1989 WL 407622. The Bilateral Agreement states that either government “has the right to request consultation with the other ... on any matter” pertaining to the Agreement itself, Paragraph 21, and the Amendment says that “hand-loomed fabrics ... shall remain exempt” from quota and visa requirements “if properly certified in accordance with ... the Agreement.” Paragraph 1(C). Therefore, Libas says, Customs lacks the authority it claims.

To win this argument, Libas would have to show that Note 4, read together with the Bilateral Agreement and its Amendment, clearly and unambiguously mandates that if Customs disagrees with the Indian Government’s certification of some fabric, the only recourse Customs has is consultation with the Indian authorities under Paragraph 21 of the Bilateral Agreement. But the most that Libas in fact argues is that the language will bear this reading. Libas says that Paragraph 21 of the Bilateral Agreement “can be interpreted” as a blueprint for Customs to follow. Customs, Libas urges, “could have” sought consultation with the Indian Government instead of unilaterally reclassifying the fabric as power-loomed. The language of Paragraph 21 of the Bilateral Agreement is concededly permissive. It does not, even if read together with Note 4, clearly and unambiguously mandate that the United States must consult with the Indian authorities if it disagrees with their certification of imported fabric instead of unilaterally reclassifying the fabric. There is no requirement of consultation, only a right to consult. Because the Bilateral Agreement, as amended, can be read consistently with Note 4 to grant Customs the authority it claims, we have no occasion to consider whether the subsequently amended Agreement trumps the statute. The language of the statute and of the Agreement clearly states that Customs has the required authority. The trial court, therefore, did not err in determining that Customs has the required authority.

III.

We now address the Court of International Trade’s determination that the fabric was power-loomed rather than hand-loomed. Legal determinations made by that court are reviewed de novo, while its factual findings are reviewed for clear error. See United States v. Hitachi America, Ltd., 172 F.3d 1319, 1326 (Fed.Cir.1999); Ct. Int’l Trade R. 52(a). By statute, Customs’ classification decision “is presumed to be correct.” Universal Elec. Inc. v. United States, 112 F.3d 488, 491 (Fed.Cir.1997) (citing 28 U.S.C. § 2639(a)(1)). The presumption may be rebutted if the importer demonstrates by a preponderance of the evidence that Customs’ classification is incorrect. See id. at 492. No deference attaches to Customs’ classification under this presumption. See Rollerblade, Inc. v. United States, 112 F.3d 481, 484 (Fed.Cir.1997). The court must consider for itself “whether the government’s classification is correct, both independently and in comparison with the importer’s alternative.” Marubeni America Corp. v. United States, 35 F.3d 530, 536 (Fed.Cir.1994) (internal citations omitted).

Libas argues, first, that the trial court wrongly ignored the direct testimony from personal knowledge of one of Libas’ main witnesses, Mr. S. Ponnuswamy of J.L.C. International, the Indian exporter of the fabric. He testified at trial that the fabric came from Kovur, a village in India about 200 miles from Madras, which he visits once a month and where he deals with three master weavers. He stated that there are no power looms in or near Ko-vur. He also provided videotapes of the Kovur operation. The United States attempted to impeach his testimony by cross-examination, but did not present rebuttal witnesses.

Libas is aware that we accord the credibility determinations of a trial court “great deference,” Refac Int’l, Ltd. v. Lotus Dev. Corp., 81 F.3d 1576, 1582 (Fed.Cir.1996) (citing Fed.R.Civ.P. 52(a); Anderson v. Bessemer City, N.C., 470 U.S. 564, 575, 105 S.Ct. 1504, 84 L.Ed.2d 518 (1985)), but, as Libas notes, the trial court’s reasoning focused entirely upon the Customs tests and did not mention Mr. Ponnuswamy. Libas argues that where the trial court made no determination about credibility, there was no finding to which we must defer. Libas then contends that Mr. Ponnuswamy’s un-rebutted testimony was sufficient to rebut Customs’ presumption of correctness because that presumption is only a procedural rule about allocation of burdens of production and persuasion. See Universal Elec., 112 F.3d at 493. The presumption, Libas concludes, has no independent evidentiary weight. In view of our ultimate disposition of the case, nothing rests upon these issues and so we need not decide them.

Libas also argues that the trial court clearly erred in relying on expert testimony about the Customs test while ignoring the only direct testimony from personal knowledge presented. Libas cites Trans-Orient Marine Corp. v. Star Trading and Marine, Inc., 925 F.2d 566, 571 (2d. Cir.1991) (clear error if the trial court’s finding was “directly contrary to the only testimony presented”). But testimony from personal knowledge would not ipso facto trump expert testimony. The results of Customs’ test and the expert testimony as to its cogency were in principle just as competent as evidence of the source of the fabric as any direct testimony. We now turn to whether the results of the test were in practice enough to sustain the trial court’s determinations.

rv.

Libas argues that the trial court erred in holding that the fabric was power-loomed because the methodology involved in the Customs test was not shown to be reliable or properly validated by scientific or other appropriate technical means. And, very importantly, the results of this test provided the sole basis for the trial court’s conclusion about the source of the fabric. The Supreme Court has held that reliability is the touchstone for expert testimony on “scientific, technical, or other specialized knowledge.” Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 589, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) (quoting Fed.R.Evid. 702). A trial judge, acting as “gatekeeper,” must “ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable.” Id. “Proposed testimony must be supported by appropriate validation-i.e., ‘good grounds,’ based on what is known. In short, the requirement that an expert’s testimony pertaining to ‘scientific knowledge’ establishes a standard of evidentiary reliability.” Id. at 590, 113 S.Ct. 2786. The Court has extended this gatekeeping requirement from scientific to all expert testimony. See Kumho Tire Co., Ltd. v. Carmichael, 526 U.S. 137, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999).

Daubert and Kumho were decided in the context of determining standards for the admissibility of expert testimony under the Federal Rules of Evidence, which are not at issue here. We agree with Libas, however, that the proposition for which they stand, that expert testimony must be reliable, goes to the weight that evidence is to be accorded as well as to its admissibility. Neither the plain language of the relevant Supreme Court opinions nor the underlying principles requiring reliability for expert testimony are narrowly confined in application to questions of admissibility. The difference between weight and admissibility, moreover, is in many instances a close question. See Heidi Li Feldman, Science and Uncertainty in Mass Exposure Litigation, 74 Texas L.Rev. 1, 3 n. 10 (1995) [hereinafter Feldman, Science and Uncertainty ] (criticizing a focus on admissibility in treatments of Daubert).

In any event, if a trial court relies upon expert testimony, it should determine that the expert testimony is reliable. It would make little sense to say that a trial court in its fact-finding role should accord much if any weight to expert testimony, the reliability of which is not established. See Perreira v. Dep’t of Health and Human Serv., 33 F.3d 1375, 1377 n. 6 (Fed.Cir.1994) (“An expert opinion is no better than the soundness of the reasons supporting it.”). By the same token, even less weight should be accorded to expert testimony the reliability of which is controverted by the evidence in the record.

When the issue of reliability is raised, it is a key consideration in determining the weight to be given to expert testimony. See United States v. Velasquez, 64 F.3d 844, 848 (3rd Cir.1995) (“[T]he reliability of [DNA] evidence goes ‘more to ... weight than to ... admissibility .... ’ ”) (quoting United States v. Jakobetz, 955 F.2d 786, 800 (2d Cir.1992) (In assessing weight of evidence “[t]he district court should focus on whether accepted protocol [for DNA testing] was adequately followed in a specific case.... ”)). In this respect, a trial court acting as fact finder should ordinarily take into account, among other considerations which may bear on the reliability of expert testimony, factors which have been authoritatively identified as important. The Supreme Court, in Daubert, cited four such factors: (1) the testability of the hypothesis; (2) whether the theory or technique has been subject to peer review and publication; (3) the known or potential rate of error; and (4) whether the technique is generally accepted. 509 U.S. at 593-94, 113 S.Ct. 2786.

As Professor Feldman has argued, attention to these considerations brings the law into line with “scientific standards for respectable science.” Feldman, Science and Uncertainty, 74 Texas L.Rev. at 1, 9-18. These factors do not exhaust the list of considerations bearing on reliability. But they are representative of matters to be taken into account in assessing a technical procedure like the Customs test in the case before us. They are therefore appropriate factors for a trial court in its fact-finding role to consider in assigning weight to testimony when concerns about the reliability of such testimony are raised.

There is no iron law that the Daubert factors be applied in Customs classification cases. The Court of International Trade obviously need not use them in every case, or even in most such cases. These factors are primarily applicable when the question involves a technical process where the reliability of a scientific or technical methodology has been raised as an issue. There may, of course, be other factors which may in a given case be more relevant than the Daubert considerations. The trial court has broad discretion in these matters, but, before turning from Daubert altogether, the court should assure itself that it has effectively addressed the important issue of reliability when it has been raised in an appropriate case.

In this case, the Court of International Trade did not ascertain whether, or explain why, the Customs test was reliable according to appropriate standards when the plaintiff made a timely challenge to its reliability. Although Libas did not raise a Daubert attack as such against the Customs test, since admissibility was not at issue, it did clearly present the question of reliability to the trial court. Libas introduced expert testimony that skilled hand weavers can produce results similar to machine weaving and that washing reduces any discrepancies between the results of the two production processes, raising questions about whether the Customs test would accurately distinguish between them. As noted below, Libas also raised grave questions about the basis of Customs’ confidence in the test. Here, where Libas effectively challenged the reliability of the Customs procedure, the trial court should have examined the Customs test either with a Daubert-style analysis or in some other equally searching way.

The trial court, however, reasoned that because two witnesses had testified that the tests Customs used were “widely accepted in the textile industry and ha[d] been used for years,” therefore, “Customs’ methodology appear[ed] to be reasonably calculated to determine whether the fabric is loomed by machine or by hand.” The trial court cited two witnesses to show that the test was “generally accepted.” Professor Emeritus Mary Jane Leland of California State University, Long Beach, stated that the physical description of the fabric reported in the Customs analysis was generally accurate, but did not say or imply that the Customs test was widely accepted. Mary Carillo, Textile Analyst at the Customs Laboratory, one of the people who devised the test and applied it to the fabric here, did say that several elements of the test are “standard textile tests,” but she did not clearly state that they were standard ways of distinguishing whether fabric is power-loomed as opposed to determining some other feature of the fabric. For example, a test might show whether fabric was cotton rather than some other material, since the composition of the fabric in dispute was one of the things that Carillo found by testing. To assess a test’s reliability, it is necessary to know what it tests. Further, even if the test was a test for the source of the fabric rather than for something else, a test or methodology might consist of standard elements without itself being standard.

In describing the test, moreover, Customs itself writes that “[w]e are not aware of any published methods that specifically address th[e] issue” of whether fabric has been power-loomed or hand-loomed. Publication is only one indicator of reliability, but whether a test can be “standard” without publication is at least open to question and, without stronger evidence of general acceptance than exists on this record, there is need of an explanation not provided by any witness here. The problem in this instance was not the trial court’s failure to discuss Daubert, which was not cited at trial, but its acceptance of the Customs test as “reasonably calculated to determine whether the fabric is loomed by machine or by hand” on the sole ground that it was “widely accepted.” The test was accepted despite the fact that it had not been published, that the claim that it was standard was thin and inconclusive and that Libas launched a powerful attack on Customs’ grounds for treating it as reliable.

Granting for the sake of argument, however, that there was some testimony that the Customs test was standard in a relevant sense, the fact that a test is widely accepted is not merely for that reason “reasonably calculated” to determine definitively whether fabric was hand-loomed or power-loomed. While. “[widespread acceptance can. be an important factor” in an assessment of reliability, Daubert, 509 U.S. at 594, 113 S.Ct. 2786, after Daubert and Kumho, the inquiry does not necessarily end there. The lesson of the Supreme Court’s rejection of “general acceptance” as the sole standard for expert testimony, in favor of the Daubert-Kumho reliability standard is that “widespread use” or “general acceptance” is an imperfect proxy for reliability. The prevailing scientific wisdom is not always right, and, moreover, a requirement of “general” or “widespread” acceptance, by itself, may exclude reliable but novel or controversial methodologies. See Brian S. Leiter, The Epistemology of Admissibility, 1997 B.Y.U. L.Rev. 803, 818-19. Of course, more solidly based evidence of widespread acceptance than the government produced in this case would be a better indicator of reliability. If a test, methodology or procedure is clearly shown to be generally accepted and to test what is at issue in the case, a court is entitled to have confidence in its results unless some particular reason for doubt arises, such as failure under the other Daubert factors.

Here, particular reason for doubt does arise. The only evidence in the record directly relating to the validation of the Customs test was Carillo’s testimony on cross examination that the reliability of the test had not been established by the obvious and natural method of double-blind testing. That would involve running the Customs test on fabric, the source of which was known in some other way, perhaps by direct observation, and determining whether testers who themselves had no knowledge of whether test samples were hand-loomed or power-loomed could reliably distinguish power-loomed from hand-loomed fabric within a respectable rate of error. Testing a methodology in this manner would satisfy two of the Daubert factors, verification and known error rate, and for this reason would enhance confidence in the reliability of the test.

Double-blind testing is not required for reliability although it is indicated, especially where it is possible, not terribly burdensome, and reasonably inexpensive. Still, there may be other indicia of reliability, and the Court of International Trade may rely on methodologies which have not been double-blind tested for making Customs tariff classifications as long as there is some other good reason to have confidence in their results or if double-blind testing is impracticable. Customs’ confidence in the test here, Carillo testified, was based on: (1) some non-double-blind testing against reference samples where the “right” result was known to the tester, (2) the textile literature from which elements of the test were derived and (3) the personal experience of five analysts, one of whom is a weaver. As Libas insists, non-double-blind testing is worth little because knowing the “right” result beforehand biases the results. The textile literature, by Customs’ own admission, contains no published methodologies for distinguishing hand-loomed from power-loomed fabrics. And the personal experience and judgment of five analysts, one of whom is a weaver, is no basis for a technically supportable conclusion.

The Daubert test is certainly not a rigid formula. But it is significant that the Customs test fails to satisfy any of its factors except-possibly-general acceptance, and it offers no other assurances of reliability based on factors not mentioned in Daubert to make up for those defects. Satisfaction of a Daubert analysis is not, as we have indicated, demanded in every classification case. Here, however, the Customs test was the only basis for the trial court’s determination. This test was the sort of evidence to which a Daubert-type analysis would apply. And the importer raised a serious challenge to the reliability of the test at trial. Without a more persuasive showing of reliability, therefore, it was clearly erroneous for the trial court to credit the testimony of Customs’ expert witnesses that the test could distinguish between power-loomed and hand-loomed fabric.

We are in no position to determine the proper classification of the fabric at issue here. Whether the fabric was power-loomed or hand-loomed is a factual determination for the trial court. However, the determination must be made on the basis of reliable expert testimony, if expert testimony of a scientific or technical sort is the appropriate way to decide the matter. Although the Customs test may very well in fact be reliable, its reliability has not been established and the record is insufficient as it stands to allow a court to make a determination of its reliability with any confidence. While we might not normally allow Customs a second evidentiary opportunity, it is appropriate to vacate and remand for further findings because the trial court was not on notice that in circumstances like those presented here, it was obliged to make an assessment of reliability based on the sort of analysis we have described. Further evidentiary hearings are probably called for. Therefore, “[t]he original classification having been tested and found wanting on our review, the case can now be remanded to the Court of International Trade to find a correct answer, whether previously claimed or not claimed by the importer. If need be, a further remand to the Customs Service for consideration by it is also permissible.... ” Simod America Corp. v. United States, 872 F.2d 1572, 1579 (Fed.Cir.1989).

Thus, we AFFIRM-IN-PART, VACATE-IN-PART, and REMAND.

COSTS

Each party shall bear its own costs. 
      
      . Contrary to what Libas suggests, the Indian Government still has a right to consultation about this matter. The record does not show whether it has availed itself of this right.
     
      
      
        . Libas did not challenge the admissibility of the Customs test, but the reliability issue was not waived. Under the Tariff Act of 1930, an importer objects to a Customs tariff classification by filing a protest with Customs, see 19 U.S.C. § 1514; if this is denied, jurisdiction lies with the Court of International Trade, see 28 U.S.C. § 1581. According to the procedure of that court, Customs "shall file ... as part of the official record, any document, paper, information or data relating to the entry of merchandise and the administrative determination that is the subject of the protest or petition.” Id. § 2635(a) (emphasis added). The test related to Customs' determination at issue here and so was filed as part of the record. Libas could not, therefore, attack its admissibility and its only recourse was to argue against the weight accorded to the test. Libas’ argument at trial against the reliability of the test was sufficient to rebut the statutory presumption of correctness accorded to Customs classifications.
     
      
      . The same defects vitiate the testimony of Désirée Koslin, an expert witness for the government whose testimony was not cited by the trial court, but who testified that her own methodology and the Customs test "overlap[ped]” and were "standard.”
     