
    PUBLIC CITIZEN and Center for Auto Safety, Petitioners, v. Diane STEED, Deputy Administrator, National Highway Traffic Safety Administration and National Highway Traffic Safety Administration, Respondents.
    No. 83-1327.
    United States Court of Appeals, District of Columbia Circuit.
    Argued Nov. 2, 1983.
    Decided April 24, 1984.
    
      William B. Schultz, Washington, D.C., with whom, Alan B. Morrison, Washington, D.C., was on the brief, for petitioners. John Cary Sims, Washington, D.C., also entered an appearance for petitioners.
    Enid Rubenstein, Atty., National Highway Traffic Safety Admin., Washington, D.C., with whom J. Paul McGrath, Asst. Atty. Gen., Michael F. Hertz, Atty., Dept. of Justice, Frank Berndt, Chief Counsel, Stephen P. Wood, David W. Allen, Asst. Chief Counsel, Bruce C. Buckheit and Roger C. Fairchild, Attys., National Highway Traffic Safety Admin., Washington, D.C., were on the brief, for respondents.
    Before TAMM and MIKVA, Circuit Judges, and BAZELON, Senior Circuit Judge.
    
      
       Senior Judge Bazelon took no part in the disposition of this case.
    
   Opinion for the Court filed by Circuit Judge MIKVA.

MIKVA, Circuit Judge:

Section 203 of the National Traffic and Motor Vehicle Safety Act (the Act), 15 U.S.C. § 1423 (1982), was passed in 1966. It required the development of uniform tire quality grading standards for motor vehicle tires by September, 1968. The primary purpose of section 203 was to provide consumers useful information in selecting tires. Notwithstanding this clear statutory directive, for nine years the National Highway Traffic Safety Administration and its agency predecessors (hereinafter referred to as NHTSA or the agency) were most reluctant regulators, until a consumer lawsuit forced NHTSA to promulgate the regulations mandated by section 203. Implementation of those regulations was further delayed pending resolution of two lawsuits filed by several tire manufacturers. The regulations finally became operative in 1979 and remained in effect until 1983 when, following a rulemaking proceeding, NHTSA suspended indefinitely what it conceded was “the most meaningful characteristic [of the tire grading program] from a consumer standpoint” — the treadwear grading requirements. Petitioners Public Citizen and the Center for Auto Safety (hereinafter referred to collectively as Public Citizen) claim that the suspension violated section 203 of the Act and the Administrative Procedure Act, 5 U.S.C. § 551 et seq. (1982). We grant Public Citizen’s petition for review and hold that NHTSA’s decision to suspend the treadwear grading program was arbitrary and capricious.

Background

In 1966, Congress enacted section 203 which provides in part:

In order to assist the consumer to make an informed choice in the purchase of motor vehicle tires, within two years after September 9, 1966, the [agency] shall, through standards established under subchapter I of this chapter, prescribe by order, and publish in the Federal Register, a uniform quality grading system for motor vehicle tires.

15 U.S.C. § 1423 (emphasis added). The language of the statute and its legislative history indicate that section 203 was viewed primarily, if not solely, as a consumer provision. See B.F. Goodrich Co. v. Department of Transportation, 541 F.2d 1178, 1184 (6th Cir.1976), cert. denied, 430 U.S. 930, 97 S.Ct. 1549, 51 L.Ed.2d 773 (1977) (Goodrich I).

An advance notice of proposed rulemaking under section 203 was issued in 1968, but NHTSA did not promulgate final tire grading regulations until it was forced to do so as a result of a lawsuit brought by a consumer group. Nash v. Brinegar, Civil Action No. 177-73 (D.D.C. May 2, 1974). We review that early history only briefly here, summarizing what another court of appeals has described as a “strange record of delay and nonfeasance on the part of administrators charged with enforcing a regularly adopted statute of the United States.” Goodrich I, 541 F.2d at 1180.

In its initial notice in 1968, the agency sought comments on numerous tire characteristics that could be included in a tire grading program, including “treadwear and carcass durability.” 33 Fed.Reg. 7,261 (1968). In 1971, the agency proposed a rule which would have graded tires in four areas of performance, postponing for later consideration the areas of traction and treadwear. 36 Fed.Reg. 18,751 (1971). This proposal was withdrawn in 1972 after “considerable negative industry response.” Goodrich I, 541 F.2d at 1184. See 37 Fed.Reg. 7,903 (1972). A year later, the agency issued a revised proposal which more closely resembled the regulations finally promulgated by NHTSA. The 1973 proposal focused on three characteristics of tires— treadwear, traction and high speed performance — which agency commenters had indicated were of the greatest interest to consumers. 38 Fed.Reg. 6,194 (1973). In 1974, NHTSA issued and subsequently revoked final tire grading regulations. See 39 Fed.Reg. 1,037 (1974); 39 Fed.Reg. 16,469 (1974). Only after the consent decree in Nash v. Brinegar, did NHTSA issue a notice of a new proposal which was amended, revised, and finally promulgated in May of 1975. See 40 Fed.Reg. 23,073 (1975).

Under the 1975 regulations, treadwear grades were to be based on the projected mileage of a tire as tested on a course located in San Angelo, Texas. The manufacturers were responsible for testing their own tires, but NHTSA also used the course for compliance testing. Under the regulations, each tire completed a total of 6,400 miles on the course and was tested for treadwear every 800 miles, after an initial “break-in” period. To minimize variations in treadwear caused by factors other than the quality of the tires themselves, the test cars traveled in convoys with one car equipped with “course monitoring tires” (CMTs) which were used to measure changes in the road and in weather conditions. In addition, cars had to meet certain weight requirements, be aligned according to the manufacturers’ specifications, and travel in convoys at a constant speed with regular changes in drivers and in the positions of the cars and tires.

The regulations did not require tire manufacturers to use the actual grade derived from the federal road test in advertising their tires to the public. Rather, the federal test grade reflected a minimum level of performance that the manufacturers would guarantee to their customers. The manufacturers were free to assign lower grades than those determined by the federal test, but they were “expected to ensure that substantially all the tires marked with a particular grade are capable of achieving it.” 40 Fed.Reg. 23,075 (1975).

The 1975 tire grading regulations were stayed pending judicial review after several tire manufacturers filed lawsuits challenging the regulations. In Goodrich I, the court upheld the essential elements of the tire grading system, but remanded to the agency for consideration of several minor issues. In answer to the tire manufacturers’ claim that the information provided consumers under the program would be “affirmatively misleading,” 541 F.2d at 1184, the court concluded that “no test procedures designed to grade millions of tires are going to approach perfection,” id. at 1188, and that section 203 requires “reasonably fair and reasonably reliable grading procedures, not theoretical perfection.” Id. at 1189.

Pursuant to the court’s order, NHTSA revised its regulations and re-issued them in 1978, excluding radial tires from the treadwear requirements of the rule because new data suggested a problem in the method that had been chosen to measure the wear rate of radials. 43 Fed.Reg. 30,542 (1978). Following another challenge by the tire manufacturers, the court again upheld the rule, this time in all respects. B.F. Goodrich Company v. Department of Transportation, 592 F.2d 322 (6th Cir. 1979) (Goodrich II).

The dreary history of tire testing reflects a thirteen year gap between the policy decision by Congress and the beginning of implementation by NHTSA. The uniform tire quality standards finally approved by the court went into effect for bias tires on April 1, 1979, for bias belted tires on October 1, 1979, and for radial tires on April 1, 1980. See 48 Fed.Reg. 5,690 (1983). They established grading standards for three tire characteristics — treadwear, traction, and heat resistance. The only portion of those regulations suspended by NHTSA in 1983 relate to treadwear; the traction and heat resistance requirements are not at issue in this case.

In February, 1981 NHTSA issued an advance notice of proposed rulemaking (the grade assignment proposal) to standardize the grade assignment practices used by manufacturers under the tire grading program. The notice was designed “to improve the uniformity of the grading system by eliminating the differences which now exist in the methods by which manufacturers assign tire grades.” 46 Fed.Reg. 10,-429 (1981). Specifically, NHTSA proposed a standardized statistical procedure for grade assignment. Id. The purpose of the proposal was to minimize the extent to which “different manufacturers, faced with similar test results, assign[ ] different grades to their tires ... reducing] the value of the UTQG [Uniform Tire Quality Grading] information to consumers.” Id. To our knowledge, NHTSA has taken no further action on the grade assignment proposal.

In a separate action later that spring, NHTSA announced that it was considering modifications of numerous regulations, including the tire grading system, to help the U.S. auto industry. 46 Fed.Reg. 21,203 (1981). In a fact-sheet accompanying its notice of intention to amend the tire grading standards, NHTSA indicated that it proposed to “retain treadwear requirements but delete and reserve for future possible rulemaking grading based upon traction and heat resistance.” The agency explained that “[t]he most meaningful characteristic from a consumer standpoint would appear to be treadwear.”

Although the agency took no action on the grade assignment proposal, in 1982 it proposed suspension of the treadwear grading standards (the suspension proposal). Relying on an on-site review of the San Angelo test grounds as the “principal basis” for its action, the agency proposed to suspend “on an interim basis” the tread-wear grading requirements. 47 Fed.Reg. 30,084, 30,086 (1982). NHTSA stated that the primary purpose of the suspension was “to avoid dissemination of potentially misleading tire grading information to consumers,” but also emphasized its desire “to minimize the imposition of unwarranted compliance costs on industry and consumers.” 47 Fed.Reg. 30,084 (1982). The agency listed several tentative sources of “data variability” that allegedly undermined the reliability of the test results obtained at San Angelo. See 47 Fed.Reg. 30,086-87 (1982). NHTSA indicated that it intended to conduct further testing to determine whether various factors causing test result variability could be controlled, but concluded that the treadwear grading requirements must be suspended until such tests were completed:

Until such testing is completed, the agency tentatively concludes that continuation of costly testing and grading under a system the results of which are at best questionable and which are in all • likelihood misleading to consumers cannot be justified.

The agency believes that the potential for injury to consumers through reliance on misleading information, together with the unjustified testing costs imposed on industry, create a situation in which specific, readily identifiable damage to the public interest would occur in the absence of a suspension of the treadwear grading requirements.

47 Fed.Reg. 30,088 (1982).

Following a comment period, NHTSA issued a final rule suspending the treadwear grading requirements (the suspension). 48 Fed.Reg. 5,690 (1983). The agency gave two reasons for the suspension. First, NHTSA’s primary reason was the same it had given in the suspension proposal — the existence of “variability in treadwear test results unrelated to actual differences in measured or projected performance.” 48 Fed.Reg. 5,692 (1983). The agency identified several factors that led to “unreasonable” variations in test results, including problems with scales, tread depth probes, wheel alignment, vehicle maintenance and use, and variations between drivers and in the weather conditions at San Angelo. See 48 Fed.Reg. 5,694-96 (1983). Second, NHTSA justified the suspension on the ground that the grades actually assigned by tire manufacturers varied considerably and did not reflect accurately the federal test results. 48 Fed.Reg. 5,692 (1983).

NHTSA also addressed several suggested changes in the tire grading system proposed by Uniroyal Tire Co. (Uniroyal) in a petition filed with the agency in January, 1983. The agency stated that it could not “conclude that Uniroyal [sic] proposal would reduce test variability to acceptable levels” and that suspension was required until the agency completed further research and testing. 48 Fed.Reg. 5,696 (1983).

In a decision that is crucial to our disposition of this case, NHTSA explicitly considered and rejected the suggestion that it continue the treadwear grading system until improvements in the grading process could be developed. 48 Fed.Reg. 5,697 (1983). Relying on the language of the court in Goodrich I, the agency concluded that the procedures being used to test for treadwear were “neither reasonably fair ... nor reasonably reliable____” Id. According to NHTSA, the procedures were “not reasonably reliable because of the excessive magnitude of the overall variability” and because the grades produced under the procedures “appear[ed] to be affirmatively misleading.” Id. The agency concluded:

Fully cognizant of the view expressed by [one of the commenters] that some information, or a less than perfect-functioning system, is better than no information or no system at all, the agency cannot agree. The agency concludes that the government has a superior duty not to participate in such an effort to the probable detriment of consumers, who have every reason to demand, and must necessarily be expected to assume, that such participation implies and connotes, a higher level of certainty that [sic] the agency can now find in this well-intentioned effort.

48 Fed.Reg. 5,698 (1983).

Following issuance of the rule suspending the treadwear regulations, Public Citizen and Uniroyal petitioned for reconsideration. NHTSA denied those petitions, but granted a separate petition by Uniroyal to engage in rulemaking to improve the reliability of the treadwear tests. 48 Fed.Reg. 32,588 (1983). Public Citizen subsequently petitioned this court for review of the final rule suspending the treadwear grading requirements, claiming that the suspension violated section 203 of the Act and was contrary to the Administrative Procedure Act because it was “arbitrary and capricious, [was] not supported by the evidence in the record, and was issued despite the availability of alternative courses of action which would have been consistent with the requirements” of both the APA and the National Traffic and Motor Vehicle Safety Act.

Discussion

A. Standard of Review

In Motor Vehicle Manufacturers Association v. State Farm Mutual Automobile Insurance Company, — U.S. -, 103 S.Ct. 2856, 77 L.Ed.2d 443 (1983), the Supreme Court held that rescission of an agency rule is subject to the same standard of review as promulgation of a rule. The Court explained that “the direction in which an agency chooses to move does not alter the standard of judicial review established by law.” At -, 103 S.Ct. at 2866. In explaining what constitutes a revocation of a rule, the Court stated:

Revocation constitutes a reversal of the agency’s former views as to the proper course. A “settled course of behavior embodies the agency’s informed judgment that, by pursuing that course, it will carry out the policies committed to it by Congress. There is, then, at least a presumption that those policies will be carried out ■ best if the settled rule is adhered to.”

Id. (quoting Atchison, T. & S. F.R. Co. v. Wichita Bd. of Trade, 412 U.S. 800, 807-08, 93 S.Ct. 2367, 2374-2375, 37 L.Ed.2d 350 (1973)).

NHTSA concedes that its elimination of the treadwear grading requirements is subject to judicial review under the APA, 5 U.S.C. § 706(2)(A), and cannot be sustained if it was arbitrary or capricious. The agency contends, however, that its action is subject to a less stringent standard of review than that which the Supreme Court applied in State Farm because it has temporarily suspended rather than permanently revoked the treadwear standards. The parameters of this less stringent review in which the agency would have us engage are not clear. NHTSA claims that, under this standard, “a less complete discussion and consideration of alternative courses of agency action, a lesser degree of factual certainty, and a less precise explanation of the bases for the decision” are needed to uphold its decision. We disagree and conclude that NHTSA’s action should be treated as a revocation, subject to the standard of review set forth in State Farm.

In the context of a thirteen year gap twixt law and enforcement, it is hard to view the suspension as of short moment. In any event, an “indefinite suspension” does not differ from a revocation simply because the agency chooses to label it a suspension. Although the agency’s characterization may provide some guidance in determining the nature of the challenged action, “it is the substance of what the [agency] has purported to do and has done which is decisive.” Environmental Defense Fund, Inc. v. Gorsuch, 713 F.2d 802, 816 (D.C.Cir.1983) (quoting Columbia Broadcasting System, Inc. v. United States, 316 U.S. 407, 416, 62 S.Ct. 1194, 1199, 86 L.Ed. 1563 (1942)). See also Natural Resources Defense Council v. U.S.E.P.A., 683 F.2d 752, 763 n. 23 (3rd Cir. 1982) (“an indefinite postponement which is never terminated is tantamount to a revocation”).

NHTSA’s suspension of the treadwear grading requirements is a paradigm of a revocation. It will remain in effect indefinitely unless and until the agency completes a full notice and comment rulemaking proceeding to reinstate a treadwear grading program. In a 180 degree reversal of its “former views as to the proper course,” State Farm, — U.S. at -, 103 S.Ct. at 2866, NHTSA has taken almost the identical position that the tire manufacturers had taken, and that NHTSA had opposed, in the Goodrich I litigation. There, the manufacturers argued that “treadwear ... tests devised by the agency do not correlate with tire performance on the road and do not produce uniform and reliable results,” 541 F.2d at 1183, that “the information provided ... ‘will in many instances be affirmatively misleading,’ ” id. at 1184, and that the “variations in the testing devices ... are sufficient to invalidate ... the ... treadwear tests.” Id. at 1186. Those are the same arguments now put forward by NHTSA. Recognizing the significance of the court’s approval of the treadwear requirements in Goodrich I, NHTSA asserted in the suspension proposal that “several significant sources of data variability ... may undermine ... the reliability of the conclusions and representations made by the agency” in that litigation. 47 Fed.Reg. 30,086 (1982). It repeated that assertion in the final suspension decision. See 48 Fed.Reg. 5,692 (1983) (the statements made by NHTSA in the course of the Goodrich litigation “have been further undermined by information now available to the agency”). However permanent or impermanent the suspension, the reasoning asserted reflects a complete reversal of NHTSA’s prior position.

The inquiry required by the APA is a familiar one. The “agency must cogently explain why it has exercised its discretion in a given manner.” State Farm, at -, 103 S.Ct. at 2869. In State Farm, the Court summarized the scope of review under the arbitrary and capricious test: Normally, an agency rule would be arbitrary and capricious if the agency has relied on factors which Congress has not intended it to consider, entirely failed to consider an important aspect of the problem, offered an explanation for its decision that runs counter to the evidence before the agency, or is so implausible that it could not be ascribed to a difference in view or the product of agency expertise.

Id. at -, 103 S.Ct. at 2867. We have stated that, when the action involves a change in a settled course of agency behavior, “the court should be satisfied both that the agency was aware it was changing its views and has articulated permissible reasons for that change, and also that the new position is consistent with the law.” NAACP v. FCC, 682 F.2d 993, 998 (D.C.Cir. 1982). Moreover, “we will demand that the [agency] consider reasonably obvious alternative^] ... and explain its reasons for rejecting alternatives in sufficient detail to permit judicial review.” Natural Resources Defense Council v. SEC, 606 F.2d 1031, 1053 (D.C.Cir.1979) (footnote omitted).

We need not decide whether the tire quality grading standards are “safety” standards subject to review under section 103(a) of the Act, 15 U.S.C. § 1392(a) (1982), or consumer information standards subject to the less stringent requirements of section 112(d) of the Act, 15 U.S.C. § 1401(d) (1982), because our conclusion would be the same under either section. See Goodrich I, 541 F.2d at 1183 (where the court assumed, without deciding, that section 103 applies to the tire quality grading standards).

B. Analysis

In reviewing NHTSA’s decision to suspend indefinitely the treadwear grading requirements, we must begin with the broader statutory and procedural context from which that decision arose. Section 203 required the promulgation of tire grading regulations by September, 1968. 15 U.S.C. § 1423. Yet the agency procrastinated for nine years before a lawsuit brought by a consumer group forced it to promulgate those regulations. Moreover, NHTSA itself has recognized the importance of treadwear ratings in a consumer information program. In its initial notice of intention to amend the tire grading standards in 1981, NHTSA stated that “the most meaningful characteristic from a consumer standpoint would seem to be treadwear.” Similarly, the Final Regulatory Evaluation upon which the agency relied in issuing the suspension, stated that “treadwear grading is potentially the most useful part of the grading system.” Final Regulatory Evaluation: Revisions to the Uniform Tire Quality Grading Standard (January, 1983) at IV-2 (hereinafter cited as Final Regulatory Evaluation). Although eighteen years have elapsed since section 203 was enacted, the centerpiece of NHTSA’s consumer information program in the tire field, according to NHTSA’s own characterization, was in place for less than four of those years.

NHTSA justified the instant suspension on the ground that the treadwear grading system was affirmatively misleading consumers about actual treadwear of tires, thereby frustrating the purpose of section 203. 48 Fed.Reg. 5,690 (1983). NHTSA claimed that the unreliability of treadwear information was caused by two factors: variability in test results caused by the test procedures themselves and variability in grade assignment practices by the tire manufacturers. Id. Rather than attempt to correct these deficiencies in the tread-wear grading program, NHTSA decided to suspend the program altogether while it further studied the pervasiveness of the variability problem and identified specific sources of variability.

We find that decision to be arbitrary and capricious for two reasons. First, the record does not support NHTSA’s finding that the magnitude of the variability problem justified suspending the treadwear grading requirements, rather than retaining them while improvements in th'e test procedures and in the manufacturers’ grade assignment practices could be developed. Second, NHTSA failed to explain why alternatives, which the rulemaking record indicates were available to the agency, could not correct many of the variability problems that NHTSA had identified.

1. The Basis for the Suspension: the Magnitude of the Variability Problem

To justify the indefinite suspension of “the most meaningful component” of the consumer information program, NHTSA had to overcome the “presumption ... against changes in current policy that are not justified by the rulemaking record.” State Farm, — U.S. at-, 103 S.Ct. at 2866 (emphasis in original). NHTSA failed to demonstrate that the testing and grading procedures once embraced by the agency and approved by a reviewing court were no longer providing “reasonably fair and reasonably reliable” information. Goodrich I, 541 F.2d at 1189. Although the rulemaking record certainly would support a decision to improve the treadwear program, it does not support the agency’s conclusion that the information was so inaccurate that consumers would be better informed without any treadwear information than they would be with the treadwear ratings provided under the program.

The data relied upon by NHTSA provided an insufficient basis for eliminating the treadwear grading requirements. From the time the treadwear regulations were first proposed in 1974, the tire manufacturers consistently have argued that variability in the treadwear test procedures would lead to inaccurate and misleading test results. The manufacturers submitted data to NHTSA and to the court of appeals during the Goodrich litigation that indicated significant variations in test results when similar tires were tested under the federal procedures. See 47 Fed.Reg. 30,085 (1982). NHTSA asserted at the time that the data provided by the manufacturers did not undermine the reliability of the test procedures. Following implementation of the regulation, the manufacturers continued to submit information to NHTSA in support of their argument that, when similar tires were tested under the federal procedures, the results varied considerably. Id. Partly in response to those submissions, NHTSA decided to conduct its own on-site review of the treadwear test procedures used in San Angelo, Texas. That review provided the “principal basis” for NHTSA’s proposed suspension of the treadwear grading requirements in 1982. 47 Fed.Reg. 30,086 (1982).

NHTSA’s on-site review at San Angelo did not provide any estimate of the overall magnitude of the variability problem, but did identify several sources of data variability in the testing procedures. NHTSA discussed those sources of variability in the suspension proposal, explaining that they related to “the instrumentation and practices used in measurement, in the calibration and use of vehicles, and in the performance of fleet drivers, and to the effect of weather conditions.” Id. Specific instrumentation problems identified by the agency included problems with scales, treadwear depth probes and wheel alignment. In addition to the problems with the testing procedures, the agency expressed concern over the “substantial burden” which the treadwear grading program placed on the tire industry — a burden NHTSA estimated to be around $10 million a year. 47 Fed.Reg. 30,088 (1982). The proposal asserted “that suspension of the treadwear rating requirements is necessary primarily to avoid dissemination of potentially misleading tire grading information to consumers, but also to minimize the imposition of unwarranted compliance costs on industry and consumers.” 47 Fed.Reg. 30,084 (1982).

In adopting the final rule suspending the treadwear grading program, NHTSA relied principally on variability caused by the test procedures, including those sources identified in the proposal. See 48 Fed.Reg. 5,692, 5,694-96 (1983). The agency, however, also placed considerable reliance on a separate factor not raised in the suspension proposal: variability caused by the manufacturers’ grade assignment practices. Under the existing regulations, manufacturers were free to undergrade their tires. To the extent that tires with higher tread-wear performance received lower grades than tires with inferior treadwear performance, NHTSA concluded that consumers were being misled.

In concluding that treadwear grades were “affirmatively misleading [consumers] in their selection of new tires,” 48 Fed.Reg. 5,690 (1983), NHTSA thus depended on two factors: the extent of the mathematical variability between federal test results and actual treadwear performance; and the extent of the mathematical variability between federal test results and the grades actually assigned by the manufacturers. NHTSA has tended to mesh those two factors together in justifying the suspension. As we explain below, however, to the extent that NHTSA relied on the variability arising from the manufacturers’ grade assignment practices, its action was arbitrary and capricious. As for the evidence indicating flaws in the test procedures themselves, NHTSA’s statement explaining the suspension reflects considerable uncertainty concerning the extent to which those flaws lead to significant inaccuracies in the test results. In any event, NHTSA has not shown that these flaws were different in kind or quantity from those that have been pressed on the agency for the past ten years. They were old and known problems that had been found insufficient to preclude use of the procedures in the past, and the evidence in the record does not warrant NHTSA’s dramatic change of position. In light of the mandate set forth in section 203, NHTSA’s concern over the “potential cumulative effect” of “sources of potential variability” provides an insufficient basis for a total suspension of the treadwear standards. See 48 Fed.Reg. 5,691 (1983).

a. Grade Assignment Practices

Although NHTSA claimed that the variability caused by the test procedures themselves formed the “principal basis” for its suspension decision, the agency’s statement justifying the decision focused primarily on data relating to the treadwear grades actually assigned by the tire manufacturers and made available to the public. This reliance on data that reflected the differences in manufacturers’ grade assignment practices — differences that the tread-wear regulations clearly permitted — constitutes the most serious flaw in NHTSA’s reasoning;

The manufacturers’ practices were consistent with the purpose of the program which was to establish a minimum level of treadwear performance that the tire manufacturers would guarantee to their customers. In promulgating the treadwear grading regulations in 1975, the agency explained: “In the NHTSA’s judgment, the most valuable single grade for the consumer is one corresponding to a level of performance which he can be reasonably certain is exceeded by the universe population for that tire brand and line.” 40 Fed.Reg. 23,075 (1975). The minimum performance standard was approved by the court in Goodrich I.

The evidence relied upon by NHTSA to justify the suspension indicates that the original goal of the treadwear program had been met. See 48 Fed.Reg. 5,693 (1983); Final Regulatory Evaluation at III — 2. Although the data indicated that manufacturers faced with comparable test results assigned very different grades to their tires, it also demonstrated that the manufacturers were consistently undergrading their tires and that most of the discrepancies which existed were found in the amount of undergrading that occurred. See Final Regulatory Evaluation at III — 2.

NHTSA does not now claim that the program failed to guarantee consumers a minimum treadwear performance level. Instead the agency has redefined the purpose of the program, asserting now that any treadwear grading system must ensure a close correlation between federal test results and the actual treadwear grades assigned by manufacturers. See 48 Fed.Reg. 5,697 (1983). NHTSA explained this change in its brief:

While it is true that early agency rule-making notices on tire quality grading specified minimum grades which every tire would be expected to exceed, the agency rejected this concept in its notice of proposed rulemaking on the establishment of a uniform grade assignment procedure.

Brief of Respondent at 41-42. We remind the agency that the regulations approved by the court in the Goodrich cases were premised on assuring consumers a minimum treadwear performance level for all tires. The agency cannot alter that objective through a notice of proposed rulemaking — the 1981 grade assignment proposal— upon which it has never acted. If the purpose of the program was changed, it was changed in the course of the instant rulemaking proceeding and subsequent litigation.

In essence, NHTSA has developed two reasonable alternative approaches to implementing a consumer information program under section 203. The original policy was to guarantee a minimum level of treadwear performance for each tire; the policy now favored by NHTSA is to enable consumers to compare the relative treadwear performance of different tires. We express no views on the relative merits of those two policies. Yet, once having established a program to serve the first policy, NHTSA cannot justify dismantling that program solely because it failed to serve the second policy. Although NHTSA’s current view of the purpose of treadwear standards may be preferable as a policy matter, it has not shown that the original purpose was unreasonable or was not being served by the treadwear program. Unless NHTSA can make that showing, its total elimination of any treadwear program based on its now preferred policy is arbitrary and capricious. NHTSA did not explain why the existing program could not have continued while the agency sought to implement its new policy of moving toward relative comparability. Without showing that the old policy is unreasonable, for NHTSA to say that no policy is better than the old policy solely because a new policy might be put into place in the indefinite future is as silly as it sounds.

b. The Test Procedures

NHTSA also justified the suspension on the ground that the test procedures themselves were unreliable. In estimating the magnitude of the variability caused by the test procedures, NHTSA relied upon data provided by the tire manufacturers and data from its own research. The tire manufacturers’ data indicated that retests of similar tires “produce[d] differences in test results of up to 80 points.” 48 Fed.Reg. 5,693 (1983). (The test results generally ranged between 100 to 300 points.) Yet the tire manufacturers had presented similar data to NHTSA when they opposed the regulations during the course of the Goodrich litigation. See 47 Fed.Reg. 30,085 (1982). That data had indicated variability in test results as high as 110 points. Id. Nowhere in the suspension proposal or in the final rule did NHTSA indicate why the data currently supplied by the tire manufacturers is more persuasive than similar data which NHTSA and a reviewing court had rejected in the Goodrich litigation. Yet the manufacturers’ data appeared to provide a significant basis for the suspension decision.

NHTSA’s own data concerning the magnitude of variability caused by the test procedures was thin. NHTSA’s only discussion of that data was a single statement that its “own compliance test data include examples of significant test result variability.” 48 Fed.Reg. 5,693 (1983) (emphasis added) (footnote omitted). In the statement accompanying the suspension, NHTSA did not quantify this “significant” variability problem, nor did it indicate that the “examples” demonstrated pervasive and consistent inaccuracies arising from the test procedures. Furthermore, instead of providing statistical evidence of the magnitude of the variability attributable to the test procedures, NHTSA described, in considerable detail, how specific variables in the test procedures could lead to variability in test results. Yet the fact that several variables in the test procedures might cause some variations in test results does not indicate that the magnitude of the resulting variations is so great as to justify suspending the treadwear program.

NHTSA now seeks to provide statistical support for its decision by relying upon a study which was completed in May of 1983, three months after the suspension went into effect. Although the test data was available at the time NHTSA reviewed petitioner’s and Uniroyal’s reconsideration petitions and was emphasized by NHTSA in its brief and in oral argument before this court, the data was not available when the final rule challenged here went into effect. Leaving aside the question of the validity of such a post hoc rationalization, see, e.g., American Textile Manufacturers Inst. v. Donovan, 452 U.S. 490, 539, 101 S.Ct. 2478, 2505, 69 L.Ed.2d 185 (1981), the study cited by NHTSA concluded only that the average grade of nominally identical radial tires “should not shift more than 24 percent” in a retest involving a different test convoy under the federal procedures. Thus the 24 percent variability figure cited repeatedly by NHTSA at oral argument represents the maximum level of variations in test results that could occur under the test procedures. There is no evidence that such variability was found to occur regularly or consistently; the average level of variability, in fact, may have been significantly lower than 24 percent.

NHTSA failed to present sufficient data to support its conclusion that the federal test procedures were not “reasonably fair and reasonably reliable.” Most of the data discussed in the suspension decision involved analysis of grades actually assigned by the tire manufacturers and thus could not be used to measure the extent of variability caused by the test procedures themselves. NHTSA’s own review of test procedure variability formed the “principal basis” for the suspension proposal, but — at the time of the final decision — indicated only that there were “examples of significant test result variability.” 48 Fed.Reg. 5,693 (1983) (emphasis added). Finally, NHTSA appeared to rely on data provided by the tire manufacturers without explaining how, or even if, that data differed from that which the agency had found to be unpersuasive during the Goodrich litigation. In light of the mandate of section 203 and the court of appeals’ approval of the treadwear test procedures in the Goodrich cases, this evidence cannot form the basis for an indefinite suspension of the most meaningful component of the consumer information program.

2. Failure to Give Adequate Consideration to Alternative Solutions to the Variability Problems

NHTSA’s action was also arbitrary and capricious because the agency failed to pursue available alternatives that might have corrected the deficiencies in the program which the agency relied upon to justify the suspension. At the very least, NHTSA was required to explain why those alternatives would not correct the variability problems it had identified.

The deficiencies cited by NHTSA fall into two main categories: variability in test procedures and variability in the manufacturers’ grade assignment practices. The agency itself has indicated that both variability problems were correctable. In the area of test procedures, for example, the agency was able to “quantify the effect of only some” of the sources of variability. 48 Fed.Reg. 5,692 (1983). Yet when NHTSA denied Uniroyal’s petition for reconsideration of the final rule, it conceded that the three primary sources of variability which it was able to quantify — tread depth probes, scales and wheel alignment— “appear to be readily correctible [sic].” 48 Fed.Reg. 32,591 (1983). Those three problems of instrumentation provided a central basis for the agency’s initial suspension proposal and for its final decision.

Another source of variability in the test procedures cited by NHTSA was the loading procedures utilized by the testing companies at San Angelo: “Some testing companies allow the weight to be placed forward of the front wheels, rearward of the rear wheels or even on the vehicle exterior. In addition, some but not all companies place heavy deer guards on the front of their test cars.” 48 Fed.Reg. 5,695 (1983) (footnote omitted). NHTSA has failed to explain why this problem could not be corrected through regulations that standardized weight loading of test cars and that mandated a uniform policy on deer guards.

NHTSA attempted to justify its failure to correct any of the identified deficiencies on the ground that, until it completes further testing and research, it cannot know the “relative significance of the various sources of variability.” 48 Fed.Reg. 5,691 (1983). The agency explained that “[ojther sources are believed to exist and continue to be discovered.” 48 Fed.Reg. 5,692 (1983). NHTSA’s approach to the variability problem suggests that it was more concerned with uncovering imperfections in the tire grading system than with improving that system. NHTSA apparently would withhold any improvements in the treadwear program until its research enables it to quantify precisely how much each “potential” source of variability actually causes variability in test results. The agency’s approach cannot be squared with the court of appeals’ recognition in Goodrich I that “no test procedures designed to grade millions of tires are going to approach perfection.” 541 F.2d at 1188.

Similarly, assuming that NHTSA could have relied on the manufacturers’ grade assignment practices to justify the suspension, NHTSA made no effort to remedy the variability caused by those practices. The fact that NHTSA’s regulations permitted companies to intentionally undermine the reliability of the treadwear rating program by undergrading their tires does not support NHTSA’s contention that it is infeasible for those companies to assign grades that do correlate closely to federal test results.

NHTSA’s treatment of its 1981 proposal to develop a standardized statistical procedure for the assignment of grades is particularly noteworthy. In the suspension decision, NHTSA acknowledged that the 1981 grade assignment proposal deserved “special mention.” 48 Fed.Reg. 5,698 (1983). NHTSA explained its failure to act on that proposal as follows: “[Cjommenters on that proposal pointed out a variety of shortcomings, particularly with respect to its failure to properly account for undergrading. No commenter in the present [suspension] rulemaking proceeding has suggested that the procedure as proposed in February 1981 be adopted at this time.” 48 Fed.Reg. 5,698 (1983) (emphasis added). NHTSA does not claim that commenters opposed the adoption of any uniform grade assignment procedure, only the adoption of the specific proposal outlined in February of 1981. Furthermore, the suspension proposal issued in 1982 focused on the problems with test procedures, not with the manufacturers’ grade assignment practices. Thus the comments may not have been addressed to the assignment procedures. Finally, and most importantly, despite the assertion of “shortcomings” in the grade assignment proposal, the Final Regulatory Evaluation used to justify the suspension stated that NHTSA had “drafted a revised procedure” which would solve the primary flaw in the 1981 grade assignment proposal and that “a notice may be issued which would request comments on this revised procedure.” Final Regulatory Evaluation at III — 34. Instead of issuing such a notice, however, NHTSA proceeded to suspend the treadwear grading program indefinitely. NHTSA cannot rely on the manufacturers’ grading practices to justify the suspension, and then decline to act on its own “revised procedure” which apparently would correct the problems arising from those practices.

Despite NHTSA’s suggestion to the contrary, this is not a case where the challenger asks the agency to “include every alternative device and thought conceivable by the mind of man ... regardless of how uncommon or unknown that alternative may have been____” Vermont Yankee Nuclear Power Corp. v. Natural Resources Defense Council, 435 U.S. 519, 551, 98 S.Ct. 1197, 1215-1216, 55 L.Ed.2d 460 (1978). The agency concedes that several of the test procedure deficiencies appear to be “readily correctible [sic].” 48 Fed.Reg. 32,591 (1983). Furthermore, the agency never explained adequately the reasons for leaving the uniform grade assignment proposal dormant for two years while it instituted a separate rulemaking proceeding to suspend indefinitely the treadwear component of the tire grading system. Thus an important alternative to suspension, which NHTSA failed to consider adequately, was neither “uncommon” nor “unknown,” Vermont Yankee, 435 U.S. at 551, 98 S.Ct. at 1215, but had been proposed by NHTSA itself in a separate proceeding.

Conclusion

It is hard to imagine a more sorry performance of a congressional mandate than that carried out by NHTSA and its predecessors under section 203 of the Act. Between inaction, foot-dragging, and field reversal, the track record of agency performance is very muddy indeed.

In light of the express statutory command that a tire grading program be established by 1968, NHTSA’s “indefinite suspension” of the most meaningful component of that program was arbitrary and capricious. The agency did not “cogently explain” why suspension was necessary when the old system could have been retained while improvements were developed. State Farm, — U.S. at -, 103 S.Ct. at 2869. NHTSA failed to give serious consideration to specific measures that could correct the variability problems which it relied upon to justify the suspension. Its explanation for the suspension “runs counter to the evidence before the agency” which indicated that many of the identified deficiencies could be corrected. Id. at-, 103 S.Ct. at 2867.

NHTSA’s rationale for suspending the treadwear grading requirements read like a “how to” manual for the compulsive perfectionist. No grading procedure could meet the standards now embraced by the agency commanded by Congress to provide consumers with useful information on the performance of tires. NHTSA’s approach to fulfilling an undisputed statutory mandate is to withhold any regulation until every i is dotted and t is crossed. That is not what Congress commanded the agency to do, nor is it reasonable behavior by an agency established to execute policy, rather than achieve quantitative perfection in its execution. The agency itself, as well as a reviewing court, have given an altogether different reading than the one now advanced in defense of the agency's do-nothing administration of section 203 of the Act. We cannot imagine a more complete flouting of the statutory scheme.

We grant Public Citizen’s petition for review and hold that NHTSA’s indefinite suspension of the tire grading program was arbitrary and capricious.

It is so ordered.  