
    Catherine Natsu LANNING; Altovise Love; Belinda Kelly Dodson; Denise Dougherty; Lynne Zirilli, v. SOUTHEASTERN PENNSYLVANIA TRANSPORTATION AUTHORITY (SEPTA); (D.C. Civil No. 97-cv-00593). United States of America, v. Southeastern Pennsylvania Transportation Authority (D.C. Civil No. 97-cv-01161). Catherine Natsu Lanning, Altovise Love, Belinda Kelly Dodson, Denise Doughtery and Lynne Zirilli, Appellants in No. 98-1644. United States of America, Appellant in No. 98-1755.
    Nos. 98-1644, 98-1755.
    United States Court of Appeals, Third Circuit.
    Argued April 28, 1999.
    Filed June 29, 1999.
    
      Lisa M. Rau (Argued), Jules Epstein, Kairys, Rudovsky, Epstein, Messing & Rau, Philadelphia, PA, Michael Churchill, Public Interest Law Center of Philadelphia, Philadelphia, PA, for Appellants: Catherine Natsu Lanning; Altovise Love; Belinda Kelly Dodson; Denise Dougherty; Lynne Zirilli in No. 98-1644.
    Bill Lann Lee, Acting Assistant Attor: ney General, Dennis J. Dimsey, Esquire, Leslie A. Simon, Robert S. Libman, (Argued), United States Department of Justice, Civil Rights Division, Washington, DC, for Appellant in No. 98-1644.
    Saul H. Krenzel, (Argued), Saul H. Krenzel & Associates, Philadelphia, PA, for Appellee — SEPTA.
    Before: MANSMANN, WEIS and JOHN R. GIBSON, Circuit Judges.
    
      
       Honorable John R. Gibson, of ihe United StaLes Court of Appeals for the Eighth Circuit, sitting by designation.
    
   OPINION OF THE COURT

MANSMANN, Circuit Judge.

In this appeal, we must determine the appropriate legal standard to apply when evaluating an employer’s business justification in an action challenging an employer’s cutoff score on an employment screening exam as discriminatory under a disparate impact theory of liability. We hold today that under the Civil Rights Act of 1991, a discriminatory cutoff score on an entry level employment examination must be shown to measure the minimum.qualifications necessary for successful performance of the job in question in order to survive a disparate impact challenge. Because we find that the District Court did not apply this standard in evaluating the employer’s business justification for its discriminatory cutoff score in this case, we- will reverse the District Court’s judgment and remand for reconsideration under this standard. In light of our decision to remand on this basis, we need not reach the parties’ other assertions of error.

I.

This appeal comes to us from a judgment entered by the District Court in favor of the Southeastern Pennsylvania Transportation Authority (“SEPTA”) after a twelve day bench trial in January of 1998. Although the parties generally do not dispute the facts relevant to this appeal, to the extent there are favorable inferences to be drawn, we must draw them in favor of SEPTA as the prevailing party. In addition, because we must not disturb the factual findings of the District Court unless clearly erroneous, much of the following background is adopted from the facts as found by the District Court in its extensive memorandum opinion. See Lanning v. Southeastern Pennsylvania Tmnsp. Autk, 1998 WL 341605, at *l-*52 (E.D.Pa. June 25,1998).

A.

SEPTA is a regional mass transit authority that operates principally in Philadelphia, Pennsylvania. In 1989, in response to a perceived need to upgrade the quality of its transit police force, SEPTA initiated an extensive program designed to improve the department. As part of this program, SEPTA dedicated its transit officers primarily to patrolling the subways and limited their responsibilities to serve as guards at other SEPTA property. In addition, SEPTA increased the number of its officers from 96 to 200 and introduced a “zone concept” for the areas they patrol. SEPTA also began to consider methods by which it might upgrade the physical fitness level of its police officers.

In 1991, SEPTA hired Dr. Paul Davis to develop an appropriate physical fitness test for its police officers. Dr. Davis initially met with SEPTA officials in order to ascertain SEPTA’s objectives. Dr. Davis determined that SEPTA was interested in enhancing the level of fitness, physical vig- or and general productivity of its police force. Once Dr. Davis had determined SEPTA’s objectives, he went on a ride-along with SEPTA transit police and, over the course of two days and approximately twenty hours, rode the SEPTA trains in order to obtain a perspective on the expectations of SEPTA transit officers.

Dr. Davis next conducted a study with twenty experienced SEPTA officers, designated “subject matter experts” (SMEs), in an effort to determine what physical abilities are required to perform the job of SEPTA transit officer. From the responses Dr. Davis received in this study, he determined that running, jogging, and walking were important SEPTA transit officer tasks and that SEPTA officers were expected to jog almost on a daily basis.

Dr. Davis then asked the SMEs to determine what level of physical exertion was necessary to perform these tasks. ' The SMEs estimated that it was reasonable to expect them to run one mile in full gear in 11.78 minutes. Dr. Davis rejected this estimate as too low based upon his determination that any individual could meet this requirement. , Ultimately, Dr. Davis recommended a 1.5 mile run within 12 minutes. Dr. Davis explained that completion of this run would require that an officer possess an aerobic capacity of 42.5 mL/kg/min, the aerobic capacity that Dr. Davis determined would be necessary to perform the job of SEPTA transit officer.

Dr. Davis recommended that SEPTA use the 1.5 mile run as an applicant screening test. Dr. Davis understood that SEPTA officers would not be required to run 1.5 miles within 12 minutes in the course of their duties, but he nevertheless recommended this test as an accurate measure of the aerobic capacity necessary to perform the job of SEPTA transit police officer. Based upon Dr. Davis’ recommendation, SEPTA adopted a physical fitness screening test for its applicants which included a 1.5 mile run within 12 minutes. Beginning in 1991, the 1.5 mile run was administered as the first component of the physical fitness test; if an applicant failed to run 1.5 miles in 12 minutes, the applicant would be disqualified from employment as a SEPTA transit officer.

It is undisputed that for the years 1991, 1993, and 1996, an average of only 12% of women applicants passed. SEPTA’s 1.5 mile run in comparison to the almost 60%

of male applicants who passed. For the years 1993 and 1996; the time period in question in this litigation, the pass rate for women was 6.7% compared to a 55.6% pass rate for men. In addition, research studies confirm that a cutoff of 12 minutes on a 1.5 mile run will have a disparately adverse impact on women. SEPTA concedes that its 1.5 mile run has a disparate impact on women.

In conjunction with the implementation of its physical fitness screening test, SEPTA also began, testing incumbent officers for aerobic capacity in 1991. SEPTA policy requires any officer who fails any portion of the incumbentfitness test to retest on the failed element within three months. For each portion of the physical fitness test' that an incumbent officer fails, an interim goal is set for that officer.

SEPTA initially disciplined those incumbent officers who failed the fitness test. Due to protests by the incumbent officers’ union, however, SEPTA discontinued its discipline policy and instead implemented an incentive program that rewarded incumbent officers for passing their interim fitness goals.

According to SEPTA’s internal documents, significant percentages of incumbent officers of all ranks have failed SEPTA’s physical fitness test. By 1996, however, 86% of incumbent officers reached SEPTA’s physical fitness standards. SEPTA has never taken any steps to determine whether incumbent officers who have failed the physical fitness test have adversely affected SEPTA’s ability, to carry out its mission.

SEPTA has promoted incumbent officers who have failed some or all of the components of the physical fitness test. SEPTA has also given special recognition, commendations, and satisfactory performance evaluations to incumbent officers who have failed the physical fitness test. SEPTA has never disciplined, terminated, removed, reassigned, suspended or demoted any transit officer for failing to -perform the physical requirements of the job.

In addition, due to a clerical error, SEPTA hired a female officer in 1991 who failed the 1.5 mile run. This officer has subsequently been “decorated” by SEPTA and has been nominated repeatedly for awards such as Officer of the Year and Officer of the Quarter. SEPTA has commended her for her outstanding performance as a police officer and has chosen her to serve as one of SEPTA’s two defensive tactics instructors.

SEPTA employs an extremely low number of women in its transit police force. The District Court found that, as of July 1997, SEPTA employed only 16 women in its 234 member police force. Only two of these women hold ranks higher than that of patrol officer. See Lanning, 1998 WL 341605 at *27.

B.

On January 28, 1997, after satisfying all administrative prerequisites, five women who failed SEPTA’s 1.5 mile run brought a Title VII class action against SEPTA on behalf of all 1993 female applicants, 1996 female applicants and future female applicants for employment as SEPTA police officers who have been or will be denied employment by reason of their inability to meet the physical entrance requirement of running 1.5 miles in 12 minutes or less. On February 18, 1997, the Department of Justice, after conducting the appropriate investigation of SEPTA’s employment practices and meeting all conditions precedent under Title VII, also filed suit on behalf of the United States challenging SEPTA’s entire physical fitness test, including the 1.5 mile run. The District Court properly exercised jurisdiction over these Title VII actions challenging SEPTA’s hiring practices pursuant to 28 U.S.C. § 1331. On April 21, 1997, the District Court consolidated the two actions for all purposes up to and including trial.

After litigation commenced, SEPTA hired expert statisticians to submit reports examining the statistical relationship between the aerobic capacity of SEPTA’s officers and their number of arrests, “arrest rates” and number of commendations. In these reports, the statisticians concluded that there was a státistically significant correlation between high aerobic capacity and arrests, arrest rates and commendations. In addition, one expert prepared a report that estimated that 51.9% of the persons arrested for serious crimes between 1991 and 1996 had an aerobic capacity of 48 mL/kg/min and 27% of those arrested had an aerobic capacity of less than 42 mL/kg/min. Based upon these reports, the District Court held that SEPTA established that its aerobic capacity requirement is job related and consistent with business necessity. See Lan-ning, 1998 WL 341605 at *35.

The District Court also found support for this conclusion in an expert report submitted on behalf of SEPTA by Dr. Robert Moffatt. Dr. Moffatt simulated a training course and concluded that officers with aerobic capacities of 45 mL/kg/min or better had a 7-8% decrement in their ability to perform physical activities after a run of approximately three minutes; officers with an aerobic capacity of less than 45 mL/kg/min exhibited a 30% decrement in physical ability after the same run. The District Court found that Dr. Moffatt’s study demonstrates “the manifest relationship of aerobic capacity to the critical and important duties of a SEPTA transit police officer.... ” Id. at *68.

The District Court entered judgment in favor of SEPTA on all claims. Both the individual plaintiffs and the United States have taken appeals from the District Court’sfinal judgment, over which we have jurisdiction pursuant to 28 U.S.C. § 1291. On appeal, the individual plaintiffs assert that the District Court applied incorrect legal standards in evaluating SEPTA’s business necessity defense and that the District Court made erroneous findings of fact in determining that SEPTA’s 1.5 mile run does not violate Title VII. Although the United States initially challenged SEPTA’s implementation of its entire physical fitness test, on appeal the United States joins the individual plaintiffs in asserting error solely with respect to the District Court’s determination that SEPTA’s 1.5 mile run is not violative of Title VII. Because the issue of whether the District Court applied the correct legal standard is one of.law, oúr review is plenary.

II.

Under Title VII’s disparate impact theory of liability, plaintiffs establish a prima facie case of disparate impact by demonstrating that application of a facially neutral standard has resulted in a significantly discriminatory hiring pattern. See Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S.Ct. 2720, 53 L.Ed.2d 786 (1977). Once the plaintiffs have established a pri-ma facie case, the burden shifts to the employer to show that the employment practice is “job related for the position in question and consistent with business necessity....” 42 U.S.C. § 2000e-2(k). Should the employer meet this burden, the plaintiffs may still prevail if they can show that an alternative employment practice has a less disparate impact and would also serve the employer’s legitimate business interest. See Albemarle Paper Co. v. Moody, 422 U.S. 405, 425, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975).

Because SEPTA concedes that its 1.5 mile run has a disparate impact on women, the first prong of the disparate impact analysis is not at issue in this appeal. Rather, this appeal focuses our attention on the .proper standard for evaluating whether SEPTA’s 1.5 mile run is “job related for the position in question and consistent with business necessity” under the Civil Rights Act of 1991. Because the Act instructs that this standard incorporates only selected segments of prior Supreme Court jurisprudence on the business necessity doctrine, we examine the history of this doctrine in order to resolve this threshold issue.

A.

The disparate impact theory of discrimination under Title VII was judicially created in the seminal case of Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971). In embracing disparate impact, the Court recognized that Title VII was meant not only to proscribe overt discrimination, but also to prohibit “practices that are fair in form, but discriminatory in operation.” Griggs, 401 U.S. at 431, 91 S.Ct. 849. The Court made clear that what is required by Title VII is “the removal of artificial, arbitrary, and unnecessary barriers to employment when the barriers operate invidiously to discriminate on the basis of racial or other impermissible classification.” Id. Accordingly, the Court announced that in evaluating practices fair in form but discriminatory in operation, “[t]he touchstone is business necessity.” Id.

The Court, however, was unclear in articulating what an employer must show to demonstrate business necessity. The Court couched the employer’s burden in terms of showing that its practice is “related to job performance”; “bear[s] a demonstrable relationship to successful performance of the jobs for which it was used”; has “a manifest relationship to the employment in question”; and is “demonstrably a reasonable measure of job performance.” Id. .at 431, 432, 436, 91 S.Ct. 849. In applying this standard, however, the Court rejected the employer’s justification in Griggs that its standardized intelligence tests and diploma requirements generally would improve the overall quality of the work force in its power plant. The Court held that, although these requirements may be useful, they could not be used to exclude disproportionately a protected group when the employer failed to show that they do not test an applicant’s ability to perform the job in question. Id. at 431-33, 91 S.Ct. 849.

The Court next spoke to the issue of business necessity in Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975). In Albemarle, an employer sought to justify the use of verbal exam and high school diploma requirements in determining whether to promote employees to more skilled positions in its paper mill. Albemarle, 422 U.S. at 408-11, 95 S.Ct. 2362. In preparation for trial, the employer hired an industrial psychologist to complete validation studies showing that the tests were job related because they had a statistically significant correlation with supervisorial ratings in several groups of the jobs in question. Id. at 429-30, 95 S.Ct. 2362. The Court, nevertheless, rejected the employer’s contention that its requirements were job related.

The Court held that “discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be ‘predicative of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.’” Id. at 431, 95 S.Ct. 2362 (quoting 29 CFR § 1607.4(c)). In so holding, the Court noted that the Equal Employment Opportunity Commission (EEOC) Guidelines for professional standards of test validation are entitled to great deference in determining whether an employer has demonstrated that its requirements are job related. Id. at 430-31, 95 S.Ct. 2362. The Court rejected the employer’s validation studies as inadequate in several respects under the EEOC Guidelines. For example, the Court rejected the studies because they focused on the most qualified employees near the top of the line of progression, stating:

The fact that the best of those employees working near the top of a line of progression score well on a test does not necessarily mean that that test, or some particular cutoff score on the test, is a permissible measure of the minimal qualifications of new workers entering lower level jobs.

Id. at 434, 95 S.Ct. 2362. The Court accordingly held that consideration must be given to the possible use of testing as a promotion device rather than as a screen for entry into lower level jobs. Id. Due to several inadequacies of the employer’s validation studies, the Court held that the employer had failed to show that its requirements were job related to the position in question. Id. at 435-36, 95 S.Ct. 2362.

The next Title VII case to raise the business necessity issue for the Court’s consideration was Dothard v. Rawlinson, 433 U.S. 321, 97 S.Ct. 2720, 53 L.Ed.2d 786 (1977). In Dothard, female applicants challenged a prison’s minimum height and weight requirements for its prison guard positions as violative of Title VII. On the issue of business necessity, the Court made clear that “a discriminatory employment practice must be shown to be necessary to safe and efficient job performance to survive a Title VII challenge.” Dothard, 433 U.S. at 332 n. 14, 97 S.Ct. 2720. The Court rejected the prison’s assertion that height and weight requirements have a relationship to the unspecified amount of strength essential to effective job performance, holding that if strength is a bona fide job related quality, the prison could test for it directly by adopting and validating a fairly administered strength test. Id. at 331-32, 97 S.Ct. 2720.

The Court’s next definitive statement on the business necessity doctrine is found in Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989), where a majority of the Court deviated from its previous business necessity jurisprudence in adopting a more liberal test for business necessity. According to the Court:

[T]he dispositive issue is whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer. The touchstone of this inquiry is a reasoned review of the employer’s justification for his use of the challenged practice. A mere insubstantial justification in this regard will not suffice, because such a low standard of review would permit discrimination to be practiced through the use of spurious, seemingly neutral employment practices. At the same time, though, there is no requirement that the challenged practice be “essential” or “indispensable” to the employer’s business for it to pass muster....

Wards Cove, 490 U.S. at 659, 109 S.Ct. 2115 (citations omitted). In addition, the Court made clear that at the business necessity stage of Title VII litigation, the employer bears only the burden of production; the burden of persuasion remains on the disparate impact plaintiff at all times. Id. As we have previously recognized, the Wards Cove standard may reasonably be viewed as a departure from the more stringent business necessity standard under Griggs and its progeny. See Newark Branch, N.A.A.C.P. v. Town of Harrison, New Jersey, 940 F.2d 792, 803 (3d Cir.l991)(noting that Wards Cove “arguably diluted the business necessity burden” under Griggs).

B.

In response to Wards Cove, Congress enacted the Civil Rights Act of 1991. One of the primary purposes of the Act was “to codify the concepts of ‘business necessity’ and ‘job related’ enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).” Civil Rights Act of 1991, Pub L. No. 102-166, § 3, 105 Stat. 1071, 1071 (1992). As part of this codification of Griggs, the Act made clear that both the burden of production and the burden of persuasion in establishing business necessity rest with the employer. See 42 U.S.C. § 2000e-2(k).

In addition, the Act codified the business necessity doctrine, by using the following language:

An unlawful employment practice based on disparate impact is established under this subchapter only if—
(i) a complaining" party demonstrates that a respondent uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin and the respondent fails to demonstrate that the challenged practice is job related for the position in question and consistent ivith business necessity; or
(ii) the complaining party makes the demonstration described in subpara-graph (C) with respect to an alternative employment practice and the respondent refuses to adopt such alternative employment practice.

42 U.S.C. § 2000e-2(k)(l)(A)(emphasis added). The Act further instructs that in interpreting its business necessity language, “[n]o statements other than the interpretive memorandum ... shall be considered legislative history of, or relied upon in any way as legislative history....” Civil Rights Act of 1991, Pub L. No. 102-166, § 105(b), 105 Stat. 1071, 1075 (1992). The interpretive memorandum referenced in this portion of the Act states in relevant part:

The terms “business necessity” and “job related” are intended to reflect the concepts enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).

137 Cong. Rec. 28,680 (1991). After the passage of the Act, proponents of both a strict test for business necessity and a more liberal requirement claimed victory in the standard adopted by the Act.

III.

The Supreme Court has yet to interpret the “job related for the position in question and consistent with business necessity” standard adopted by the Act. In addition, our sister courts of appeals that have applied the Act’s standard to a Title VII challenge have done so with little analysis. See, e.g., Fitzpatrick v. City of Atlanta, 2 F.3d 1112, 1117-18 (11th Cir.1993)(noting that Civil Rights Act of 1991 statutorily reversed Wards Cove but ruling in favor of employer because practice was demonstrably necessary to meet an “important business goal”); Bradley v. Pizzaco of Nebraska, Inc., 7 F.3d 795, 797-98 (8th Cir.1993)(noting that Griggs standard was reinstated by the Act and holding that employer failed to meet Griggs standard).

Because the Act proscribes resort to legislative history with the exception of one short interpretive memorandum endorsing selective caselaw, our starting point in interpreting the Act’s business necessity language must be that interpretive memorandum. The memorandum makes clear that Congress intended to endorse the business necessity standard enunciated in Griggs and not the Wards Cove interpretation of that standard. By Congress’ distinguishing between Griggs and Wards Cove, we must conclude that Congress viewed Wards Cove as a significant departure from Griggs. Accordingly, because the Act clearly chooses Griggs over Wards Cove, the Court’s interpretation of the business necessity standard in Wards Cove does not survive the Act.

We turn now to articulate the standard for business necessity' — one most-consistent with Griggs and its pre Wards Cove progeny. The laudable mission begun by the Court in Griggs was the'eradication of discrimination through the application of practices fair in form but discriminatory in practice by eliminating unnecessary barriers to employment opportunities. In the context of a hiring exam with a cutoff score shown to have a discriminatory effect, the standard that best effectuates this mission is implicit in the Court’s application of the business necessity doctrine to the employer in Griggs, i.e., that, a discriminatory cutoff score is impermissible unless shown to measure the minimum qualifications necessary for successful performance of the job in question. Only this standard can effectuate the mission begun by the Court in Griggs; only by requiring employers to demonstrate that their discriminatory cutoff score measures the minimum qualifications necessary for successful performance of the job in question can we be certain to eliminate the use of excessive cutoff scores that have a disparate impact on minorities as a method of imposing unnecessary barriers to employment opportunities.

The evolution of the Court’s articulation of the business necessity doctrine in both Albemarle and Dothard reinforces the conclusion that this standard is both implicit in Griggs and central to its mission. In Albemarle, the Court explained that discriminatory tests must be validated to show that they are “predictive of ... important elements of work behavior which comprise ... the job ... for which candidates are being evaluated” and that the scores of the higher level employe.es do not necessarily validate a cutoff score for the minimum qualifications to perform the job at an entry level. Albemarle, 422 U.S. at 431, 434, 95 S.Ct. 2362. This is simply another way of saying that discriminatory cutoff scores must, be validated to show they measure the minimum qualifications necessary for successful performance of the job. Similarly, in Dothard, the Court made clear that “a discriminatory employment practice,” such as a discriminatory cutoff score on an entry level 'exam, “must be shown to be necessary to- sáfe ' and efficient job performance to survive a Title VII challenge.” Dothard, 433 U.S.' at 332 n. 14, 97 S.Ct. 2720. ■ '

Taken together, Griggs, Albemarle and Dothard teach that in order to show the business necessity of a discriminatory cutoff score an employer must demonstrate that its cutoff measures the minimum qualifications necessary for successful performance of the job in question. Furthermore, because the Act instructs us to in-, terpret its business necessity language in conformance with Griggs and its preWards Cove progeny, we must conclude that the Act’s business necessity language incorporates this standard.

Our conclusion that the Act incorporates this standard is further supported by .the business necessity language adopted by the Act. Congress chose the terms “job related for the position in question” and “consistent with business necessity.”- Judicial application of a standard focusing solely on whether the qualities measured by an entry level exam bear some relationship to the job in question would imper-missibly write out the business necessity prong of the Act’s chosen standard. With respect to a discriminatory cutoff score, the business necessity prong must be read to demand an inquiry into whether the score reflects the minimum qualifications necessary to perform successfully the job in question. See also EEOC Guidelines, 29 C.F.R. § 1607.5(H) (noting that cutoff scores should “be. set so as to be reasonable and consistent with -normal expectations of acceptable proficiency within the work force.”).

In addition, Congress’- decision to emphasize the importance of the policies underlying the disparate impact theory of discrimination through its codification supports application of this standard to discriminatory cutoff scores. The disparate impact theory of discrimination combats not intentional, obvious discriminatory policies, but a type of covert discrimination in which facially neutral practices are employed to exclude, unnecessarily and disparately, protected groups from employment opportunities. Inherent in the adoption of this theory of discrimination is the recognition that an employer’s job requirements may incorporate societal standards based not upon necessity but rather upon historical, discriminatory biases. A business necessity standard that wholly defers to an employer’s judgment as to what is desirable in an employee therefore is completely inadequate in combating covert discrimination based upon societal prejudices.

Only a business necessity doctrine that examines discriminatory cutoff scores in light of the minimum qualifications that are necessary to perform the job in question successfully can address adequately this subtle form of discrimination.

Accordingly, we hold that the business necessity standard adopted by the Act must be interpreted in accordance with the standards articulated by the Supreme Court in Griggs and its pre-Warcfe Cove progeny which demand that a discriminatory cutoff score be shown to measure the minimum qualifications necessary for the successful performance of the job in question in order to survive a disparate impact challenge.

IV.

Although the District Court purported to apply the Act’s “job related to the position in question and consistent with business necessity” standard to SEPTA’s cutoff score on its 1.5 mile run, it is clear from the District Court’s memorandum opinion that it did not apply the standard we have found to be implicit in Griggs and incorporated by the Act. The District Court rejected the formulation of the Griggs standard found in Dothard, characterizing it as dicta, and relied instead upon language found in New York City Transit Auth. v. Beazer, 440 U.S. 568, 99 S.Ct. 1355, 59 L.Ed.2d 587 (1979). As our prior discussion makes clear, the Beazer language is dicta and the Dothard standard is binding under the Act. Moreover, the Beazer dicta upon which the District Court relied mirrors the standard adopted by Wards Cove. Compare Banning, 1998 WL 341605 at *54 (noting that in Beazer, the Court “implicitly approves employment practices that significantly serve, but are neither required by nor necessary to, the employer’s legitimate business interests”) with Wards Cove, 490 U.S. at 659, 109 S.Ct. 2115 (stating . that standard is “whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer” and noting that there is no requirement that the practice be essential). As we previously stated, the-Wards Cove standard does not survive the Act.

The District Court’s application, of its understanding of business necessity to SEPTA’s business justification further illustrates that the District Court did not apply the correct legal standard. As an initial matter, the District Court seemed to conclude that Dr. Davis’ expertise alone is sufficient to justify the 42.5 mL/kg/min aerobic capacity cutoff measured by the 1.5 mile run. .This conclusion disregards the teachings of Griggs, Albemarle and Dothard in which the Court made clear that judgment alone is insufficient to validate an employer’s discriminatory practices. More fundamentally, however, nowhere in its extensive opinion did the District Court consider whether Dr. Davis’ 42.5 mL/kg/min cutoff reflects the minimum aerobic capacity necessary to perform successfully the job of SEPTA transit police officer.

Instead, the District Court upheld this cutoff because it was “readily justifiable.” Lanning, 1998 WL 341605 at *57. The validation studies of SEPTA’s experts upon which the District Court relied to support this conclusion demonstrate the extent to which this standard is insufficient under the Act. The general import of these studies is that the higher an officer’s aerobic capacity, the better the officer is able to perform the job. Setting aside the validity of these studies, this conclusion alone does not validate Dr. Davis’ 42.5 mL/kg/ min cutoff under the Act’s business necessity standard. At best, these studies show that aerobic capacity is related to the job of SEPTA transit officer. A study showing that “more is better,” however, has no bearing on the appropriate cutoff to reflect the minimal qualifications necessary to perform successfully the job in question.

Dr. Siskin’s testimony is particularly instructive on this point. Dr. Siskin testified that in view of the linear relationship between aerobic capacity and the arrest parameters, any cutoff score can be justified since higher aerobic capacity levels will get you more field performance (ie., “more is better”). See Lanning, 1998 WL 341605 at *41. Under the District Court’s understanding of business necessity, which requires only that a cutoff score be “readily justifiable,” SEPTA, as well as any other employer whose jobs entail any level of 'physical capability, could employ an unnecessarily high cutoff score on its physical abilities entrance exam in an effort to exclude virtually all women by justifying this'facially'neutral yet discriminatory practice on the theory that more is better. This result contravenes Griggs and demonstrates why, under Griggs, a discriminatory cutoff score must be shown to measure the minimum qualifications necessary to perform successfully the job in question,

V.

For the foregoing reasons, it is clear to us that the District Court did not employ the business necessity standard implicit in Griggs and incorporated by the Act which requires that a discriminatory cutoff score be shown to measure the minimum qualifications necessary for successful performance of the job in question in order to survive a disparate impact challenge. We will therefore vacate the judgment of the District Court and remand this appeal for the District Court to determine whether SEPTA has carried its burden of establishing that its 1.5 mile run measures the minimum aerobic capacity necessary to perform successfully the job of SEPTA transit police officer.' Because this is the first occasion we have had to clarify the Act’s business necessity standard, on remand the District Court may wish to exercise its discretion to allow the parties to develop further the record in keeping with the standard announced here.

WEIS, Circuit Judge,

dissenting:

The “minimum qualifications” criterion of business justification does not apply to all types of employment. When public safety is at stake, a lighter burden is placed on employers to justify their hiring requirements. Because I believe that the latter standard applies in this case, I would affirm.

I.

Concerned about its inability to control crime on its property, SEPTA instituted a three-pronged attack on the problem. It added a substantial number of officers, implemented a zone method of patrol, and adopted standards to improve the generally poor physical condition of its officers. Unlike many metropolitan police departments, SEPTA officers are deployed alone and on foot, engaging in physical activities more frequently than other law enforcement agencies.

The patrol zones present significant variations in conditions that affect the physical exertion of officers in the performance of their duties. Zone One, for example, has a climb of 30 to 50 steps from street level. Zone Three, a mixture of above and below-ground locations, borders a large shopping mall, featuring retail theft and pursuits that lead into the SEPTA transit system. Zone Five, which includes sports complexes, is characterized by long distances between stations. Zone Six includes the Temple University area, a scene of frequent crimes against students.

SEPTA officers must occasionally ask for assistance from their comrades in other zones. These calls are divided into two categories, “officer assists” and “officer backups.” An “assist” requires officers to respond immediately. Often the only method available to get to the scene quickly is a run of five to eight city blocks. An officer responding to an “assist” must preserve enough energy to deal effectively with a situation once arriving on the scene. SEPTA averages about 380 running assists per year. “Backups” are' not as critical as “assists,” so officers generally use a “paced jog.” SEPTA averages about 1,920 “backups” annually.

For help in attaining its fitness goals, SEPTA turned to Dr. Paul Davis, an acknowledged expert in the field who had recommended corrective measures for numerous law enforcement and government agencies. At the time Dr. Davis began his research for SEPTA, an officer’s equipment load was 12 pounds; it: is now nearly 26 pounds. Dr. Davis found that officers need “sound, intact, diseaseTfree cardiovascular systemfs]” to effectively perform their jobs. These requirements implicate aerobic capacity, ie., the ability of the body to utilize oxygen during sustained physical activities such as running, swimming, and cycling. Aerobic ■ capacity is commonly measured in units of milliliters óf oxygen per kilogram of body weight per minute' — “mL/kg/min,” or “mL.”

SEPTA officers typically run or jog on a daily basis from three to eight city blocks for periods of three to ten minutes. They also engage in stair climbing, which requires a capacity of 54 mL. In light of this and other evidence, Dr. Davis concluded that SEPTA transit officers need an aerobic capacity of 50 mL. After determining that such a level would have a “draconian” effect on female applicants, however, Dr. Davis lowered his recommendation to 42.5 mL. That capacity could be demonstrated by running 1.5 miles in 12 minutes, a test that was adopted for applicants.

Dr. Davis had done a similar study for a fire department in St. Paul, Minnesota, which — in setting a standard of 45 mL— required applicants to run 1.5 miles in 11 minutes and 40 seconds. Eighty percent of male applicants and 76% of female applicants passed this test.

In addition to Dr. Davis’ testimony, SEPTA also presented evidence from other experts to demonstrate a statistically significant correlation between aerobic capacity and the number of arrests made by individual SEPTA officers. Furthermore, of 207 commendations, 96% went to officers with an average capacity of 46 mL. Of these awards, 198 involved arrests, and 116 involved a foot pursuit, use of force or other physical- exertion. Another study indicated that 51.9% of offense perpetrators had a capacity of 48 mL or higher, with only 27% having lower than a 42 mL rating.

The record demonstrates that a smaller percentage of female applicants passed the running test than males, but that nearly all women who trained for it were able to pass. The named plaintiffs and some of the class members who failed demonstrated, for the most part, a “cavalier” attitude towards the running test. Videotapes showed some of these applicants walking at the halfway point, either because they were indifferent or unable to run for even that short a period of time. Thus, although there was a significant disparity between the pass-fail rates of male and female applicants, the extent of the difference appears to have been exaggerated to some extent by the approach taken by some of the applicants.

A physiologist, Dr. Lynda Ransdell, testified that 40% of all women starting at an aerobic capácity of 35 to 37 mL can train to pass the running test in eight weeks, and that 10% of all women between 20 and 29 years of age can do so without any training. She concluded that the average sedentary woman can achieve SEPTA’s performance standard with only moderate training. SEPTA sent applicants a letter outlining recommended training techniques that Dr. Ransdell testified were adequate.

Plaintiffs introduced the testimony of Dr. William klcArdle, who suggested the use of- a “relative fitness” test in which all applicants would be required to meet the 50th percentile of aerobic capacity for their gender — approximately 42 mL for males, and 36 mL for females. However, Dr. Robert Moffatt, a defense expert who conducted tests of the aerobic capacity necessary to perform a SEPTA officer’s duties, disagreed. He stated that female officers with a capacity of 36 mL would not be able to capably perform their duties after running to an “assist” or a “backup.” Dr. Bernard Siskin, another defense expert, found that the arrest rate for females with a 36 mL capacity was significantly lower than that of males with a 42 mL capacity.

The District Court rejected Dr. McAr-dle’s proposal because it would not serve SEPTA’s business goal of providing a police force capable of performing the physical requirements of the job nearly as well as the existing test. Instead, the court found that “Dr. Davis’ study, standing alone, met the professional standards for construct validation and satisfies defendant’s burden of demonstrating job relatedness and business necessity.” Moreover, his study had sufficient empirical support for an aerobic capacity requirement of 42.5 mL.

II.

The dispute in this case centers on the applicable standard of business justification under the Civil Rights Act of 1991. See Pub.L. No. 102-166, Title I, § 105(a), 105 Stat. 1074-75 (adding 42 U.S.C. § 2000e-2(k)). The pertinent section provides: “An unlawful employment practice based on disparate impact is established ... only if — [the] complaining party demonstrates that a respondent uses a particular employment practice that causes a disparate impact on the basis of ... sex ... and the respondent fails to demonstrate that the challenged practice is job related for the position in question and consistent with business necessity!.]” 42 U.S.C. § 2000e-2(k)(l)(A).

This addition to Title VII was passed in response to the Supreme Court’s decision in Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989). In that case, the Court held that after a plaintiff makes a prima facie showing of disparate impact, the defendant bears the burden to produce evidence of business justification. See id. at 659, 109 S.Ct. 2115. The burden of persuasion, however, remains at all times with the plaintiff. See id. As to what showing would satisfy business justification, the Court held that “the dispositive issue is whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer.” Id. However, “there is no requirement that the challenged practice be ‘essential’ or ‘indispensable’ to the employer’s business for it to pass muster.” Id.

Some members of Congress were displeased with the result in Wards Cove and argued for a stricter standard of business justification based on their reading of pre-Wards Cove cases. After two years of legislative struggle, Congress and the President agreed upon a compromise bill. Whether the ambiguous language of the statute accomplished that purpose has been the subject of lively debate.

The 1990 bill, which had been vetoed by the President, had used the phrase “required by business necessity,” rather than “consistent with business necessity,” as used in the 1991 Act. The substitution of the word “consistent” was considered to indicate a standard. less stringent than would “required.” In that light, a fair reading of the 1991 Act is “the challenged practice is job related for the position in question and in harmony with business necessity.”

It may fairly be said that the language ultimately adopted in the 1991 Act reflects an “agreement to disagree” and a return of the dispute to the courts for resolution. In short, unable to muster a veto-proof majority for either view, Congress “punted.” This conclusion is underscored by Congress’ highly unusual admonition that the courts consider only a designated “interpretive memorandum” as legislative history, rather than the more elaborate committee reports and other materials that customarily reveal the extent of the controversy between various views. See Pub.L. No. 102-166, Title I, § 105(b), 105 Stat. 1075. The interpretive memorandum states that: “The terms ‘business necessity’ and ‘job related’ are intended to reflect the concepts enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).” 137 Cong. Rec. S15276 (daily ed. Oct. 25, 1991).

Congress’ reference to the Griggs line of Supreme Court decisions, however, does little to clear the air because the language in those opinions has caused confusion. The problem can ultimately be traced back to Griggs itself. In that case, which involved power-plant jobs, the Court held that a high school completion requirement and general intelligence tests that disproportionately disqualified black applicants were not significantly job related. The Court said: “The touchstone is business necessity.” Griggs, 401 U.S. at 431, 91 S.Ct. 849. However, the very next sentence reads, “[i]f an employment practice ... cannot be shown to be related to job performance, the practice is prohibited.” Id. Thus, the Court speaks of both “necessity” and “job-relatedness” in the same breath.

In the following paragraph, we read that neither employment requirement is “shown to bear a demonstrable relationship to successful performance of the jobs for which it was used. Both were adopted ... without meaningful study of their relationship to job-performance ability.” Id. The Court also refers to “testing meeha-nisms[that are] unrelated to measuring job capability,” “job-related tests,” and states that “any given requirement must have a manifest relationship to the employment in question.” Id. at 432-34, 436, 91 S.Ct. 849. Not once does the opinion repeat or expound upon “business necessity.” Unquestionably, “job-relatedness” is Griggs ’ dominant thread.

The Court also cited with approval former EEOC Guideline 29 C.F.R. § 1607.4(c), which required employers to' produce data “demonstrating that the test is predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.” Id. at 433 n. 9, 91 S.Ct. 849.

The Court next visited the concept of business justification in Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975), where a paper mill was using screening tests that had a disparate impact on black applicants. The issue, according to the Court, was whether the employer had shown the tests to be “job related.” Id. at 408, 95 S.Ct. 2362. The Court concluded that the employer’s validation study was defective because it “involved no analysis of the attributes of, or the particular skills needed in, the studied job groups.” Id. at 432, 95 S.Ct. 2362. The Court was also critical of hiring decisions based on the subjective opinions of supervisors. See id. at 432-33, 95 S.Ct. 2362.

The portion of Albemarle most relevant to the case at hand focused on whether tests that take into account capability for promotion may be utilized if such long-range requirements fulfill a “genuine business need.” Id. at 434, 95 S.Ct. 2362. The employer’s validation study focused on the scores achieved by job groups near the top of the various lines of progression. .The Court observed that those results did “not necessarily mean that the test, or some particular cutoff score on the test, is a permissible measure of the minimal qualifications of new workers entering lower level jobs.” Id. at 434, 95 S.Ct. 2362. Thus, the validation study was faulty because there had been “no clear showing that differential validation was not feasible for lower level jobs.” Id. at 435, 95 S.Ct. 2362.

The Court next considered appropriate criteria in Washington v. Davis, 426 U.S. 229, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976), which involved written tests that allegedly had a discriminatory impact on black applicants for police officer positions. Although the suit was not brought under Title VII, the Court discussed Griggs and Albemarle. The district judge had concluded “that a positive relationship between the test and training-course performance was sufficient to validate the[test], wholly aside from its possible relationship to actual performance as a police officer.” Id. at 250, 96 S.Ct. 2040. Significantly, the Supreme Court remarked that such a conclusion was not foreclosed by either Griggs or Albemarle and “it seems to us the much more sensible construction of the job-relatedness requirement.” Id. at 250-51, 96 S.Ct. 2040. Dismissing challenges to the test, the Court remarked that “some minimum verbal and communicative skill would be very useful, if not essential, to satisfactory progress in the training regimen.” Id. at 250, 96 S.Ct. 2040.

In another case, Dothard v. Rawlinson, 433 U.S. 321, 97 S.Ct. 2720, 53 L.Ed.2d 786 (1977), the Court held that height and weight requirements for prison guards could not stand. The ruling was based on the employer’s failure to produce any evidence to correlate those standards with “the requisite amount of strength thought essential to good job performance.” Id. at 331, 97 S.Ct. 2720. In a footnote, Dothard repeated Griggs ’ statement that “[t]he touchstone is business necessity,” and further stated that “a discriminatory employment practice must be shown to be necessary to safe and efficient job performance to survive a Title VII challenge.” Id. at 332 n. 14, 97 S.Ct. 2720. Earlier in the body of the opinion, the Court explained that the employer must show that a requirement has “ ‘a manifest relationship to the employment in question.’ ” Id. at 329, 97 S.Ct. 2720 (quoting Giiggs, 401 U.S. at 432, 91 S.Ct. 849).

In yet another context, the Court upheld an employer’s prohibition of employment to users of methadone, despite claims of disparate impact on blacks and Hispanics. See New York City Transit Authority v. Beazer, 440 U.S. 568, 587, 99 S.Ct. 1355, 59 L.Ed.2d 587 (1979). To the Court, the employer’s narcotics rule, éven in its application to methadone users, was “job related.” Id.

Beazer quoted the District Court’s observation that “those goals [ie., safety and efficiency] are significantly served by— even, if they do not require — [the employer’s] rule as it applies to all methadone users including those who are seeking emr ployment in non-safety-sensitive positions.” Id. at 587 n. 31, 99 S.Ct. 1355. The Supreme Court concluded that “[t]he record thus demonstrates that [the employer’s] rule bears a ‘manifest relationship to the employment in question.’ ” Id. (quoting Griggs, 401 U.S. at 432, 91 S.Ct. 849).

The Beazer Court observed that most of the affected job positions were “attended by unusual hazards and must be performed by ‘persons of maximum alertness and competence.’ ” Id. at. 571, 99 S.Ct. 1355. Other positions were “critical” or “safety sensitive,” and many involved “danger to [the employees] or to the public.” Id.

III.

As the preceding sketch of pr e-Wards Cove opinions demonstrates, the Supreme Court’s articulations of the appropriate standards are far from clear. Phrases such as “business necessity,”' “demonstrable relationship to successful performance of the job,” “manifest relationship to the employment in question,” “genuine business needs,” and ‘‘essential, to good job performance,”' have been used interchangeably. These varying formulations bring to mind Justice Holmes’ observation, “A word is not a crystal, transparent and unchanged, it is the skin of a living thought and may vary greatly in color and content according'to the circumstances and the time in which it is used.” Towne v. Eisner, 245 U.S. 418, 425, 38 S.Ct. 158, 62 L.Ed. 372 (1918).

My study of the standard for business justification as set forth by the Civil Rights Act of 1991 convinces me that it remains essentially the same as it was in the pr e-Wards Cove era. However, other than its holding on burden of proof, it does not seem that Wards Cove was a revolutionary pronouncement. Until the Supreme Court reexamines the subject, however, courts will continue to struggle with, the often inconsistent phraseology employed in Griggs and its progeny. The definition and application of the appropriate standard for business justification will depend on the context in which it is raised.

There are significant factual differences in the cases that explain, to some extent, the differing formulations. Albemarle and Griggs applied greater scrutiny when the disparate impact affected entry to lower-level jobs, where it is fair to assume that no special qualifications would be generally expected.

In contrast, Beazer and Washington raised an additional important consideration — public safety. Beazer concerned jobs involving serious dangers to employees as well as to transit passengers. In Washington, a written test demonstrating an applicant’s ability to complete police officer training was job-related, even apart from its relationship to actual performance as a police officer. The impact of public safety concerns on employee qualifications is inescapable, and serves to differentiate those positions from lower-level, nonsafety-sensitive ones.

The Courts of Appeals have explicitly recognized the relevance of safety considerations in a series of decisions beginning with Spurlock v. United Airlines, Inc., 475 F.2d 216 (10th Cir.1972). In that case, an airline required that applicants for flight officer positions have a college degree and a minimum of 500 flight hours. The Court, citing Griggs, held that where “the job clearly requires a high degree of skill and the economic and human risks involved in hiring an unqualified applicant are great, the employer bears a correspondingly lighter burden to show his employment criteria are job related.” Id. at 219. Because, in the case of pilots, “[t]he risks involved in hiring an unqualified applicant are staggering .... [t]he courts ... should proceed with great caution before requiring an employer to lower his pre-employment standards for such a job.” Id.

Another leading case, Davis v. City of Dallas, 777 F.2d 205 (5th Cir.1985), applied the Spurlock doctrine to criteria for hiring police officers. The City required a specific amount of college education, no history of recent marijuana usage, and a negative history of traffic violations. Despite findings of disparate impact, the Court upheld the requirements. Having reviewed the many cases following Spur-lock, the Court had “no difficulty ... equating the position of police officer in a major metropolitan area such as Dallas with other jobs that courts have found to involve the important public interest in safety.” Id. at 215 (internal quotation marks omitted). The degree of public risk and responsibility alone “would warrant examination of the job relatedness of the ... education requirement under the lighter standard imposed under Spurlock and its progeny.” Id. at 215.

Observing the nature of the positions at issue in Griggs and Albemarle, Davis noted that in neither case did the Supreme Court suggest that those jobs “were noteworthy for their dangerousness or importance to the public welfare.” Id. at 210. In contrast, the employment under consideration in Davis directly implicated public safety concerns. See id. at 211. It is interesting that Justice Blackmun, in Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988) (plurality op.), objecting to what he considered to be a tendency to weaken the employer’s burden, cited Davis favorably, stating that “[t]he proper means of establishing business necessity will vary with the type and size of the business in question, as well as the particular job” in question. Id. at 1007, 108 S.Ct. 2777. (Blackmun, J., concurring in part and concurring in the judgment)

In a post-Wards Cove case involving firefighters, the Court of Appeals for the Eleventh Circuit noted that such “safety claims would afford the City an affirmative defense, for protecting employees from workplace hazards is a goal that, as a matter of law, has been found to qualify as an important business goal for Title VII purposes.” Fitzpatrick v. City of Atlanta, 2 F.3d 1112, 1119 (11th Cir.1993) (citing Beazer, 440 U.S. at 587 & n. 31, 99 S.Ct. 1355; Dothard, 433 U.S. at 331 n. 14, 97 5.Ct. 2720). Thus, “[m]easures demonstrably necessary to meeting the goal of ensuring worker safety are therefore deemed to be ‘required by business necessity’ under Title VII.” Id.

In a similar case, the Court of Appeals for the Eighth Circuit wrote that “the law does not require the city to put the lives of [plaintiff] and his fellow firefighters at risk by taking the chance that he is fit for duty when solid scientific studies indicate that persons with test results similar to his are not.” Smith v. City of Des Moines, 99 F.3d 1466, 1473 (8th Cir.1996). Other Courts of Appeals have reached similar conclusions in cases involving safety-sensitive positions such as truck drivers, bus drivers, firefighters, and police officers.

IV.

The issues thát separate the parties are straightforward. Plaintiffs do not seriously contest the fact that aerobic capacity is a valid predictor of efficient job performance as a transit police officer. They do not challenge the finding that running for 1.5 miles is an effective way to measure aerobic capacity. Nor apparently do they suggest that 42.5 mL is an inappropriate cut-off for male applicants: they implicitly accept this standard by advancing Dr. McArdle’s alternative test, which would use that score for males and a lower one for females.

Even the government plaintiff concedes that an employer may improve its. workforce. U.S. Br. at 35 (citing Griggs, 401 U.S. at 431, 91 S.Ct. 849). Griggs, in turn, stressed that tests “must measure the person for the job and not the person in the abstract.” Griggs, 401 U.S. at 436, 91 S.Ct. 849. SEPTA’s running test attempts to do just that, i.e., improve the caliber of its police force by selecting new hires to fit appropriately heightened performance standards.

A fair appraisal of the plaintiffs’ objection is that the running test’s cut-off requires female applicants to run faster than a majority of women can run without training. However, nearly all of the women who did train were able to pass the test. Also, not all males were able to pass, although their failure percentages were substantially lower.

Plaintiffs complain that SEPTA cannot point to any instances where a perpetrator of a crime got away, or an offense was committed because of an officer’s lack of aerobic capacity. But as noted by Fitzpatrick, “[t]he mere absence of unfortunate incidents is not sufficient” to preclude a particular safety requirement because otherwise, such “measures could be instituted only once accidents had occurred rather than in order to avert accidents.” Fitzpatrick, 2 F.3d at 1120-21.

Here, where applicants have it within their power to prepare for the running test, they may properly be expected to do so. In view of the important public safety concerns at issue, it is not unreasonable to expect all applicants — female or male — to take the necessary steps in order to qualify for the positions.

The District Court’s conclusions must-be appraised against this background. The trial was lengthy and the evidence extensive. The findings of fact on job needs with respect to aerobic capacity are not clearly erroneous. This conclusion is mandated by the standard that clear error exists only when, on the entire evidence, a court is left with the definite, firm conviction that a mistake has been committed. See Anderson v. City of Bessemer City, 470 U.S. 564, 573, 105 S.Ct. 1504, 84 L.Ed.2d 518 (1985). If the account of the District Court is “plausible in light of the record viewed in its entirety,” we may not reverse even if we are convinced that had we “been sitting as the trier of fact, [we] would have weighed the evidence differently.” Id. at 574,105 S.Ct. 1504.

Moreover, “[w]here there are two permissible views of the evidence, the factfin-der’s choice between them cannot be clearly erroneous.” Id. “This is so even when the district court’s findings ... are based instead on physical or documentary evidence or inferences from other facts.” Id. Where findings are based on credibility determinations, appellate review accords even greater deference to the findings of the District Court. See id. at 575, 105 S.Ct. 1504. Courts routinely hold that business justification is reviewed for clear error. See, e.g., Davis, 777 F.2d at 208 & n. 1; Spurlock, 475 F.2d at 219-20. I accept, therefore, that 42.5 mL is an appropriate level for the position of a SEPTA officer, that it is reasonable, and that it is attainable by otherwise physically fit female applicants with moderate training.

The question then, is whether SEPTA’s standard is permissible under the terms of the Civil Rights Act of 1991 and the relevant precedents. The District Court rejected the plaintiffs’ contention that “business necessity” under the statute is governed by a footnote in Dothard that states: “[A] discriminatory employment practice must be shown to be necessary to safe and efficient job performance....” Dothard, 433 U.S. at 332 n. 14, 97 S.Ct. 2720. Rather, looking to Griggs and Beazer, the Distinct Court stated that SEPTA need only show that its tests “significantly serve, but are neither required by nor necessary to, the employer’s legitimate business interests” — in other words, that it '“bears a manifest relationship” to the employment in question.

In disagreeing with the criteria used by the District Court, the majority holds that “a discriminatory cutoff score is impermissible unless shown to measure the minimum qualifications necessary for successful performance of the job in question.” The difficulties presented by this standard are illustrated by the testimony of Dr. McArdle, the plaintiffs’ expert. In essence, he proposed that female applicants be expected to meet 50% of their aerobic capacity, translating to 36 mL, but that males continue at the 50% level of 42.5 mL. That standard would, of course, have less adverse impact on women, but according to the findings of the District Court, would also have a detrimental impact on the effectiveness of the SEPTA transit police.

With this in mind, I cannot agree that the majority’s standard is the correct one for this case. Reducing standards towards the lowest common denominator is particularly inappropriate for a police force. Undoubtedly, candidates who fail the running test — female or male — may have other qualities of particular value to SEPTA, but they must possess the requisite aerobic capacity as well. No matter how laudable it is to reduce job discrimination, to achieve this goal by lowering important public safety standards presents an unacceptable risk.

Aerobic capacity is an objective, measurable factor which gauges the ability of a human being to perform physical activity. The aerobic demands on the human system are affected by absolutes such as the distance traveled, the speed, the number of steps to be climbed, and similar factors. Governmental agency pronouncements will not shorten distances, reduce the number of steps, or decrease the aerobic capacity of perpetrators to match the reduced standards of officers, male or female. Some males and more females cannot meet the necessary requirements. Based on the facts established at trial, those individuals simply cannot perform the job efficiently. To the extent that they cannot, their hire adversely affects public safety.

The current Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607 (“EEOC Guidelines”), are not as strict as the standard suggested by the majority. In discussing cut-off scores, the Guidelines explicitly state that “they should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.” 29 C.F.R. § 1607.5(H) (1998). Further, the EEOC Guidelines standard — “predictive of or significantly correlated with important elements” — has been cited by the Supreme Court with approval on several occasions. See Albemarle, 422 U.S. at 431, 95 S.Ct. 2362 (quoting former 29 C.F.R. § 1607.4(c)); Griggs, 401 U.S. at 433 n. 9, 91 S.Ct. 849 (quoting same); see also 29 C.F.R. § 1607.5(B) (1998).

Further, Albemarle’s reference to “minimal qualifications” was directed only to the inappropriateness of using a test geared towards higher-level jobs as a screen for entry-level positions. See Albe-marle, 422 U.S. at 434, 95 S.Ct. 2362. This holding, which is minimally relevant to the matter at hand, is doubly inapplicable when the job affects public safety. See Davis, 777 F.2d at 211 n. 5.

I see no need to remand this case to the District Court. Whatever standard is used, the findings of fact require an affir-manee. Although the District Court rejected the plaintiffs’ argument that the Dothard footnote, rather than Beazer, supplied the proper standard, the factual findings make it clear that under either formulation, the District Court reached the correct result.

The Dothard footnote states that the challenged practice must be “necessary to safe and efficient job performance.” Do-thard, 433 U.S. at 331 n. 14, 97 S.Ct. 2720. The District Court, also in a footnote, wrote “physical fitness is only one trait or ability required of SEPTA officers, [but] it is a trait or ability that is necessary for and critical to the successful performance of the job, and thus SEPTA should be able to test for such a trait.” This finding more than complies with Dothard’s footnote by concluding that not only is physical fitness “necessary” to safe and efficient job performance as SEPTA officers, but that it is “critical” to successful performance of these jobs. Moreover, the finding clearly meets even the criterion that cut-off scores “measure the minimum qualifications necessary for successful ■performance of the job.” (emphasis added).

Nor can there be any doubt that the factual findings here satisfy Griggs ’ requirement of “business necessity.” Unquestionably, SEPTA’s test is job-related and there can be no doubt that physical fitness, and particularly aerobic capacity, is necessary for adequate performance of the job of a SEPTA transit officer. The findings are convincing that 42.5 mL is a reasonable cut-off point for determining the physical ability necessary for successful performance of the job. Consequently, even under the plaintiffs’ reading of the 1991 Act, which relies so much on Do-thard, the judgment in favor of the defendant should be affirmed.

To my mind, the correct standard for this case is that of Spurloch-Davis, one that places greater emphasis on the safety of the public and fellow officers. I have no doubt that this line of cases survives the Civil Rights Act of 1991, because those opinions — as noted in Congress’ “interpretive memorandum” — “reflect the concepts enunciated” in Supreme Court decisions prior to Wards Cove. See Watson, 487 U.S. at 998, 108 S.Ct. 2777; Beazer, 440 U.S. at 587 n. 31, 99 S.Ct. 1355; Washington, 426 U.S. at 250, 96 S.Ct. 2040; Smith, 99 F.3d at 1473; Fitzpatrick, 2 F.3d at 1119. Safety concerns are clearly “concepts” considered by the Supreme Court and applied in various factual circumstances by the Courts of Appeals, both in pre and post-Wards Cove cases. Nothing in the legislative history casts any doubt on the continued viability of these opinions.

Although it did not cite Spurloch-Davis, the District Court stated in its conclusions of law that “employers such as SEPTA should be encouraged to improve the efficiency of its workforce, especially where public safety is implicated by the particular job as it is with SEPTA.” More emphatically, it stated that “[t]he Court simply will not condone dilution of readily obtainable physical abilities standards that serve to protect the public safety in order to allow unfit candidates, whether they are male or female, to become SEPTA transit police officers.”

Although the District Court only inferentially applied Spurlock-Davis, I would do so explicitly and affirm the judgment on that basis. Here, the record supplies ample evidence about safety concerns related to the performance of SEPTA officers. In cases such as these, courts should decline to lower standards in an effort to reduce disparate impact when that goal comes at the expense of public safety. Due deference should be afforded to the experience of specialized employers in setting appropriate requirements for safety-sensitive positions.

V.

The Lanning appellants propose a number of alternative practices that they suggest would have a lesser disparate impact while still serving SEPTA’s goals. First, they suggest that SEPTA select medically fit applicants who pass fitness requirements at the end of their training at the Philadelphia Police Academy. Second, as noted earlier, they argue in favor of a relative fitness test (ie., one with a lower cut-off point for females). Third, they prompt SEPTA to propose an alternative.

For plaintiffs to establish a satisfactory alternative, they must “make[ ] the demonstration described in [42 U.S.C. § 2000e-2(k)(l)(C) ] with respect to an alternative employment practice and [establish that] the[employer] refuses to adopt such alternative employment practice.” 42 U.S.C. § 2000e-2(k)(l)(A)(ii). To meet this burden, the plaintiffs’ proposed alternatives must have less disparate impact and “also serve the employer’s legitimate interest in ‘efficient and trustworthy workmanship.’ ” Albemarle, 422 U.S. at 425, 95 S.Ct. 2362; see also NAACP v. Medical Ctr., Inc., 657 F.2d 1322, 1336 n. 17 (3d Cir.1981) (en banc). As stated in Watson, the alternative test must “be equally as effective as the challenged practice in serving the employer’s legitimate business goals.” Watson, 487 U.S. at 998,108 S.Ct. 2777.

The District Court found that none of the plaintiffs’ proposals served SEPTA’s legitimate interest in having a more physically fit work force. If SEPTA may require an aerobic capacity of 42.5 mL after training at the police academy, as plaintiffs propose, it is unclear how that practice would be any less discriminatory than requiring it before hire. In short, that plan would simply require that training be on “company time” rather than on that of the applicants.

As to the relative fitness test proposed by the plaintiffs’ expert, the factual findings demonstrate that officers with a capacity of 36 mL do not serve SEPTA’s needs as well as the required standard of 42.5 mL. Finally, the proposal that SEPTA come forward with an alternative is not an alternative at all. Thus, plaintiffs have failed to meet their burden to establish an alternative employment practice.

I would affirm the judgment of the District Gourt. 
      
      . Under the zone concept, SEPTA designated eight separate zones covering the subway system. In a typical .zone, one Lieutenant is assigned to command the zone. Two Sergeants are also assigned to the zone. Three shifts of officers per day tour the zone. Beats within the zones are assigned to the individual officers. Beats are reassigned periodically to familiarize the officers with the entire zone. Officers patrol their beats alone and on foot.
     
      
      . Dr. Davis is an expert exercise physiologist who has extensive experience in designing physical fitness employment tests for various law enforcement agencies.
     
      
      .Dr. Davis initially decided that an aerobic capacity of 50 mL/kg/min was necessary to perform the job of SEPTA transit police officer. After determining that institution of such a high standard would have a draconian effect on women applicants, however, Dr. Davis decided that the goals of SEPTA could be satisfied by using a 42.5 mL/kg/min standard.
     
      
      . SEPTA contends that it did not seek applicants in 1992. Credited testimony was of- ' fered, however, that each' of the six or seven women who ; took the 1.5 mile test in 1992 failed. Relying on this testimony, the District Court found that the disparate impact on women was slightly more pronounced than the 1991, 1993, and 1996 figures reflect. See Lanning, 1998 WL 341605 at *28.
     
      
      . For example, one proffered study showed that approximately 47% of men between the ages of 20 to 29 can perform a 1.5 mile run in 12 minutes where only 12% of women in the same age category can achieve this time. As noted by the District Court, testimony was offered that this study may not be entirely reliable because the women who participated in the study were predominately white women of higher socioeconomic status.' Other research studies, however, were offered which show that men generally have a higher- aerobic rate than women due to physiological differences between the sexes.
     
      
      .The District Court pointed to one document, for example, indicating that between July 1, 1994 and August 22, 1995, the percentage of uniformed personnel who failed- the fitness test was as follows: a) Age group 20-30: 10% of all officers; b) Age group 30-40: 30% of all officers and 12% of all supervisors; c) Age group 40-50: 45% of all officers and 52% of all supervisors; d) Age group 50-60: 55% of all officers and 40% of all supervisors. See Lanning, 1998 WL 341605 at *31.
     
      
      . “Arrest rates" were tabulated by expressing the number of arrests made by an officer as a percentage of the number of incident reports involving that officer. See App. at 3040-41 (Siskin Expert Report).
     
      
      . The category of “serious crimes" includes homicide, rape, robbery, aggravated assault, burglary, theft, and auto theft. This category of arrests accounts for approximately ten percent of all reported incidents and seven percent of all reported arrests. See App. at 3040. (Siskin Expert Report).
     
      
      . On appeal, SEPTA offered evidence to establish that the individual female applicants who failed SEPTA’s 1.5 mile run demonstrated a cavalier attitude in preparing for and taking the test. As aptly noted by plaintiffs’ counsel at oral argument, this evidence has no bearing upon our analysis in this appeal because SEPTA has conceded that its test has a severe disparate impact on women.
     
      
      . Prior lo Dothard, the Court included some language related to the business necessity doctrine in Washington v. Davis, 426 U.S. 229, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976), an equal protection case. Because Washington is not a Tide VII case, however, we cannot treat the language in Washington as reflective of the prc-Wards Cove business necessity doctrine applicable to Title VII cases.
     
      
      . Two cases prior to Wards Cove forecast some of the changes to come. In New York City Transit Auth. v. Beazer, 440 U.S. 568, 99 S.Ct. 1355, 59 L.Ed.2d 587 (1979), the Court disposed of a Title VII case by holding that the plaintiffs failed to establish a prima facie case of disparate impact. The Court, however, commented on the business necessity doctrine in dicta. In a footnote, the Court stated that even if a prima facie case had- been established, the employer would have shown business necessity by establishing that its practice significantly serves its legitimate business goals of safety and efficiency. Beazer, 440 U.S. at 587 n. 31, 99 S.Ct. 1355. Similarly, a plurality opinion in Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988), suggested that employers could meet their burden of establishing business necessity simply by advancing a legitimate business reason for the practice in question. Watson, 487 U.S. at 998, 108 S.Ct. 2777. While the language in these cases clearly foreshadowed the Court’s holding in Wards Cove, this language had never been embraced by a majoriLy of the Court as the binding standard for business necessity prior to Wards Cove.
      
     
      
      . See Andrew C. Spiropoulos, Defining the Business Necessity Defense to the Disparate Impact Cause of Action: Finding the Golden Mean, 74 N.C. L.Rev. 1479, 1516-20 (1996)(outlining the respective positions of both sides to the debate); compare also Michael Carvin, Disparate Impact Claims Under the New Title VII, 68 Notre Dame L.Rev. 1153 (1993)(arguing that Wards Cove is still good law after Civil Rights Act of 1991); with Susan S. Grover, The Business Necessity Defense in Disparate Impact Discrimination Cases, 30 Ga. L.Rev. 387 (1996)(arguing for a strict business necessity standard under the Act); Note, The Civil Rights Act of 1991: The Business Necessity Standard, 106 Harv. L.Rev. 896 (1993)(asserting that Wards Cove does not survive the Act).
     
      
      . We are cognizant that a contrary argument has been advanced in which it is asserted that Wards Cove remains the controlling standard. See Carvin, supra note 12, at 1157-64. Pursuant to the argument, the business necessity standard announced in Wards Cove simply clarified Griggs and therefore is not inconsistent with the Act’s command to apply the standard enunciated in Griggs. In addition, it is asserted that due to the legislative history of the Act, it would be improper to apply a strict business necessity standard. This argument, however, ignores two important aspects of the Act which constrain our interpretation of the standard adopted. First, the interpretive memorandum’s distinction between Griggs and Wards Cove casts significant doubt on the assertion that Congress read Wards Cove as simply a clarification of Griggs. Second, the Act precludes us from considering the legislative history upon which this argument relies for support. Accordingly, we find this argument to be devoid of merit.
     
      
      . For an interesting discussion on male-oriented biases in the labor market see Maxine N. Eichner, Getting Women Work That Isn't Women's Work: Challenging Gender Biases in the Workplace Under Title VII, 97 Yale LJ. 1397 (1988). See also, Hurley v. The Atlantic City Police Dept., 174 F.3d 95, 104 n. 5 (3d Cir.1999)(noting egregious sexual harassment to which a female police officer was subjected by her male colleagues); Mazus v. Department of Transp., Com. of Pa., 629 F.2d 870, 876 (3d Cir.1980)(Sloviter, J., dissenting)(noting allegations demonstrating prevalent male attitude that construction work is not the "type of work” women should perform).
     
      
      . We need not be concerned that implementation of this standard will result in forcing employers to adopt quotas, a result that would be inconsistent with the mandates of Title VII. If an employer can demonstrate that its discriminatory cutoff score reflects the minimum qualifications necessary for successful job performance, it will be able to continue to use it. If not, the employer must abandon that cutoff score, but is free to develop either a non-discriminatory practice which furthers its goals, or an equally discriminatory practice that can meet this standard. Nothing in the Griggs business necessity standard requires employers to hire employees in numbers to reflect the ethnic, racial or gender make-up of the community.
      The following example based up'on the facts of this case illustrates this point. Assuming that SEPTA’s 1.5 mile run has a disparate impact on women and that SEPTA can not show that the 12 minute cutoff measures the minimum aerobic capacity necessary to -be a successful transit officer, it does not follow that SEPTA would then be required to hire women in equal proportion to men. Several options would be available to SEPTA. For example, SEPTA could: 1) abandon the test as a hiring requirement but maintain an incentive program to encourage an increase in the officers' aerobic capacities; 2) validate a cutoff score for aerobic capacity that measures the minimum capacity necessary to successfully perform the job and maintain incentive programs to achieve even higher aerobic levels; or 3) institute a non-discriminatoiy test for excessive levels of aerobic capacity such as a test that would exclude 80% of men as well as 80% of women through separate aerobic capacity cutoffs for the different sexes. Each of these options would help SEPTA achieve its stated goal of increasing aerobic capacity without running afoul of Title VII and none of these options require hiring by quota.
     
      
      .Relying upon Spurlock v. United Airlines, Inc., 475 F.2d 216 (10th Cir.1972), and like cases from our sister courts of appeals, the dissent asserts that this standard should not apply to SEPTA because the job of SEPTA transit officer implicates issues of public safety. Under the Act, however, our interpretation of the business necessity language is limited to "the concepts enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).” See 137 Cong. Rec. 28,680 (1991)(emphasis added). Because the Supreme Court never adopted the holding of Spurlock prior to Wards Cove, its is clear that, under the Act, we are not to consider Spurlock as authoritative. Furthermore, if Congress had intended to endorse the holding of Spurlock, it could have done so affirmatively. Accordingly, because the Act limits our interpretation to Supreme Court jurisprudence and does not otherwise endorse Spurlock, we are not at liberty to adopt the holding of Spurlock at this juncture. Moreover, to the extent that Spurlock and other cases from our sister courts of appeals can be read to suggest that minimum qualifications do not apply to certain types of employment, these cases are inconsistent with the teachings of Griggs and are accordingly uninformative under the Act.
      Furthermore, to the limited extent that the Supreme Court’s pre-Wanis Cove jurisprudence instructs that public safety is a legitimate consideration, application of the business necessity standard to. SEPTA is consistent with that jurisprudence because the standard itself takes public safety into consideration. If, for example, SEPTA can show on remand that the inability of a SEPTA transit officer to meet a certain ■ aerobic level would significantly jeopardize public safety, this showing ■ would be relevant to • determine if that level is necessary for the successful performance of the job. Clearly a SEPTA officer who poses a significant risk to public safety could not be considered to be performing his job successfully. We are accordingly confident that application of the business necessity standard to SEPTA is fully consistent with the Supreme Court's pre-Wards Cove jurisprudence as required by the Act.
     
      
      . See supra note 11.
     
      
      . While relying predominately upon Dr. Davis’ expertise, the District Court does point to a study which Dr. Davis completed for Anne Arundel County, Maryland in which he concluded that a 42.5 mL/kg/min.aerobic capacity predicted success as an Anne Arundel County police officer. Absent a finding that the work of an Anne Arundel County police officer is comparable to SEPTA transit officer work, a finding the District Court did not ■ make, reliance on this validation study is misplaced. See 29 C.F.R. § 1607.7(B)(2); see also .29 C.F.R. § 1607.7(B)(3)(explaining that validation studies created for other employers must also include a study of "test fairness”). Furthermore, it is unclear from Dr. Davis’ report whether the Anne Arundel study's 42.5 mL/kg/min cutoff actually measures for qualities significant to SEPTA transit police performance. Compare App. at 3134 (Davis Report) (noting that 42.5 mL/kg/min level for Anne Arundel study is significant for carrying an unspecified amount of weight and generally effecting arrests) with App. at 3132 (Davis Report) (stating ”[t]ransit police officers are more likely to have incidents come to them, as opposed to responding to the scene of an event. By mission, the presence of the officer is that of a deterrent, maintaining maximum visibility. Occasionally, officers will come upon criminal activities such as assaults or robberies, but for the most part, the officer will attempt to control a situation such as disorderly conduct or force compliance (paying fares) without having to make an arrest.”); see also App. at 3139 (Davis Re-porL)(quoting experienced officer as stating "[t]he most important factors in my opinion of being a good officer is to be able to think clearly at all times an [sic] verbalize and or articulate when dealing with all people.... Running quickly is physically demanding, although in the transit system, most dealings are close, physical altercations.”). In addition, it is unclear from the record whether the Anne Arundel study itself was properly validated.
     
      
      .The danger of allowing an employer to carry its burden by relying simply upon an expert’s unvalidated judgment as to an appropriate cutoff score in a testing device is illustrated by this case. In determining an appropriate cutoff for aerobic capacity, Dr. Davis rejected the SMEs' estimate of the minimal qualifications necessary to perform the job even though these SMEs were experienced transit officers. Dr. Davis then determined that "a SEPTA transit officer needs an aerobic capacity of 50 ml/kg/min to successfully perform a number of tasks.” tanning, 1998 WL 341605 at *16 (emphasis added). Dr. Davis, however, revised this requirement, finding that "the goals of SEPTA could be satisfied by using a 42.5 mL/kg/min standard” after determining that the higher limit would have a "draconian” effect on women. Id. There is no indication in the District Court’s opinion as to how Dr. Davis determined that the lower standard would be sufficient. Where, as here, the cutoff score chosen has a discriminatory disparate impact, Griggs prohibits the establishment of exactly this type of arbitrary barrier to employment opportunities.
     
      
      . The District Court seems to have derived this standard from the Principles for the Validation and Use of Personnel Selection Procedures ("SIOP Principle"), principles published by the Society for Industrial and Organizational Psychology as a professional guideline for conducting validation research and personnel selection. To the extent that the SIOP Principles are inconsistent with the mission of Griggs and the business necessity standard' adopted by the Act, they are not instructive.
     
      
      . The Court has cautioned that studies done in anticipation of litigation to validate discriminatory employment tests that have already been given must be examined with great care due to the danger of lack of objectivity. Albemarle, 422 U.S. at 433 n. 32, 95 S.Ct. 2362. We also have warned in a disparate impact context that "the story statistics tell depends, not unlike beauty, upon the eye and ear of the beholder” and that "we must apply a critical and cautious ear to one dimensional statistical presentation.” Bryant v. International Sch. Servs., Inc., 675 F.2d 562, 573 (3d Cir.1982). A critical evaluation of the statistical studies relied upon by the District Court in this case, reveals several aspects of these studies that we find to be, at a minimum, disconcerting.
      The following concerns are only a representative sample of possible deficiencies in these studies: 1) While the ability to make an arrest may be an important aspect of the job, the absolute number of arrests or "arrest rates” do not necessarily correlate with successful job performance. See App. at 3132 (noting that SEPTA officer should generally attempt to control a situation without having to make an arrest); 2) The study on arrests and arrest rates examined a disproportionately large number of officers with an aerobic capacity over 42 mL/kg/min compared to the number of officers with an aerobic capacity under that level which likely skewed the results. See, e.g., App. at 3053 (comparing arrests of 231 officers with aerobic capacities under the 42 mL/kg/min with arrests of 813 officers with aerobic capacities over the 42 mL/kg/min)'; see also, 29 C.F.R. § 1607.14(B)(6)(noting that “[r]eliance upon a selection procedure which is significantly related to a criterion measure, but which is based upon a study involving a large number of subjects and has a low correlation coefficient will be subject to close review if it has a large adverse impact.”); 3) The comparison of aerobic, capacity with commendations is not helpful absent finding as to the subjective considerations involved in awarding commendations. See Al-bemarle, 422 U.S. at 432-33,. 95 S.Ct. 2362; 4) The studies' emphasis on arrests for "serious crimes” is suspect; these arrests account for only 7% of all arrests and therefore represent only a small aspect of job. See generally 29 C.F.R. § 1.607.14(B)(6)(noting that reliance on single selection instrument which is related to only one of many job duties will be subject to close réview); 5) SEPTA’s table on the field performance of its officers belies the contention that there is a strict linear relationship of arrests to aerobic capacity; officers at less than 37 mL/kg/min had an average arrests of 13.6 compared to officers with at least a 48 mL/kg/min level who had average arrests of 13.9. See App. at 3065 (Defendant's Exhibit 52D); 6) The study on the average aerobic capacity of perpetrators has little meaning unless SEPTA can show that arrests of these perpetrators are typically aerobic contests; because SEPTA police are armed, such a showing is unlikely.
      Because we are remanding for the District Court to reconsider this evidence in light of the Griggs standard, we need not rule on whether any of the District Court's prior findings as to these studies were clearly erroneous. We comment here on the validity of these studies only to draw the District Court's attention to these concerns and to encourage the District Court to take a critical look at these studies, if necessary, on remand.
     
      
      . Such a result has the potential to have a significant detrimental impact on the amount and type of employment opportunities available to women. Obviously, under a "more is better” theory, employers such as police departments, fire departments and correctional facilities could develop physical tests with unnecessarily high cutoffs that would effectively exclude women from their ranks. Perhaps less obvious, however, is the impact that this result could . have on industries where strength even minimally related to the job in question. For example, all companies engaged in delivery, construction or any other type of physical labor would be permitted to develop unnecessary strength requirements on the theory that "more is better” or "the stronger the worker, the faster the job gets done.” This result is clearly unacceptable given the policies underlying both Title VII and the disparate impact theory of discrimination.
     
      
      . This is not to say that studies that actually prove that "more is better” are always irrelevant to validation of an employer’s discriminatory practice. For example, a content validated exam, .such as a typing exam for the position of typist, which demonstrates that the applicants who score higher on the exam will exhibit better job performance may justify a rank-ordering hiring practice that is discriminatory. In such a case, a validation study proving that "more is better” may suffice to validate the rank-order hiring. This is true, however, in only the rarest of cases where the exam tests for qualities that fairly represent the totality of a job’s responsibilities. It is unlikely that such a study could validate rank-hiring with a discriminatory impact based upon physical attributes in complex jobs such as that of police officer in which qualities such as intelligence, judgment, and experh ence surely play a critical role. This is especially true in SEPTA’s case, where the record indicates that SEPTA patrol officers encounter “running assists,” the most strenuous task upon which SEPTA’s aerobic capacity testing predominately was justified, at an average rate of only twice per year. Compare Lan-ning, 1998 WL 341605 at *5 (finding that SEPTA has approximately 380 running assists per year) with id. at *27 (noting that SEPTA has 190 patrol officers).
     
      
      . In addition to the law review commentaries cited by the majority, see also Rosemary Alito, Disparate Impact Discrimination Under the 1991 Civil Rights Act, 45 Rutgers L.Rev. 1011, 1033 (1993) ("Only ... cases requiring proof of job-relatedness and a reasonable need for the challenged practice accord[ ] with both the statutory language of the 1991 Act and the applicable Supreme Court precedent.”); Kingsley R. Browne, The Civil Rights Act Of 1991: A "Quota Bill," A Codification Of Griggs, A Partial Return To Wards Cove, Or All Of The Above?, 43 Case W. Res. L.Rev. 287, 349 (1993) ("business necessity” has the same meaning as the Wards Cove phrase "serves, in a significant way”); Linda Lye, Comment, Title VII’s Tangled Tale: The Erosion and Confusion of Disparate Impact and the Business Necessity Defense, 19 Berkeley J. Employment & Lab. L. 315, 358 (1998) (a challenged practice must be a "reasonable predictor of effective performance of job duties,” defined in light of "important business goals”).
     
      
      . The fear of quota hiring was behind the President's refusal to sign earlier versions of the bill. See Statement of President George Bush Upon Signing S. 1745, reprinted in 1991 U.S.C.C.A.N. 768 (stating that the Act promotes the goals of ridding discrimination, allowing employers to hire, on the “basis of merit and ability without the fear of unwarranted litigation,” without leading to quotas or incentives for needless litigation). For a discussion of the drafting of the Civil Rights Act of 1991, see, 2 Lex K. Larson, Employment Discrimination § 23.04[1] (2d ed.1999). For ' analysis of the rejected 1990 bill, see Cynthia L. Alexander, The Defeat of the Civil Rights Act of 1990: Wading Through the Rhetoric In Search of Compromise, 44 Vand. L.Rev. 595 (1991).
     
      
      . The District Court rejected as irrelevant the plaintiffs’ evidence that incumbent officers had failed the physical fitness test yet successfully performed the job and that other police forces function well without an aerobic capacity admission test. See Lanning, 1998 WL 341605 at *68-*70. Under the standard implicit in Griggs and incorporated into the Act, this evidence tends to show that SEPTA’s cutoff score for aerobic capacity does not correlate with the minimum qualifications necessary to perform successfully the job of SEPTA transit officer. Accordingly, this evidence is relevant and should be considered by the District Court on remand.
     
      
      . See Peter Brandon Bayer, Mutable Characteristics and the Definition of Discrimination Under Title VII, 20 U.C. Davis L.Rev. 769, 822 & n. 213 (1987) ("Both the Supreme Court and lower court rulings offer a confusing patchwork of seemingly conflicting standards.”).
     
      
      . See Andrew C. Spiropoulos, Defining the Business Necessity Defense to the Disparate Impact Cause of Action: Finding the Golden Mean, 74 N.C. L.Rev. 1479 (1996).
     
      
      
        . In the analogous context of the defense of bona fide occupational qualification, the Supreme Court has stated: " 'The greater the safety factor, measured by the likelihood of harm and the probable severity of that harm in case of an accident, the more stringent may be the job qualifications... .’ ” Western Air Lines, Inc. v. Criswell, 472 U.S. 400, 413, 105 S.Ct. 2743, 86 L.Ed.2d 321 (1985) (quoting with approval Usery v. Tamiami Trail Tours, Inc., 531 F.2d 224, 236 (5th Cir.1976)).
     
      
      . See, e.g., York v. American Telephone & Telegraph Co., 95 F.3d 948, 952, 959 (10th Cir.1996) (powerhouse operating engineers); Zamlen v. City of Cleveland, 906 F.2d 209, 217 (6th Cir.1990) (firefighters); Hamer v. City of Atlanta, 872 F.2d 1521, 1535 (11th Cir.1989) (firefighters); Levin v. Delta Air Lines, Inc., 730 F.2d 994, 997-98 (5th Cir.1984) (flight attendants); Chrisner v. Complete Auto Transit, Inc., 645 F.2d 1251, 1261-63 (6th Cir.1981) (truck yard employees); Harriss v. Pan American World Airways, Inc., 649 F.2d 670, 676 (9th Cir.1980) (flight attendants); McCosh v. City of Grand Forks, 628 F.2d 1058, 1063 (8th Cir.1980) (police); Boyd v. Ozark Air Lines, Inc., 568 F.2d 50, 54 (8th Cir.1977) (airline pilots); see also Alito, supra, at 1033-35 & n. 100.
     
      
      . It is interesting that in the legislative history of the original text of Title VII, congressional advocatés argued that "title VII Would not require, and no court could read title VII as requiring, an employer to lower or change the occupational qualifications he sets for his employees....” 110 Cong. Rec. 7246-47 (April 8, 1964) (interpretive memorandum of Sen. Case). Senators Clark and Case stated that the “employer may set his qualifications as high as he likes....” Id. at 7213 (April 8, 1964) (interpretive memorandum of Sens. Clark and Case). Senator. Humphrey stated that "[t]he employer, not the Government, will establish the standards.” Id. at 13088 (June 9, 1964). Thus, the legislative history of Title VII “clearly reveals that Congress was concerned about preserving employer freedom, and that it acted to mandate employer color-blindness with as little intrusion into the free enterprise system as possible.” Contreras v. City of Los Angeles, 656 F.2d 1267, 1278 (9th Cir.1981).
     
      
      . Although the government is a plaintiff in this dispute, I would note that some agencies take a somewhat different tack on the issue of aerobic fitness. The U.S. Forest Service, for instance, requires firefighters to have an aero-
      bic capacity of 45 to 48 mL, and recommends one of up to 50. See United States Department of Agriculture, Forest Service, Technology & Development Program, Fitness and Work Capacity 51 (2d ed.1997). Notably, that agency currently uses a 1.5 mile run test. See id. at 50-51.
      Also, the Presidential Physical Fitness Award is available to children who meet the 85th percentile of fitness by meeting target levels in events such as a one-mile run. See Qualifying Standards (updated Oct. 15, 1998). http://www.indiana.edu/# Al# preschal/quali-fying.html>.
      The Centers For Disease Control and Prevention lament that more than 60% of U.S. adults do not engage in the recommended amount of activity, and 25% are not active at all. See Physical Activity and Health, Adults (viewed May 7, 1999) hup:// www.cdc.gov/nccdphp/sgr/ adults.htm>.
     
      
      . The plaintiffs also suggest that SEPTA’s validation studies were insufficient. However, strict compliance with the EEOC Guidelines is not necessary in all cases. See Beazer, 440 U.S. at 587 n. 31, 99 S.Ct. 1355; Washington, 426 U.S. at 250-51, 96 S.Ct. 2040. In cases involving public safety, courts have held that empirical validation is not required. See Boyd, 568 F.2d at 54.
     
      
      . An order of the District Court may be affirmed on alternative grounds where the judgment is supported by the record below. See Guthrie v. Lady Jane Collieries, Inc., 722 F.2d 1141, 1144-45 & n. 1 (3d Cir.1983) (citing Helvering v. Gowran, 302 U.S. 238, 245, 58 S.Ct. 154, 82 L.Ed. 224 (1937)).
     
      
      . The Civil Rights Act of 1991 presents another potential barrier to the relative fitness test. Subsection 2000e-2(Z) prohibits "in connection with the selection or referral of applicants or candidates for employment ... to ... use different cutoff scores for ... employment related tests on the basis of ... sex[.]” By its plain language, 42 U.S.C. § 2000e-2(Z) arguably prohibits a relative fitness test. The District Court concluded that this provision did not apply. I have some doubt on that ruling, but need not reach that issue because I would affirm on other bases.
     