
    Ann BRUNET, et al., Plaintiffs, v. CITY OF COLUMBUS, et al., Defendants.
    No. C-2-84-1973.
    United States District Court, S.D. Ohio, E.D.
    May 13, 1986.
    Supplemental opinion and order May 30, 1986.
    On motion to stay July 14, 1986.
    
      Alexander Spater and Kathaleen Schulte, Spater, Gittes & Terzian, Columbus, Ohio, for plaintiffs.
    Eileen A. Groves, Asst. City Atty., Columbus, Ohio, for defendants.
   OPINION AND ORDER

KINNEARY, District Judge.

In this action, the named plaintiffs and the class of similarly situated women that they represent challenge certain parts of the tests used by the City of Columbus to select entry-level firefighters since 1979. Plaintiffs Ann Brunet, Lynn Shearrow, Rebecca Schumacher and Edwina Hornung took the tests administered in 1980 and 1984. None of the plaintiffs was selected as a firefighter. Plaintiffs contend in this litigation that they were subjected to discriminatory tests in 1980 and 1984. The defendants are the City of Columbus; the Columbus Civil Service Commission; Dana Rinehart, Mayor of Columbus; and Alphonso Montgomery, Safety Director. For convenience, the defendants are often referred to as “the City”. This action was originally brought under Title VII of the Civil Rights Act of 1964, 42 U.S.C. §§ 2000e et seq.) later the complaint was amended to include a claim under 42 U.S.C. § 1983. Plaintiffs seek injunctive and backpay relief on behalf of themselves and the class of women they represent.

Plaintiffs Shearrow, Schumacher and Hornung applied for and took the firefighter selection tests in 1980. Based upon their scores on the exam, plaintiffs were placed upon a rank-ordered list of white applicants, to be selected for further consideration in order from that list. Pursuant to this Court’s Decree in Dozier v. Chupka, 395 F.Supp. 836 (S.D.Oh.1975) (Kinneary, J.), the City has maintained dual hiring lists for black and white applicants for firefighter and one-for-one hiring from those lists to remedy past racial discrimination. Of a total of 626 applicants ranked on the 1980 list, Shearrow ranked 193, Hornung ranked 319, and Schumacher ranked 571. Jt. Ex. 1. Plaintiffs Shearrow and Schumacher timely filed charges of discrimination with the Ohio Civil Rights Commission, Jt. Ex. 30-31, and received right-to-sue letters from the Equal Employment Opportunity Commission. Tr. 214.

Plaintiff Ann Brunet took the entry-level firefighter test held in 1984. She was ranked 464 on the list of non-black applicants. Jt. Ex. 5. Like the other plaintiffs, she was not selected as a firefighter. She timely filed a charge of discrimination and received a right-to-sue letter. Jt. Ex. 32, 26.

In both 1980 and 1984, the firefighter examination consisted of a written examination and a physical test. In 1980, the written test consisted of four sub-tests: a reading comprehension test, a mechanical reasoning test, and two psychological profiles. Stip. # 11. The reading comprehension test was pass/fail; the remaining three tests were scored, and weighted equally to make up 70% of an applicant’s total score. Stip. # 12. The physical test consisted of seven events, six of which were scored. Timed scores were used to compute a physical exam score which constituted 30% of an applicant’s total score. Stip. # 15. In 1984, a few changes were made, but the general approach remained the same. The written test consisted of a reading comprehension test and mechanical reasoning test, both of which were scored, and weighted equally to constitute 70% of an applicant’s total score. Stip. # 31. The physical test was composed of the same events as in 1980 with the exception of one event, which was dropped. As in 1980, the score on the physical test constituted 30% of an applicant’s total score. Stip. #35.

In both years, applicants were ranked in order of their total score on separate eligibility lists for white and black applicants. Stip. #23, 36. From time to time, applicants were taken from the lists in order of their rank to be certified to the Columbus Director of Public Safety for consideration for appointment as firefighters. Before being so certified, however, in both 1980 and 1984, applicants were required to pass a ladder test — which involved climbing a ladder to a height of five stories and descending — and a bicycle ergometer test— which measured heart rate in response to physical stress. Stip. # 24-26, 37. In addition, applicants were required to pass a medical examination and a background check, and to undergo an interview with a board comprised of members of the Division of Fire. Stip. #27, 37. Applicants who met these requirements were then appointed as firefighters, as necessary, in the order of the ranking upon the dual lists. Stip. # 28, 38. During the life of the 1980 lists, a total of 109 applicants were appointed as firefighters, four of whom were female. Stip. # 29. One hundred and twenty-six appointments, including two females, were made from the 1984 list. Stip. # 39.

Plaintiffs challenge two components of the firefighter examination: the physical test and the mechanical reasoning test, as discriminatory against female applicants. Plaintiffs contend that the lower scores earned by female applicants on these two components contributed substantially to lower total scores, with the result that fewer female applicants were ultimately selected. Further, they contend, these test components have not been shown by the City to reflect accurately the actual requirements of the job of firefighter.

In their amended complaint, plaintiffs set forth two legal theories. First, they contend that the tests employed by the City have an adverse impact upon female applicants and are not job related. First Amended Complaint, ¶ 6. This is a theory of prohibited disparate impact under Title VII. Second, plaintiffs contend that the discriminatory acts of the defendants are intentional and violate § 1983. Plaintiffs did not seriously pursue the claim of intentional discrimination at trial or in their post-trial memorandum. In Part I of this Opinion, the Court briefly states its reasons for concluding that plaintiffs have failed to produce sufficient evidence to justify a finding that the defendants engaged in intentional discrimination against women in connection with recruitment of firefighters.

This leaves plaintiffs’ adverse impact theory for consideration. In Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975) the Supreme Court described the burdens of the parties in such a disparate impact case as follows:

In Griggs v. Duke Power Co., 401 U.S. 424 [91 S.Ct. 849, 28 L.Ed.2d 158] (1971), this Court unanimously held that Title VII forbids the use of employment tests that are discriminatory in effect unless the employer meets “the burden of showing that any given requirement [has] ... a manifest relationship to the employment in question.” Id., at 432 [91 S.Ct. at 854], This burden arises, of course, only after the complaining party or class has made out a prima facie case of discrimination, i.e., has shown that the tests in question select applicants for hire or promotion in a racial pattern significantly different from that of the pool of applicants. See McDonnell Douglas Corp. v. Green, 411 U.S. 792, 802 [93 S.Ct. 1817, 1824, 36 L.Ed.2d 668] (1973). If an employer does then meet the burden of proving that its tests are “job related,” it remains open to the complaining party to show that other tests or selection devices, without a similarly undesirable racial effect, would also serve the employer’s legitimate interest in “efficient and trustworthy workmanship.” Id., at 801 [93 S.Ct. at 1823].

Id., at 425, 95 S.Ct. at 2375; accord, Harless v. Duck, 619 F.2d 611, 616 n. 6 (6th Cir.), cert. denied, 449 U.S. 872, 101 S.Ct. 212, 66 L.Ed.2d 92 (1980). The burdens are identical in a case involving alleged discrimination on the basis of sex. Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S.Ct. 2720, 2726, 53 L.Ed.2d 786 (1977).

The defendants have argued that the plaintiff class has failed to meet its initial burden of showing adverse impact from either the 1980 or 1984 examinations. Upon consideration of the evidence and the arguments of the parties, the Court concludes, in Part II of this Opinion, that the defendants’ arguments are partially meritorious. With respect to the 1980 examination, female applicants who had completed the testing process were selected at essentially the same rate as were similarly situated male applicants. In the judgment of the Court, this fact is fatal to any claim that the 1980 testing and selection process had an adverse impact upon female applicants. However, the Court further concludes that plaintiffs have carried their initial burden of showing that the 1984 testing and examination process had an adverse impact upon female applicants. As a result of these determinations, only plaintiffs’ Title VII claim regarding the 1984 examination remains for consideration.

As a result of plaintiffs’ demonstration of adverse impact in the 1984 firefighter examination, it becomes defendants’ burden to show that the tests reflect the actual requirements of the job. This burden is often expressed by saying that the defendants must demonstrate that the test is job-related or, equivalently, valid. Having considered carefully the testimony at trial, including the testimony of the parties’ respective expert witnesses, and having reviewed the documents submitted as exhibits, the Court concludes, in Part III of this Opinion, that the defendants have failed to demonstrate the job-relatedness of the 1984 physical examination. On the other hand, the Court further concludes that defendants have adequately justified the mechanical reasoning test, which has also been challenged by the plaintiffs.

In the Court’s Opinion, there are two difficulties with the 1984 physical examination. One problem stems from the fact that defendants employ the test scores to rank candidates for selection as firefighters. “Ranking is a valid, job-related selection technique only where the test scores vary directly with job performance.” Williams v. Vukovich, 720 F.2d 909, 924 (6th Cir.1983), citing Guardian’s Association of New York v. Civil Service Commission, 630 F.2d 79, 100 (2d Cir.1980). Many more persons apply for the position of firefighter than there are available places. In these circumstances, relatively small differences in scores can determine whether an individual is selected as a firefighter. If these relatively small differences in test scores reflect likely differences in job performance, then the test is valid, and there is no violation of Title VII. On the other hand, as the Court concludes is the case here, where these differences in scores have not been shown to reflect differences in likely job performance, selection of applicants in accord with such a test is impermissible under Title VII.

In 1975, a report prepared for the City by Battelle concerning hiring criteria for firefighters concluded that physical strength, endurance, agility and health were necessary to perform effectively as a firefighter. Jt. Ex. 24, at 13. The test administered by the City in 1984 is a reasonable test of physical strength in a number of respects that have been shown to reflect the actual physical demands of the job. It appears also to be a reasonable test of health; at least, no one has raised an issue concerning this aspect of the examination process. However, it is a poor test of endurance, and there is no attempt to test agility. The inevitable result of this narrowed focus upon strength is that relatively small differences in strength will tend to determine whether an individual is selected as a firefighter. There is no guarantee, however, that in selecting stronger individuals, individuals with greater endurance and agility are also being selected. Where a test is used to rank individuals for purposes of hiring, it is important that that test cover the range of abilities that are involved in performance of the job. The test administered in 1984 has failed on this count, and is, therefore, invalid when used to rank-order applicants for selection as firefighters.

Having concluded that the defendants have failed to show that the 1984 physical test is job-related, the Court then considers, in Part IV of this Opinion, the remedy to which the plaintiffs are entitled. In light of the absence of substantial evidence of intentional discrimination, the Court concludes that the remedy should be precisely tailored to eliminate the discrimination and restore any individuals to the position they would have occupied but for the discrimination. Accordingly, the Court will order the City to prepare a new physical examination for entry-level firefighters, and to demonstrate its job-relatedness. The City must make the initial decision whether to continue to use a scored physical exam for purposes of ranking, or whether to adopt a pass/fail approach. Whichever approach is adopted, the examination must be approved by the Court before it is administered. Further, before administration of the new examination, the Court will require the City to provide notice, in a form approved by the Court, of this new examination and the results of this decision to all females who had applied to take the 1984 firefighter examinations. After the new examination has been administered and the results of the examination are before it, the Court will consider retroactive relief and back pay. To the extent that women perform better on the new examination, the Court will presume that they would have so performed on the 1984 examination but for defendants’ discrimination. In this circumstance, the Court will fashion a remedy requiring defendants to set aside an appropriate number of places for female applicants in future firefighter classes, and determine the back-pay to be awarded to these applicants. On the other hand, if women as a group perform only as well as, or more poorly than their performance on the 1984 examination, then no retroactive relief would be appropriate.

It is no part of this remedy that the City be required to select women as firefighters in any particular numbers or ratio. Indeed, under Title VII, the gender of an applicant should be irrelevant. As the Supreme Court has explained:

Nothing in the Act [i.e. Title VII] precludes the use of testing or measuring procedures; obviously they are useful. What Congress has forbidden is giving these devices and mechanisms controlling force unless they are demonstrably a reasonable measure of job performance. Congress has not commanded that the less qualified be preferred over the better qualified simply because of minority origins. Far from disparaging job qualifications as such, Congress has made such qualifications the controlling factor, so that race, religion, nationality, and sex become irrelevant.

Griggs v. Duke Power Co., 401 U.S. 424, 436, 91 S.Ct. 849, 856, 28 L.Ed.2d 158 (1971). The issue before this Court is not whether women should be firefighters, or how many women should be firefighters. Rather, the issue is whether the test used by the defendants to select firefighters complies with Title VII. When the defendants administer a valid, job-related examination, that examination will determine how many women are to become firefighters.

I.

Plaintiffs have alleged in their amended complaint that the defendants engaged in intentional discrimination by employing the physical and mechanical reasoning tests to select firefighters and have also addressed this matter in a perfunctory manner in their post-trial memorandum. Plaintiffs contend that intent to discriminate can be inferred from the following evidence. First, prior to 1975, job announcements for the position of firefighter were restricted to males. Tr. 25. Second, only five of 832 firefighters are women. Tr. 203. Third, plaintiffs have presented evidence about bias against women on the part of the Director of the Training Academy. Tr. 198-202; 819-822. It appears that this led to his removal as head of the Training Academy. Tr. 821. Finally, plaintiffs argue that the defendants, at various times, were aware of less discriminatory testing methods than those they were employing, but refused to adopt them.

However, there is substantial evidence in the record showing that the City made efforts to encourage women to apply as firefighters and to complete the selection process. Marie Hardin, Equal Employment Opportunity Administrator for the City, testified at length about her efforts to recruit females to participate in both the 1980 and 1984 selection processes. Tr. 810-819. These efforts included maintaining contact with female applicants after their appointment. Tr. 818. Further, although the Court heard testimony from two incumbent female firefighters, Francisca Figueroa and Yolanda Stewart, no evidence of discriminatory treatment was offered by these witnesses. Tr. 156-184; 770-805. In addition, there appears to be no discrimination against women in the administration of the physical examination, as plaintiff Shearrow admitted in her testimony. Tr. 192. Plaintiff Brunet testified that she was permitted to practice the physical examination before taking it and received hints and assistance from firefighters during those practice sessions. Tr. 224-225.

In light of the evidence before it, the Court cannot draw the inference of intentional discrimination suggested by the plaintiffs. Plaintiffs’ evidence of intent to discriminate is at best impressionistic. Further, there is substantial evidence suggesting the absence of discrimination. Accordingly, judgment must be rendered for the defendants on plaintiffs’ claim of intentional discrimination under § 1983.

II.

In this section of this Opinion, the Court considers whether plaintiffs have met their initial burden of showing that the examinations administered in 1980 and 1984 had an adverse impact upon the class of women they represent. Having considered the evidence before it and the arguments of the parties, the Court concludes that plaintiffs have failed to show adverse impact in the case of the 1980 examination, but have shown adverse impact in the case of the 1984 examination. Because they present separate questions, each examination will be discussed separately.

Prior to trial, defendants filed a motion for partial summary judgment, arguing that plaintiffs had failed to carry their initial burden of showing that the 1980 firefighter’s examination had an adverse impact upon women. This motion was not ruled upon prior to trial. At trial, defendants renewed their contention at the close of plaintiffs’ evidence, seeking dismissal of plaintiffs’ claims arising from the 1980 examination. The Court reserved ruling upon defendants’ motion and now renders its Opinion.

The facts pertinent to defendants’ motion are not in dispute; indeed, they have been stipulated by the parties. In 1980, the Columbus Municipal Civil Service Commission received applications from a total of 1,577 individuals, of whom 83 were females and 1,494 were males. Stip. #9. The Civil Service Commission required all applicants to meet certain minimal requirements, e.g., having completed tenth grade in school. These requirements eliminated eight male applicants and no female applicants. Stip. # 9. Accordingly, 83 female applicants and 1,486 male applicants were invited to the first stage of the 1980 testing process, the written test. Thirty-five female applicants and 387 male applicants failed to appear for the written test. Stip. # 10.

In 1980, the written test consisted of four subtests: a reading comprehension test, a mechanical aptitude test, and two psychological tests. The reading comprehension test was graded pass/fail, and applicants who failed were eliminated from further consideration. Three females and seventy-four males failed this test. Stip. #11. The remaining three tests were scored. All applicants who took the written test, including those who failed the reading comprehension subtest, were invited to the next stage, the physical capabilities test. Of the 48 females invited, 20 failed to appear; 303 of the 1,099 invited males failed to appear. Twenty-eight females completed the physical capabilities test; of these, twenty-five were placed on the 1980 eligibility list. Seven hundred ninety-six males completed the test, and 722 were placed on the eligibility list. Stip. # 13. A total of 109 applicants were appointed as firefighters for the 1980 eligibility lists: four were females and 105 were males. Stip. # 29. These appointments were made from dual lists for black and white applicants according to a process of one-for-one hiring mandated by this Court’s order in Dozier v. Chupka, 395 F.Supp. 836 (S.D.Oh.1975). Stip. # 28. All four female applicants were appointed from the black list.

Defendants argue that, taken as a whole, the 1980 testing process did not have an adverse impact upon women. Of the twenty-eight females who completed the testing process, four — or 14% — were ultimately hired. Of the 804 males who similarly completed the process, 105 — or 13% — were hired. Thus, defendant asserts, when the process is evaluated from the point of view of its ultimate result, there is no detrimental impact upon women. Defendants’ reliance upon hiring ratios among actual applicants appears reasonably grounded in the relevant case law. Berkman v. City of New York, 536 F.Supp. 177, 206 n. 19 (E.D.N.Y.1982) aff'd, 705 F.2d 584 (2nd Cir.1983).

In response, plaintiffs argue that the Court should focus upon the components of the testing process, specifically the physical test, and evaluate the discriminatory impact, if any, of these components. Plaintiffs contend that this approach is compelled by the decision of the Supreme Court in Connecticut v. Teal, 457 U.S. 440, 102 S.Ct. 2525, 73 L.Ed.2d 130 (1982). In addition, plaintiffs offer statistics to show differences in the average scores of men and women on the 1980 firefighter examination. Plaintiffs argue that these differences in mean scores show adverse impact.

With respect to the issue of whether the 1980 firefighter’s test as a whole or its components is the appropriate unit of analysis, it is apparent to the Court that the central issue between the parties is the interpretation of Connecticut v. Teal, supra. In Teal, a state agency required that employees achieve a passing score on a written examination in order to be promoted to supervisor. The passing rate on the examination for black candidates was approximately 68% that for white candidates. It was undisputed that the examination, by itself, had an adverse impact upon blacks. Id., at 442, n. 4, 102 S.Ct. at 2528, n. 4. However, the score upon the written examination was not the sole criterion for promotion. Rather, it was used to generate a list of eligible candidates. Selections from the list were made by considering past work performance, recommendations of candidates’ supervisors and seniority. The result of this selection process was that approximately 23% of the black candidates on the eligible list were promoted to supervisor, while only 13.5% of the white candidates were promoted. Id., at 444,102 S.Ct. at 2529. Thus, the state argued — and this was the sole issue before the Supreme Court — that this “bottom line” result should be considered a complete defense to a race discrimination suit. Even though the state had argued that the bottom line result was a defense, the Court construed the issue as whether plaintiffs had made a prima facie case. Id., at n. 7, and p. 451, 102 S.Ct. n. 7, and p. 2532.

The Supreme Court rejected the “bottom line” approach urged by the state. The Court focused upon § 703(a)(2) of Title VII, which provides:

It shall be an unlawful employment practice for an employer to limit, segregate, or classify his employees or applicants for employment in any way which would deprive or tend to deprive any individual of employment opportunities or otherwise adversely affect his status as an employee, because of such individual’s race, color, religion, sex, or national origin.

42 U.S.C. § 2000e-2(a)(2). The Court reasoned that the statute speaks, not in terms of jobs and promotions, but rather “in terms of limitations and classifications that would deprive any individual of employment opportunities.” Id., at 448, 102 S.Ct. at 2531, emphasis in original. Thus, the Court concluded:

When an employer uses a non-job-related barrier in order to deny a minority or woman applicant employment or promotion, and that barrier has a significant adverse effect on minorities or women, then the applicant has been deprived of an employment opportunity “because of ... race, color, religion, sex, or national origin.” ... Relying on § 703(a)(2), Griggs explicitly focused on employment “practices, procedures, or tests,” 401 U.S. at 430 [91 S.Ct. at 853] that deny equal employment “opportunity,” id. at 431 [91 S.Ct. at 853] ... The examination given to respondents in this case surely constituted such a practice and created such a barrier.

Id., at 448-449, 102 S.Ct. at 2531.

Teal differs from the instant case in the respect that the challenged component of the selection process, the written examination, was graded pass/fail. Here, however, the challenged portions of the testing process were given a numerical score, which was used, along with other similar scores, to rank candidates on eligibility lists. Thus, the written examination in Teal constituted a “barrier” in the sense that it precluded candidates from further consideration. The challenged components of the testing process here, even though lower scores on these components may lessen a candidate’s overall chance of acceptance, do not preclude further consideration of that candidate. The question that the Court must decide is whether this difference amounts to a distinction.

For the following reasons, the Court concludes that Teal is distinguishable from the instant case and, therefore, rejects plaintiffs’ contention that the bottom line result does not negate adverse impact. In Teal, the actual holding of the Court is:

[Respondent's claim of disparate impact from the examination, a pass-fail barrier to employment opportunity, states a prima facie case of employment discrimination under § 703(a)(2), despite their employer’s nondiscriminatory “bottom line,” and that “bottom line” is no defense to this prima facie case under § 703(h).

Id., at 452, 102 S.Ct. at 2533. Thus, the holding is limited by its terms to a pass/fail barrier. Concededly, there is language in the opinion that sweeps more broadly. It does not appear that this language is essential to the reasoning of the majority opinion, however. The critical premise in the majority’s reasoning is that the pass/fail subtest eliminated individuals from further consideration.

In addressing the precise issue before the Court, Schlei and Grossman, in their widely cited text on employment discrimination, comment:

It seems probable that Teal’s rejection of the bottom line approach with respect to components that constitute a “pass/fail barrier” to further consideration in the selection process will not be applied to multicomponent selection processes where all candidates complete all components of the process before the selection is made. Although the majority did not specifically address this issue, the Second Circuit decision below, which was affirmed, specifically so held, and the four Justices in dissent so interpreted the majority opinion.

B. Schlei & P. Grossman, Employment Discrimination Law (2nd ed. 1983), at 1377-1378. In Teal, the Second Circuit had written: Teal v. State of Connecticut, 645 F.2d 133, 138 (2nd Cir.1981), aff'd, 457 U.S. 440, 102 S.Ct. 2525, 73 L.Ed.2d 130 (1982), quoting Kirkland v. New York State Dept. of Correctional Services, 374 F.Supp. 1361, 1370 (S.D.N.Y.1974).

Where all of the candidates participate in the entire selection process, and the overall results reveal no significant disparity of impact, scrutinizing individual questions or individual sub-tests would, indeed, “conflictQ with the dictates of common sense.”

In Smith v. Troyan, 520 F.2d 492 (6th Cir.1975), cert. denied, 426 U.S. 934, 96 S.Ct. 2646, 49 L.Ed.2d 385 (1976), the Sixth Circuit held that, where the overall examination process had no disparate racial impact, it was error to require a defendant to prove that a component of the overall process was job-related, even though blacks fared less well on that sub-test. As in the instant case, the score on challenged sub-test was added to scores on other subtests and used to rank eligible candidates. 363 F.Supp. 1131, 1134-1135, 1144-1145 (N.D.Oh.1973). In these circumstances, the Court concluded, the plaintiff had failed to demonstrate prima facie that the test was unlawfully discriminatory. Id. at 497. The Court observed:

Though general ability, or intelligence, tests have often been invalidated for their racially disproportionate impacts ... (cites omitted) ..., the disproportionate impacts have been in the hiring, rather than in the test results in and of themselves.

Id., at 497-498. Teal does not squarely overrule this result, which must, therefore, be considered to be controlling law in this Circuit.

The Uniform Guidelines on Employee Selection Procedures (“Guidelines”), 29 C.F.R. §§ 1607.1 et seq., also support the viéw that individual components of a testing procedure need not be justified by an employer where the entire testing procedure does not have an adverse impact. Where the total selection process does not have an adverse impact,

[t]he Federal enforcement agencies ... will not expect a user to evaluate the individual components for adverse impact, or to validate such individual components, and will not take enforcement action based upon adverse impact of any component of that process, including the separate parts of a multipart selection procedure ...

29 C.F.R. § 1607.4(C). Although not binding upon this Court, the Uniform Guidelines are entitled to substantial deference as the interpretation of the Act by the enforcing agency. Albemarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280 (1975).

Furthermore, there is some question whether Teal should be applied in case, like the instant case, which is brought as a class action. In Coser v. Moore, 587 F.Supp. 572 (E.D.N.Y.1983), aff'd, 739 F.2d 746 (2nd Cir.1984), the district court construed Teal to be inapplicable in the case of a class action by women alleging system-wide discrimination on the basis of sex. The court interpreted Teal as involving a claim by individuals who failed a written examination with a proven adverse impact.

The error of the district court in Teal was to foreclose proven and unrebutted individual claims of discrimination by looking to an employer’s treatment of a group.

Id. 587 F.Supp. at 588, emphasis in original. The case before the Coser court involved an attempt by a class of women to prove sex discrimination in hiring and promotions on a university-wide basis. To prove their case, the plaintiff class presented evidence of under-utilization of women in specific departments and divisions of the university. In response, the university presented evidence of lack of discrimination in university-wide hiring. The plaintiffs argued that this evidence was no defense under Teal. The issue was thus analogous to the issue presented by the instant case.

The court rejected this reliance on Teal, reasoning as follows:

Unlike the individual plaintiffs in Teal, plaintiffs here are a class of women seeking to prove by disparate impact analysis that Stony Brook has a pattern and practice of discrimination against women. If successful, that finding would then enable individual plaintiffs to rely on an inference of discrimination when they seek to prove their individual claims ...
[T]he issue is whether Stony Brook’s neutral criteria have an adverse impact upon a group, and upholding Stony Brook’s defense against plaintiffs’ class action claims would not foreclose valid individual claims of discrimination, as the “bottom line” defense did in Teal.

Id. at 588, emphasis in original. In the instant case as in Coser, plaintiffs are asserting a group claim. It follows that their proof of adverse impact necessarily depends upon the fortunes of the group. Thus, Teal’s focus upon the individual appears misplaced in the context of the instant case.

Alternatively, plaintiffs propose to demonstrate adverse impact of the entire examination and selection process by focusing upon the differences in average scores of men and women on the 1980 firefighter exam. According to calculations made by plaintiffs’ expert Dr. Joseph Cranny, the average (or mean) total score of females on the 1980 examination was 80.13, while the average score for males was 85.21. Jt.Ex. 6. On the physical agility test alone, females averaged 36.00 while males averaged 49.98. Cranny also calculated a correlation of .36 between the score on the physical test and the overall test score. This showed, in his words, that there is a “slight tendency” for people who do well on the physical test to do well on the total test. Cranny Depo. of Dec. 21, 1984, at 27; Tr. 260. Finally, Cranny calculated that the statistical likelihood of these differences in scores arising by chance was extremely small.

Plaintiffs argue that the differences in average scores means that women have less chance of being selected as firefighters than men. This lessened opportunity arises because candidates are selected in order of their scores upon the tests. Furthermore, a significant part of these differences in scores arise from the physical tests challenged in this litigation. Accordingly, plaintiffs conclude that women have been denied an equal opportunity to be considered for the position of firefighter.

Upon consideration, the Court declines to draw the inference of denial of equal opportunity from the differences in average scores. It is surely relevant to note that the actual result of the 1980 selection process was that women were hired at a slightly higher rate than men. It is difficult to ascribe any meaning to the notion of denial of equal opportunity when it is considered in light of this fact. Title VII does not require employers to equalize the probabilities of hiring of the average members of two groups. Rather, it requires that actual individuals enjoy opportunities for employment free from discriminatory barriers.

The reliance upon differences in mean scores is misplaced for an additional reason. There are far more applicants than there are available jobs in the Columbus Fire Department. Consequently, only the applicants earning the highest scores have any realistic chance of being hired. Thus, it is the impact of the examination upon the highest scorers, not the average impact that is significant. See United States v. City of Chicago, 549 F.2d 415, 429 (7th Cir.), cert. denied, 434 U.S. 875, 98 S.Ct. 225, 54 L.Ed.2d 155 (1977). Plaintiffs’ statistical expert admitted at trial that it was possible that there be significant differences in average scores for men and women on a test and yet that selection ratios be essentially the same due to the fact that all selections would occur from only a small region of the distributions. Tr. 376-377.

Plaintiffs contend that the use of mean-difference analysis to show adverse impact was approved by Judge Duncan in Police Officers for Equal Rights v. City of Columbus, 644 F.Supp. 393 (S.D.Ohio 1985). One issue in that case was whether the sergeants promotional examinations administered by the Columbus Police Department had an adverse impact upon black police officers. Dr. Joseph Cranny appeared as an expert witness for the plaintiffs. He sought to show adverse impact by three methods: examination of selection ratios under the %’s rule of the Guidelines, mean difference analysis, and analysis of pass/ fail ratios. The Court concluded that plaintiffs had proven adverse impact under the %’s rule. Id. at 88. The Court also noted that plaintiffs has shown a difference in mean scores. Id. at 89. Thus, the case cannot properly be relied upon to support the contention that mean-difference analysis alone can be relied upon to prove adverse impact. The same is true of Walls v. Mississippi State Dept. of Public Welfare, 542 F.Supp. 281, 293 (N.D.Miss.1982), aff'd in relevant part, 730 F.2d 306 (5th Cir.1984) and Thomas v. City of Evanston, 610 F.Supp. 422, 427 (N.D.Ill.1985), both of which are also cited by plaintiffs.

One case that does support the plaintiffs’ reliance on differences in average scores is Burney v. City of Pawtucket, 559 F.Supp. 1089 (D.R.I.1983). One issue in the case was whether physical agility requirements of a police academy had an adverse impact upon women. In order to graduate from the police academy, a recruit was required to score at least a “C” in each course, including a physical test. The score in the physical test was based equally upon performance upon certain physical tests and the subjective estimate by instructors of the recruit’s achievement and attitude. Id., at 1095-1096. Women earned lower scores on the test than did men. The defendants argued, however, that, notwithstanding their lower scores on the physical tests, all of the women who had entered the academy had graduated. Further, their scores on the physical test did not prevent women from graduating at or near the top of their classes. Id., at 1099. The Court rejected these argument, citing Teal for the proposition that such “bottom-line” arguments were no defense.

This Court is unpersuaded by the reasoning of the Burney court. The plaintiff in Burney had been dismissed from the police academy for accumulating excessive demerits in the physical training program. Id., at 1100. Thus, as to her, adverse impact was established by the fact of her dismissal. The average scores of women on the physical tests are irrelevant to this. More generally, the Burney court, perhaps because it was faced with a case involving an individual claim, appears to have confused the theories of disparate impact and disparate treatment. In any event, it appears that Burney is out of line with the great weight of authority.

Plaintiffs also assert that hiring ratios are unreliable in the instant case due to existence of dual hiring lists for black and white firefighters. The four women hired in 1980 were all selected from the black list. The list of black candidates was substantially shorter than the white list, and thus the process of one-for-one hiring led to hiring from further down the black list. Had there been only one list in 1980, plaintiffs contend, no women would have been hired. Thus, but for the dual lists, no women would have been hired from the 1980 lists.

This argument is beside the point, even though it may well be factually correct. It is beside the point because the narrow issue presently before the Court is whether plaintiffs, as representatives of a class, have proven adverse impact by the 1980 firefighter examination. This is plaintiffs’ initial burden, and must be carried before defendants are required to justify the examination by showing that it is job-related. Whether plaintiffs would more easily have been able to carry their burden had things been different in 1980 is irrelevant. If the class of female applicants in 1980 was not adversely affected by the firefighter examination, then defendants are not liable for their acts connected with the 1980 exam and plaintiff are not entitled to a remedy with respect to that exam. The Court must decide a case such as instant one upon the facts before it, not upon theoretical possibilities. See Schlei and Grossman, supra, at 102 n. 94.

In his testimony at trial, plaintiffs’ expert witness suggested that the equivalence of the hiring ratios in 1980 for male and female applicants was a “complete statistical artifact.” Tr. 282. This artifact arose because selection ratios for both males and females derive from large numbers of applicants and small numbers of appointments. Tr. 281-282. Even if this is correct, it is of no consequence for this case. It is the plaintiffs’ burden to prove adverse impact, not the burden of the defendants to prove absence of adverse impact.

The Court also declines to assign any significance to the fact that 20 of 48 — or 42% — female applicants failed to appear from the physical exam, while only 303 of 1099 males — or about 28% — failed to appear. It is true that courts must be mindful of the possibility of deterrence of applicants before relying upon data regarding actual applicants. Dothard v. Rawlinson, 433 U.S. 321, 330, 97 S.Ct. 2720, 2727, 53 L.Ed.2d 786 (1977). While these numbers might suggest that some female applicants were deterred from appearing for the physical examination, compare Tr. 286-287 and Jt.Ex. 8, this suggestion is not supported by the evidence produced at trial. At trial, Dr. Gerald Barrett testified that he had made an informal survey of fire testing dropouts in the City of Akron. He found that women and blacks tend to drop out of the testing process at a higher rate than white males. He attributed this to a variety of factors, including change of career orientation and increased knowledge about the job of firefighter. Tr. 663-664. This testimony was corroborated by the testimony of Marie Hardin based upon her experiences in Columbus, Tr. 817, as well as the statement of named plaintiff Hornung that she is no longer interested in becoming a firefighter. Tr. 188.

For these reasons, the Court concludes that plaintiffs have failed to prove adverse impact from the 1980 firefighter’s examination. Accordingly, defendants’ motion to dismiss plaintiffs’ Title YII claims regarding the 1980 examination must be GRANTED. Fed.R.Civ.P. 41(b).

As in the case of the 1980 examination, the facts relevant to adverse impact in the 1984 examination have largely been stipulated. In 1984, a total of 2,886 males and 354 females appeared for the written test. Stip. # 32. Four hundred and fifteen males and fifty-two females failed the written test. Consequently, 2,471 males and 302 females were invited to take the physical test. Stip. #33. Of those invited, 1,343 males and 83 females appeared and completed the physical test. Stip. #35. Two females and 124 males have been selected from the 1984 eligibility lists; no further selections from the 1984 list are anticipated. Stip. #39.

In 1984, the selection ratio for women was two out of 83, or 2%; for men, it was 124 out of 1,343, or 9%. The Guidelines have suggested as a rule of thumb that if the selection ratio of the protected group is less than 80% of the selection ratio of the non-protected group, there is likely to be adverse impact in the selection process. 29 C.F.R. § 1607.4(D). Here, the selection rate for female applicants is only about 22% that for male applicants. Further, Dr. Cranny testified at trial that he had performed a chi-square analysis upon these selection ratios, to determine the probability that these observed differences in selection ratios arose by chance. He testified that, using a one-tailed test, the observed difference was significant at the .05 level, that is, that there is only one chance in twenty that it was the mere result of chance. Tr. 285. Dr. Cranny admitted that the chi-square test was not significant if a two-tailed test was employed. Id. Although defendants question this use of a one-tailed test, the Court concludes that it is appropriate where, as here, the raw numbers indicate that women are selected at a lesser rate than men. In these circumstances, the question being asked is whether this apparent difference is real or a statistical artifact. This question is appropriately answered by a one-tailed test. There is no indication in this record that in reality women are being selected a higher rate than men in 1984.

The Court concludes from this showing of violation of the 80% rule and the chi-square analysis that there was adverse impact upon women in the 1984 firefighter examination taken as a whole. This conclusion is corroborated by plaintiffs’ evidence regarding differences in mean scores of men and women upon the exam. Tr. 266-271; Jt.Ex. 7. As explained above, differences in mean scores may properly be relied upon to corroborate a showing of adverse impact by the 80% rule or chi-square analysis. Police Officers for Equal Rights v. City of Columbus, supra, at 88-89. Thus, plaintiffs have met their initial burden with respect to the 1984 firefighter examination.

III.

Because plaintiffs have shown adverse impact upon women in the 1984 examination, it becomes the defendants’ burden to show that the test has a “manifest relationship to the employment in question.” Griggs v. Duke Power Co., 401 U.S. 424, 432, 91 S.Ct. 849, 854, 28 L.Ed.2d 158 (1971). In making this showing, “[t]he touchstone is business necessity.” Id., at 431, 91 S.Ct. at 853. The standard of proof of job-relatedness has been stated by the Supreme Court as follows:

[Discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be “predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.” 29 CFR § 1607.4(c).

Albermarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280 (1975). In this section of this Opinion, the Courts makes its findings of fact and conclusions of law in support of its determination that defendants have failed to carry their burden.

The 1984 Examination

As has been noted, the 1984 firefighter examination consisted of a written test and a physical test. The written test had two components, a reading comprehension test and a mechanical reasoning test. Plaintiffs’ expert, Dr. Cranny, performed a statistical analysis of the scores of men and women on the 1984 examination and its various components. This analysis is not challenged by the defendants. On the total examination, men, as a group, achieved an average score of 78.5, while the average score of women, as a group, was 65.0. Jt.Ex. 7, at 5. Womens’ scores ranged from about 46 to about 82.4 while mens’ scores ranged from 0 to about 93. Jt.Ex. 7, at 5; Tr. 267-269. These total scores were the result of a series of statistical manipulations to standardize the raw scores on the various component tests. Stip. # 36. No issue has been raised regarding the propriety of these statistical manipulations.

The highest score earned by a woman on the 1984 examination was 82.4. Three hundred and fifty-five white males and twenty-one black males earned higher scores. A total of 126 individuals were ultimately hired as firefighters from the 1984 lists. Had there not been dual hiring lists mandated by Court order, consequently, no females would have been hired as firefighters. In fact, two females were hired, both from the black eligibility list.

The differences in male and female total scores resulted primarily from lower female scores on two components of the total test: the physical test and the mechanical reasoning test. Plaintiffs’ challenge is directed to these two components. There was no significant difference between the sexes on the reading comprehension test. Jt.Ex. 7, at 5; Tr. 270. On the mechanical reasoning test, men earned an average score of 19.6, while womens’ scores averaged 15.1. Jt.Ex. 7, at 5. The greatest disparity occurred on the physical test, where men averaged 76.1, while women averaged 44.5. A statistical analysis of test scores by defendants’ expert, Dr. Frank Landy, confirmed what is apparent from the raw numbers: the differences in male and female total scores are due primarily to the differences in scores upon the mechanical and physical tests. Jt.Ex. 11; Tr. 275-278.

The 1984 firefighter physical capability test consisted of seven events. All seven events had pass levels, and it was necessary that a candidate pass in order to be considered for hiring. However, the pass levels were set very low, and it appears very few persons failed the physical exam. Five of the six events were scored. Jt.Ex. 52, 53. The events were administered to groups of applicants, at approximately ten minute intervals.

1) Beam Walk: Applicants were required to walk the length of a twenty foot beam that was four inches wide while carrying a roll of hose. The event was pass/fail only, and was not timed. Three tries were allowed. In 1984, one male failed the beam walk. Stip. # 36. Because virtually everyone passed this event, it is of the little consequence in this litigation.

2) Manual Dexterity: Applicants were required to screw three metal plugs into three threaded intakes on a piece of fire equipment, a multiversal, and then unscrew them. The event was timed, and a higher score was earned by completing the event more quickly. At a minimum, the event had to be completed in two minutes. In 1984, women scored about the same on this test event than did men. The average time of women was 26.8 seconds, while that of men was 25.5 seconds. Jt.Ex. 7, at 5; Tr. 272. This difference, however, was not statistically significant. Tr. 272.

This event had been recommended for inclusion on the test by the Five Training Academy staff, who had experienced problems with recruits who lacked manual dexterity. The suggestion of plaintiffs’ expert, Dr. John Magel, an exercise physiologist, that women might be disadvantaged on this event due to less experience with tools than men, Tr. 467-468, is contradicted by the essential equality of average scores. See also Tr. 735-736. This event is a direct simulation of a common firefighting task, as firefighter Yolanda Stewart testified. Tr. 779-780. Francisca Figueroa suggested that there could be a problem with failing to line up the threads properly if one tried to work too fast on the job. Tr. 164-165. This does not appear to be a serious problem, however.

3) Sandbag Drag or Carry: Applicants were required to carry or drag a sand dummy through a designated serpentine course defined by a line, on the floor running around a number of poles. The dummy was the approximate size of a small duffle bag, with straps to grip, and weighed 125 pounds. The event was timed. Further, if the applicant chose to drag the dummy, or dropped it on any part of the course, the time was doubled. Also, if a pole was knocked over, a two-second penalty was imposed.

Men performed substantially better than women on this test event. The mean time for men was 19.5 seconds, while that of women was 38.2 seconds. Because the event was timed, the lower score is better. This difference is statistically significant. Jt.Ex. 7, at 5; Tr. 272-273.

The event was designed to test an applicant’s ability to drag or carry adults or children. Jt.Ex. 23. The event is an imperfect simulation. It appears from both expert and firefighter testimony that the weight of the bag is reasonable. Tr. 718-723; 784. However, the shape of the bag makes it awkward to carry, depriving individuals of the opportunity to use lifting techniques and leverage. Tr. 468-471; 786-787. An articulated dummy could readily have been used. Tr. 324. There is little sense to be made of doubling the score if the bag was dragged; the testimony at trial was that victims are typically dragged from a building, due to the presence of smoke. Tr. 69; 169; 827-828. No rationale appears for placing a premium on extréme speed; the testimony at trial indicates that the speed necessary depends on the circumstances. Tr. 108-169; 827. This event measures primarily upper body strength and anaerobic capacity.

4) Pike Pole Pull: The applicant pulls on a handle attached to a rope which runs through pulleys and is attached to a 75 pound weight. A repetition consists of pulling down the handle until it strikes a stand, and then returning the handle back to its original position; this, of course, involves lifting and lowering the 75 pound weight the distance of travel of the handle. The entire event lasts one minute. To pass, five repetitions must be completed in that time. The event is also scored: the more repetitions completed, the higher the score.

The scores of males were substantially better than those of females. On the average, men accomplished 58.2 repetitions, while women performed 38.9 repetitions. Jt.Ex. 7, at 5. Unlike the other timed events, here the higher raw score is better. Tr. 273-274.

The pike pole pull is a rough simulation of the actual use of pike poles, a rod with a hook on the end, to rip out walls and ceilings to search for fires. From the testimony at trial, the Court concludes that the seventy-five pound weight reasonably reflects the physical demands of the job. Jt.Ex. 21; Tr. 708-717. However, the simulation of the job is questionable, in a number of respects. Actual use of a pike pole involves both push and pull phases; the event tests only the pull phase. No rationale appears for the requirement of hitting the stand with the pole; this appears to be merely a device for score-keeping without any analogue in the actual use of the pike pole. It also appears that shorter persons — women tend to be shorter than men — were slightly disadvantaged by the event, because they could use their entire body to less advantage. Tr. 475; 715-716. This bias could have been eliminated by making the apparatus adjustable. And, the test appears to over-emphasize speed as compared with actual practice. The experts agree that the event measures upper body strength and the anaerobic capacity of the upper body. Jt.Ex. 20, at 3-4; Tr. 474.

5) Equipment Hoist: The applicant pulls a rope that runs over a roller to lift a sixty-five pound weight to a third-story window. The event is timed, and the more quickly the weight is raised, the better the score. After it is raised, the weight must be gently lowered to the ground. The lowering is not timed; however, if the weight is dropped, a penalty is assessed. In this timed event, the mean score for men was 10.7 seconds; the mean score for women was 26.9 seconds. Jt.Ex. 7, at 5.

The event was designed to simulate raising ladders and hose by means of a hose roller, an actual piece of fire equipment. Jt.Ex. 28. It does not appear, however, that hoisting is done very often in actual firefighting, mainly because the roller takes too long to set up. Tr. 160-161, 175-176, 778. There is no indication why the weight of sixty-five pounds was chosen. Taller persons appear to enjoy a slight advantage in the event. Tr. 729. The experts agree that the test measures primarily muscular strength and anaerobic capacity of the arms. Jt.Ex. 20, at 4; Tr. 476-477.

6) Stairway Climb: The applicant was required to climb six flights of stairs and descend as rapidly as possible while wearing fire gear and carrying equipment, a roll of hose. The fire gear and equipment weighed about forty-seven pounds. The event was timed, and the score depended upon how quickly the event could be completed.

The mean score for men was 65.7 seconds, while that of women was 102.2 seconds. Jt.Ex. 7, at 5. The standard deviation of the men’s score was 12.6; this means that approximately two-thirds of the male applicants in 1984 completed the event in a time ranging between 53.1 seconds and 78.3 seconds. The standard deviation of the womens’ score was 22.6, so that the comparable range was 79.6 to 124.8 seconds.

Firefighters must frequently climb stairs, though it appears that in tall buildings they use elevators when possible. Sometimes firefighters must climb six or more stories. Tr. 763; 788-789. When equipment must be carried up many stories, it is shuttled up two or three flights of stairs at a time in a relay; this operation, called staging, is more efficient. Tr. 887. Firefighters infrequently run up stairs, both for safety reasons and to marshall their energy to perform when they arrive at the fire. Tr. 162-164; 789; 843-844. This test measures anaerobic power to sprint; performance does not depend primarily upon cardiovascular endurance or aerobic capacity. Tr. 505-510, 512; 547; 766-767.

Three events on the test — the sandbag drag or carry, the pike pole pull, and the equipment hoist — measure primarily upper body strength. Further, the test tends to measure anaerobic capacity of the various muscles used. No event measures primarily aerobic capacity. The ten minute resting period between events contributed to the overall anaerobic character of the test. This observation that the various test events tend to measure similar physical abilities is confirmed by the statistical analysis performed by defendants’ expert, Dr. Landy. This shows quite substantial statistical correlations among scores on the various events. Jt.Ex. 11.

Two of these events — the beam walk and manual dexterity test — had no significant impact upon the relative scores of men and women. Thus, the issues in this case turn upon the job-relatedness of four test events: the sandbag drag or carry, pike pole pull, equipment hoist, and stairway climb. These events were timed, with the exception of the pike pole pull, where the number of repetitions determined the score. In all cases, speed was of the essence. On the three timed events, the average time of women was roughly twice that of men. In the pike pole pull, men completed approximately 50% more repetitions, on the average. These differences determined the differences in total score between men and women upon the physical portion of the examination. And, to a substantial extent, they determined the differences between men and women in total score upon the written and physical examinations. Accordingly, the fairness of 1984 physical test stands or falls upon the validity of these four test events.

There is a very scanty record regarding the other component of the 1984 firefighter examination under challenge here, the mechanical reasoning test. In 1984, it consisted of thirty written questions, and was scored by adding the number of questions answered correctly. Stip. # 31. It constituted 35% of an applicant’s total score. Id. There is no indication in the record why this 35% weighting was selected. Nor is there any indication of the content of the mechanical reasoning test, other than that conveyed by its name.

As noted above, women scored less well than men on the mechanical test, averaging 15.1 as against 19.6, respectively. Jt.Ex. 7, at 5. The range of men’s scores from 9 to 29 was somewhat higher than women’s score range, which was from 8 to 23. Scores upon the mechanical reasoning test were highly correlated with total test scores; the correlation coefficient was .85. Jt.Ex. 11.

Test Development

An event that looms large over the test development process is this Court’s judgment and decree in Dozier v. Chupka, 395 F.Supp. 836 (S.D.Oh.1975). In Dozier, this Court concluded that the division of fire had employed standards and criteria for the selection of firefighters that had a racially discriminatory impact upon members of the plaintiff class, black male applicants for the position of firefighter. In 1973, the fire department had administered a written aptitude test to applicants. The Court considered the validation attempts undertaken by the defendants and concluded that the written examination had not been validated. 395 F.Supp. at 854. On April 16,1975, the Court entered a remedial decree enjoining the defendants from further discrimination on the basis of race. Further, the Court ordered the defendants to develop criteria for selection of firefighters and to validate these criteria in compliance with the Equal Employment Opportunity Guidelines on Testing as set forth in 26 C.F.R. §§ 1607.1 et seq. Id. at 859-860.

In 1973, prior to the entry of the Dozier decree, a two-step selection process had been used by the fire department. First, all applicants took a written examination; to be considered further, an applicant must pass that examination. The next step was a physical agility test, which also was initially graded pass/fail. Applicants who passed both tests were then ranked on an eligibility list, their relative position being determined by adding their two test scores and certain bonus points, if any, for military service. Then, a background investigation were conducted. 395 F.Supp. at 840-841. Candidates who were not removed from the eligibility list on the basis of the background investigation were appointed on the basis of total scores. Id., 844; Jt.Ex. 16, at 1-4. With some modifications, this same general approach to firefighter selection, i.e., ranking applicants according to scores on written and physical exams, was used in the 1980 and 1984 firefighter selection process.

At the time the remedial decree was entered in Dozier, the City had hired Battelle to conduct an analysis of the job of firefighter. The City had informed the Court at the time of the Dozier decreee of its intention of doing so, and the Court noted this fact. 395 F.Supp. at 859. Battelle submitted a document entitled “Final Report of Hiring Selection Criteria for the Entry-Level Firefighter” to the Civil Service Commission on June 20, 1975. Jt.Ex. 24. The report proposed hiring selection criteria pertaining to physical abilities, sensory abilities, communication skills, reasoning and judgment skills, and personal and interpersonal characteristics. Id., at 15-22. The report included an assessment of the physical demands of job. This involved weighing equipment, determining hose recoil pressures, and measuring the size of windows through which firefighters must sometimes crawl. The Battelle study did not propose tests for selecting firefighters; rather, it set forth criteria that tests should be developed to measure. The study concluded that strength, endurance and agility were the most important physical characteristics of firefighters. Id., at 9.

Testing of applicants for firefighter was conducted by the Civil Service Commission in 1975 and 1978. Stip. # 1. These were the first tests that were open to female applicants. Prior to 1975 the job announcement for firefighters restricted applicants to males. Tr. 25. In both 1975 and 1978, the Civil Service Commission used a written reading test, a physical agility test, and a battery of tests selected by Dr. Gerald Barrett to determine mechanical comprehension, math ability and certain personality characteristics. Jt.Ex. 16, at 1-7. The physical agility examinations in 1975 and 1978 were graded pass/fail. Tr. 26; Jt.Ex. 48. Candidates were chosen from the dual hiring lists ordered in Dozier v. Chupka in order of their written test scores. Stip. 4-5. In 1978 two females were appointed as firefighters, the first females to be so appointed. Stip. # 7.

At trial, the Court received into evidence certain documents pertaining to the 1975 firefighter examination. The 1975 test consisted of the following events: bent-knee sit-ups; ladder climb; driver capability (to determine if person is of a size to drive a fire truck); weight lift and twist; stairway climb; ladder raise; push-ups; beam walk with hose; and dummy dodge run. Jt.Ex. 48. It does not appear that the job analysis performed by Battelle played any role in developing these test events. The Battelle study had recommended that the physical test should duplicate actual physical activities performed by firefighters, Jt.Ex. 24, at 23, which is not apparent in this test. Further, the test scoring procedures are dated June 1975 and appear to derive largely from recommendations made by the Bureau of Training of the Division of Fire in March of 1975. Jt.Ex. 47. The Battelle report was dated June 20, 1975. There is no evidence in the present record about the content of the 1978 examination.

In late 1978, the City began to develop a new physical test for firefighter. Tr. 28. The impetus for this development was the decision of Judge Duncan in Brandt v. City of Columbus, Case No. C-2-75-425 (S.D.Oh. Oct. 5, 1978), a class action alleging sex discrimination in the Columbus Police Department. In that decision, Judge Duncan concluded that the physical agility test used by the police department for selection of recruits failed to meet the validity standards set forth in the Uniform Guidelines, and, therefore, was unlawful under Title VII. Because of the similarities between the testing procedures struck down in Brandt and those used by the fire department, the City reexamined the test procedures. Tr. 29.

In November 1978, Dr. S. David Kriska, who is in charge of personnel testing for the Civil Service Commission, drafted a memorandum reviewing the adequacy of the current firefighter physical examination in light of Brandt. Jt.Ex. 22. Kriska examined the various test events in light of the Battelle job analysis; this appears, from the record, to be the first time that this was done. He recommended that three test events — the sandbag lift and carry, push-up and sit-up events — be eliminated for lack of job-relatedness. Also, Kriska proposed new events to test upper body strength and endurance, both of which were found relevant by the Battelle study. Plaintiffs’ Ex. 3, at 2; Tr. 40. Kriska also noted that setting of passing scores was likely to be a problem, since there was likely to be an adverse impact upon women. Kriska proposed that a modified test should be given to a random sample of firefighters of various ages as well as a group of women likely to be representative of the probable applicant population to determine pass points. Id., at 4; Tr. 34-35. However, neither the 1980 nor 1984 physical tests has been administered to incumbent firefighters. Tr. 43-45. Indeed, it is unclear whether any of the physical tests from 1975 forward has ever been administered to incumbent firefighters.

In May 1979, Julia Ingram, an employee of the Civil Service Commission, and David Kriska issued a report proposing a physical test for firefighter. Jt.Ex. 49. In preparing the report, they had consulted with Dr. Edward Fox, an exercise physiologist and expert witness in Brandt. Id., at 2; Tr. 28. The proposed test included seven events: a beam walk with hose; ladder climb; ladder draw and carry; hose drag; blind hose follow (crawling in fire gear wearing an opaque face mask, following a hose through a predetermined course); stairway climb; and bicycling. Jt.Ex. 49, at 1-4. Each test event was to be graded pass/fail; failing any event would eliminate the candidate from further consideration. This 1979 proposal represents the culmination of efforts to rethink physical testing of firefighters in light of the Brandt decision.

However, before these 1979 suggestions were accepted, a critical change in thinking and approach occurred. In 1980, a job analysis was performed by Ingram for the Civil Service Commission. Jt.Ex. 18. Based upon this job analysis, Ingram and Kriska concluded that the work of firefighting was largely physical, and that better firefighters were distinguished by the ability to excel while performing physical tasks. Jt.Ex. 50, at 23. Consequently, they recommended to the Civil Service Commission that the physical capability test be made part of the ranking of job candidates. This recommendation was adopted by the Commission in May 1980. There is no indication in the record before the Court that possible greater adverse impact upon women from a scored physical exam was considered.

At approximately the same time, a new firefighter physical examination was proposed for administration in 1980. Development of the test and the Ingram job analysis occurred simultaneously; the job analysis did not precede test development. Tr. 57-59. With one change — the elimination of a furniture push event — the 1984 examination was identical to that administered in 1980. Three of the six events on the 1984 physical test: the beam walk, stairway climb, and sandbag carry, had previously been used in 1975 through 1978 on a pass/fail basis. The stairway climb and sandbag carry were now to be timed and scored. One event, the manual dexterity test, had been suggested by the Fire Training Academy. Jt.Ex. 50, at 1. The remaining two events: the pike pole pull and the equipment hoist were new in 1980. They appear to have been developed as simulations of firefighting tasks; both are tests primarily of upper body strength.

In summary, the 1984 firefighter physical examination evolved from previous physical examinations with the addition of several events to test upper body strength. In three respects, test development departed from reasonable professional standards and practices. First, despite the emphasis upon endurance and agility in the Battelle study, events were not developed to test specifically for these abilities. Second, it is both striking and surprising that the various physical tests since 1975 were never administered to incumbent firefighters in any systematic way. This meant that, to a large extent, test development proceeded in a vacuum. Third, a major change in approach from a pass/fail to a scored physical examination occurred in 1980 without any apparent consideration being given to possible greater adverse impact upon women. This change was purportedly justified by the Ingram job analysis, which will be examined in detail in the next section of this Opinion. In broader perspective, the City had readopted the approach used in 1973 prior to Dozier and Brandt after having experimented briefly with alternative approaches in 1975 and 1978. It appears that Dozier and Brandt had caused little change of approach.

The record contains relatively little detail regarding the development of the mechanical reasoning test used in 1984. Dr. Gerald Barrett, an industrial psychologist, testified that he developed a written test for entry-level firefighters for the City of Akron in 1974 in connection with employment discrimination litigation. The test has been used since then under court supervision. Tr. 652-653. The City of Columbus adopted Barrett’s test for use in 1980. Jt.Ex. 16, at 1-7.

Job Analysis

The Uniform Guidelines require that any validity study, i.e., any demonstration of the job-relatedness of a selection procedure, should be based upon a job analysis, that is, “a review of information about the job for which the selection procedure is to be used.” 29 C.F.R. § 1607.14(A). The job analysis need not be conducted by any particular method, provided that it yields the information required for the specific validation strategy used. Id. In the instant case, defendants have, of necessity, relied primarily upon content validity studies. Regarding such studies, the Guidelines state:

There should be a job analysis which includes an analysis of the important work behavior(s) required for successful performance and their relative importance ... Any job analysis should focus on the work behavior(s) and the tasks associated with them____ The work behaviors) selected for measurement should be critical work behavior(s) and/or important work behavior(s) constituting most of the job.

29 C.F.R. § 1607.14(C)(2).

In addition to the Battelle study discussed supra, the City has conducted two analyses of the job of firefighter — the 1980 Ingram and Kriska job analysis (“Ingram report”), Jt.Ex. 18; and a report prepared specially for purposes of this litigation by Landy, Jacobs and Associates (“Landy study”), Jt.Ex. 17. There is no issue in this case regarding the adequacy of the job analyses. Plaintiffs’ expert Dr. Cranny testified that he saw no major problems with these two analyses. Tr. 289, 314-315. Based upon this testimony and its own examination of the relevant exhibits, the Court concludes that the City has complied with the requirements of the Guidelines. Accordingly, these job analyses will be summarized in this section of the Opinion only to the extent necessary to evaluate the validity studies performed by the City, which is the subject of the next section of this Opinion.

The research underlying the Ingram report was conducted in five stages. In the first stage, a comprehensive list of job tasks was compiled through questionnaires given to incumbent firefighters. At stage two, this list of job tasks was shown to a sample of firefighters who were asked to score the tasks according to frequency of occurrence, consequence of error and probability of error. Based upon these scorings by firefighters, a ranking of job tasks by “task value” was derived. The highest “value” tasks, which were ranked “5”, were those that occurred frequently, where error was likely, and, if an error occurred, it would likely have serious consequences. The lowest “value” tasks were scored one.

The Court has examined this aspect of the Ingram report in detail. It appears that firefighters rank as most important— and consequently scored “5” — job tasks involving judgment and safety procedures. Of the fourteen tasks ranked “4”, only several: manipulating and working from ladders, and immediate response, appear to be predominantly physical in character. None of these physical tasks was directly simulated in the scored part of the physical test in 1984. Most of the tasks ranked “4” appear to involve primarily judgment and safety.

In stage three of the study, Ingrain presented firefighters with the task list she had formulated, with each task tentatively matched with knowledges, skills and abilities (“traits”) needed to perform the task. The firefighters were asked to add or delete traits necessary to perform the job. Eleven traits were added. The product of this stage of the research was a matching of tasks and traits. In his testimony at trial, Dr. Cranny questioned this attempt to determine the abilities required by various tasks. Tr. 290-291. Having examined the Ingram report, the Court is satisfied with the trustworthiness of these inferences. The abilities are described concretely, and in terms that appear comprehensible to ordinary persons. The Court is satisfied that incumbent firefighters can make reasonable judgments about the knowledges, skills and abilities they must use daily.

In stage four of the research, firefighters were asked to rank these “traits” on three bases: the extent to which the trait must be possessed by a firefighter to perform on a “barely acceptable” level; the extent to which the trait may distinguish a superior from an average firefighter; and the extent to which applicants may be expected to possess the trait. In order to gain some idea of the beliefs of incumbent firefighters about what traits are indications of superior firefighting ability, the Court compiled a list of all traits that received a score of 2.2 or higher on the three point scale employed in the study. The cut-off of 2.2 was chosen arbitrarily to include a reasonable number of traits; 17 of 142 traits were scored 2.2 or above. Most of these traits are knowledges of various sorts. Also, traits include abilities to function under adverse environmental conditions or to deal with stress. None of these traits appears to involve, in any direct fashion, physical abilities. The sole apparent exception is remaining oriented and functioning without sight.

However, when attention is turned to those traits that firefighters regard as necessary to a barely adequate job performance, a different picture emerges. These results are reported on a scale of “0” to “1”, with “1” representing “yes” and “0” representing “no.” Nineteen traits received scores of “1.000.” These abilities are predominantly, although not exclusively, physical in nature. Thus, the Ingram report tends to support the conclusion that, while physical abilities are highly important as minimal qualifications, they are not particularly good indicators of superior firefighting ability.

In the fifth and final stage of the research, ratings were developed to indicate the relative importance of each trait. These ratings were derived by summing the values — as determined at stage two of this project — of the tasks that were associated with the particular trait. To gain an understanding of the judgments of firefighters, the Court examined all traits that were awarded task value scores greater than or equal to twenty. Again, know-ledges of various kinds, e.g., of proper lifting techniques, tend to predominate on this list. The physical ability to use fire department equipment, such as pike poles, also ranks prominently on this list. On the whole, however, this section of the study reinforces the general conclusion that knowledge distinguishes the better firefighter.

Ingram also sought to determine which traits should be considered for testing purposes. A trait was inappropriate for testing purposes if it could only be learned on the job or if all applicants already possessed the trait. Of particular interest are those traits that were labelled “degree” traits. These were those traits that not only met the minimum criteria for inclusion in a test, but also tended to indicate superi- or workers. Ingram recommended that any test should measure for the amount of the degree trait that each applicant possessed. Jt.Ex. 18, at 7-8. In all, there were thirty-three such degree traits. Physical traits were prominent, comprising eighteen of the total. However, with several exceptions, the task values associated with these physical degree traits tended to be relatively low. The exceptions are: physical ability to use firefighting equipment, such as pike poles and axes; ability to crawl on hands and knees; and physical ability to work from ladders.

The report draws a number of conclusions regarding testing for physical abilities.

In many instances, a superior worker would possess more of a given physical ability than other workers. That is, the ability to lift more, work longer or climb faster was generally the mark of a better fighter.

Id., at p. 10. The report also recommended ranking applicants with respect to physical abilities.

[S]ince in many instances the possession of a higher degree of a physical ability is better, there is justification for awarding points for the performance of certain physical activities and including the physical capability scores in the ranking process.

Id. The report recommends that candidates be ranked in the 1980 examination process on the basis of their combined scores on the written and physical test.

It appears that these conclusions are, at least in part, unsupported by the report. It is an overstatement to assert that superior physical ability was generally a mark of a superior firefighter. Rather, as noted above, firefighters themselves rated knowledge and judgment much more highly in evaluating superiority of a firefighter. Further, the conclusion that ranking by physical test scores is appropriate neglects the value of the tasks associated with various physical abilities. The associated task values vary widely; in many cases, they tend to be quite low. Tr. 295-303; Plaintiffs Ex. 2.

Thus, the Court concludes that the Ingram report only weakly supports one of its central conclusions: that superior physical ability distinguishes superior from average firefighters. Despite this failing, the Ingram report contains substantial, detailed information about the job of firefighter in Columbus, and a wealth of information about firefighters’ understanding of their job.

The second job analysis, the Landy report, was designed specifically to be used as basis for a validation study, which will be discussed in the following section. It contains less detailed information and a higher degree of aggregation of data because of this linkage to a specific validation strategy. In this section of the Opinion, it will be discussed briefly as background to the Landy validation study, and to resolve certain factual issues that have arisen.

Landy began his job analysis with a list of tasks that was aggregated into twenty-eight functional categories. Examples are: firefighting — operates and advances hose lines and fire extinguishers; forcible entry — pries open or breaks down doors or windows using appropriate tools while wearing full firefighting gear; and extrication — extricates victims from buildings or cars using appropriate tools. Jt.Ex. 17, App. B. This list of grouped tasks was then shown to incumbent Columbus firefighters, who were asked to rate its verisimilitude on a one (for poor) to five (for very good) scale. The incumbents awarded an average score of 4.06. Id., p. 6. Landy concluded that the task list was a good representation of the job. Id., p. 6. No attempt was made, however, to correct the list in light of the responses of incumbent firefighters.

Next, Landy asked incumbent firefighters to rate the importance of the various task groups. They were asked to distribute one hundred points among the various groups to reflect their importance in preserving life and property. Jt.Ex. 17, App. E. This yielded an average importance score for each of the twenty-eight functional task groups. The most highly rated tasks, and their associated scores were: firefighting (8.1), rescue (7.3), search (5.9), emergency medical treatment (5.8), driving (5.0), engine operation (4.4), apparatus operation (4.3), and extrication (4.1). Firefighters were also asked to rate the various task groups according to how frequently they were performed. Id., App. G, H. The most important tasks, in terms to saving life and property, tend to occur infrequently, whereas less important tasks, e.g., equipment maintenance, occur daily. Landy elected, in light of this inverse relation between importance and frequency, to ignore frequency for the remainder of the study. This judgment is questionable, for the two measures could have been combined. Ingram had done so in her job analysis. However, this does not appear to the Court to be a fundamental problem with the Landy job analysis.

The next step in the Landy job analysis is more controversial. Landy sought to determine the physical abilities that were necessary to perform the various job tasks. To do this, he used a taxonomy of human abilities developed by Dr. Edwin Fleishman, an industrial psychologist. Fleishman sought to devise a list of abilities that underlay all human performance; the list was to be comprehensive and its elements were to be independent of one another. The list of abilities was created by reliance on a statistical technique, factor analysis. As this suggests, the Fleishman abilities are abstract concepts that are linked to a theory of human performance.

For instance, Fleishman distinguished three kinds of strength. Static strength refers to the amount of force that a person can exert against an immovable or very heavy object. This is similar to the everyday concept of strength. Explosive strength, on the other hand, refers to the ability to use energy in one or a series of explosive muscular acts. An example would be the strength used in jumping over a barrier. Dynamic strength is the ability to use one’s arms and trunk repeatedly to move one’s body weight over a distance, e.g., climbing a rope. The Fleishman terms are not everyday ones and it was necessary to train firefighters in their meaning. Further, the distinctions drawn in the Fleishman classification are not common-sense ones, as is illustrated by the three kinds of strength.

Landy asked incumbent firefighters to rate the extent to which the various Fleishman abilities were involved in performance of the tasks involved in the various task groups. For this purpose, the twenty-eight task grouping previously defined were aggregated into sixteen groups. Id., App. J. Prior to making the ratings, the firefighters were instructed in the Fleishman classification, and discussions were held. Then, the firefighters completed the task ratings by distributing one hundred points across the various Fleishman abilities to reflect their relative role in performance of a particular group of tasks. The scores assigned to each ability were then averaged across the various task areas. Finally, values were recalculated to reflect the relative importance of each of the task areas, as previously determined.

At trial, plaintiffs’ expert Dr. Cranny criticized the reliance upon the Fleishman abilities in the Landy study. He testified that the inferences about the abilities involved in a task were inherently unreliable, even when made by job incumbents. Tr. 321-323; Jt.Ex. 10. Concerns about whether firefighters understood the abstract categories of Fleishman’s taxonomy were also expressed by the court in Berkman v. City of New York, 536 F.Supp. 177, 189-190 (E.D.N.Y.1982), aff'd, 705 F.2d 584 (2nd Cir.1983), where the Fleishman taxonomy had also been used in the job analysis. In the instant case, although the Court feels some skepticism about reliance on terminology so distant from ordinary experience, the use of the Fleishman abilities does not appear to cause major problems. Landy calculated intra-class correlations for firefighters using the Fleishman categories. This statistic measures the amount of agreement of the various individuals about the extent to which the various task groups involve a particular ability. In the case of firefighters, the intra-class correlation was quite high, equalling .95. Jt.Ex. 13; Tr. 982-983. In addition, Dr. Landy also presented at trial certain exhibits summarizing the judgments of firefighters about particular tasks and abilities. These exhibits showed a good deal of variability and discrimination, suggesting that the firefighters were making reasonably accurate judgments. Tr. 984-987.

The results of the abilities analysis of incumbent firefighters are presented in Appendix M of Jt.Ex. 17. One conclusion reached in the report is that physical abilities account for one half of the job of firefighter. That is, of ratings assigned to all thirty-five Fleishman physical and cognitive abilities, the ratings assigned to physical abilities amount to about 50 of a possible 100 points. The seven physical abilities rated highest by firefighters with their accompanying ratings are: stamina (8.17); static strength (8.11); explosive strength (4.86); dynamic strength (4.81); multi-limb coordination (2.74); manual dexterity (2.67); and gross-body coordination (2.63). Together, these abilities account for 63% of the total physical ability composition of the job of firefighter. This information constitutes the basis for Landy’s attempt to validate the 1984 physical examination.

As noted in the introduction to this section of the opinion, there was no real dispute regarding the job analyses that had been performed by the City. It is plain that adequate descriptions of the job of firefighter in Columbus have been formulated. Several general conclusions can be reached here. First, neither the Ingram report nor the Landy report justifies the conclusion that possession of more of a particular ability is, in all circumstances, better. For all these reports show, it may well be true that a firefighter requires enough of a particular ability to do the job well, and that any more of that ability is merely redundant. Further, neither report contains any data on how quickly firefighters perform particular tasks. There is much conflicting testimony in the record about the speed at which firefighters work; neither of these reports addresses this issue.

Test Validation Studies

In this section, the Court summarizes the two test validation studies undertaken by the City. The first was authored by Dr. Kriska and Constance Hines and was intended to fulfill one of the requirements of this Court’s Order in Dozier v. Chupka, supra. Jt.Ex. 16 (“Kriska/Hines report”). Although it is based upon data from the 1980 firefighter examination, it is relevant to the 1984 examination by virtue of the substantial overlap between the two examinations. The second study was that undertaken by Landy, Jacobs and Associates specially for purposes of this litigation. It involved solely an analysis of the 1984 physical test. Jt.Ex. 17 (“Landy report”).

Validation refers to the process of gathering evidence to show the job-relatedness of a test or selection device. Validity may be demonstrated by different kinds of studies: criterion related studies, content validity studies, or construct validity studies. 29 C.F.R. § 1607.5(A); Harless v. Duck, 619 F.2d 611, 616 n. 5 (6th Cir.), cert. denied, 449 U.S. 872, 101 S.Ct. 212, 66 L.Ed.2d 92 (1980). Only the first two approaches are relevant to this litigation. The Kriska/Hines report is primarily a criterion-related validity study. The Landy report, on the other hand, is an example of a content validity study.

In a criterion-related validity study, an attempt is made to collect data to show that the test predicts important aspects of actual job performance. In such a study, thus, evidence is sought to show the association of test scores and measures of actual performance on the job, the criteria. There are two kinds of such studies. In a predictive validity study, an applicant’s test scores and subsequent performance on the job as an employee are compared. In a concurrent validity study, on the other hand, the test scores of present employees are compared with their present job performance. Both approaches were used in the Kriska/Hines study. Where it is possible, a criterion-related study is preferable, because it is the most direct approach to showing job-relatedness. However, due to problems with measuring job performance, a criterion approach is not always feasible. Tr. 241-243.

In a content validation study, evidence is gathered to show that the content of the test, i.e., the questions or tasks comprising the test, are representative of the content of the job, i.e., the important or critical tasks comprising the job. An attempt is made to determine the degree to which test items are representative of the job. 29 C.F.R. § 1607.14(c)(4); Jt.Ex. 44, at 11. Although a content validation approach is less direct than a criterion-related approach, it is nonetheless a permissible method for demonstrating validity. Firefighters Institute for Racial Equality v. City of St. Louis, 549 F.2d 506, 511 (8th Cir.1977). It should also be pointed out that criterion-related approaches and content approaches are not mutually exclusive in any respect; they are simply different strategies for collecting evidence regarding job-relatedness. Jt.Ex. 44, at 9-11; Tr. 243.

As noted above, the Kriska/Hines report sought to demonstrate criterion-related validity by both predictive and concurrent studies. Kriska/Hines used two categories of variables as measures of on-the-job performance. Two measures were derived from ratings of firefighters by their supervisors based upon observation of the firefighters over a period of time. These two scales were measures of performance at the fire scene, and overall performance. Jt.Ex. 16, at 4-2 to 4-14. The other category of measures of job performance derived from testing programs that are used in the Fire Division to evaluate training success. One was the Training Academy Final Average, a composite score consisting of instructor ratings and scores on written exams during initial firefighter training. The other training measures were written examinations used in post-Academy training; these are called the Firefighter I, Firefighter II and Journeyman examinations. Id., at 4-14 to 4-18.

In the predictive study, Kriska/Hines sought to find significant and substantial correlations between scores upon the physical capability test and these criterion measures. The results were disappointing. The correlation of the physical test scores with supervisor’s ratings of performance at the fire scene was .00, that is, there was no association at all. The correlation of the physical test with supervisor’s overall ratings was —.03, that is, there was a very slight negative association. Jt.Ex. 16, at 5-19. The only statistically significant correlation found with the training measures was with the Training Academy Final Average; this correlation is .32. Neither the Firefighter I or II examinations were significantly correlated with the physical test. Id.

Kriska/Hines noted a number of statistical problems that might be causing their reported correlations to underestimate the actual correlation. Jt.Ex. 16, at 5-27. Dr. Landy corrected statistically for these problems, and recalculated the correlations between physical test scores and the various criterion measures. Landy’s calculations have the effect of doubling the correlation beween the physical test and the Training Academy Final Average; it is variously reported as being between .60 and .72. Defendant’s Ex. E, Tables 1, 4, 6. Otherwise, nothing else changes, that is, all other attempted correlations with criterion measures remain nonsignificant. The correlations with supervisor’s ratings remain essentially zero, as before. Id., Tables 1, 3, 5.

Kriska/Hines also report results of a predictive study of Barrett’s mechanical test, which is also under challenge in this litigation. They found significant correlations of mechanical test scores with Training Academy Final Average (.54) and the Firefighter I examination (.38). Jt.Ex. 16, at 5-19. Correlations with the performance measures were not significant. As before, Landy’s recalculations tended to increase these reported correlations somewhat.

Kriska/Hines did not examine the physical test in their concurrent validity study. They expressed the view that it was possible that training and performance of firefighting tasks made applicants and incumbent firefighters different from one another. Also, administering the test to incumbents would increase the cost of the study. Jt.Ex. 16, at 6-4. Thus, the City was no more willing to administer its physical test to incumbent firefighters to validate its test than it was in the process of developing the test.

The Kriska/Hines study did report the results of administering the Barrett mechanical test to incumbent firefighters. The scores of incumbents were then correlated with certain of the criteria measures previously discussed. There were significant correlations as follows: with Training Academy Final Average: .53; with Firefighter I examination: .31; with supervisor’s ratings of performance at the fire scene; .30; and with supervisors’ ratings of overall performance: .32. Jt.Ex. 16, at

6-17. This section of the report concluded that the mechanical aptitude tests were significantly correlated with training success and on-company performance and, therefore, should be retained as part of the test for selecting firefighters. Id., at 6-29.

The other validation study was the Landy study, which pertained solely to the physical examination. This study was an attempt to demonstrate job-relatedness through a content validation strategy. In the previous section, Landy’s job analysis was discussed. As will be recalled, the culmination of that analysis was a rating, by incumbent firefighters, of the relative importance of the Fleishman physical abilities in the job as a whole. This rating had been derived by averaging across the various task groups formulated in the job analysis, and weighting for importance of the tasks. Landy’s validation strategy was direct, yet elegant. He asked a group of industrial and organizational psychologists to make an evaluation of the 1984 firefighter test similar to that made by the firefighters of the job. Like the firefighters, the psychologists were given the Fleishman abilities with explanatory and illustrative material, and were presented with information, including a videotape, about the 1984 firefighter test. For each event, they were asked to distribute one hundred points across the various abilities to reflect the extent to which the ability was tested by the particular event. These results were then averaged across the various events to yield an overall measure of the extent to which a given ability was important in the 1984 examination. Values were recalculated to omit the beamwallc event; this is reasonable because only one person failed that event.

The seven highest rated abilities accounted for approximately 80% of the total points awarded by the psychologists to all eighteen abilities. These highest rate abilities were:

Ability Score Top 7 in Firefighters Rating Score in Firefighters Rating

Speed of Limb Movement 17.9 no 2.4

Dynamic Flexibility 14.5 no 3.8

Static Strength 13.4 yes 16.2

Explosive Strength 10.0 yes 9.8

Stamina 9.8 yes 16.4

Manual Dexterity 7.7 yes 5.4

Wrist-Finger Speed 6.1 no 1.8

From the Landy study, it is possible to offer a qualitative appraisal of the job-relatedness of the 1984 firefighter test from the point of view of the underlying physical abilities purportedly measured. First, the test appears to overemphasize certain abilities. This is quite extreme in the case of speed of limb movement and dynamic flexibility. (Dynamic flexibility is defined as the ability to make repeated arm or leg flexing movements with some speed, e.g., pulling in a rope, hand over hand. Jt.Ex. 17, App. I.) There is also substantial over-weighting of wrist-finger speed in the test. This overweighting appears to result from the timed nature of the test. The emphasis on extreme speed that, of necessity, characterizes such a timed test does not appear to be reflected in firefighters’ appraisals of their jobs.

The test also underweights certain abilities that were thought to be important by firefighters. This is most striking in the case of stamina, the physical ability most highly rated by firefighters. Dynamic strength also appears to be under weighted, having been rated 5.8 by the psychologists, but a higher (corrected) 9.6 by the firefighters. Finally, the 1984 test appears to have achieved a reasonable fit with the static strength, explosive strength and manual dexterity required by the job; the relative ratings appear sufficiently comparable to justify this conclusion.

Dr. Landy testified that there was a “good match” between the test and the job. Tr. 961. He based this conclusion upon the observation that the abilities most highly rated by the firefighters: endurance, static strength, explosive strength, and dynamic strength, were also important in the test. Tr. 1010; Jt.Ex. 17, at 21. Dr. Cranny disagreed with this conclusion. He calculated a correlation coefficient to measure the extent to which the relative abilities for the job and the test were rated in the same order by the respective judges. The correlation was .45, a “rather low degree of correspondence.” Tr. 338; Plaintiffs’ Ex. 1. Although this calculation was questioned by Dr. Landy, Tr. 1009, it appears reasonable. Tr. 336-339. It appears that the experts are choosing to characterize the fit between test and job in different ways, rather than contradicting one another. The test does reflect certain abilities that are important to the job; this is especially true in the case of the various kinds of strength. On the other hand, there are other abilities that are not reflected in the test proportionally to their apparent importance in the job, and yet other abilities that are overemphasized in the test. The controlling question is whether the degree of fit achieved by the test is sufficient; this question will be addressed in the following section.

Legal Discussion

The Court having concluded that plaintiffs have demonstrated adverse impact from he 1984 firefighter examination, it becomes the defendants’ burden to show that the test “bears a manifest relationship to successful and efficient job performance.” Harless v. Duck, 619 F.2d 611, 616 (6th Cir.), cert. denied, 449 U.S. 872, 101 S.Ct. 212, 66 L.Ed.2d 92 (1980); Griggs v. Duke Power Co., 401 U.S. 424, 432, 91 S.Ct. 849, 854, 28 L.Ed.2d 158 (1971). The test of manifest relationship looks to whether the discriminatory employment practice is “necessary to safe and efficient job performance.” Chrisner v. Complete Auto Transit, Inc., 645 F.2d 1251, 1252 (6th Cir.1981). “Necessary” here does not mean indispensable, but rather “substantially promote[s] the proficient operation of the business.” Id. Nonetheless, manifest relationship is intended to set a “high standard.” E.E.O.C. v. Ball Corp., 661 F.2d 531, 541 (6th Cir.1981). If, but only if, the employer meets the burden of establishing manifest relationship, the burden shifts back to the plaintiff to show that there is an alternative selection device with less disparate impact that would also serve the employer’s legitimate interests. Chrisner v. Complete Auto Transit, Inc., supra, at 1263. Because the Court concludes that the defendants have not met their burden, the issue of alternative selection devices is not reached here.

In the instant case, the Court concludes that the defendants must demonstrate manifest relationship by showing that the 1984 test was validated in conformity with the standards set forth in the Uniform Guidelines, 29 C.F.R. §§ 1607.1 et seq. Although the Guidelines are not substantive regulations promulgated as law, they are entitled to “great deference.” Albermarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280 (1975). The Guidelines have been followed by those courts that have decided cases involving discrimination in testing in fire departments. Berkman v. City of New York, 536 F.Supp. 177 (E.D.N.Y.1982), aff'd, 705 F.2d 584 (2d Cir.1983); Fire Institute for Racial Equality v. City of St. Louis, 549 F.2d 506, 510-511 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76 (1977); Vulcan Society v. Civil Service Commission, 360 F.Supp. 1265, 1273 n. 23 (S.D.N.Y.), mod., 490 F.2d 387 (2d Cir.1973). An additional reason for applying the Guidelines is that in Dozier v. Chupka, 395 F.Supp. 836 (S.D.Oh.1975), this Court ordered the City to validate its hiring criteria for firefighters in compliance with the Guidelines. Id., at 859-860. In so concluding, the Court is mindful that the Guidelines are meant to be consistent with professional standards for testing, and that these are not unchanging. 29 C.F.R. § 1607.5(C), (A). Thus, the Court considers it appropriate to consider also the standards set forth in Standards for Educational and Psychological Testing, published by the American Psychological Association in 1985 (“Division 14 Standards”). Jt.Ex. 44.

Relying upon Spurlock v. United Airlines, Inc., 475 F.2d 216 (10th Cir.1972), the City argues that it should be held to a lower quantum of proof of job-relatedness because the job of firefighter implicates public safety. In Spurlock, the Court held:

when the job clearly requires a high degree of skill and the economic and human risks involved in hiring an unqualified applicant are great, the employer bears a correspondingly lighter burden to show that his employment criteria are job-related.

Id., at 219. The Sixth Circuit adopted this doctrine in Chrisner v. Complete Auto Transit, Inc., supra. Subsequently, the Court of Appeals explained that the doctrine was restricted to the “narrow eater-gory of jobs which greatly implicate human safety, e.g., airline piloting and over-the-road trucking.” E.E.O.C. v. Ball Corp., supra, at 541 n. 20.

For the following reasons, the Court concludes that the Spurlock doctrine does not alter the defendants’ burden of showing compliance with the Guidelines. First, the Spurlock Court relied upon an E.E.O.C. regulation, then existing 29 C.F.R. § 1607.-5(c)(2)(iii) as the basis for its holding. However, when the Guidelines were revised in 1978, this provision was not included. The natural assumption is that this provision has been incorporated into or superseded by the standards presently set forth in the Guidelines. In addition, the Spurlock doctrine has been applied mainly in cases involving education or experience requirements or other non-scored objective criteria. See B. Schlei and P. Grossman, Employment Discrimination Law (2d ed. 1976), at 167-173. It has been frequently been applied in cases involving scored tests, where distinct standards have been developed by the courts.

Turning now to the merits of defendants’ case, the Court first concludes that defendants have met their burden with respect to the mechanical reasoning test. There is no evidence before the Court about the content of this test as administered in 1984. The plaintiffs have done little more than raise the issue by showing adverse impact; they virtually abandoned the claim at trial and in their brief. Nevertheless, the defendants have produced evidence of validation of this test, which, under the circumstances, the Court can only conclude is sufficient to meet their burden.

Defendants’ evidence is of two sorts. First, in the Kriska/Hines concurrent validation study, significant correlations with both training performance and supervisor’s ratings of on-the-job performance were shown. Jt.Ex. 16, at 6-17. Plaintiffs’ contention that the correlations in this study are too low to validate the test is unpersuasive. See, e.g., B. Schlei & P. Grossman, supra, at 129; also 1983-1984 Supp., at 18. The Guidelines set no minimum standards for correlation coefficients in criterion-related studies. 29 C.F.R. § 1607.14(B)(6). The Court considers it appropriate to rely upon both training performance and on-the-job performance as validating criteria. Mechanical reasoning ability, it would appear, is necessary both to successful completion of training and performance on the job.

In addition, defendants presented testimony from Dr. Gerald Barett, the developer of the mechanical reasoning test. He testified that the test had been developed to use in selecting firefighters in Akron, and that a test validation study had been performed on the tests as used there. Tr. 652-657; Jt.Ex. 19. He further testified that, based on his review of data about the job of firefighter in Columbus, his knowledge of the job of firefighter in Akron, and the general literature on firefighting, the job of firefighter was similar in both cities. Tr. 658, 660. This testimony is uncontradicted. In fact, plaintiffs’ expert, Dr. Ma-gel, testified that evidence regarding firefighting in one city was applicable to another city, in his words, “firefighting is firefighting.” Tr. 461. The Court concludes that requirements of the Guidelines for reliance upon validity studies conducted by other users have been met. 29 C.F.R. § 1607.7.

The Court further concludes that defendants have failed to show that the 1984 firefighter physical examination is valid by means of the Kriska/Hines predictive criterion-related validity study. That study found a significant and substantial correlation between physical test scores and the Training Academy Final Average, but no other meaningfully correlation with the other training measures or, more important, with on-the-job measures. In Dozier v. Chupka, supra, this Court rejected a contention that a correlation of test scores and training academy scores was sufficient to validate a test. Id., at 853. The Court sees no reason to abandon this proposition here. It is true that the Supreme Court has held that a positive correlation of a test with training course performance may be enough to validate a test apart from a possible relationship to on-the-job performance. Washington v. Davis, 426 U.S. 229, 250, 96 S.Ct. 2040, 2052, 48 L.Ed.2d 597 (1976). However, courts of appeals have interpreted this holding to apply only in the case of minimal standards necessary to successful completion of a training program. Guardian’s Association v. Civil Service Commission, 633 F.2d 232 (2d Cir.1980), aff'd, 463 U.S. 582, 103 S.Ct. 3221, 77 L.Ed.2d 866 (1983); Ensley Branch of NAACP v. Seibels, 616 F.2d 812, 819-822 (5th Cir.1980); Craig v. County of Los Angeles, 626 F.2d 659, 662-663 (9th Cir.1980). Physical ability, of course, is not something that is merely needed to train as a firefighter; it is necessary on the job. Thus, the absence of any non-zero correlations with on-the-job measures of performance is fatal to any claim of criterion-related validity.

The more important issue, to which the parties have devoted the most attention, is whether the defendants have shown that the 1984 physical examination is content-valid. More particularly, the controlling question is whether the Landy study, Jt.Ex. 17, constitutes such a demonstration. Plaintiffs raise a number of objections to the design and execution of the Landy study: the study improperly relied upon abstract physical abilities; the study failed to provide operational definitions; and the ratings of relative importance of various physical abilities by firefighters and psychologists were unreliable. Plaintiffs also contend that, even ignoring these alleged problems of the study, it does not demonstrate content validity for two reasons: the test events did not accurately reflect the complexity of actual job tasks, and the results of the Landy study do not show sufficient proportionality between test and job to permit rank-ordering of applicants on the basis of test scores. The Court concludes that, except for the last, these contentions are without merit. However, the last point, by itself, compels the conclusions that the 1984 physical exam is not content-valid and, therefore, its use in the 1984 firefighter selection process constituted impermissible discrimination.

Plaintiff’s objection to the design and execution of Landy study itself are readily disposed of. The Guidelines expressly permit selection procedures that measure knowledges, skills or abilities to be justified by content validity. 29 C.F.R. § 1607.-14(C)(1). However, they require that the knowledge, skill or ability be operationally defined. 29 C.F.R. § 1607.14(C)(4). The Court concludes that this requirement was met by the Landy study. Jt.Ex. 17, App. I. Finally, for reasons discussed supra, the Court concludes that judgments of firefighters and psychologists are not so unreliable as to undermine the study.

Plaintiffs also object that the test events fail to approximate actual job tasks. The Guidelines provide:

[T]o be content valid, a selection procedure measuring a skill or ability should closely approximate an observable work behavior____ If a test purports to sample a work behavior ..., the manner and setting of the selection procedure and its and complexity should closely approximate the work situation.

29 C.F.R. § 1607.14(C)(4). On the whole, the 1984 firefighter examination was a reasonable approximation of the actual tasks.

The more telling objection to the 1984 physical examination is not that the events comprising it fail to approximate actual job tasks; it is that, taken as a whole, the test fails to reflect accurately the content of the job. The Guidelines provide:

A selection procedure can be supported by a content validity strategy to the extent that it is a representative sample of the content of the job.

29 C.F.R. § 1607.14(C)(1). The Division 14 Standards also speak of representativeness. Jt.Ex. 44, at 10-11. This has been interpreted to require that a test, to be content valid, must reflect all or nearly all the important aspects of the job. Firefighters Institute for Racial Equality v. City of St. Louis, 549 F.2d 506, 511-512 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76 (1977); accord, Guardian’s Association v. Civil Service Commission, 630 F.2d 79, 98-100 (2nd Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981); Berkman v. City of New York, supra, at 195; Burney v. City of Pawtucket, 559 F.Supp. 1089, 1101-1103 (D.R.I.1983); see generally, B. Schlei and P. Grossman, supra, at 130 n. 135-137. Based upon the Landy study, the Court concludes that the 1984 physical examination does not meet this standard of representativeness. As discussed in the preceding section, the test overemphasizes speed of limb movement and dynamic flexibility, while it underemphasizes endurance. The fact that the test appears to reflect, more or less accurately, the strength necessary for the job does not, by itself, validate the test. Although there was testimony at trial that stronger firefighters would have more endurance, Tr. 970-972, such generalized testimony cannot be accepted as a substitute for concrete evidence based upon a job analysis.

In addition, if a test is to be used to rank-order applicants, it must be more than merely content valid. The Guidelines provide:

If a user can show, by a job analysis or otherwise, that a higher score on a content valid selection procedure is likely to result in better job performance, the results may be used to rank persons who score above minimum levels. Where a selection procedure supported solely or primarily by content validity is used to rank job candidates, the selection procedure should measure those aspects of performance which differentiate among levels of job performance.

29 C.F.R. § 1607.14(C)(9). The Guidelines recognize that a test which may be valid as a pass/fail test, may not be valid as a ranking test, because of likely greater adverse impact. 29 C.F.R. § 1607.5(G).

The courts have followed these special requirements for ranking tests. In Williams v. Vukovich, 720 F.2d 909 (6th Cir.1983), the Court of Appeals stated:

Ranking is a valid, job-related selection technique only where the test scores vary directly with job performance.

Id., at 924, citing Guardian’s Association of New York v. Civil Service Commission, 630 F.2d 79, 100 (2nd Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981). In Guardian’s Association, the Second Circuit had held:

Permissible use of rank-ordering requires a demonstration of such substantial test validity that it is reasonable to expect one- or two-point differences in scores to reflect differences in job performance.

Id., at 100-101. So far as the Court’s research discloses, this appears to be the unanimous view of the courts. See generally, B. Schlei & P. Grossman, supra, at 155 n. 17; 1983-1984 Supp. at 18 n. 42.

In Berkman v. City of New York, 536 F.Supp. 177 (E.D.N.Y.1982), aff'd, 705 F.2d 584 (2nd Cir.1983), the Court struck down a physical test for firefighters on a number of grounds, among them that it was insufficiently precise to justify ranking of candidates. Id., at 210-212. The Court objected especially to the premium placed on maximum speed and all-out effort on the test, which, like the instant test, was timed. The Court concluded that such a test failed to reflect the actual demands of firefighting which, in many circumstances, requires endurance and pacing. Id., at 212. The court is aware of no case, and the defendants cite none, in which a ranking test has been upheld as a selection device for firefighters. See also Firefighters Institute for Racial Equality v. City of St. Louis, 616 F.2d 350, 357-360 (8th Cir.1980).

The Landy report briefly addressed the issue of the justifiability of ranking candidates by their scores on the physical test. To determine this, firefighters were asked to estimate the level of each ability necessary to do the job of firefighter at three performance levels: minimum competence, average competence, and outstanding performance. For this purpose, the seven most important physical abilities were selected. Groups of firefighters were asked a series of questions, of which the following is representative: Does a firefighter need to be very low, below average, average, above average or well above average in this ability to perform at an outstanding level? Jt.Ex. 17, App. S. The results are predictable, given the question format. It is hardly surprising that firefighters who are asked how much — below average, average, or above average — of an abstractly characterized ability is necessary to perform at, say, a minimally competent level, will tend to answer: below average. This exercise well illustrates the reasons why the law has developed rules against leading questions; it has no apparent bearing on any of the actual issues in this case, however.

At trial, the defendants presented testimony that firefighters frequently work as quickly as possible, going all-out to attack a fire agressively. In fact, there was a great deal of testimony at trial about firefighters working at an all-out pace versus firefighters pacing themselves. The clear import of the testimony, taken as a whole, is that sometimes firefighters work all-out, and sometimes they pace themselves; it depends on the task at hand. Anecdotal evidence regarding the speed at which firefighters must work is not sufficient to justify a timed, competitive examination. There must be systematic evidence based upon a job analysis. The Battelle researchers weighed actual pieces of firefighting equipment to determine the strength necessary to perform the job. Jt.Ex. 24, at 6-8. It is hard evidence such as this that is necessary to justify an examination with adverse impact.

The defendants also presented evidence that women, on the whole, lack the upper body strength of men, and have lower levels of aerobic capacity. This is undisputed; indeed plaintiffs’ expert Dr. Magel stated at trial: “[W]e know for a fact that women perform less well in most fitness measures other than tests of flexibility or balance.” Tr. 457. Firefighting is physically demanding work, defendants argue, and men are better equipped to perform this work than women. From the evidence at trial, there appears to be some truth to this. However, this argument is based upon a misconception of the role of the Court in a Title VII ease. It is not the province of the Court to determine whether women should be firefighters, or how many women should be firefighters. Rather, it is the Court’s duty to evaluate a test in light of the standards set forth in Title VII. How many women should be firefighters can be decided only by the administration of a validated examination.

Accordingly, the Court concludes that the defendants have failed to show that the 1984 firefighter physical test is content valid. This conclusion is based upon two reasons: that the test, taken as a whole, does not represent the physical demands of the job, and that there is no evidence that higher scores on the test vary directly with job performance to justify ranking. Consequently, the defendants engaged in discrimination on the grounds of sex when they used the 1984 examination to select firefighters. Thus, the plaintiffs are entitled to judgment on their Title VII claims regarding the 1984 physical exam.

IV.

The Court having concluded that defendants have failed to show that the 1984 firefighter physical examination was job-related, plaintiffs are entitled to relief. This relief has two aspects: prospective relief to assure future compliance with Title VII, and retrospective relief to remedy the effects of past discrimination.

The Court understands that no further hiring from the 1984 eligibility lists will occur. However, the City will at some point have to administer new examinations and generate new eligibility lists from which future training classes will be selected. Thus, the critical aspect of prospective relief is to ensure that the next examination and selection process complies with the requirements of Title VII. Accordingly, the City will be ordered, prior to administration of any future firefighter physical tests, to modify the test so as to eliminate the problems found with the 1984 test by this Court.

Those problems are two fold: lack of representativeness, and the use of rank-ordering. To eliminate the former, the City must redesign the test so that it reasonably reflects the physical abilities actually used on the job. For this purpose, the Court may rely upon the approach used in the Landy study, and the results of the Landy analysis of the job of firefighting, as reported in Appendix M of Jt.Ex. 17. To eliminate the problems stemming from rank-ordering, the City must make a choice. If defendants wish to continue to rank-order candidates, they must be prepared to show that rank-ordering complies with the Uniform Guidelines. Specifically, they must produce evidence to show that a higher score on the examination is likely to result in better job performance. 29 C.F.R. § 1607.14(C)(9). That evidence should be sufficient to justify any additional adverse impact that rank-ordering may have over a pass/fail test; for this purpose, defendants must determine the likely adverse impact from a pass/fail examination, the passing points of which are validated according to the standards of the Uniform Guidelines. Alternatively, defendants may choose to design and administer a pass/fail examination. In this event, the problem of representativeness of the test must be resolved, and pass points must be justified consistently with the Guidelines.

Turning to the matter of retrospective relief, the fact that a non-job-related physical examination was administered by the defendants in 1984 raises the inference that there are females who would have been hired but for the discriminatory examination. However, it is impossible to determine how many of the female applicants are qualified to be firefighters, or how many, if any, would have been hired in a nondiscriminatory examination. The Court will resolve this problem by requiring the defendants to administer a nondiscriminatory examination, and to provide notice of that examination, in a form approved by the Court, to all female applicants for the 1984 examination. If women succeed upon the new examination in greater numbers than upon the 1984 examination, the Court will order the defendants to set aside a sufficient number of places in future firefighter classes to rectify any past discrimination thus revealed. On the other hand, if female applicants do not succeed in greater numbers than before, then no set aside would be appropriate.

The parties are DIRECTED to file with the Court no later than May 23, 1986 their suggestions regarding the decree by which this remedy will be implemented. The Court will then issue its decree.

WHEREUPON, having considered the evidence and the arguments of the parties, the Court renders its decision on liability as follows: Plaintiffs have failed to prove their claim of intentional discrimination under § 1983, and their claim under Title VII with respect to the 1980 firefighter selection process; however, plaintiffs have prevailed upon their claim with respect to the 1984 firefighter selection process. The Clerk shall enter JUDGMENT on liability only in favor of the defendants on the § 1983 claim and the Title VII claim pertaining to the 1980 examination, and JUDGMENT in favor of the plaintiffs on the Title VII claim pertaining to the 1984 examination.

IT IS SO ORDERED.

SUPPLEMENTAL OPINION AND ORDER

This matter comes before the Court to consider the remedial decree to be entered in this case. In an Opinion and Order entered May 13, 1986, the Court concluded that the defendants discriminated against the plaintiffs — a class of past, present and future female applicants for firefighter— when they used the 1984 firefighter examination to select firefighters. In light of this conclusion, the Court directed the parties to file their suggestions regarding a remedial decree. Having considered these suggestions, the Court now renders its decision regarding a remedy.

Under Title VII, hiring by use of an examination with adverse impact on women constitutes impermissible discrimination unless that examination has been proven to be job-related. The Court concluded that defendants had failed in two respects to show that the 1984 firefighters’ test was job-related. First, the physical abilities measured by the physical test, taken as a whole, did not reflect the physical abilities actually used on the job. Second, defendants used test scores to rank-order applicants who were to be hired in order from eligibility lists. The practice of ranking is permissible only if there is evidence that scores on the test vary directly with job performance. The Court concluded that the defendants had failed to present such evidence. The Court stated that to eliminate the problem of lack of representativeness of the physical test, defendants must redesign the test. The Court further stated that, to eliminate the problems associated with ranking, the defendants must produce the requisite evidence or, alternatively, eliminate ranking.

While this case was pending, the City administered a new firefighters’ entrance examination. In December 1985, a new written examination was administered. And, in March 1986, after this case had been tried, but prior to the submission of briefs by the parties, a new physical test was administered. The new physical test included two new events; in addition, some of the events on the 1984 physical test were modified in various ways before they were administered to applicants in 1986. The defendants state that the grading and scoring of the 1986 firefighter’s examination has been suspended pending this Court’s determination of the job-relatedness of the test and acceptance of a scoring method. They also state that, for the first time, the firefighters’ examination was administered to incumbent firefighters. The defendants refused any discovery to plaintiffs regarding the nature or administration of the 1986 physical test. Thus, the Court has before it only the assertions of defendants’ counsel regarding the 1986 test.

In their submission to the Court, plaintiffs have made a number of suggestions regarding particular events that were included in the 1984 physical test. For the moment, these suggestions are moot, because defendants have already redesigned the physical test. Some of the changes proposed by plaintiffs have already been introduced. Plaintiffs also proposed certain changes in the scoring of the test. In particular, they propose that all timed test events and all ranking based on physical test scores be eliminated. They also propose that defendants be required to administer any new test to incumbent firefighters, a demand that has already been met by the defendants. Further, plaintiffs demand timely access to information regarding the new test. And, they request an interim award of attorney’s fees.

In their submission to the Court, defendants set forth a remedial plan in some detail. First, to address the Court’s concerns regarding representativeness of the test, defendants propose to employ the same strategy used in the Landy, Jacobs study, Jt.Ex. 17, to evaluate the content validity of the 1984 physical test. In the Landy report, incumbent firefighters were asked to evaluate the physical abilities used on the job. A panel of industrial psychologists assessed the physical abilities measured by the test. The judgments of the firefighters were then compared with those of the industrial psychologists to determine the representativeness of the test. The Court relied upon the results of this analysis to conclude that the 1984 test did not accurately reflect the requirements of the job. Defendants propose to present information regarding the 1986 physical test to a panel of industrial psychologists. They believe that the modifications introduced in 1986 have produced a valid, job-related test. Defendants also suggest that the scores of particular test events can be reweighted, if necessary to ensure representativeness. Defendants estimate that such a study of the 1986 physical examination could be completed by August 1986.

To address the Court’s concern regarding use of test scores to rank-order applicants, defendants state that they have been discussing with their expert, Dr. Landy, a criterion-related validity study to address issues concerning the scoring of the examination. Because the 1986 physical examination was administered to incumbent firefighters, defendants propose a concurrent validity study to compare the job performance of these incumbents with their test performance. Also, defendants point out, test scores of incumbent firefighters can be used to calculate cut scores for administration of a pass/fail physical examination, if necessary. In addition, defendants express interest in conducting a predictive validity study. If they are permitted to hire on the basis of the 1986 examination, defendants could follow the development of applicants who are hired, comparing test scores with training success and job performance. A predictive criterion-related validity study provides the best and most direct evidence of job-relatedness.

In light of the proposals of the parties and the present circumstances, the Court concludes that the following remedial plan would most directly eliminate the discrimination and protect the interests of the defendants in a safe and efficient fire division. Defendants will be permitted to hire on the basis of the 1986 examination only when the content validity of the modified examination has been proven. The Court considers the proposal to duplicate the Landy study of the 1984 examination to be a reasonable way of making this showing. If the 1986 examination is shown to be content valid — or, if it is not, some further modified examination has been shown to be content valid — defendants must notify all female applicants for firefighter in 1984 of the new examination and administer it to all such applicants who are still interested in being considered for the position of firefighter.

When the content valid test has been formulated, and administered to 1984 female applicants, defendants may hire on the basis of the 1986 examination. At this point, the 1986 examination may be used only on a pass/fail basis, where the cut-points defining passing scores are determined by the performance of incumbents on the examination in accord with the standards of the Uniform Guidelines, 29 C.F.R. §§ 1607.1 et seq. Persons achieving pass scores on the test as a whole will be available for hire; if there are more passing applicants than positions available, candidates shall be considered for hiring so that the percentages of females and males considered for hire reflects the relative proportions of male and female applicants achieving passing scores. Defendants may determine the particular method by which this result is achieved. Further, before hiring from the 1986 examination, defendants must determine the number of females who would have been hired in 1984 had the 1984 examination included a content-valid physical test and the test as a whole had been scored on a pass/fail basis. To the extent that there are such females, a set-aside of places in the firefighter classes hired on the basis of the 1986 examination must be created. Females applicants in 1984 who have taken the new firefighter examination will then be considered to fill these set-aside positions in the order of their total test scores. When hiring from the 1986 list occurs, defendants will then perform their proposed predictive criterion-related validity study. If defendants can show, on the basis of the predictive and concurrent criterion-related validity studies they propose, that the test as a whole is sufficiently precise to be used for ranking, they may then apply to the Court for an Order permitting ranking.

The Court is satisfied that this remedy achieves a reasonable accommodation of plaintiffs’ interest in freedom from discrimination and relief from any effects of past discrimination, and the defendants’ strong interest in safe and efficient staffing of the division of fire. Through this remedial process, the Court believes that the central goal of Title VII — to make job qualifications the controlling factor and factors such as sex irrelevant — can best be realized. For these reasons, the Court makes the following order:

1) The defendants are enjoined from any hiring of entry-level firefighters on the basis of the 1986 firefighter examination until here has been compliance with this Order.
2) As soon as practicable, defendants shall submit to the Court a report detailing the results of an analysis of the 1986 physical test comparable to that underlying Appendix R of the Landy, Jacobs report, Jt.Ex. 17. No members of the Landy, Jacobs firm or any employee of the City of Columbus may serve as a member of the panel of industrial psychologists that is to evaluate the test. Prior to the analysis of the 1986 test by the expert panel, all relevant materials and the design of the study shall be made available to an expert to be chosen by the plaintiffs, who will review these materials and file a written report to accompany the City’s proposed report. The fees of this expert will be paid by the defendants. Further, the defendants shall provide discovery regarding the content and administration of this study; plaintiffs shall file their objections, if any, contemporaneously with defendants’ report. The Court shall then determine whether the 1986 physical examination is content valid. If it is not, the defendants must redesign the test and again demonstrate its content validity. If the test must be remodified, it must again be administered to incumbent firefighters.
3) If the 1986 physical test is found to be content valid by the Court, the defendants shall then submit promptly to the Court a report on the results of the administration of the 1986 physical examination to incumbent firefighters. If the 1986 physical test already administered is found not to be valid, the defendants shall then remodify the test, and, when the test has been determined to be content valid, shall administer the test to incumbent firefighters and report on the results. The report shall describe fully the details of administration of the test to incumbents and provide data regarding the incumbents. The report shall include a proposal regarding cutoff scores to be used to grade the test as a whole pass/fail. These cut-off scores shall “be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.” 29 C.F.R. § 1607(H). Defendants shall provide discovery to plaintiffs regarding the test of incumbents and all relevant information regarding characteristics of incumbents. The parties shall present to the Court their proposals regarding appropriate cutoff scores. The City shall show the number of males and females that would be hired under each proposal. The Court will then determine appropriate scoring procedures.
4) When defendants have formulated a content valid physical test and pass/fail scoring procedures have been determined by the Court, defendants shall notify all female applicants for firefighter in 1984 that they may reapply to take the firefighter examination as a result of this Court’s decision. The form of notice must be approved by the Court. Only female applicants on the 1984 eligibility list who were considered for hire and rejected for reasons other than the physical test are excepted from receiving notice. Defendants shall then administer the new physical test to all applicants who appear in response to the notice. Defendants shall devise a training program for applicants, and shall notify all 1984 applicants of the availability of this program. Plaintiffs shall make their suggestions to defendants regarding the content of this training program in writing in a timely manner. The test results will then be scored in the manner previously approved by the Court; defendants may use 1984 written test scores to determine the total score of female applicants who retake the physical test. Defendants shall determine the number of female applicants who would have been hired in 1984 had a content valid test been administered and had the test as a whole been graded pass/fail. For this purpose, the defendants shall assume that, if a greater number of male and female applicants achieve passing scores than were in fact hired from the 1984 eligibility lists, male and female applicants would have been considered for hiring in proportion to the relative proportions of male and female applicants achieving passing scores.
6) When defendants have completed these steps, they may hire on the basis of the 1986 examination. Defendants shall set aside the number of places determined as detailed in 11 5 of this Order to represent female applicants who would have been hired from the 1984 eligibility lists, and shall fill these positions, if any, before any other hiring. In addition, defendants shall hire males and females in proportion to the relative proportions of males and females achieving passing scores.

7) Plaintiffs have prevailed in part upon an issue determining the rights of the parties, and, thus, an interim award of attorney’s fees and costs is appropriate. Plaintiffs shall make application for interim fees, and, after considering defendants response, the Court will render its decision.

8) Defendants shall disregard any local ordinance, law of Ohio, charter provision or Ohio constitutional provision to the extent that it conflicts with the implementation of this Order.

IT IS SO ORDERED.

ON MOTION TO STAY

On July 11, 1986, the defendants moved to stay enforcement of the judgment and order entered on May 30, 1986 pending the appeal in the above-captioned case. Specifically, the defendants move the Court to suspend or modify that portion of its earlier order which enjoins the City of Columbus from hiring new firefighters until the 1986 firefighter test can be validated and administered to 1984 female firefighter applicants.

At the request of the Court, the parties submitted proposals for the interim hiring of firefighters. The defendants propose to give the 1986 firefighter exam to all eligible 1984 female applicants and to select the top 100 male and female individuals for supplemental testing and a potential place in the interim firefighter classes. The plaintiffs believe the 1986 exam is discriminatory and, therefore, oppose any modification of the remedial order which would require 1984 female applicants to take the 1986 exam. Instead, the plaintiffs propose that a limited number of places be set aside in the interim class for the top female applicants who took the 1984 examination.

There is no question that this Court has the authority to modify the current injunction notwithstanding its appeal. Rule 62(c) of the Federal Rules of Civil Procedures provides:

When an appeal is taken from an interlocutory or final judgment granting ... an injunction, the court in its discretion may suspend [or] modify ... an injunction during the pendency of the appeal upon such terms ... as it considers proper for the security of the rights of the adverse party.

Moreover, the Sixth Circuit has made it clear that where a district court has a continuing duty to maintain the status quo, such as in the present case, it may modify its supervisory order upon the emergence of new facts which threaten the status quo. Jago v. United States District Court, N. Dist. of Ohio, E. Div. at Cleveland, 570 F.2d 618, 622-23 (6th Cir.1978).

In order to make out a case for modification of the Court’s injunction prohibiting hiring of firefighters, the defendants have the burden of showing the following:

1) The likelihood that they will prevail upon the merits of the appeal;
2) irreparable injury unless the relief requested is granted;
3) no substantial harm to the plaintiffs if the relief is granted; and
4) no harm to the public interest resulting from the relief requested.

Reed v. Rhodes, 549 F.2d 1046, 1048 (6th Cir.1976).

In considering the defendants’ motion, the Court observes that due to normal attrition and the terms of the collective bargaining agreement between the City of Columbus and the International Association of Fire Fighters, Local # 67, the City of Columbus will require at least 60 new firefighters by early 1987. Failure to fill these positions could jeopardize the health, safety and welfare of the lives and property of the citizens of Columbus. Rinehart and Werner Affidavits. Thus, continuing the injunction creates a significant risk of irreparable injury to the City of Columbus and its citizens. The plaintiffs do not take issue with these assertions or otherwise contest the need to hire new firefighters. Therefore, the Court concludes that the defendants have met their burden of establishing irreparable injury.

The defendants also must establish that the interim hiring of firefighters will not harm the plaintiffs’ interest. In this light, the defendants submitted their proposal to allow the 1984 female applicants to take the 1986 test and to select the firefighter class according to rank. While it is true that the defendants’ proposal would give the plaintiffs an opportunity to take an arguably valid test, the defendants’ proposal fails to take into consideration the objectionable nature of rank-order hiring which the Court found violative of Title VII in its Opinion and Order of May 14, 1986. Opinion and Order pp. 70-73. The Court believes that.any interim hiring based upon the rank-ordering of applicants would be detrimental to the interests of the plaintiffs.

In contrast, the plaintiffs’ proposal would ensure that the 1984 female applicants’ interests are protected in the interim. However, the Court cannot accept plaintiffs’ proposal for two reasons. First, plaintiffs’ proposal is based on the assumption that the 1986 physical capability test is not valid. In support of their argument, the plaintiffs point to the statistically significant difference in pass rates for male and female applicants taking the 1986 physical capability test. Cranny Affidavit ¶¶ 3-4. While the Court has no reason to doubt the accuracy of Dr. Cranny’s statistical analysis, the Court points out that a statistically significant difference does not mean that the 1986 physical capability test is not valid. As the Court noted in its earlier Opinion and Order, it is a known “fact that women perform less well in most fitness measures other than tests of flexibility or balance.” Opinion and Order p. 73 citing to Trial Testimony of plaintiffs’ expert Dr. Magel. Thus, it would not be surprising that women might do less well on a physical capacity test which validly measured their capacity to perform the physically demanding tasks of a firefighter. Opinion and Order p. 73. Therefore, the Court does not believe that it would be appropriate to draw an inference from the statistical analysis of the 1986 physical capacity test that the test is invalid. To do so at this point would be putting the proverbial cart-before-the-horse. Rather, the Court believes that it is far more prudent to follow the procedures outlined in its Order of May 30, 1986 for determining whether the 1986 physical capacity exam is valid.

Secondly, the Court believes that it would be inappropriate to require the City to hire a certain number of women in a new class. As the Court stated earlier:

It is not the province of the Court to determine whether women should be firefighters. Rather, it is the Court’s duty to evaluate a test in light of the standards set forth in Title VII. How many women should be firefighters can be decided only by the administration of a validated examination.

Opinion and Order p. 73.

Although the Court rejects the proposals submitted by the parties, this does not end the matter. Rather, the Court believes that an interim order consistent with the spirit of its remedial order yet which would allow the City of Columbus to hire an interim firefighters class would accommodate both parties’ interests as well as the general interest of the public. To this end, the Court MODIFIES its injunction to allow the City of Columbus to hire an interim firefighters class but only on condition that the 1986 test be administered to the 1984 female applicants, that the test be graded on a pass-fail basis, that spaces be set aside for those 1984 female applicants who would have passed the 1984 test but for its discriminatory nature, and that the City hire men and women in proportion to the number of men and women who pass the 1986 test.

Finally, the defendants must establish a likelihood of success on its appeal as part of its burden in moving for relief under Rule 62(c). District courts have generally interpreted this factor as requiring a finding that the case before it involves unusual facts or novel issues of law. N.L.R.B. v. General Motors Corp., 510 F.Supp. 341, 342 (S.D.Ohio 1980), 7 Moore, Moore’s Federal Practice 1162.05, n. 16 at 62-28 (1985). The strength of the showing which the defendant must make varies inversely with the degree of injury the moving party will suffer absent a stay. Dayton Christian Schools v. Ohio Civil Rights Commission, 604 F.Supp. 101, 104 (S.D.Ohio 1984); Metropolitan Detroit Plumbing and Mechanical Contractors Assn. v. H.E.W., 418 F.Supp. 585, 586 (E.D.Mich.1976).

In the present case, the Court finds this branch of the test to be relatively immaterial to its decision. Notwithstanding the fact that the Court has every confidence that its decision in this case will withstand the rigors of appellate review, the Court believes the urgency of hiring new firefighter recruits and the lengthy time required to validate the 1986 test necessitates the modification of its injunction to allow the hiring of an interim class of firefighters.

For the above stated reasons, the Court finds defendants’ motion for relief from the remedial order of May 30, 1986 to be meritorious and it is, therefore, GRANTED. In order to prevent irreparable harm to the defendants, to protect the health and safety of the public and to preserve and protect the interests of the plaintiffs, the Court hereby makes the following order.

1) The defendants may hire one or two firefighter recruit classes comprised of no more than a total of seventy-two (72) individuals.

2) The City will administer both the written and physical capacity components of the 1986 firefighter test to 1984 female firefighter applicants. The City will provide an appropriate Court approved notice to all the 1984 female applicants except those who were considered for hire and rejected for reasons other than their scores on the 1984 physical capacity and mechanical reasoning tests. The 1984 female applicants shall receive the equivalent pre-test training as the 1986 applicants. The schedule outlined by the defendants in paragraph 3 of their proposal is acceptable to the Court.

3) Consistent with the remedial order of May 30, 1986, the City will hire men and women for the 1986 interim classes in proportion to their pass rate. In addition, spaces must be reserved for the 1984 female applicants who would have passed the 1984 test and who would have been hired but for the discriminatory nature of the 1984 test.

4) A cut-off score must be established in order to determine the number of male and female applicants passing the written and physical portions of the 1986 test. This cut-off score shall “be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.” 29 C.F.R. § 1607(H). The Court directs the parties to submit proposals on the appropriate cut-off scores to be used in conjunction with the written and physical capacity components of the 1986 test within two weeks after the 1986 test is administered to the 1984 female applicants. The City shall project the number of males and females that would be hired under each proposal consistent with the terms of this order. The cut-off scores for the physical capacity test should be based upon the scores of incumbent firefighters.

5) Because of some apparent irregularities during the administration of the 1986 physical test to incumbents, the City will re-administer the physical test under conditions similar to those under which the test is administered to applicants. The plaintiffs may have an observer or observers present during the administration of the physical test to incumbents.

6) In order to determine the number of women who would have been hired in 1984 but for the test, the City must first determine the number of 1984 male and female applicants who would have passed a nondiscriminatory test in order to establish a hypothetical “candidate” population. The City will assume that males and females would have been hired in 1984 in proportion to their representation in the candidate population. To construct the 1984 candidate population, the City will assume the following:

i) The percentage of 1984 female applicants passing the test would have been the same as the percentage of 1986 female applicants passing the 1986 test.
ii) The percentage of 1984 male applicants passing the test would have been the same as the percentage of 1986 male applicants passing the 1986 test.
iii) Multiplying these percentages by the number of male and female applicants in 1984 yields the number of men and women in the 1984 candidate population. However, if the number of 1984 female applicants who take and pass the 1986 test is greater than the number of female applicants computed by the above method, the City will use the number of female applicants actually passing the 1986 test as the number in the candidate population.
iv) The City will assume that the number of women hired in 1984 would have been proportional to their number in the 1984 candidate population. If fewer were actually hired, an appropriate number of spaces will be reserved for 1984 female applicants in the 1986 recruit class[es]. However, these spaces may be filled only by those 1984 female applicants who actually pass the 1986 test. If no or fewer applicants exist to fill these spaces, the spaces will be allocated to the 1986 female applicants who passed the 1986 test.

7) After the 1984 female applicant spaces are allocated, the City will select males and females for the remaining slots in the class[es] in proportion to the number of males and females passing the 1986 test.

8) The City may select men and women from within each category either by rank order or on a random basis. However, the Court notes that the latter would be preferable for purposes of conducting a predictive criterion-rated validity study.

9) Since the City has reached and exceeded the Court’s minority population goal for firefighters in the Fire Division, for the purposes of this interim hiring order, the one-to-one hiring requirement of Dozier v. Chupka will be suspended by Order of the Court.

10) The parties will continue to comply with the provisions of the remedial order entered May 30, 1986 which are not suspended or modified by this Order.

IT IS SO ORDERED. 
      
      . The evidence in this case comprises the testimony taken in open Court, cited to the transcript as "Tr._stipulations of the parties filed in open Court, cited as "Stip. #_exhibits presented jointly by the parties, cited as “Jt.Ex. _and exhibits received into evidence on behalf of only one party, cited as, e.g., "Plaintiffs' Ex.__” In addition, the defendants made available a number of pieces of firefighting equipment for examination by the Court.
     
      
      . Defendants also assert that females passed the 1980 selection process at a rate of 89% (25 of 28), while males passed at the rate of 90% (772 of 796). The sole component of the overall testing process that was graded pass/fail was the reading comprehension test, which is not challenged in the litigation. All other components of the test were used to rank order candidates. Thus passing ratios are irrelevant.
     
      
      . Jt.Ex. 5 shows a female total score of 87.1 for Lawrence Livingston. This appears to be an error in coding for sex of applicants. The same error is repeated in Dr. Cranny’s analysis, Jt.Ex. 7, at 5, and his testimony at trial. Tr. 269.
     
      
      . Seven tasks were ranked "5". They were: surveying structure for possible hot spots after fire has been knocked down; using appropriate safety procedures; observing smoke and fire conditions and locating source of fire; sizing up fire and identifying appropriate extinguishing and ventilation techniques; driving apparatus according to state and local regulations; selecting shortest route to emergency scene; and maneuvering apparatus at scene to occupy best position and avoid interfering with other companies.
      Fourteen tasks were rated “4”. These were: locating hidden fires by seeing, feeling or smelling fire or opening walls; manipulating ladders; climbing and working from ladders with equipment or carrying people; obtaining and donning proper protective equipment; applying knowledge of heat and fluid mechanics to anticipate fire behavior; identifying and saturating potential exposures; identifying and removing flammable or hazardous materials; locating hydrant or water source with best access to fire; computing necessary line pressure; pumping water to supply hoses or sprinkler systems; responds immediately to emergency to save lives; interacts with distraught persons to obtain information; checking vital signs of victim; and preplanning fires in industrial and commercial buildings to locate fire prevention and fighting equipment.
     
      
      . The seventeen traits and the scores they were awarded by firefighters are: knowledge of hydrant locations (2.440); knowledge of occupancy, use, and structural composition of buildings (2.379); knowledge of firefighting tactics (2.370); knowledge of methods of fire extinguishment (2.328); knowledge of safe treatment of hazardous substances (2.310); ability to receive, comprehend and follow orders (2.308); knowledge of ventilation techniques (2.283); ability to learn and improve performance (2.259); skill at remaining oriented at emergency scenes, e.g., dense smoke (2.250); ability to function without sight (2.250); knowledge of streets and addresses in district (2.241); knowledge of CFD regulations regarding positioning of apparatus at scene (2.241); knowledge of size-up procedures (2.241); ability to deal with emotional supervision (2.241); knowledge of CFD hose evolutions (2.204); ability to put knowledge of proper use of rescue equipment into practice (2.200).
     
      
      . These traits are: physical ability to use ladders; physical ability to use hydrant wrench; physical ability to drag empty hose lines; physical ability to advance charged hose lines; physical ability to lift and operate fire extinguishers; physical ability to use CFD equipment, e.g., pike poles; physical ability to climb and work from ladders; ability to crawl on hands and knees; ability to drag or carry adults or children; ability to detect higher temperatures by feel; ability to detect smoke or fire by smell; manual dexterity; ability to remain alert; ability to hear or read and follow instructions; physical ability to assist in loading of hose bed; ability to learn; ability to write legibly; ability to comprehend and follow orders; ability to work from heights without fear.
     
      
      . These were, in order of their scores: ability to wear mask which covers entire face (88); ability to withstand high temperatures (84); ability to function without sight (81); skill at moving around in structures weakened by fire (36); knowledge of proper lifting techniques (35); knowledge of proper use of tools (35); physical ability to use CFD equipment, e.g. pike poles (33); ability to crawl on hands and knees (29); knowledge of operation of CFD apparatus (26); communication skills-hearing and understanding speech in person (25); knowledge of search patterns used in CFD (24); knowledge of smell of materials while burning (23); communication skill — speaking (22); knowledge of proper use of CFD rescue equipment (21); ability to put knowledge of proper use of rescue equipment into practice (21); and skill at remaining oriented at emergency scenes (20). Jt.Ex. 18, Appendix 5.
     
      
      . The physical degree traits and their associated task values are: physical ability to use ladders (9); physical ability to carry out duties of "hydrant man" (3); physical ability to use hydrant wrench (3); physical ability to drag empty hose lines (2); physical ability to advance charged hose line (4); physical ability to mount and operate master stream device (11); physical ability to lift and operate fire extinguishers (1); physical ability to use CFD equipment, e.g. pike poles (33); physical ability to use equipment to shore up unsound structures (3); physical ability to sue tools and equipment in removing water from floors (4); physical ability to work from ladders (19); ability to crawl on hands and knees (29); ability to drag or carry adults or children (13); ability to push or lift heavy objects (14); manual dexterity (3); physical ability to perform first aid and cardiopulmonary resuscitation (15); physical ability to assist in loading of hose bed (2); physical ability to participate in physical training (2).
     
      
      . In meeting with firefighters in this context, Landy identified himself as someone hired by the City to help defend this lawsuit, Tr. 1001, which was criticized at trial by Dr. Cranny. Tr. 359-361. This approach strikes the Court as ill-advised and unnecessary.
     
      
      . These problems are from three potential sources of bias in the correlations: restriction of range, measurement errors in the criteria and the use of dual eligible lists for hiring. These are discussed in Jt.Ex. 16, at 5-4 to 5-16. Kriska/Hines resolved these problems in a highly conservative manner.
     
      
       Source: Jt. Ex. 17, App.R.
     
      
       Source: Jt. Ex. 17, App.M. Because physical abilities were assigned only 49.9 points out of a possible total of 100 points, the scores reported in Appendix M are doubled for purposes of comparison with the results reported in Appendix R. The remaining points in Appendix M were assigned to cognitive abilities.
     
      
       The seven highest rated physical abilities in Appendix M with their associated (corrected) scores were: Stamina (16.4), Static Strength (16.2), Explosive Strength (9.8), Dynamic Strength (9.6), Multi-Limb Coordination (5.4), Manual Dexterity (5.4), and Gross-Body Coordination (5.2). These seven abilities account for 63% of the total physical points awarded by firefighters.
     
      
      . At trial, Dr. Landy testified regarding possible rescoring of the 1984 examination on a pass/fail basis. Passing levels were set at the mean score for all female applicants on the 1984 physical examination. Virtually no females scored above the mean, thus determined, on all five test events; on the other hand, a very large number of males scored above the means. Hence, Landy testified, a pass/fail examination would have even greater adverse impact than a scored examination. Tr. 1029-1032, The problem with this is that the pass points are simply arbitrary; there is no attempt to base them on a job analysis.
     