
    Brenda BERKMAN, on Behalf of herself and a class consisting of all similarly-situated women, Plaintiff-Appellee-Cross-Appellant, v. The CITY OF NEW YORK; Edward I. Koch, individually and as Mayor of the City of New York; New York City Fire Department, Augustus Beekman, individually and as Fire Commissioner of the City of New York; New York City Department of Personnel; Michael Nadel, individually and as Director of Personnel of the City of New York; Thomas Roche, individually and as former Director of Personnel of the City of New York; Civil Service Commission of the City of New York, Defendants-Appellants-Cross-Appellees, and Uniformed Firefighters Association, Local 94, Firefighters Eligibles Association, List No. 1162, Inc., Defendants-Intervenors-Appellants-Cross-Appellees, and James T. Ahrens, Defendant-Intervenor.
    Nos. 1307, 1308, 1309, 1310, Dockets 86-7157, 86-7159, 86-7167 and 86-7201.
    United States Court of Appeals, Second Circuit.
    Argued May 28, 1986.
    Decided Feb. 17, 1987.
    
      Norma Kerlin, New York City (Frederick A.O. Schwarz, Jr., Corp. Counsel, Francis F. Caputo, Elizabeth Dale Kendrick, Robin M. Levine, New York City, on the brief), for municipal defendants-appellants-crossappellees.
    John F. Mills, Mineóla, N.Y. (Colleran O’Hara & Mills, Mineóla, N.Y., on the brief), for defendant-intervenor-appellantcross-appellee Firefighter Eligibles Ass’n, List No. 1162, Inc.
    Michael N. Block, New York City (H. Adam Prussin, Cheryl Eisberg Moin, Lipsig, Sullivan & Liapakis, New York City, on the brief), for defendant-intervenor-appellant-cross-appellee Uniformed Firefighters Ass’n.
    Laura Sager, Washington Square Legal Services, Inc., New York City (Robert L. King, Jonathan E. Richman, Debevoise & Plimpton, New York City, on the brief), for plaintiff-appellee-cross-appellant.
    Before FEINBERG, Chief Judge, NEWMAN and KEARSE, Circuit Judges.
   JON O. NEWMAN, Circuit Judge:

This is an appeal and cross-appeal from orders of the District Court for the Eastern District of New York (Charles P. Sifton, Judge) providing supplemental relief in connection with a Title VII lawsuit alleging gender discrimination in entry-level hiring of New York City firefighters. In an earlier stage of this litigation, the District Court invalidated an entry-level examination and ordered various forms of relief, including the development of a new non-discriminatory entry-level test. The current round of litigation concerns challenges to the validity of the new test and to the District Court’s orders requiring adjustments in the scoring of the new test and in the use of the eligibility list assembled as a result of the new test. For reasons that follow, we affirm in part, reverse in part, and remand for entry of a revised order.

Background

Much of the background is set forth in our prior decision, which rejected challenges to certain aspects of the relief the District Court had ordered after invalidating the physical portion of the original test. See Berkman v. City of New York, 705 F.2d 584 (2d Cir.1983). The plaintiff, Brenda Berkman, filed the suit in 1979, alleging gender discrimination by the New York City Fire Department and other municipal defendants in violation of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. The plaintiff challenged the physical test of Exam 3040, the 1978 Fire Department entrance examination, on the ground that this test had a disparate impact on women and was not job-related. On March 4,1982, the District Court invalidated the physical portion of Exam 3040 and ordered several forms of relief, including “preparation of new and valid selection procedures.” Berkman v. City of New York, 536 F.Supp. 177, 216 (E.D.N.Y.1982). This aspect of relief, which is customary in Title VII litigation, was not challenged on the prior appeal. The March 4, 1982, decision also ordered as interim relief the hiring of up to 45 women members of the plaintiff class who passed a “qualifying test” of physical abilities. Such a test was developed by the defendants in cooperation with the plaintiff and approved by the District Court in August 1982. The qualifying test consisted of two parts, a simulation of engine company tasks and a simulation of ladder company tasks, separated by a rest interval. See 705 F.2d at 592 n. 10. The test was scored on a pass/fail basis, with completion in four minutes, nine seconds, considered a passing score. This test was administered in September 1982. Thirty-eight of the women who passed were hired as firefighters.

On September 11, 1982, the defendants administered the written portion of a new entry-level firefighter test, Exam 1162. The written portion was administered to 31,421 candidates, of whom 566 identified themselves as females.

In October 1982 the defendants sought the District Court’s approval of the physical test of Exam 1162. The physical test was similar to the “qualifying test” used for interim hiring, with some changes. Two additional tasks were added — a hose pull and a wall vault. The rest interval between the engine company tasks and the ladder company tasks was reduced to two minutes. Finally, the scoring was altered from pass/fail to a rank-ordered system based on speed of completion. Completion in less than four minutes was scored 100, completion in each of the six 30-second intervals between four and seven minutes was scored downward from 95 to 70 in five-point steps, and completion in more than seven minutes was considered failing. This produced seven passing grades or “bands.” An applicant’s overall score on Exam 1162 was to be determined by averaging the scores on the written and physical tests.

In January and February 1983 the District Court heard testimony on seven days concerning the validity of the physical test of Exam 1162. That hearing was adjourned on February 18, 1983, without a specific date for resumption. Two months later the defendants informed the Court that they were reluctantly going to administer Exam 1162, despite the lack of an advance ruling on its validity, because of the need to promulgate a new eligibility list and the unlikelihood that the hearing would be resumed and an advance ruling issued. Receiving no contrary indication from the District Court, the defendants administered the physical test of Exam 1162, at a cost of $750,000, to more than 20,000 applicants who had passed the written test. Prior to administering the physical test, the defendants obtained foundation funding for a special training program for women to prepare them for the test. Most of the women who participated actively in the training program passed the physical test, with 40 percent scoring at least 85.

The physical test was administered during the period from July 1983 until the spring of 1984. During that period an episode occurred that would prove significant to one aspect of the remedy challenged on this appeal. In September 1983 the named plaintiff, Brenda Berkman, and another class member, both of whom had been hired pursuant to the District Court’s interim hiring remedy, were terminated at the conclusion of their probationary period, ostensibly for poor performance. This action precipitated a motion for reinstatement, plaintiff contending that the terminations had been the result of intentional discrimination. The District Court agreed and ordered reinstatement. Berkman v. City of New York, 580 F.Supp. 226 (E.D.N.Y.1983).

The defendants disclosed the results of Exam 1162 in May 1984. Of the 31,421 applicants who took the written test, 29,113 achieved a passing score of at least 70. The passing rates were: men, 98.65 percent; women, 97.8 percent. The scores were bunched at the high end: 83.21 percent of the applicants scored 90 or higher, 66.68 percent scored 94 or higher, and 9,788 applicants scored 98 or higher. Of the 28,559 men who passed the written test, 22,255 (77.93%) took the physical test; of the 554 women who passed the written test, 165 (29.78%) took the physical test. The passing rates on the physical test were: men, 95.42 percent; women, 46.67 percent. The distribution of scores on the physical test was as follows:

Score Males Females
100 600 0
95 6180 0
90 8529 7
85 3982 19
80 1325 18
75 451 25
70 169 8
below 70 1019 88

In June 1984 defendants disclosed a proposed eligibility list based on an equal weighting of the scores on the written and physical tests. The District Court estimated that approximately 2,800 applicants will be hired as firefighters during the four-year life of the eligibility list, see N.Y.Civ.Serv.L. § 56 (McKinney 1983), and that approximately 6,500 applicants must be offered positions to obtain the needed 2,800. The 6,500 highest ranking applicants received a combined score of 94.5 or better. Only two women are in this group, which includes all applicants with any prospect of being hired as firefighters from this list.

In response to a motion by the plaintiff, the District Court issued an order on June 29, 1984, enjoining use of the proposed eligibility list “until this Court has made a determination that Exam 1162 is valid and does not discriminate against women.” An exception was made for interim hiring. The plaintiff reports that, as of March 1986, 850 firefighters had been hired from the proposed list of eligibles, all of whom are male.

Between January and June 1985 the District Court conducted hearings on the validity of Exam 1162. The defendants sought to demonstrate the validity of the physical test of Exam 1162 on the basis of both content validity and criterion-related validity. Content validity concerns the measurement of knowledge or abilities needed for successful job performance. Criterion-related validity concerns the identification of criteria that reflect successful job performance and a determination of the extent to which test scores correlate with the meeting of such criteria. See Uniform Guidelines on Employee Selection Procedures (1978) of the Equal Employment Opportunity Commission, 29 C.F.R. § 1607.5(B), .14 (1986). The criterion-related validation was based on a concurrent validation study, which compared test scores with job performances of a sample of 133 incumbent firefighters, 104 males and 29 females. Defendants’ expert testified that his analysis showed a high degree of correlation for both males and females between physical test scores and job performance criteria.

The plaintiff’s experts challenged the validity of Exam 1162 essentially on two grounds. First, they contended that the physical test measured a candidate’s anaerobic energy system and ignored the aerobic system. Anaerobic energy is expended in using strength and speed for short intervals of time, usually less than five minutes. Aerobic energy is expended during physical exertion over prolonged periods of time. Weight-lifters and sprinters use primarily anaerobic energy; long-distance runners use primarily aerobic energy. In the prior stage of this litigation, when the District Court invalidated Exam 3040, the physical portion of that exam had been criticized for testing only anaerobic energy, disregarding the fact that successful firefighting frequently requires paced exertion over several hours of activity. 536 F.Supp. at 207, 212. Plaintiff complained that Exam 1162 perpetuated this deficiency by requiring each of the two sets of physical tasks to be performed within 90 seconds in order to produce a score, high enough to afford a candidate a realistic chance of being hired. Failure to test for stamina, it was urged, neglected a characteristic important for successful job performance and also slanted the scores adversely to women, who tend to compare more unfavorably with men in regard to anaerobic energy than aerobic energy. Plaintiff’s evidence indicated, for example, that men tend to run 40 percent farther than women in runs lasting two minutes, but only 14 percent farther in runs lasting ten minutes.

Second, plaintiff’s experts challenged the use of scores from the written portion of Exam 1162. Noting that these scores were bunched at the high end, they contended that, because the written test was too easy, it did not provide sufficient differentiation among applicants. As a result, ranking on the proposed eligibility list was determined primarily by the results of the physical test, despite the fact that Fire Department officials had rated mental and physical abilities of equal importance for successful job performance.

In the course of presenting testimony challenging Exam 1162, one of plaintiff’s experts suggested a way to rescore the results of the physical test by grouping the raw scores into three scoring bands, instead of the seven used by the defendants. The highest band included all who finished in less than four minutes 30 seconds, the second, those finishing between four minutes 31 seconds and six minutes; and the third, those finishing in more than six minutes one second but less than seven minutes. Rescoring of the physical test in three bands produced the following distribution:

Score Males Females
95 and 100 6,780 0
80, 85, and 90 13,886 44
70 and 75 620 33

For purposes of combining physical test scores with written test scores, the expert suggested assigning the top band a score of 100, the second band, 85, and the third band, 70. The expert urged that a three-band scoring would predict job performance as well as the seven-band scoring and would produce less adverse impact on women. Subsequently, the expert made it clear that his three-band proposal for the physical test scores would reduce adverse impact on women only if these scores were then combined with a computer-generated “normal distribution” of passing scores on the written test, randomly assigned to those who passed the written test.

On October 8, 1985, the District Court issued an opinion and order concerning the validity of Exam 1162. 626 F.Supp. 591 (E.D.N.Y.1985). Judge Sifton noted his pri- or criticism of Exam 3040 for its undue emphasis on anaerobic energy performance and concluded that in devising Exam 1162, “defendants failed lamentably to establish a basis for the emphasis placed on maximal strength and speed,” id. at 598, and “ignored not only this Court’s prior findings concerning the role of aerobic energy in performing firefighting functions but also the recommendations of the same expert whose Philadelphia evaluators appear to have been the principal source of both the qualifying exam and Exam 1162 that the test last 5 to 10 minutes without a recovery period,” id. at 599 (footnote omitted). Nevertheless, the Court concluded that Exam 1162 “comports in general with the requirements of this Court’s decision of March 1982 with the exception of its scoring which purports to distinguish between qualified candidates to a degree of exactness not consistent with the lack of precision inherent in the exam.” Id. at 593.

To remedy the scoring deficiencies, Judge Sifton directed three changes. First, he required that the physical test be scored in three bands. This system, he concluded, shows “greater validity” than the seven-band system and “appears required by defendants’ own criterion measures as well as by the failure of defendants in their job analysis and test preparation to give due consideration to the demands made on aerobic energy in performing firefighting tasks,” id. at 600 (footnote omitted). Second, Judge Sifton devised a remedy to deal with the fact that the percentage of those who passed the written test and went on to take the physical test was much less for women than for men. The District Judge attributed this fall-off in interest to continued discrimination within the Fire Department as evidenced by the well-publicized efforts of the Department to discharge the plaintiff and another female probationary firefighter in September 1983. On the assumption that in the absence of the deterrent effect of the attempted firings, the percentage of those taking the physical test after passing the written test would have been the same for women as for men, Judge Sifton estimated that 432 women would have taken the physical test, instead of the 165 who did so, or 2.62 times as many. Id. at 600-01. On the further assumptions that, if 432 women had taken the physical test, they would have passed at the same rate as the 77 women who took and passed the test and that all women who would have passed would have achieved a distribution of scores similar to those of the 77 who passed the test, Judge Sifton ordered that each woman on the eligibility list should be afforded an increased opportunity to be hired ahead of an equally ranked male. Id. at 601. The increased opportunity was to be achieved by use of a “compensation ratio” of 2.62 to 1. Third, to lessen the undue differentiating power of the physical test scores because of the bunching of the written test scores, Judge Sifton directed the parties to explore the validity and impact on women of rescoring the written test on either a pass/fail basis or in three scoring bands.

Each side presented entirely different proposals for a final order in response to the October 8 ruling. The defendants presented evidence showing that use of a three-band scoring system for the physical test would have a more adverse effect on women than the seven-band system. Although only two women would be reached on the eligibility list for hiring with either scoring system, they would be reached later with use of the three-band system. Defendants therefore urged the District Court to permit use of the eligibility list as proposed. The plaintiff took the position that the Court’s findings concerning the deficiencies in Exam 1162 required an extensive rescoring remedy. She proposed random selection from among all applicants who achieved a passing score on both the written and physical tests, with a female applicant accorded an increased opportunity to be hired over a male applicant in the ratio of 2.62 to 1. Alternatively, she proposed that rank-ordering of candidates be permitted provided that male and female applicants were hired in the same proportion as would result from random selection of those who achieved passing scores. The plaintiff also offered a third and fourth alternative to be used in the event that the District Court was satisfied that the physical test had sufficient criterion-related validity to permit its use. The third alternative was to administer a new written test, presumably one with sufficient difficulty to produce a broader spread of test scores, and combine the scores on such a test with the scores from the physical test grouped in three bands. The fourth alternative was to generate by computer a “normal distribution” of passing scores for the written test, assign these scores randomly to all candidates who passed the written and physical tests, and then combine these assigned scores with the scores from the physical test grouped in three bands.

On February 14, 1986, the District Court issued a final order concerning Exam 1162. This order requires three changes in the scoring of Exam 1162 and the use of its results. First, the physical test is to be scored in three bands. Judge Sifton accepted the three bands as described in the testimony of one of plaintiff’s experts, placing scores of 100 and 95 in band A, scores of 90, 85, and 80 in band B, and scores of 75 and 70 in band C. However, the District Judge ordered that the scores to be combined with the written test scores would be 95.4 for candidates in band A, 87.6 for candidates in band B, and 73.6 for candidates in band C. Second, a “normal distribution” of test scores is to be computer generated for the written test and randomly assigned to all candidates who passed the written and physical tests. Third, the District Judge required use of the 2.62 compensation ratio outlined in the October opinion. The ratio is to be applied once a new rank-ordering of candidates has been compiled using the combined scores resulting from the random assignment of computer-generated written scores weighted equally with the three-band scoring of the physical test. From such a list the defendants are to hire women and men as if there were 2.62 times as many women as actually appear at each level of the combined score. The compensation ratio gives each woman at any given score an increased chance of being selected over male candidates at that same score, though it does not require that any woman be hired ahead of any man with a higher score.

On the main appeal, the defendants challenge all three changes ordered by the District Court. On the cross-appeal, the plaintiff contends that the District Court should have required new written and physical tests, or at least should have required that the existing tests be used only for random selection from among all candidates who achieved passing scores.

Discussion

It will be useful to begin with consideration of plaintiffs cross-appeal, since if she is correct that the physical and written tests of Exam 1162 may not be used at all, there would be no need to consider the specific adjustments to the scoring of these tests ordered by the District Court. With respect to the written test, plaintiff has made no showing that would justify our rejection of this component of the exam. The written test was not claimed to have an adverse impact on women. Nor was it claimed to test for knowledge insufficiently related to job performance. Plaintiffs only complaint about the written test is that it is too easy. We will return to that claim in considering the defendants’ objections to the scoring adjustments ordered by the District Court. At this point it suffices to note that the content of the written test has not been shown to be vulnerable on any ground cognizable under Title VII.

Plaintiff’s challenge to the physical test is somewhat more substantial. She contends that the physical test has a demonstrably adverse impact on women and that it has not been adequately validated to justify its use. Her primary complaint is that the test measures an applicant’s anaerobic energy system, as applied to firefighters’ tasks, and substantially ignores the aerobic energy system. Assessment of the latter, she alleges, would afford female applicants an improved opportunity to achieve higher scores compared to those of male applicants. In considering this argument, we note first that, though the District Court expressed some criticism of the physical test for inadequate assessment of anaerobic energy capacity, the Court nonetheless upheld the validity of the test. Having reviewed the record and the decision of the District Court made on the basis of that record, we are not persuaded that the decision upholding validity was erroneous. The test was carefully constructed after a detailed consideration of firefighters’ tasks. Substantial testimony before the District Judge supported validity on the basis of both content validation and criterion-related validation. Though plaintiff finds fault with the methods whereby the defendants demonstrated both content and criterion-related validity, there is an insufficient basis to disregard the District Court's conclusion upholding the validation of the test.

We do not doubt the plaintiff’s basic point that stamina, a function of a person’s aerobic energy system, is important in the performance of a firefighter’s tasks. The evidence of senior officials of the Fire Department acknowledged that stamina was an important attribute for successful job performance. It does not follow, however, that a physical test of the ability to perform simulated job tasks of firefighters, without a specific measurement of stamina, lacks validity to a degree that renders it vulnerable to a Title VII challenge. Obviously, firefighters frequently face situations where their anaerobic abilities determine whether or not they will save the lives of fire victims. The firefighter arriving on the scene of a fire will frequently be obliged to use strength and speed in a short amount of time. Abundant evidence in the record supports this point, which in any event would be self-evident. It may well be that the effectiveness of a person with minimal stamina will decline if called upon to perform firefighting tasks over a considerable period of time. Perhaps a person with greater stamina would perform the tasks better after protracted activity than the firefighter who might excel in the first few minutes of activity. But the Fire Department is entitled to select those who are endowed with the physical abilities to act effectively in the first moments of arrival at a fire scene, where immediate speed and strength literally concern matters of life and death. If a person with limited stamina tires during the course of firefighting duties, that person can be replaced with a fresh firefighter. However, if the first firefighters on the scene are deficient in the speed and strength necessary to handle their tasks, those in need of immediate rescue will not be comforted by the fact that those first on the scene might be able to sustain their modest energy levels for a prolonged period of time. See Spurlock v. United Airlines, Inc., 475 F.2d 216, 219 (10th Cir.1972) (employer’s burden to justify employment criteria correspondingly lighter where “human risks involved”).

In an ideal world, a fire department might first select those applicants with a high degree of speed and strength and from that group make a second selection of those with relatively greater stamina. There is nothing in this record, however,, to show that such a selection process would have a less adverse effect upon women. Indeed, since only seven women placed in the top 15,316 applicants on the physical test, which primarily measured speed and strength in the performance of firefighters’ tasks, a further selection from among these applicants, giving priority to those with relatively greater stamina, would at most have placed only these seven women applicants somewhat higher on the eligibility list, an outcome by no means certain.

In sum, the District Court’s conclusion that the written and physical tests of Exam 1162 are appropriate for use to select entry-level firefighters is entitled to be approved.

We turn then to the defendants’ challenges to the three changes in scoring ordered by the District Court. The first adjustment — replacement of the seven-band system for scoring the physical test with a three-band system — is fatally flawed. In the first place, the defendants have demonstrated, without contradiction by the plaintiff, that the three-band scoring system does not advance any objective of Title VII: it neither enhances the validity of the physical test nor reduces the adverse effect upon women applicants. In fact, it operates to the detriment of both test validity and women applicants. As we pointed out above, speed in the performance of a firefighter’s task is highly relevant to successful job performance; collapsing the seven bands of test completion times into three bands serves only to oblige the defendants to select some applicants ahead of others who have demonstrated the capacity to handle firefighter tasks more swiftly. From the standpoint of women applicants, the three-band system does not place even one additional woman applicant high enough on the eligibility list to have any prospect of being reached for selection and in fact postpones the time when the two women high enough to be selected will be appointed. Plainly, this remedy is unwarranted.

The second adjustment — rescoring the written test — arises from a trilemma faced by the District Court arising from two undisputed facts. The Fire Department officials evaluated both cognitive and physical skills as important for successful job performance. In addition, scores on the written test were bunched at the high end, according the physical test more differentiating power in the ultimate selection of candidates than the written test. Since the defendants were entitled to use the distribution of scores on the physical test in the selection of applicants, there were essentially three possibilities for use of the written test. The first was the plaintiff’s preference for a new written test of greater difficulty, which would produce a broader distribution of scores reflecting the range of cognitive abilities of those taking a more difficult test. The second was rescoring the written test to eliminate bunching of scores at the high end; of various techniques available, the District Court chose generating by computer a normal distribution of passing test scores and assigning such scores randomly to all who passed the written and physical tests. The third was the defendants’ preference to leave the written scores unadjusted.

Each approach has some deficiency. Use of a more difficult written exam would encounter the substantial argument that cognitive abilities have been differentiated to a. degree greater than that required for successful job performance. See Vulcan Society v. Civil Service Commission, 360 F.Supp. 1265, 1276 (S.D.N.Y.), aff'd in part, remanded in part, 490 F.2d 387 (2d Cir.1973). Though cognitive ability is important for successful performance as a firefighter, it does not follow that extremely high degrees of cognitive ability that might be measured by a difficult written test will provide a basis for selecting more competent firefighters. Use of the District Court’s remedy encounters a different problem. Those who scored significantly better on the written test than other applicants and thereby demonstrated somewhat better cognitive ability within a range of abilities appropriate for hiring selection will be deprived of the competitive advantage they earned. Moreover, use of a normal distribution of scores on the written test may give this test more differentiating power than the physical test, since scores on the latter did not follow a normal distribution pattern. Use of the defendants’ solution also is not without a deficiency. Leaving the written scores unadjusted accords enhanced differentiating power to the scores of the physical test.

Though the facts created a trilemma, they did not warrant the District Court’s remedy of random assignment of written test scores based on a computer-generated normal distribution of scores. That remedy unfairly deprives many male and female applicants of the enhanced opportunity they achieved by scoring comparatively better on the written test than other applicants. Moreover, it burdens the Fire Department with the prospect of hiring some applicants who, though achieving passing scores on the written test, ranked below other applicants and thereby demonstrated a lesser degree of cognitive ability. Though the one-point differences at the high end of the score distribution probably lack significance, see Guardians Ass’n v. Civil Service Commission, 630 F.2d 79, 100-05 (2d Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981), the distribution of written scores used by the defendants spans scores throughout the range from 70 to 100. Even though scores were concentrated at the high end, whatever differentiating power the written test has could not be eliminated unless substantially justified to avoid noncompli-ance with Title VII. Such was not the case. The deficiency the District Court sought to avoid was according greater differentiating powér to the physical test than to the written test. Though that outcome may have placed male applicants higher on the eligibility list than they would have been had a normal distribution of written test scores been used, this consequence did not impair the validity of the physical test nor that of Exam 1162 as a whole. As discussed above, the defendants have an entirely legitimate interest — a job-related interest — in according priority in hiring to those with the demonstrated ability to perform firefighting tasks speedily. Indeed, the defendants would have been entitled, had they chosen, to score the written test solely on a pass/fail basis, using its results only as a threshold to identify the group from which applicants would then be selected on the basis of job-related physical abilities. The defendants did not violate Title VII by letting the results of the written test exert some differentiating power on the final eligibility list, though less than that of the physical test.

The third scoring adjustment— use of a 2.62 compensation ratio to enhance the hiring opportunity of women at the same combined score on the eligibility list as men — is also unwarranted. Affirmative relief that accords enhanced hiring opportunities to compensate for the effects of past discrimination is available only under limited circumstances. See Local 28, Sheet Metal Workers’ Int’l Ass’n v. E.E.O.C., — U.S. —, 106 S.Ct. 3019, 3050, 92 L.Ed.2d 344 (1986) (plurality opinion); Kirkland v. New York State Department of Correctional Services, 711 F.2d 1117, 1134 (2d Cir.1983), cert. denied, 465 U.S. 1005, 104 S.Ct. 997, 79 L.Ed.2d 230 (1984). In this case, Judge Sifton adopted an affirmative remedy because he concluded that publicity surrounding the Fire Department’s attempt to discharge the plaintiff and another female probationary firefighter had deterred women who had passed the written test from returning to take the physical test. This conclusion was an inference drawn from the facts that the terminations of the two probationary firefighters occurred shortly before the September 12-13, 1983, dates during which women applicants took the physical test, the terminations received considerable publicity, and the percentage of those passing the written test who showed up to take the physical test was much lower for women than for men.

In assessing whether this inference of deterrence may support a compensating hiring ratio, we note first the numerical context in which the ratio has been imposed. Defendants’ projections indicate that, even with all of the scoring adjustments ordered by the District Court, only six women will be ranked high enough on the eligibility list to be offered appointments during the life of the list. Since we have concluded that the three-band scoring of the physical test (which aided no women) and the random assignment of a normal distribution of scores on the written test are not warranted, the number of women who will likely be offered appointment remains at two, the original estimate when the list was first assembled. Since the compensation ratio applies only to increase the hiring priority of women at the same score on the list as men, rather than to give women a higher score, its effect is extremely limited. It will only enable the two women likely to be offered appointment to join the Fire Department sooner. Thus, this is the rare case where an affirmative remedy has an extremely minimal effect. That circumstance would appear to reduce both the objection to the remedy and the need to adopt it.

Several considerations persuade us that the use of the affirmative remedy of a compensation ratio is not warranted in this case. There is no testimony from any female who passed the written test that she was deterred from taking the physical test by discriminatory conduct of the Fire Department. In this respect, the circumstances are quite unlike those that we previously found sufficient to support interim preferential hiring after Exam 3040 was invalidated. Several women testified that the difficulty of that test, later invalidated, had deterred them from taking it. Moreover, prior to any publicity concerning the two terminations, the defendants provided a special training program to assist female applicants in scoring well on the physical test. The approximately 300 women who chose not to participate in this program could not have been deterred by any publicity concerning the two terminations, since the training program began two months before the publicity. In addition, the fact that the defendants made the training program available and secured foundation funding to support it militates against a conclusion that an affirmative remedy is warranted to overcome a demonstrated hostility to female candidates. Finally, it is significant that the defendants have offered to afford female candidates who passed the written test an additional opportunity to take the physical test and have further offered to provide a training program to enhance their chances of success. If, as Judge Sifton concluded, some women were deterred from taking the physical test because of the two terminations, the defendants’ offer is a more promising remedy than the 2.62 compensation ratio. Providing training for the approximately 300 women who did not take the physical test and giving them a second chance to take the test offers at least the prospect that some may score high enough to secure a combined score that will enable them to join the only two women now high enough on the list to be appointed. This is a more useful remedy than simply advancing the time of appointment for the two women now likely to be hired.

Conclusion

For all of these reasons, we affirm the October 8, 1985, and the February 14,1986, orders of the District Court to the extent that they uphold the validity of Exam 1162; we reverse the orders to the extent that they require three-band scoring of the physical test, random assignment of computer-generated scores for the written test, and use of a compensation ratio; the defendants may promptly issue and make appointments from an eligibility list compiled from the combined scores on the written and physical tests, without adjustments; that list shall be supplemented with the-combined scores of any women who have passed the written test and accept the defendants’ offer to participate in a training program and take the physical test again.

The orders of the District Court are affirmed in part, reversed in part, and remanded for entry of a revised order consistent with this opinion. 
      
      . A "normal distribution" is a pattern of the frequencies with which data occur at points along a continuum, the pattern resembling a bell-shaped curve and characterized by few occurrences at the low and high ends (in this case, scores near 70 and 100), the highest number of occurrences at the mean (in this case, 85), and nearly 70 percent of occurrences within one standard deviation of the mean (in this case, between 80 and 90). See R.P. Runyon & A. Haber, Fundamentals of Behavioral Statistics 112-13 (3d ed. 1977).
     
      
      . The Order details the use of the 2.62 compensation ratio as follows:
      Female applicants on the list shall be afforded an increased opportunity over equally ranked males on the list to be selected pursuant to a ratio of 2.62 to 1. This ratio shall be known as ' the compensation ratio. To apply this compensation ratio, the defendants shall divide the number of eligible men assigned a particular score on the list by the number of eligible women receiving the same score, times the compensation ratio of 2.62. The quotient of the calculation shall determine how many men receiving a particular score on the examination may be selected before the next woman is selected. This ratio shall be known as the selection ratio. Where men and women receive the same score, a woman shall be selected first, and the number of men determined by the selection ratio shall be selected thereafter. This process shall be repeated until the score is exhausted, and the next score shall be treated in the same manner, provided there [are] also women receiving that score.
     