
    Felix WAISOME; Freddie McMillan; Richard B. Keith; Robert L. Bethea; Ellsworth Corum, Jr.; Hillary King; Roderick W. Upshur on behalf of themselves and all those similarly situated, Plaintiffs-Appellants, v. The PORT AUTHORITY OF NEW YORK AND NEW JERSEY; the Board of Commissioners; Stephen Berger; Henry I. Degeneste; the Port Authority Police Benevolent Association, Incorporated, Defendants-Appellees.
    No. 1673, Docket 91-7213.
    United States Court of Appeals, Second Circuit.
    Argued June 27, 1991.
    Decided Nov. 19, 1991.
    
      Eric Schnapper, New York City (Julius L. Chambers, Charles Stephen Ralston, NAACP Legal Defense and Educational Fund, Inc., of counsel), for plaintiffs-appellants.
    Carlene V. McIntyre, New York City (Arthur P. Berg, Philip A. Maurer, James Beg-ley, Milton H. Pachter, of counsel), for defendants-appellees The Port Authority of New York and New Jersey.
    Samuel A. Marcosson, Washington, D.C. (Donald R. Livingston, Gwendolyn Young Reams, Vincent J. Blackwood, E.E.O.C., of counsel), for The E.E.O.C. as amicus curiae.
    Douglas S. McDowell, Washington, D.C. (Robert E. Williams, Edward E. Potter, Garen E. Dodge, McGuiness & Williams, of counsel), for The Equal Employment Advisory Council as amicus curiae.
    Before CARDAMONE, MINER and MAHONEY, Circuit Judges.
   CARDAMONE, Circuit Judge:

This appeal is from a dismissal of a Title VII action alleging disparate impact. To prove its claim, the plaintiff class relied on statistics. Although Holmes predicted that “the man of the future is the man of statistics,” O.W. Holmes, The Path of the Law, 10 Harv.L.Rev. 457, 469 (1897), his prophecy has proved overly optimistic. Lawyers and judges working with statistical evidence generally have only a partial understanding of the selection processes they seek to model, they often have incomplete or erroneous data, and are laboring in an alien and unfamiliar terrain. Yet, the statistical evidence in this record evidences a disparity significant enough to suggest a violation of Title VII.

We must determine on this appeal whether the procedures used by the Port Authority of New York and New Jersey (Port Authority) to promote police officers to the rank of sergeant had a disparate impact on black candidates. Plaintiff class — all 64 of the black candidates seeking such promotion and all of whom participated in the promotion process (plaintiffs or appellants) — appeals from a judgment of the United States District Court for the Southern District of New York (Duffy, J.), entered January 29, 1991 granting the Port Authority’s motion for summary judgment dismissing plaintiffs’ complaint in its entirety.

Plaintiffs urge that the district court erred when it concluded the class had failed to demonstrate that the Port Authority's promotion procedures had a disparate impact on black candidates compared to white candidates sufficient to prove the existence of discrimination. We think it did, hence we remand this case to the district court for further proceedings.

FACTS

The events leading up to commencement of this suit are more fully set forth in the district court’s thorough opinion, Waisome v. Port Auth. of New York and New Jersey, 758 F.Supp. 171 (S.D.N.Y.1991), with which we assume the reader’s familiarity. We recount only those facts relevant to resolution of the issues before us. On July 11, 1986 the Port Authority announced the beginning of an examination process to establish a vertical list of officers and detectives eligible for promotion to the rank of sergeant. The list was to expire three years after it was issued. Candidates for promotion were required to be employed as police officers as of the date of the first test, have two years in grade as a Port Authority police officer including Academy training, and undergo the examination process.

That process had three steps: first, a written test designed to gauge a candidate’s knowledge of the law, of police supervision, and of social and psychological problems at work; second, those who passed the written test took an oral test designed to measure judgment and personal qualifications; candidates who succeeded on the second step proceeded to the third and final step — a performance appraisal based on a supervisory performance rating and the candidate’s attendance record.

Candidates completing the examination process were placed on an “Eligible List” by rank according to the weighted composite score achieved on the written and oral tests and on the performance appraisal. The written test accounted for 55 percent of the composite score, the oral test for 35 percent, and the performance appraisal for the final 10 percent. After the list was issued, the Port Authority promoted candidates to the rank of sergeant, as need required, starting with those achieving the highest total score, and proceeding down the list in order of rank. It was made clear that no candidates below the 120th position on the list could expect promotion.

A total of 617 candidates took part in the examination process, of whom 508 were white, 64 were black and 45 were in other groups. The passing score for the written part of the test was 66. The number passing the written examination was 539 — of whom 455 were white and 50 were black. White candidates had therefore a pass rate of 89.57 percent and black candidates 78.13 percent. The rate at which black candidates passed the written examination was 87.23 percent that of the white test-takers. The statistical measure of the difference between the pass rates is 2.68 standard deviations. The mean score of blacks on the written examination was 72.03 percent and whites scored 79.17 percent, yielding a difference of 5.0 standard deviations. The term “standard deviation,” discussed more fully later in this opinion, refers to the probability that a result is a random one.

All but eight of the 539 applicants, that is, 531 of those who had passed the written test went on to take the oral test. Of these, 448 were white and 49 were black. The passing score on the oral examination, 69.9 percent, was achieved by 310 candidates, including 258 whites and 33 blacks. The pass rate for white applicants was 57.58 percent and that of black applicants was 67.35 percent. Thus, the pass rate of blacks was 116.97 percent of the white pass rate. All candidates who passed the oral test underwent a performance appraisal. Also undergoing the performance appraisal were six white officers who were “grandfathered” from a pre-existing list into the pool of applicants that successfully completed the written and oral examinations. No minimum score was required for a candidate to be placed on the Eligibility List. The mean score of whites on the performance appraisal was 94.37 and that of blacks was 94.17.

The 316 candidates who underwent the performance appraisal were placed in order of their composite scores from the three steps of the examination process on the Eligibility List on March 30, 1987. Promotions were made from the top of the list moving downward, and during its three year life the 85th candidate was reached. These 85 candidates included 78 whites, 5 blacks, and 2 members of other groups. Four of the white candidates declined the promotions, two retired before being offered promotions, and two were among those grandfathered onto the list, leaving a net of 70 white officers actually promoted through the examination process. Fourteen (14) percent of all the white candidates who took the written test (70 of 501) were actually promoted. The comparable figure for black candidates was 7.9 percent (5 of 63). The success rate for the promotion of black candidates was therefore 55.52 percent of the rate for white candidates. The difference in selection rates is computed at 1.34 standard deviations.

Plaintiffs point out that 76 was the minimum score a candidate could achieve on the written component and still be within the top 85 candidates. According to plaintiffs and amicus curiae, the U.S. Equal Employment Opportunity Commission (EEOC), 42.2 percent of black candidates scored at least 76 on the written test while 78.1 percent of white candidates attained at least that score. Hence, the rate at which black candidates achieved a score of 76 was 58.9 percent of the rate at which white candidates did so. There is some confusion as to how many standard deviations this disparity amounted to, though the most accurate figure appears to be 4.77 standard deviations.

The named plaintiffs brought this class action on behalf of all 64 of the black candidates for sergeant who participated in the promotion process. Plaintiffs’ original complaint alleged the promotion procedure had a discriminatory impact on black candidates in violation of Title VII of the Civil Rights Act of 1964, § 703, 42 U.S.C. § 2000e et seq (1988). On February 26, 1987 plaintiffs filed an amended complaint adding a claim that the procedure was adopted and administered with a discriminatory motive in violation of Title VII, the Civil Rights Act of 1866, 42 U.S.C. §§ 1981 and 1983, and the Fourteenth Amendment.

Named as defendants in plaintiffs’ amended complaint were the Port Authority, its Executive Director and Board of Commissioners, the Superintendent of Police, and the Port Authority Police Benevolent Association, Inc. With the parties’ consent the district court permitted 31 non-minority candidates to intervene. Plaintiffs sought injunctive relief enjoining the Port Authority from using the Eligibility List for promotions and requiring them to use non-discriminatory procedures in the future, and affirmative relief to redress the effects of the testing procedure. Because the district court believed plaintiffs had failed to demonstrate irreparable harm, it denied their motion for injunctive relief on October 14, 1988.

The parties entered into a stipulation, approved by the court on November 21, 1989, narrowing the scope of the issues to be litigated. Under the terms of the stipulation plaintiffs agreed to drop the intentional discrimination cause of action and limit their action to the disparate impact claim, and defendants agreed that if the court found a disparate impact they would not attempt to litigate the validity of the test. Plaintiffs moved for class certification and for partial summary judgment on the issue of the Port Authority’s liability. The defendant cross-moved for summary judgment dismissing the complaint.

In its January 29, 1991 decision the district court certified the plaintiff class, denied its motion for partial summary judgment on the issue of the defendant’s liability, and granted the Port Authority’s cross-motion for summary judgment dismissing plaintiffs’ complaint. Specifically, the court concluded that the written examination did not have a disparate impact on black candidates because they passed this portion of the examination process at 87.28 percent that of white candidates, which it believed did not amount to a substantial difference. The court also disagreed with plaintiffs’ contention that the proper score at which to gauge the impact of the written examination was not the passing score, but rather the higher minimum score at which a candidate could actually be promoted. Finally, the court found there was no disparate impact in the rates at which black and white candidates were actually promoted because the difference was not statistically significant.

DISCUSSION

Plaintiffs challenge the trial court’s rulings in three respects: they assert, first it incorrectly found they had not demonstrated a disparity between the rates at which black and white candidates passed the written examination sufficient to show a Title VII violation; second, it should have considered whether there was a sufficient disparity between the rates at which black and white candidates achieved the minimum score necessary on the written examination to have had a realistic opportunity for promotion, and declare that there was such a disparity; third, the court erroneously concluded there was not a sufficient disparity between the rates at which black and white candidates were actually promoted. We address each argument in turn.

A.

We start with the statute. Section 703 of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 20Q0e-2, provides in relevant part:

(a) It shall be an unlawful employment practice for an employer—
(1) to fail or refuse to hire or to discharge any individual, or otherwise to discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual’s race, color, religion, sex, or national origin; or
(2) to limit, segregate, or classify his employees or applicants for employment in any way which would deprive or tend to deprive any individual of employment opportunities or otherwise adversely affect his status as an employee, because of such individual’s race, color, religion, sex, or national origin.

Title VII prohibits overt and intentional discrimination as well as discrimination resulting from employment practices that are facially neutral, but which have a “disparate impact” because they fall more harshly on a protected group than on other groups and cannot otherwise be justified. See Connecticut v. Teal, 457 U.S. 440, 446-47, 102 S.Ct. 2525, 2530-31, 73 L.Ed.2d 130 (1982); International Bhd. of Teamsters v. United States, 431 U.S. 324, 335-36 n. 15, 97 S.Ct. 1843, 1854-55 n. 15, 52 L.Ed.2d 396 (1977); Griggs v. Duke Power Co., 401 U.S. 424, 431, 91 S.Ct. 849, 853, 28 L.Ed.2d 158 (1971); Bridgeport Guardians, Inc. v. City of Bridgeport, 933 F.2d 1140, 1146 (2d Cir.), cert. denied, — U.S. -, 112 S.Ct. 337, 116 L.Ed.2d 277 (1991).

1. Proof of Disparate Impact Generally

To prove disparate impact, a plaintiff must first identify the specific employment practice he is challenging, see Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 656-57, 109 S.Ct. 2115, 2124-25, 104 L.Ed.2d 733 (1989); Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 994, 108 S.Ct. 2777, 2788, 101 L.Ed.2d 827 (1988), and then show that the practice excluded him or her, as a member of a protected group, from a job or promotion opportunity. See Watson, 487 U.S. at 994, 108 S.Ct. at 2788. Statistical evidence may be probative where it reveals a disparity so great that it cannot be accounted for by chance, see Bridgeport Guardians, 933 F.2d at 1146, or, to state it in other words, the “statistical disparities must be sufficiently substantial that they raise ... an inference of causation.” Watson, 487 U.S. at 995, 108 S.Ct. at 2789.

Here plaintiffs claim they were deprived of an employment opportunity to advance to the rank of sergeant. As will appear, they offer statistical evidence substantial enough, in certain aspects, to raise an inference of causation. Normally, a plaintiff making such a showing has stated a prima facie disparate impact claim. To avoid a finding of discriminatory impact, the employer must demonstrate the subject employment practice is used for non-discriminatory reasons, for example, the practice serves the employer’s legitimate employment goals. See Wards Cove Packing Co., 490 U.S. at 659, 109 S.Ct. at 2125-26. Plaintiff may counter proof of an employer’s lawful goal by demonstrating that there are alternative employment practices that will reduce the disparate impact, ones that are equally as effective as the challenged practices in achieving that goal, that is to say, the employer’s reason was a pretext for discrimination. Id. at 660-61, 109 S.Ct. at 2126-27. In the matter before us, the parties have stipulated these shifting burdens of proof out of the case, and our review is limited therefore to whether plaintiffs have shown that the promotion procedures used by the Port Authority had a disparate impact on black candidates. This means that if plaintiffs meet the burden of proving a prima facie case, they will prevail in this action. The rule that emerges from prior cases is that a prima facie case is made out by showing either a gross statistical disparity, or a statistically significant adverse impact coupled with other evidence of discrimination. See, e.g., International Bhd. of Teamsters, 431 U.S. at 338-40 & n. 20, 97 S.Ct. at 1856-57 n. 20; Bridgeport Guardians, 933 F.2d at 1146-48.

The district judge ruled that the written test did not have a disparate impact because the pass rate of blacks was 87.2 percent of that of whites. See EEOC Guidelines, 29 C.F.R. § 1607.4D (1990). He was of the view that though the disparity may have been statistically significant— since it yielded a standard deviation of 2.68 indicating that there was a one in a 100 chance the outcome was random — it was not meaningful as a practical matter. Judge Duffy reasoned that the results would not have been statistically important had two more black candidates passed the written test. It was on this basis he decided that the results of the written examination failed to show a disparity sufficiently substantial to state a Title VII violation. See 758 F.Supp. at 177.

Appellants argue that this finding of no disparate impact was error. Specifically, they insist the trial court incorrectly interpreted the EEOC Guidelines for determining practical significance by using a comparison of the pass rates of black and white candidates, and then employing a hypothetical that changed the statistical meaning of the disparity between those pass rates. Appellants conclude that because the disparity between the pass rates of black and white candidates amounts to a difference of 2.68 standard deviations, a disparate impact should have been found.

2. Statistical Proof of Disparate Impact

We have in the past looked to the EEOC Uniform Guidelines on Employment Selection Procedures, 29 C.F.R. § 1607.4D (adverse impact may be inferred where, assuming not too small a sample, the members of a protected group are selected at a rate that is less than four-fifths of the rate at which a more successful group is selected) (EEOC Guidelines), for guidance in determining whether a disparity is sufficiently substantial to violate Title VII, see, e.g., Bushey v. New York State Civil Serv. Comm’n, 733 F.2d 220, 225-26 (2d Cir.1984), and also have relied on standard deviation analysis for this purpose. See, e.g., Guardians Ass’n of New York City Police Dep’t, Inc. v. Civil Serv. Comm’n of the City of New York, 630 F.2d 79, 86 & n. 4 (2d Cir.1980).

The EEOC Guidelines state in relevant part:

A selection rate for any race, sex, or ethnic group which is less than four-fifths (%) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact. Smaller differences in selection rate may nevertheless constitute adverse impact, where they are significant in both statistical and practical terms....

29 C.F.R. § 1607.4D (1990). The Guidelines provide no more than “a rule of thumb” to aid in determining whether an employment practice has a disparate impact. See Watson, 487 U.S. at 995-96 n. 3, 108 S.Ct. at 2789 n. 3. Standard deviation analysis measures the probability that a result is a random deviation from the predicted result — the more standard deviations the lower the probability the result is a random one. See Ottaviani v. State Univ. of New York at New Paltz, 875 F.2d 365, 371 (2d Cir.1989); D. Baldus and J. Cole, Statistical Proof of Discrimination, § 9.03 (1980) (defining standard deviation, explaining and applying it). Social scientists consider a finding of two standard deviations significant, meaning there is about one chance in 20 that the explanation for a deviation could be random and the deviation must be accounted for by some factor other than chance. Ottaviani, 875 F.2d at 371. A finding of two or three standard deviations (one in 384 chance the result is random) is generally highly probative of discriminatory treatment. Id. at 372.

There is no minimum statistical threshold requiring a mandatory finding that a plaintiff has demonstrated a violation of Title VII. Courts should take a “case-by-case approach” in judging the significance or substantiality of disparities, one that considers not only statistics but also all the surrounding facts and circumstances. Id. 372-73; see also International Bhd. of Teamsters, 431 U.S. at 340, 97 S.Ct. at 1857 (statistics “come in infinite variety and ... their usefulness depends on all of the surrounding facts and circumstances”).

Applying these principles, we believe Judge Duffy correctly held there was not a sufficiently substantial disparity in the rates at which black and white candidates passed the written examination. Plainly, evidence that the pass rate of black candidates was more than four-fifths that of white candidates is highly persuasive proof that there was not a significant disparity. See EEOC Guidelines, 29 C.F.R. § 1607.4D (1990); cf. Bushey, 733 F.2d at 225-26 (applying 80 percent rule). Additionally, though the disparity was found to be statistically significant, it was of limited magnitude, see Bilingual Bicultural Coalition on Mass Media, Inc. v. Federal Communications Comm’n, 595 F.2d 621, 642 n. 57 (D.C.Cir.1978) (Robinson, J., dissenting in part) (statistical significance tells nothing of the importance, magnitude, or practical significance of a disparity) (citing H. Blal-ock, Social Statistics 163 (2d ed. 1972)), as the district court demonstrated by positing that if two additional black candidates passed the written examination the disparity would no longer be of statistical importance. See 44 Fed.Reg. 11996, 11999 (March 2, 1979) (approving of use of hypothetical alterations in results of challenged employment practice to determine whether disparity was too small to find an illegal disparate impact).

These factors, considered in light of the admonition that no minimum threshold of statistical significance mandates a finding of a Title YII violation, persuade us that the district court was justified in ruling there was an insufficient showing of a disparity between the rates at which black and white candidates passed the written examination.

B.

Appellants next assert even if there was not a substantial disparity between the pass rates, the written examination still had a disparate impact. They argue that the rate at which black candidates achieved the minimum score on the written test necessary for promotion was significantly less than the rate at which white candidates obtained the minimum score. This argument was dismissed by the district court, which did not consider therefore whether plaintiffs could show such substantial disparity in the rates at which the subject groups of candidates achieved the minimum score for promotion. 758 F.Supp. at 177. We think this was error.

The written examination served two purposes in the Port Authority’s promotion process. First, it served as a pass-fail mechanism that required each candidate to obtain a passing score before moving on to the next step of the examination. Second, candidates’ scores on the written test were factored into the composite scores that were then used to compute a candidate’s rank on the Eligibility List. The trial court correctly looked to see whether there was a sufficient disparity with respect to the passing scores, since a disparate impact from a pass-fail element of a multi-part employment procedure is sufficient to show a Title VII violation. See Teal, 457 U.S. at 452, 102 S.Ct. at 2533. But because the written examination scores were also factored into the candidate’s composite or total score, the trial judge should also have examined the evidence to see if there was a disparate impact in this aspect of the test’s use.

Plaintiffs’ theory is that a candidate needed to score at least 76 on the written examination to be within the top 85 people on the Eligibility List, and even with a 76 grade an applicant had to get 100 on the remaining parts of the test to be promoted. Because, plaintiffs continue, black candidates’ grades on the written examination were bunched at the lower end of the scale—even though these candidates may have passed the test in sufficient numbers—there was no real opportunity for promotion. This they assert, is adequate evidence on which to base a finding of disparate impact.

Where a test serves dual functions, as the written examination does in the present case, evidence that the scores of members of a protected group were clustered at the low end of the grading scale— though such group members may have passed the examination in sufficient numbers—provides support for a finding that the test had a disparate impact on that group, assuming the clustering could not have occurred by chance. We held as much recently in Bridgeport Guardians. In that case plaintiffs challenged the procedures by which defendant Bridgeport Civil Service Commission determined which Bridgeport police officers to promote to the rank of sergeant. We said there that “evidence that the bunching of White candidates’ test scores at the top of the scale and of minority candidates’ test scores at the bottom could not be expected to occur by chance,” and that such proof, coupled with evidence that defendants knew prior to administering the test that it would have an adverse impact on minorities, “provided more than adequate support for the district court’s ruling that plaintiffs had established a prima facie case.” Bridgeport Guardians, 933 F.2d at 1148.

As noted, the trial court did not consider this possibility. It disposed of plaintiffs’ argument for a focus on the 76 grade by observing “that when the number of Blacks who achieved passing scores is compared to the number of Whites who achieved passing scores, no statistical disparity in selection rates is revealed.” 758 F.Supp. at 177. It is unclear whether the trial court’s statement refers to the written test or the actual promotions. Clearly, it does not address the contention that at score 76, (1) there is a standard deviation of 4.77; and (2) the percentage of blacks who scored 76 or above, 42.2 percent, is substantially less than the 71.7 percent of whites who did so and is considerably less, at 58.9 percent, than four-fifths of the white rate for achieving the same score. See 29 C.F.R. § 1607.4D (1990).

In addition to the disproportionate bunching, there is other nonstatistical evidence that could indicate a discriminatory disparate impact may be present. In April 1988, the Port Authority Police Department’s Manager of Police Planning and Administration stated that “at the outside maximum [the Port Authority] will not reach any further than the 120th individual on the list before it expires.” This statement was too generous, since during the three-year life of the Eligibility List only the 85th candidate was reached. And further fact-finding, foregone up to this point, might reveal additional evidence bearing on the issue of disparate impact. In any event, all relevant evidence should be considered in addressing this issue.

Moreover, our prior case law lends support to the use of the 76 standard. Where a written test served, as here, both as a passing “gate” to further consideration for promotion, and as a major component of the ultimate score required for promotion, we indicated there was no disparate impact in the pass rate, but the disparity in actual promotions established that the written test had a prohibited disparate impact. See Kirkland v. New York State Dep’t of Correctional Servs., 711 F.2d 1117, 1122 & n. 3, 1131-32 & n. 17 (2d Cir.1983). In Kirkland, as in the present case, evidence demonstrated that, though there was no disparity in the rate at which minority candidates for promotion passed an examination, their representation on the eligibility list was disproportionately low at the top of the list and high at its bottom. Id. at 1122, 1131-32. Hence, remand is required for the district court to develop a full record against which to evaluate the evidence of bunching and to determine whether the written examination had a disparate impact when these statistics and all the surrounding facts and circumstances are considered. See Watson, 487 U.S. at 995-96 n. 3, 108 S.Ct. at 2789 n. 3.

The Port Authority’s argument that any disparity at this level was corrected by the success of black candidates on the other two components of the examination and does not show up therefore in the number of promotions actually made is unavailing. An employer may not rely on a bottom-line defense, by that we mean that though parts of the employer’s hiring or promotion procedures adversely affected members of a protected group, it may not successfully argue that a sufficient number of protected group members were nevertheless hired or promoted so as to refute proof of disparate impact derived from the way in which the examination process itself was designed. See Wards Cove Packing Co., 490 U.S. at 653 n. 8, 109 S.Ct. at 2129 n. 8; Teal, 457 U.S. at 453-56, 102 S.Ct. at 2534-35. Consequently, even were it true that there was not a sufficiently substantial disparity in the respective promotion rates of black and white candidates — a question decided in a moment — that fact would not serve to overcome a finding that an aspect of the examination’s procedures had a disparate impact.

C.

Appellants’ final argument is that the district court erred in finding no disparate impact in the promotion rates of black and white candidates. The promotion rate of black candidates participating in the entire promotion process (5 of 63, or 7.93 percent) was 55.52 percent of the comparable rate for white candidates (70 of 499, or 14.03 percent), considerably lower than the 80 percent figure recommended as a rule of thumb by the EEOC guidelines. Because only a small number of promotions were made, the district court believed it necessary to apply standard deviation analysis to determine whether there was a disparate impact and, since the difference in promotion rates was 1.34 standard deviations, concluded there was not a sufficiently substantial disparity in the rate at which black and white candidates were promoted. 758 F.Supp. at 178.

Its conclusion is based entirely on its determination that, due to the small number of promotions actually made from the Eligibility List, the disparity between the black and white promotion rates was not statistically significant. To the contrary, as discussed earlier, disparate impact was shown because the disparity in the scores of white and black applicants on the written examination caused disparate promotion rates and because the black promotion rate that was less than three-fifths of the white promotion rate indicates disparity significant in a practical sense.

The district court decided that the difference in promotion rates of less than two standard deviations was conclusive of a failure to reliably demonstrate a disparate impact. In fact, in this instance the statistical significance of the disparity in promotion rates was largely irrelevant. The purpose of analyzing data for statistical significance is to show the likelihood that an observed disparity is due simply to chance, rather than some other factor, such as race. See Ottaviani, 875 F.2d at 371; Sobel v. Yeshiva Univ., 839 F.2d 18, 36 (2d Cir.1988), cert. denied, 490 U.S. 1105, 109 S.Ct. 3154, 104 L.Ed.2d 1018 (1989); cf. Castaneda v. Partida, 430 U.S. 482, 496 n. 17, 97 S.Ct. 1272, 1281 n. 17, 51 L.Ed.2d 498 (1977). If the probability that a particular distribution could have occurred randomly is low enough, then it constitutes evidence that the distribution, and hence the disparity, was not a chance occurrence.

But the district court in effect ruled that the statistically insignificant disparity in promotion rates was probative evidence of the absence of a correlation between race and promotion. But where statistics are based on a relatively small number of occurrences, the presence or absence of statistical significance is not a reliable indicator of disparate impact. See International Bhd. of Teamsters, 431 U.S. at 339-40 n. 20, 97 S.Ct. at 1856-57 n. 20; Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605, 620-21, 94 S.Ct. 1323, 1333, 39 L.Ed.2d 630 (1974); 44 Fed.Reg. 11996, 11999 (March 2, 1979). For smaller samples the value of the standard deviation rule drops off, see Baldus & Cole, supra, § 9.1, and gives emphasis to Mark Twain’s comment that there are “lies, damned lies and statistics.”

It is for this reason that, in cases involving small or marginal samples, other indicia raising an inference of discrimination must be examined. See, e.g., Segar v. Smith, 738 F.2d 1249, 1283-84 (D.C.Cir.1984) (where failure to show significance of disparity in one set of promotions is due to small sample size, evidence of other discrimination, including other promotion decisions, may suffice to show disparate impact), cert. denied, 471 U.S. 1115, 105 S.Ct. 2357, 86 L.Ed.2d 258 (1985); Boston Chapter, N.A.A.C.P., Inc. v. Beecher, 504 F.2d 1017, 1019-21 (1st Cir.1974) (other evidence of discrimination may supplement statistical evidence where small sample size precludes showing of significant disparity), cert. denied, 421 U.S. 910, 95 S.Ct. 1561, 43 L.Ed.2d 775 (1975).

Here, other evidence points unmistakably toward the conclusion that the observed disparities in promotion rates were related to the candidates’ race. The plaintiffs were able to point to a specific element of the promotion process, the written test, and show—using in that aspect of the case a sufficiently large sample size—that it resulted in a statistically significant disparity. There is no doubt that the result of that test, apart from its screening out some candidates, was to cause blacks to rank lower on the eligibility list than whites. This supports the inference that the lack of statistical significance in the ultimate promotion figures reflects only the small sample size. Further, the black promotion rate of 7.9 percent was such a small percentage of the white promotion rate of 14 percent that it falls far short of the four-fifths rate, which the Commission’s Guidelines state will generally be regarded as evidence of disparate impact. 29 C.F.R. § 1607.4D (1990).

Thus, we think it clear that the plaintiffs sufficiently demonstrated disparate impact to escape summary judgment dismissing their complaint. Measured at the proper point, the written examination showed a disparity well above the “two to three” standard deviation threshold and the “four-fifths” rule set forth in the Guidelines and sufficient to support an inference of a disparity due to race. Consequently, plaintiffs properly alleged disparate impact in the ultimate promotion decisions made by the Port Authority, and further that this difference was attributable to the written exam, not to random chance.

CONCLUSION

Accordingly, the judgment is reversed and remanded for further proceedings consistent with this opinion.  