
    Roger Anciel FUDGE, Plaintiff, Appellee, v. CITY OF PROVIDENCE FIRE DEPARTMENT, et al., Defendants, Appellants. Roger Anciel FUDGE, Plaintiff, Appellant, v. CITY OF PROVIDENCE FIRE DEPARTMENT, et al., Defendants, Appellees.
    Nos. 83-1624, 83-1650.
    United States Court of Appeals, First Circuit.
    Argued June 7, 1984.
    June 28, 1985.
    Breyer, J., concurred and filed opinion.
    
      Gerard McG. DeCelles, Providence, R.I., with whom Elizabeth M. Emma, Providence, R.I., was on brief, for City of Providence Fire Department, et al.
    Walter R. Stone, Providence, R.I., with whom Stone, Clifton & Clifton, Providence, R.I., was on brief, for Roger Anciel Fudge.
    Before BOWNES and BREYER, Circuit Judges, and DOYLE, Senior District Judge.
    
      
       Of the Western District of Wisconsin, sitting by designation.
    
   JAMES E. DOYLE, Senior District Judge.

Fudge, who is black, brought this action against the City of Providence Fire Department (Department), the Division of Training (Academy), and Chief Michael Moise (Chief), alleging the defendants had engaged in discriminatory testing in their hiring procedures in 1974, in violation of 42 U.S.C. §§ 1981, 1983, 1985, and 1988, and of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. The City of Providence was added as a defendant.

The ease was tried simultaneously to a jury on all but the Title VII claim and to the court on the Title VII claim. Although the jury found the written examination administered by the Department had had a disparate and adverse impact on blacks, it also found none of the defendants had harbored a racially discriminatory purpose. The non-Title VII claims were dismissed on their merits. Neither party appeals from this disposition.

The district court held the defendant City had violated Title VII and it awarded Fudge back-pay in the amount of $8,666, and attorneys fees and costs totalling $12,-274.50. Defendant City appeals the judgment against it for back-pay under Title VII and also the award of attorneys fees. Fudge appeals that portion of the judgment limiting back-pay to a period ending in 1978.

The district court found explicitly that a certain written examination used in the 1974 hiring procedures had a disparate and adverse impact on black applicants. It held plaintiff had thus established a prima facie case of employment discrimination. It held defendant had not met the burden, then falling to it, to show that the written examination had “a manifest relationship to the employment in question.” Griggs v. Duke Power Co., 401 U.S. 424, 434, 91 S.Ct. 849, 855, 28 L.Ed.2d 158 (1971). It concluded that defendant had violated Title VII.

FACTS AS TO DISPARATE AND ADVERSE IMPACT

(a) Facts as found by district court, pursuant to Fed.R.Civ.P. 52(a).

Plaintiff is a black resident of the city of Providence, Rhode Island. As of 1974, when plaintiff applied to be admitted to the city fire department’s fire fighter training academy, defendant was imposing a minimum requirement of a tenth grade education, and the selection procedure was based upon a composite score of 60 in three categories: scholastic attainment, military service, and a written entrance examination.

In the scholastic attainment category, one point was awarded, up to a total of ten, for each grade completed beyond tenth. In the military service category, a maximum of ten points was possible, depending upon criteria such as total amount of time served in the military, time served in combat areas, advancement in rank, and decorations. The aggregate of the points scored by an applicant in the two categories (a maximum of 20) was divided by 2 (with a resulting maximum of 10). A maximum of 50 points could be obtained from one’s score on a written examination. Thus, of a perfect composite score of 60 drawn from all three categories, 50 points (83%) were accounted for by the written examination.

Plaintiff was one of 248 applicants in 1974 for admission to the academy. He received 2 points in the scholastic attainment category and the maximum of 10 in the military service category, for an aggregate of 12 which, when divided by 2, resulted in a total of 6. At that point, he ranked 6th among the 248. On November 9, 1974, the written examination was given. Plaintiff scored 16 on the scale of a possible 50. His ranking dropped to 195th and he was not admitted to the academy.

Of the 248 applicants in 1974, 24 were black and 224 white. Thirty were admitted to the academy, of whom one was black (4 percent of 24) and 29 were white (13 percent of 224).

In 1973, 199 applicants took a written examination for admission to the academy, of whom about 20 were black and 179 white. Forty-one were admitted to the academy, of whom one was black (5% of 20) and 40 were white (22% of 179).

In 1972, 86 applicants took a written examination, of whom about 9 were black and 75 white (apparently two may have been members of non-black minorities). Twenty were admitted to the academy, of whom none was black (0% of 9) and 20 were white (27% of 75).

The black applicants in 1974 had a “higher test failure rate” than white applicants on the written examination. The examination posed more of a hurdle for black applicants than for white. It had a disparate and adverse impact on black applicants.

(b) Facts not found by district court, but undisputed in the record

The written examination administered in 1972 contained 53 questions for an aggregate possible total of 50 points. Seventeen questions with an aggregate of 25V4 possible points tested knowledge of addition, subtraction, division, multiplication, percentages, fractions, decimals, square roots, and computations of areas and volumes. Eight questions with an aggregate of 6 possible points tested knowledge of definitions of words (multiple choice questions on the meaning of “posterior,” “accentuate,” “impertinence,” “cumbersome,” and “atlas”; other forms of questions on the meaning of “statute,” “decade,” and “autobiography”). Seven questions with an aggregate of 4V4 possible points tested knowledge of simple and practical physics or chemistry. One question with a weight of ¥4 point tested basic knowledge of geometrical figures. Twenty questions with an aggregate of 13¥i possible points tested general knowledge (e.g., in 1972 was Communist China a member of the UN; did Eisenhower succeed Franklin Roosevelt as president; in 1972 who was vice-president of the United States, who was governor of Rhode Island, who was commander-in-chief of United States Armed Forces; what cities and towns border on the city of Providence; how many seats in the United States Senate does Rhode Island have; name of the navigable waterway connecting Atlantic and Pacific oceans; how many stars in the flag of the United States; names of five states bordering the Gulf of Mexico).

The written examination administered in 1973 consisted of 63 questions, for an aggregate possible total of 50 points. Twenty-one questions with an aggregate of 20¥t possible points tested knowledge of addition, subtraction, division, multiplication, percentages, fractions and decimals. Fourteen questions with an aggregate of 7V4 points tested knowledge of the definitions of words (multiple choice questions on the meaning of “posterior,” “incandescence,” “clientele,” “becloud,” “clique,” “equine,” and “cursory”; other forms of questions on the meaning of “alien,” “ancestor,” “abdicate,” “biography,” “inertia,” “fiction,” and “bibliography”). Thirteen questions with an aggregate of 9 possible points tested knowledge of simple and practical physics and chemistry. One question with a weight of 3A point tested basic knowledge of geometrical figures. Fourteen questions with an aggregate of 12V4 possible points tested general knowledge (e.g., in what city the 1972 Democratic National Convention was held; in 1973 who was Adjutant General of Rhode Island; who was the last vice-president to assume the presidency while in office; by whom was the city of Providence founded; names of the five counties of the state of Rhode Island; what is the usual shape of a yield sign at an intersection; and what four countries comprised what was once known as Indo-China).

The written examination administered in 1974 to plaintiff Fudge and the other 247 applicants contained 43 questions for an aggregate possible total of 50 points. Twenty-one questions with an aggregate of 25V2 possible points tested knowledge of addition, subtraction, division, multiplication, fractions, decimals, and computations of areas and volumes (described on the examination as “arithmetic skills”). Two questions with an aggregate of 2 possible points tested, by multiple choice questions, knowledge of definitions of two words: “extricated” and “radiant” (described on the examination as “word comprehension”). Eighteen questions with an aggregate of 21 possible points tested knowledge of simple and practical physics and chemistry (described on the examination as “general science” and “mechanical knowledge”). Two questions with an aggregate of IV2 possible points asked whether most exit doors in public buildings swing out or in and what is the shape of traffic stop signs (described on the examination as “observational aptitude”). No questions tested general knowledge of the sort tested by 20 questions on the 1972 examination (with an aggregate of 133A possible points) or by 14 questiohs on the 1973 examination (with an aggregate of 12V4 points).

Between 1973 and 1974, the number of points awarded for the non-written-examination component of the composite score was reduced from 20 to 10. In 1972 and 1973 this 20-point component was not confined to scholastic attainment and military service, as it was in 1974, but reached the additional factors of age, height, weight, visual acuity, “license category,” and employment experience.

The selection procedure used by defendant in 1972, 1973, and 1974, in which the written examinations were the dominant factor, had never been subjected to the validation tests embodied in the Uniform Guidelines on Employee Selection Procedures. 29 C.F.R. §§ 1607.1 et seq.

After 1974, the next occasion on which a test was administered was 1978. Consultants were retained to prepare it. The written examination, on the one hand, and, on the other, “physical agility and other areas” were weighted 50-50. Of a total of 505 applicants who took the examination, 479 were white, 18 were black, and 8 were members of other minorities. Of the 107 admitted to the training academy, 105 were white and 2 were black. Thus, of the whites taking the examination, 22% were accepted, and of the blacks taking the examination, 11% were accepted. (The record contains no further information about the content of the written examination in 1978 or about the “physical agility and other areas.”)

As of 1974-1975, of 479 employees of the City fire department, 10 or 11 were members of minorities. As of 1970 and as of 1973, blacks represented 8.9% of the total population of the City of Providence.

OPINION

I.

Plaintiff clearly made a prima facie showing that he is a member of a black minority, that he applied for admission to the training academy and thus for employment as a fire fighter, and that he was denied such admission and employment. Defendant suggests plaintiff failed to show he was qualified for admission and employment. Plaintiff showed he met the age requirement and he was under no significant physical or mental disability. It is unnecessary to linger over whether he had some obligation initially to show he would have met some reasonable set of requirements other than the set actually imposed on him by defendant, or whether, in articulating its assertedly nondiscriminatory reason for denying him admission and employment, it was defendant’s burden to present the set of requirements and then to contend that plaintiff failed to meet them. In this case defendant did articulate as its reason for rejection that in 1974 it required applicants to rank among the top 30 on the 60-point composite scoring and that plaintiff did not. However the matter is viewed, the litigative stage was clearly reached at which the burden was on the plaintiff to prove by a preponderance of the evidence that in 1974 a disparate and adverse impact was visited on blacks by defendant’s use of the 60-point composite scoring system. Albermarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975). We hold that plaintiff failed to meet this burden.

II.

We have summarized above the facts found by the court, and the other facts of record, which bear on the presence or absence of disparate and adverse impact. We note now the absence of certain kinds of findings and the absence from the record of certain kinds of facts which might have supported such findings.

Plaintiff’s expert witness, Felix Lopez, a psychologist specializing in constructing employment, selection, and promotion tests, testified at some length to the following effect: Tests used in screening applicants for employment should be constructed by making initial inquiry into the functions performed in the course of the particular employment in question: in this case, by examining the functions actually performed by fire fighters employed by defendant City. Inquiry should be directed to defining a body of knowledge and skills well-related to the job functions to be performed. Some determination should be made as to the portion of that well-related body of knowledge and skills which trainees should possess as of the time they complete their training. Testing to screen applicants for admission to the training program should be directed, not to the body of knowledge and skills they possess when they apply, but to their aptitude to acquire that body of knowledge and skills in the course of training and subsequently on the job. Aptitude, not achievement, should be the focus of testing for admission to the training program.

Lopez testified that he had examined the 1974 examination but that without certain additional information which he had requested but had been told did not exist, he was unable to form an opinion as to its “validity,” defined in terms of the directness of the test’s relationship to some aspect of job performance and to the teacha-bility of various aspects of the knowledge or skills required in that job performance. However, he was willing to express the opinions that the 1974 test was an achievement test measuring already acquired knowledge, that it was not reasonably related to the work of a fire fighter, that it overemphasized arithmetic, and that the weighting of points did not reflect the relative difficulty of the questions.

Lopez testified also that: while he “could not say there was any bias in the test per se, the outcome of the test [I] do say it had an adverse impact on the blacks”; and blacks “had less opportunity to pass the test than the whites.” He based this opinion wholly on the number and racial composition of the group of applicants who took the 1974 examination and who passed it, namely: a total of 248 applicants; 224 whites taking the exam and 30 of them being accepted; and 18 blacks taking the exam and 1 of them being' accepted. Considering Lopez’ educational background, work experience, and specialty, he may have been qualified to explain the basis of his opinion about the statistical significance of these figures, but he was not requested to do so and did not.

There is no evidence from the witness Lopez or from any source bearing on whether either the form or the content of particular questions or sets of questions in the 1974 examination, or in the 1972, 1973, or 1978 examinations, might be more or less difficult for blacks generally as compared with whites generally. There is no evidence that pen and pencil examinations of this sort are more or less difficult for blacks than for whites. There is no evidence that the heavy emphasis (50 points in 1974) on the written examination as contrasted with the scholastic attainment and the military service (10 points) was more or less favorable to blacks as compared with whites. There is no evidence that the shift between 1973 and 1974 away from awarding points for age, height, weight, visual acuity, license category, and employment experience favored or disfavored blacks as compared with whites. Except for plaintiff Fudge’s 1974 examination paper, showing his answers, there is no evidence of the answers given by any of the applicants, white or black, on any of the 1972, 1973, 1974, or 1978 examinations.

The great bulk of the trial was devoted to whether the 1974 written examination was sufficiently job-related and whether its emphasis upon achievement versus aptitude was fair. There was heavy insinuation that an examination is racially discriminatory if it is not job related or if it measures achievement rather than aptitude. This proposition may be valid in some circumstances. But logic alone affords it no support. Only evidence, none of which was presented to the district court, can sustain it.

III.

Plaintiff’s proof of a disparate and adverse impact on blacks flowing from the 1974 scoring procedure consists, therefore, solely of evidence of a lower acceptance rate among black applicants than among whites. Plaintiff contends that his proof of this kind includes the 1972 and 1973 results and, for a different reason, the 1978 results, as well as the 1974 results.

The district court refrained from basing its finding upon the 1978 results. We agree. The record affords minimal information about the changes in 1978 in the scoring system and none about the content of the 1978 written examination. It is true that the acceptance rate for black applicants in 1978 (11%) was higher than in 1974, 1973, or 1972, but there is no basis for attributing this improvement to scoring changes, the nature of which is undisclosed in this record.

The district court found that the results in 1972 and 1973 lent strong support to its finding that a disparate impact, adverse to blacks, flowed from the 1974 examination. This is a finding of critical importance. If it is proper to form an amalgam of the numbers of white and black applicants in 1972,1973 and 1974 and an amalgam of the acceptance rates in those three years, a finding for plaintiff as to his 1974 experience is facilitated. The district court made no explanation of its implicit determination that the 1972, 1973 and 1974 testing episodes were sufficiently similar to permit such lumping. Our review of the evidence on this point, summarized above, persuades us that the dissimilarities prevent lumping: (1) In 1972 and 1973, a total of 20 points was allocated to eight non-written-examination factors; in 1974, only two of these eight factors were operative and an aggregate of only 10 points was allocated to them. (2) Of the possible 50 points allocated to the written examinations in each of three years, 13% were allocated to the so-called general knowledge questions in 1972, 12% in 1973, and 0 in 1974; and 6 points were allocated to definitions of words in 1972, 7% in 1973, and 2 in 1974.

It is not whether the 1974 examination was better or worse, by some standard, than the 1972 and 1973 examinations; it is whether the 1974 test was sufficiently different to require that its impact be assessed independently. In our view, it was so different as to require such independent assessment. We hold that it was clearly erroneous for the district court to find that the 1972 and 1973 results lent support to a finding that a disparate impact was shown by the 1974 results.

IV.

Plaintiffs proof consists solely of the different rates of acceptance for the 24 black applicants in 1974 (4%) and for the 224 white applicants that year (13%). The question is whether this evidence is sufficient to support a finding that disparate and adverse impact upon blacks flowed from the 1974 test.

The difficulty in determining the legal significance of sets of figures in the context of employment discrimination litigation is no stranger to this court. See Boston Chapter N.A.A.C.P., Inc. v. Beecher, 504 F.2d 1017 (1st Cir.1974); Castro v. Beecher, 459 F.2d 725 (1st Cir.1972). In our view, in a case involving a claim that a screening test for admission to employment imposes a disparate and adverse impact on blacks, the initial inquiry must be whether there is a discrepancy in the rate of passage as between whites and blacks. If so, an intuitive judicial judgment must next be made whether the discrepancy is substantial. In the present case, in 1974 there was a discrepancy in the rate of passage for blacks (4%, as compared to 13%), and the district court was justified in responding intuitively that the discrepancy was substantial (the 4% rate is but 31% of the 13% rate).

However, at least in the context of a sample as small as the 1974 sample in the present case, such an intuitive response to substantiality is an insufficient basis for a finding of disparate impact. Where the use of employment tests results in differential pass rates for blacks and whites, even an apparently substantial differential, the discrepancy may be due to chance. Hameed v. Intern. Assn’n of Bridge, Etc., 637 F.2d 506 (8th Cir.1980), citing D. Baldus & J. Cole, Statistical Proof of Discrimination 288-92 (1980). Statistical significance and, in the case of so small a sample as the 1974 sample, we believe judicial significance, can be attributed to an observed discrepancy only where there is a low probability that the differential in pass rates would be expected to occur simply by chance.

The focus in Title VII cases is upon the discriminatory impact a test would have on all blacks and all whites in the relevant population. Where only sample data is available, the disparate impact observed in a single sample of individuals drawn from the relevant population and administered the exam may not justify the conclusion that the test has a discriminatory impact upon the population as a whole. For one sample given the test, the passage rate for blacks may be much lower than that of whites, while for a second sample, drawn from the same population and given the same test, the opposite result may occur. Thus, the issue is: what is the probability that the disparity in passage rates that appeared in the sample would occur by chance if in fact there would be no difference in the passage rates of blacks and whites in the relevant population?

Widely accepted statistical techniques have been developed to determine the likelihood an observed disparity resulted from mere chance. Where a plaintiff relies exclusively on a narrow base of data, as here, it is crucial for the court to consider the possibility that chance could account for the observed disparity.

We think that in cases involving a narrow data base, the better approach is for the courts to require a showing that the disparity is statistically significant, or unlikely to have occurred by chance, applying basic statistical tests as the method of proof. Peques v. Mississippi State Employment Service, 699 F.2d 760, 768 n. 9 (5th Cir.1983); E.E.O.C. v. Am. Nat. Bank, 652 F.2d 1176 (4th Cir.1981 reh’g denied 680 F.2d 965 (4th Cir.) cert. denied, 459 U.S. 923, 103 S.Ct. 235, 74 L.Ed.2d 186 (1982); Hameed v. Intern. Ass’n of Bridge,

Etc., 637 F.2d 506 (8th Cir.1980). When statistical tests sufficiently diminish chance as a likely explanation, it can then be presumed that an apparently substantial difference in pass rates is attributable to discriminatory bias, thus shifting the burden to defendants to show job relatedness. If the probability is sufficiently high that the disparity resulted from chance, the plaintiff must present additional evidence of disproportionate impact in order to establish a prima facie ease. This test for chance would determine the competency and validity of exclusively statistical proof in a far more reliable manner than wholly intuitive response.

In the present ease, plaintiff, who bore the burden of persuasion on the point, presented no expert opinion, nor did he request the taking of judicial notice of written sources from which accurate and ready answers could be obtained, Fed.R.Evid. 201, on the role of chance in the results from the 1974 written examination. Defendant provided none. The district court appears to have sought out none.

Because the district court relied upon data from 1972 and 1973, as well as from 1974, in making its finding of disparate and adverse impact in 1974, we do not know what finding it would have made on the basis of 1974 data alone. However, considerations of economy in judicial time and effort persuade us we should refrain from remand to permit such a determination by the district court. We are persuaded that the application of any of the more simple statistical techniques would reveal clearly that the role of chance as an explanation is far too high to permit a judicial finding that it was the content of the 1974 examination which caused the disparate rates of passage as between black and white applicants. A new finding by the district court, based solely on the 1974 data, that the content of the examination caused the disparate rates of passage would be clearly erroneous.

Our disposition of this appeal makes it unnecessary to decide defendant’s appeal from the amount of the district court’s award of attorneys fees to plaintiff or to decide plaintiff’s appeal from the district court’s refusal to award back-pay after 1978.

For the reasons stated, we reverse the judgment of the district court and direct entry of judgment dismissing this action on its merits.

BREYER, Circuit Judge

(concurring).

While I join the court’s opinion, I add the following comments. First, this case strikes me as highly unusual in that, as I read the briefs and the record below, the plaintiff 1) has challenged the 1974 test as having a “disparate impact” see Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), but 2) must rely entirely upon the 1974 black/white accept/reject ratios in order to show that impact. This court has held that, where samples are small, other evidence of “disparate impact” may be highly relevant. Boston Chapter, N.A.A.C.P., Inc. v. Beecher, 504 F.2d 1017, 1021 (1st Cir.1974) (“what conclusively tips the scale in plaintiffs’ favor is the uncontroverted testimony, from experts ... that black and Spanish surnamed candidates typically perform more poorly on paper-and-pencil tests of this type”), cert. denied sub nom. Director of Civil Service of Massachusetts v. Boston Chapter, N.A.A.C.P., Inc., 421 U.S. 910, 95 S.Ct. 1561, 43 L.Ed.2d 775 (1975).. Here, however, no such evidence was presented. Plaintiff’s expert, for example, might have testified that one could expect differences in educational opportunity to make the sorts of questions found on the 1974-type test more difficult for relevant minority job applicants; but he did not so testify. Cf. Beecher, 504 F.2d at 1021. Nor is there any evidence about whether or not this type of test or the earlier 1972 and 1973 tests may have disproportionately discouraged minority applicants from applying. Cf. Beecher, 504 F.2d at 1021 n. 6. Nor is there any evidence that blacks taking the 1974 test had a lower mean score (as opposed to a lower pass-rate) than whites. Evidence about the earlier 1972 and 1973 tests is sparse, to say the least; and there is no testimony or even argument about whether the earlier tests were similar in any relevant respect to the 1974 test. Since a reading of the tests themselves shows important differences, and since 1974 seems to have represented a transitional year between the use of tests that apparently did discriminate and the use of tests that apparently did not, one needs at least some evidence, not total silence, to infer that there are relevant (discriminatory) similarities between the two sets of tests.

Second, the 1974 test results simply will not bear the near total weight that plaintiff, plaintiff’s expert, and the district court put upon them. As Judge Doyle’s opinion points out, a perfectly fair test given to a pool of blacks and whites will not always produce results that precisely mirror the racial percentages in the pool. Indeed, virtually all the time the perfectly fair test would lead to some deviations favoring either whites or blacks. A perfectly fair 1974 test (that is, a test having no systematic disparate impact on blacks) would, if repeatedly applied to racially similar pools, yield one (or no) successful black applicant(s) out of thirty 18 percent of the time (82 percent of the time it will yield two or more). See Shoben, “Differential Pass-Fail Rates In Employment Testing: Statistical Proof Under Title VII,” 91 Harv.L.Rev. 793, 812 (1978) (setting out probability formula). A perfectly fair coin, if flipped twice, will come up two heads 25 percent of the time. Flipped three times, the fair coin will yield three heads 12.5 percent of the time.

Under these circumstances, I agree that the numbers alone in so small a sample do not show bias. Had the 1974 test been repeated, of course, one would have better evidence. A perfectly fair test would produce similar results in two instances only about 3.2 percent (18 percent X 18 percent) of the time — well below the number that statisticians, for scientific purposes, consider “significant.” But where the likelihood of pure “chance” bulks as large, as here, I agree with the court that the plaintiff must present some reason to believe that the explanation is not “fairness plus pure chance.” In Beecher, we spelled out a few, apparently easy, ways in which this might be done. But, I also agree that virtually no such “other” evidence or reason was presented here. 
      
      . The district court judgment is imprecise as to the defendant or defendants against whom it is entered. However, with only the Title VII claim at issue, the employer City and possibly the chief of the fire department, as its agent, appear to be the only defendants against whom judgment lies. We will refer to "defendant” in the singular, meaning the City as the employer.
     
      
      . It made no explicit finding that the defendant City had a discriminatory purpose to deny plaintiff employment because he is black or that it had no such racially discriminatory purpose. Despite the absence of an explicit finding on discriminatory purpose, the court’s implication is clear that plaintiff had failed to prove the presence of discriminatory purpose. Perhaps the district court intended consciously to embrace, as advisory in the Title VII case, the finding of absence of discriminatory purpose embodied in the jury's verdict on the constitutional claims. We proceed on the basis that the district court intended to reject a discriminatory treatment claim, but to accept as valid a discriminatory impact claim. Because of the slight uncertainty whether the district court consciously rejected a discriminatory treatment claim, we note that our own examination of the record persuades us that had the district court found the existence of racially discriminatory purpose, the finding would have been clearly erroneous. Fed.R.Civ.P. 52(a).
     
      
      . Although not explicit in the district court's findings, it is clear from the record and implicit in those findings that the department decided upon the absolute number of new fire fighters (30, for example) it would employ at a given time (about once a year); applicants for employment as fire fighters were initially screened for admission to a departmental training academy; the desired number (30, for example) was then selected in order of composite score in the three categories; not everyone admitted to the academy was ultimately hired as a fire fighter; those denied admission to the academy were denied employment, although it was a possibility that if one of those initially admitted to the academy dropped out or was dropped, the next person in scoring rank would be admitted. In this latter sense, neither the composite of the three categories nor the written examination itself was pass-fail, but for virtually all applicants the reality was that the composite score in the three categories was pass-fail.
     
      
      . Several of these 18 questions correspond to some questions on the 1972 and 1973 tests which we have characterized as arithmetical and one of the 18 is virtually the same as a question appearing on both the 1972 and 1973 tests which we have described as testing basic knowledge of geometrical figures.
     
      
      . Because our characterization of the categories of questions on the tests involves some subjectivity, we make explicit the following: On the 1972 test, we have characterized questions numbered 1 through 17 as arithmetical; 19, 20, 21, 22, 24, 32, 35 and 37 as definitions of words; 28, 40, 42, 45, 50, 52 and 53 as physics and chemistry; 36 as geometrical; and 18, 23, 25, 26, 27, 29, 30, 31, 33, 34, 38, 39, 41, 43, 44, 46, 47, 48, 49 and 51 as general. On the 1973 test we have characterized questions numbered 1 through 21 as arithmetical; 30, 31, 32, 33, 34, 35, 39, 40, 41, 42, 43, 44, 45 and 49 as definitions; 22, 23, 24, 47, 48, 50, 53, 54, 55, 56, 57, 58 and 59 as physics and chemistry; 52 as geometrical; and 25, 26, 27, 28, 29, 36, 37, 38, 46, 51, 60, 61, 62, and 63 as general. Those who prepared the 1974 test employed their own characterization of the categories; their characterization is apt.
     
      
      
        . The evidence is, and the district court found, that 24 blacks (not 18) took the test in 1974, and that 29 whites (not 30) were accepted. There was some confusion between the number 24 and the number 18 in the course of counsel’s questions and comments at trial. The number 18 for blacks taking the test and the number 30 for whites being accepted were used in the course of eliciting the opinion of the witness Lopez.
     
      
      . We appreciate that an employer who hires only comparatively few employees from time to time and who uses a test in the screening may construct a test with the conscious purpose to discriminate racially, and make frequent changes in the questions. By this device, such an employer might consciously prevent an accumulation of statistical observations of identical or highly similar incidents sufficient to permit meaningful statistical analysis of disparity in impact. We appreciate too that the same difficulty may arise in cases of small employers who construct tests innocently and make frequent changes innocently. But such difficulties cannot justify imposing liability on any particular employer for a disparity in impact when there is a failure of proof of the disparity.
     
      
      . For many non-judicial purposes, results are considered significant if the probability that the observed effect occurred by chance is lower than a certain percentage level, usually 5%. D. Baldus & J. Cole, Statistical Proof of Discrimination 291 (1980).
     
      
      . The Supreme Court has relied upon tests of statistical significance in discrimination cases. Castaneda v. Partida, 430 U.S. 482, 97 S.Ct. 1272, 51 L.Ed.2d 498 (1977); Hazelwood School District v. United States, 433 U.S. 299, 97 S.Ct. 2736, 53 L.Ed.2d 768 (1977).
     
      
      . We are mindful that a black acceptance rate of 4% as compared to 13% for whites is well below the “four-fifths” rate established by the EEOC rule of thumb for determining whether an employer’s practices have an adverse impact on the employment opportunities of any race. See Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607.4(D), which provides: "A selection rate for any race which is less than four fifths (Vs) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal Enforcement agencies as evidence of adverse impact.” Where the size of the sample is small, however, the “four-fifths rule" is not an accurate test of discriminatory impact. See Shoben, Differential Pass-Fail Rates in Employment Testing: Statistical Proof Under Title VII, 91 Harv.L.Rev. 793 (1978)
     