
    Agee ALLEN and Ronald Battle, individually and on behalf of all similarly situated individuals, Plaintiffs-Appellees, v. L. William SEIDMAN, Chairman, Federal Deposit Insurance Corporation, Defendant-Appellant.
    Nos. 88-1811, 88-2893.
    United States Court of Appeals, Seventh Circuit.
    Argued June 16, 1989.
    Decided July 27, 1989.
    
      Stephen G. Seliger, Paddy Harris McNamara, Thomas P. Sullivan, Jenner & Block, Chicago, Ill., for plaintiffs-appellees.
    Ann L. Wallace, Daniel C. Murray and Nancy K. Needles, Asst. U.S. Attys., for defendant-appellant.
    Before POSNER, RIPPLE, and MANION, Circuit Judges.
   POSNER, Circuit Judge.

This appeal, together with the appeal in Evans v. City of Evanston, 881 F.2d 382 (7th Cir.1989), argued the same day and also decided today, are the first disparate-impact appeals heard and decided by this court in the wake of the Supreme Court’s decision in Wards Cove Packing Co. v. Atonio, — U.S. -, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989), which modified the ground rules that most lower courts had followed in disparate-impact cases. Before Wards Cove it was generally believed that if the plaintiff in a Title VII case showed by a reasonable statistical test that a criterion or practice used by an employer to screen candidates for hiring or promotion was disproportionately excluding members of a group protected by the statute, such as blacks or women, the burden shifted to the employer to persuade the judge (the trier of fact in Title VII cases) that the criterion or practice was necessary to the effective operation of the employer’s business. See, e.g., Washington v. Electrical Joint Apprenticeship & Training Committee, 845 F.2d 710, 712 (7th Cir.1988); Regner v. City of Chicago, 789 F.2d 534, 537 (7th Cir.1986). Wards Cove returns the burden of persuasion to the plaintiff, while leaving the burden of production on the employer, and also dilutes the “necessity” in the “business necessity” defense in a manner anticipated by the plurality opinion in Watson v. Fort Worth Bank & Trust Co., — U.S. -, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988), and by our decision in Aguilera v. Cook County Police & Corrections Merit Board, 760 F.2d 844, 846-48 (7th Cir.1985). The Court explains in Wards Cove that “there is no requirement that the challenged practice be ‘essential’ or ‘indispensable’ to the employer’s business for it to pass muster,” the question being merely “whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer.” 109 S.Ct. at 2125-26. If the plaintiff can show that a less exclusionary practice would serve the employer’s legitimate interests just as well, the employer’s refusal to adopt it when urged to do so by the plaintiff will defeat the “business necessity” defense (now a misnomer, since the “defense” does not require a showing of necessity and is no longer an affirmative defense). However, the alternative proposed by the plaintiff must be “equally effective as [the defendant’s] chosen hiring procedures in achieving [the defendant’s] legitimate employment goals,” and “the judiciary should proceed with care before mandating that an employer must adopt a plaintiff’s alternate selection or hiring practice in response to a Title VII suit.” 109 S.Ct. at 2127. The Court also repeated an old point about disparate-impact evidence: a shoddy showing of disparate impact will not require the defendant even to produce evidence in justification of the challenged practice. See 109 S.Ct. at 2121-24.

The plaintiffs in this case are the representatives of a class of black bank examiners employed by the Federal Deposit Insurance Corporation, who failed the “Program Evaluation” test that the Corporation formerly used as an aid in determining whether to promote bank examiners at pay level GS-9 to the rank of “commissioned bank examiner” (GS-11). The district court concluded after a bench trial that the test had a disparate impact and had not been shown to be a business necessity, and therefore entered judgment for the class. The Corporation appeals, represented by the Justice Department, which urges us to reverse outright and dismiss the suit. The plaintiffs’ counsel urges us to affirm the judgment on the ground that even under the standard of Wards Cove the Program Evaluation test is unreasonably exclusionary.

The first question is whether the district judge committed a clear error in finding that the plaintiffs had demonstrated a disparate impact. As only 39 percent of the black candidates who took the Program Evaluation test passed, compared to 84 percent of the white candidates, and as the large number of candidates made the difference highly significant statistically, it may seem beyond question that the plaintiffs showed a disparate impact, thereby shifting to the defendant the burden of producing evidence of justification. But this the defendant contests, noting first that passing the Program Evaluation test is not a sine qua non for promotion. The regional directors for whom the bank examiners work can promote an examiner who fails it and can refuse to promote an examiner who passes it; the test results are merely advisory. But it is rare for regional directors to ignore the advice, as is suggested although not proved by the fact that of the black candidates who took the test 56 percent were promoted to commissioned bank examiner within one year, compared to 92 percent of the whites. These statistics suggest that while a few “fails” were nevertheless promoted (as can be seen by the difference between 56 percent, the number of blacks promoted, and 39 percent, the number who passed the test, and by the difference between 92 percent and 84 percent — the corresponding figures for whites), this was true for both races and did not reduce the disparate impact of the test. On the contrary, it increased it, for the black fails fared worse than the white ones. Only 27 percent of the black fails were promoted within one year, compared to 53 percent of the white ones.

These statistics are potentially misleading because some examiners who passed the test may not have been promoted, so conceivably it is pure happenstance that blacks both did badly on the test and were less likely to be promoted than whites. But apparently it was extraordinarily rare for a regional director to refuse to promote an examiner who passed the Program Evaluation test; the record contains only two instances of such a refusal.

A stronger attack made by the Corporation on the plaintiffs’ statistical test is that the test is simplistic because it has only one independent variable: race. Other variables, such as education, can of course affect performance on a test, and there is a well-known statistical technique, multiple-regression analysis, for estimating the partial effect of one of several independent variables on the dependent variable (here, success on the Program Evaluation test). See 1 Gastwirth, Statistical Reasoning in Law and Public Policy 400-23 (1988). It is possible that if success on the Program Evaluation test had been regressed on other variables as well as race, race would have been found to have no effect, or a statistically insignificant effect, on success; and then there would have been no proof of disparate impact. A statistical analysis must cross a threshold of reliability before it can establish even a prima facie case of disparate impact. See, e.g., Morgan v. Harris Trust & Savings Bank, 867 F.2d 1023, 1028 (7th Cir.1989) (per curiam). But we agree with the plaintiffs, for reasons about to be explained, that they were not required to perform a multiple-regression analysis in this case. Paradoxically, our conclusion is strengthened by Wards Cove, because after that decision the prima facie case means less than it did before, so there is less reason to be fussy about it. Under the regime of Wards Cove it just makes the defendant produce some evidence in justification of its test, after which the plaintiff must prove the test unreasonable. In addition, the defendant can always present evidence to show that there was no disparate impact — that it is merely an artifact of the plaintiffs statistical study. See, e.g., Tagatz v. Marquette University, 861 F.2d 1040, 1044 (7th Cir.1988); Washington v. Electrical Joint Apprenticeship & Training Committee, supra, 845 F.2d at 714.

When, as in this case and those just cited, there has been a full trial, the issue of prima facie case drops out, and the question becomes whether the judge is persuaded that the test or other challenged practice is discriminatory because it has a disparate impact unjustified by the defendant’s legitimate business needs. To speak precisely, the existence of a “prima facie case” in the specialized Title VII sense of a case strong enough to shift the burden of production to the defendant becomes moot once the lawsuit is tried. Yet in its older sense of evidence sufficient to defeat a defendant’s motion for directed verdict, the existence of a prima facie ease remains an issue — or would if there were jury trials in Title VII cases. Since there are not, it is simpler and clearer just to ask whether at the conclusion of the trial the evidence pro and con liability supports a finding of violation.

If the two groups compared in these plaintiffs’ simple statistics — black and white bank examiners who took the Program Evaluation test — were obviously and substantially different in some relevant respect, like the skilled and unskilled workers whom the Court in Wards Cove thought insufficiently alike to justify the use of a simple comparison to show disparate impact, the plaintiffs’ case would fail at the threshold, without the judge’s having to reach the issue of business necessity. Besides Wards Cove, see, e.g., Beard v. Whitley County REMC, 840 F.2d 405, 409 (7th Cir.1988); Coser v. Moore, 739 F.2d 746, 751-54 (2d Cir.1984); EEOC v. Federal Reserve Bank, 698 F.2d 633, 658-60 (4th Cir.1983), rev’d on other grounds under the name Cooper v. Federal Reserve Bank, 467 U.S. 867, 104 S.Ct. 2794, 81 L.Ed.2d 718 (1984). But the problem of comparability is much less acute here than in Wards Cove or the other cases we have cited. In order to be eligible to take the Program Evaluation test, you must have been a GS-9 bank examiner for at least a year. Examiners are hired years before reaching GS-9 and several grades lower (GS-4 or GS-5), with the result that the test takers will have been working as bank examiners for the FDIC for anywhere between five and fifteen years and will have demonstrated sufficient competence to earn several promotions. Any, or at least many, educational deficiencies that individual examiners had when hired are likely to have been washed out by on-the-job training and experience. Furthermore, to be eligible to take the Program Evaluation test, you need, in addition to at least a year as a GS-9 bank examiner, a recommendation from your regional director. So there is reason to believe that the pool taking the test will be reasonably homogeneous despite possible differences (not, by the way, proved in this case) in original entry qualifications, arid this makes the very large disparity between blacks and whites in performance on the test suggestive of racial bias.

The disparity is suggestive but not conclusive. First, because the Corporation has an affirmative-action program for blacks, the blacks eligible for the Program Evaluation test may have had inferior entry qualifications to the whites, in which event they could be expected to perform somewhat worse on the test. But this is speculation. The Corporation put in no evidence that its affirmative-action program went beyond vigorous recruitment of blacks to a lowering of standards, and the judge was not required to assume that such evidence exists. Second, and consistent with the existence of an affirmative-action program, the average black examiner may not have worked for the Corporation as long as the average white examiner. In that event the whites taking the test would have been more experienced examiners, and perhaps, therefore, more likely to pass the test — recall that persons taking the test varied greatly in the amount of their previous experience as an examiner. But again no evidence was introduced. And it is possible that many of those who waited a long time to take the test were likely to do badly on it, since the wait may have reflected either delays in promotion to GS-9 attributable to substandard performance, or a well-founded insecurity about the prospects for passing the test.

The Corporation’s attack on the plaintiffs’ statistical case amounts to a contention that unless a plaintiff eliminates all alternative hypotheses he must lose. That would raise the threshold of proof too high. All the plaintiff is required to present is enough evidence to warrant a finding that, more likely than not, the challenged test, criterion, or practice had a disparate impact. In a case like this, where the pool taking the challenged test is reasonably homogeneous in terms of qualifications and the racial disparity in results is very large, a simple statistical comparison will support a finding that the test had a disparate impact. See Aguilera v. Cook County Police & Corrections Merit Board, supra, 760 F.2d at 846, and eases cited there. This is especially true in the present case since the defendant, while taking pot shots — none fatal — at the plaintiffs’ statistical comparison, did not bother to conduct its own regression analysis, which for all we know would have confirmed and strengthened the plaintiffs’ simpler study. The district judge was entitled to conclude that a disparate impact had been proved and that the only issue therefore was the existence of a “business necessity” for the Program Evaluation test.

On this front, too, the plaintiffs mounted a powerful attack, arguing that the Program Evaluation test was poorly designed and administered and pointing out that it had been abandoned shortly after this suit was filed (the record is silent on what replaced it). It was a three-day oral and written test, conducted by panels of three commissioned bank examiners — one from the Corporation’s training division and two chosen at random from among the commissioned bank examiners in the regional offices. The evidence supports the district court’s findings that there were no set questions, no set right or wrong answers, no fixed passing grade, no instructions for weighting performance on the various parts of the exam, no fixed time limits for the individual sessions, and no evaluation of the panel members. Panel members were not required to attend each session, and if they missed one they graded it anyway, guided by the advice of the member or members who had attended. The test emphasized the problems of small banks and as a result disfavored examiners who worked in the large metropolitan areas, where blacks tend to be concentrated (although there was no specific evidence that this is true of the black bank examiners employed by the Corporation). The emphasis on small banks may be rational— they are more prone to failure than large ones and probably keep worse records, although they may be on balance simpler to audit and the cost of a failure to the FDIC is lower for a small than for a large bank— but the Corporation does not argue that it is, so on this record the emphasis in the Program Evaluation test on the problems of small banks is arbitrary. Furthermore, the test failed to test many of the tasks that bank examiners are called on to perform. Nor was the concept of adequate performance defined, for the testers’ notes reveal that they would sometimes pass a candidate after rating him “barely adequate” or “marginally acceptable," and sometimes fail one after noting seemingly minor, readily correctable deficiencies. The Corporation discarded all the test papers of candidates who passed and many test papers of those who failed, and continued doing so — in violation of EEOC regulations — even after the suit was brought. See 29 C.F.R. §§ 1602.14(a), 1607.4. As a result it was impossible at trial to determine consistency among the different rating panels, but it is unlikely in view of the evidence reviewed above that there was much, and the Corporation was not entitled to the benefit of the doubt when the doubt resulted from its own destruction of documents.

The Corporation argues that all this doesn’t matter, for even if the Program Evaluation test was a bad test the plaintiffs failed to pinpoint particular aspects of it that were unfavorable to blacks. (The bias in favor of examiners from non-metropolitan and small metropolitan areas, a potential such aspect, was not pursued.) However, nothing in the structure of a disparate-impact case requires such pinpointing; whether the reason for the test’s disparate impact can be identified is merely another issue bearing on the correct interpretation of the plaintiffs’ statistics. At all events it is not difficult to imagine how the Program Evaluation test may have harmed blacks. In a test notably devoid of objective standards, where far from using blind grading the testers based an unknown part of the grade on the results of an unstructured personal interview, the danger is acute that racial bias of which the testers may well be unconscious will influence the grade. Although the precise racial composition of the testers is not in the record, it appears that the vast majority were white, and they may subliminally have expected blacks to perform worse than whites. The subjectivity of the Program Evaluation test deprived the testers of better information and may have inclined them to fall back on race and on vocationally irrelevant cultural factors correlated with race; if so the test was discriminatory in an uncontroversial sense. No doubt a sensible test for promotion to senior bank examiner status could not consist solely of multiple-choice or true-false questions that a computer could score. So subjectivity can’t be entirely banished. But it is hard to believe that the FDIC can’t do better than the Program Evaluation test, which its own consultants had criticized repeatedly — and indeed for all we know it has done better, on the successor test adopted when the Program Evaluation test was dropped.

The plaintiffs argue with great force that Judge Hart’s finding of liability is independent of the question of burden of persuasion, or the precise definition of the “business necessity” defense (which in the wake of Wards Cove should perhaps be renamed the “issue of legitimate employer purpose”) — the two pertinent respects in which the Supreme Court has changed the ground rules for disparate-impact litigation as they were understood by most lower courts. Judge Hart, however, said that “it has not been shown that the Program Evaluation was a reliable selection device.” This is the language of burden of persuasion not production, and the burden of persuasion was placed on the wrong party, the employer. We therefore remand the case for reconsideration under the correct legal standard. Circuit Rule 36 shall not apply on remand, and although the purpose of the remand is to enable Judge Hart to reconsider his decision in light of Wards Cove as interpreted in this opinion, he is free to take additional evidence if he decides it would be helpful in determining whether the employer had a legitimate purpose in using the challenged test — the only issue open on remand, for as explained earlier we agree with Judge Hart that the plaintiffs have proved disparate impact.

A final point. We are disturbed that the Department of Justice, in submitting Judge Hart’s opinion as an appendix to its brief, as it was required to do by Fed.R. App.P. 30(a), submitted a copy on which the Department’s lawyer had scribbled critical marginalia, such as the word “WRONG” beside several findings of Judge Hart with which she took particular issue. This is indecorous and unprofessional conduct, which we have noticed in other cases and remark publicly today in the hope it will not recur.

Vacated and Remanded.  