
    Juliette FICKLING, Lisa Goodwine, Patricia Fowler, Sherley Kenyon, Dhyalma Vasquez, Lena Coleman-Minor, Sonya E. Graves, and Stephanie King, Plaintiffs, v. The NEW YORK STATE DEPARTMENT OF CIVIL SERVICE and County of Westchester, Defendants.
    No. 92 CV 3350 (BDP).
    United States District Court, S.D. New York.
    Dec. 22, 1995.
    
      Michael Sussman, Goshen, NY, for Plaintiffs.
    Antoinette M. McCarthy, Marilyn J. Slat-ten, Westchester County Attorney, White Plains, NY, for Defendants.
   MEMORANDUM ORDER AND DECISION

PARKER, District Judge.

BACKGROUND

Plaintiffs Juliette Fickling, Lisa Goodwine, Patricia Fowler, Sherley Kenyon, Dhyalma Vasquez, Lena Coleman-Minor, Sonya E. Graves and Stephanie King (collectively “plaintiffs”) brought this action for violations of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. and the New York State Executive Law § 296, claiming that their termination as Welfare Eligibility Examiners as a result of their failing scores on competitive examinations was unlawful. Defendants are the New York State Department of Civil Service (“New York State DCS”) and County of Westchester Department of Social Services (‘Westchester County DSS”) (collectively “defendants”). This Memorandum Opinion constitutes the findings of fact and conclusions of law of the Court after a bench trial held on September 21, 1995 comprising expert testimony and largely stipulated facts.

Specifically, plaintiffs, seven African-American and one Hispanic former employees of defendant Westchester County DSS, allege that the use of an entry-level examination for the position of Eligibility Examiner with the Westchester County DSS had a racially disparate impact and failed to serve defendants’ employment goal of fair competítion among candidates for civil service positions. For the reasons set forth, the Court finds that the examination had a disparate impact and that it did not serve that goal because it lacked content validity. Cf. Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 659, 109 S.Ct. 2115, 2126, 104 L.Ed.2d 738 (1989).

FACTS

As of March 1, 1991, plaintiffs were employed as provisional or temporary Social Welfare Eligibility Examiners by Westches-ter County DSS. In 1989 and 1990, each plaintiff took and failed, on more than one occasion, the New York State DCS examination for the position of Eligibility Examiner with the Westchester County DSS. The Westchester County Department of Personnel administered the Eligibility Examiner examinations to plaintiffs. The New York State DCS prepared, approved and graded the examinations.

On February 4, 1991, Westchester County DSS informed each plaintiff that her employment was to be terminated effective March 15, 1991, and each plaintiff was terminated on March 15,1991. Each plaintiff was terminated because her failing test score precluded her placement on the “eligible list” for the position of Eligibility Examiner. Each plaintiff, except Lena Coleman-Minor, had received satisfactory to excellent performance evaluations from at least one of her supervisors prior to her termination.

Initially, access to the position of Eligibility Examiner is controlled by competitive examination. Applicants must attain a score of 70 on the examination to be placed on an Eligibility Examiner “eligible list.” The “eligible list” is established on the basis of ratings received by the candidates in the competitive portions of the examination. N.Y.C.S.L. § 50(6). Appointment to the position of Eligibility Examiner is made by selecting one of the top three candidates on the “eligible list” who are willing to accept such appointment. N.Y.C.S.L § 61(1). Selection from the “eligible list” is then made on the basis of a candidate’s qualifications, score on the written test, an oral interview and, following appointment, performance during a probationary period.

Plaintiffs were employed as provisional or temporary Eligibility Examiners because Westchester County did not have an “eligible list” at the time. Provisional or temporary Eligibility Examiners may become permanent, however, only by passing the examination and being placed among the top three candidates willing to accept appointment on the “eligible list.” N.Y.C.S.L. § 65(1).

The examinations had a disparate impact on African-Americans and Hispanics in Westchester County and statewide. In Westchester County, the impact ratios (% minority passing/% white passing) at the cutoff score on the 1989 examination ranged from 52.8% to 66.2% for African-Americans and between 43.1% and 56.6% for Hispanics. For the 1990 examination, the pass rate for African-Americans was between 40.4% and 50.8% of the white pass rate while Hispanics passed at between 25.5% and 34.9% of the white rate. Statewide, the impact ratio at the cutoff score on the 1989 examination ranged from 41.8% to 53.7% for African-Americans and between 36.6% and 44.6% for Hispanics and on the 1990 examination was between 38.2% and 43.1% for African-Americans and between 25.0% and 40.3% for His-panies.

The New York.State DCS knew of the disparate impact of its Eligibility Examiner examinations since at least July of 1987. The New York State DCS did not notify the Westchester County DSS because it believed that the disparate impact was readily apparent from the results of the examinations.

The examinations comprised 30 questions testing an understanding of concepts and practices of interviewing, and 30 questions testing the application of welfare-eligibility rules and regulations to hypothetical clients. The questions testing interviewing concepts and practices were presented in the form of vignettes describing typical work situations. Prior to taking the examination, plaintiffs were provided sample questions, instructions and other information concerning the procedural and substantive requirements of the examination.

The New York State DCS created the examinations based upon a job analysis conducted in 1975. The job analysis identified the knowledge, skills and abilities (“KSAs”) required for successful performance in the Eligibility Examiner position. Although the underlying documents from the job analysis were apparently discarded sometime after 1975, a summary of the results of the job analysis is contained in a report entitled “Social Welfare Examiners.”

Defendants concede that the examinations at issue tested only 13 of the 49 KSAs (approximately 27%) identified by the summary report of the job analysis. Nevertheless, they argue that the examinations were reliable because test questions were continuously reviewed to ensure they were written at the appropriate level of difficulty for the position being tested. Post-test analyses were conducted after each the administration of the examination to determine how well test questions were performing. Based on the analysis and on appeals received from test-takers, test questions were changed or deleted.

The New York State DCS has never done a predictive validity study on the examinations at issue.

From 1989 to 1991, the New York State DCS conducted another job analysis in order to update its knowledge of the duties of Eligibility Examiners and to reevaluate the examinations for those positions. Field visits were conducted in 1989, the data from which was used to generate a list of 28 KSAs required of an Eligibility Examiner. Questionnaires were sent to hundreds of incumbent Examiners asking them to rate the 28 KSAs by the level of competence required and when it was needed. New Eligibility Examiner examinations were developed in 1991 based upon the job analysis update. DISCUSSION

Title VII proscribes employment discrimination with respect to hiring or the terms and conditions of employment on the basis of race, color, religion, sex, or national origin. 42 U.S.C. § 2000e-2(a). Designed “ ‘to achieve equality of employment opportunities and remove barriers that have operated in the past to favor an identifiable group of white employees over other employees,’ ” Albemarle Paper Co. v. Moody, 422 U.S. 405, 417, 95 S.Ct. 2362, 2371, 45 L.Ed.2d 280 (1975) (quoting Griggs v. Duke Power Co., 401 U.S. 424, 429-30, 91 S.Ct. 849, 853, 28 L.Ed.2d 158 (1971)), Title VII prohibits not only overt and intentional discrimination, but also discrimination resulting from practices that are facially neutral but have “disparate impact,” i.e., significant adverse effects on protected groups. See, e.g., International Brotherhood of Teamsters v. United States, 431 U.S. 324, 335-36 n. 15, 97 S.Ct. 1843, 1854-55 n. 15, 52 L.Ed.2d 396 (1977).

Title VII also provides that
[njotwithstanding any other provision of this subchapter, it shall not be an unlawful employment practice ... for an employer to give and to act upon the results of any professionally developed ability test provided that such test, its administration or action upon the results is not designed, intended or used to discriminate because of race, color, religion, sex or national origin.

42 U.S.C. § 2000e-2(h).

A plaintiff may establish a prima facie ease of disparate impact by showing that use of a test causes selection of applicants for hiring in a racial pattern that significantly differs from that of the pool of applicants. See, e.g., Albemarle Paper Co. v. Moody, 422 U.S. at 425, 95 S.Ct. at 2375; Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 994-95, 108 S.Ct. 2777, 2788-89, 101 L.Ed.2d 827 (1988). This showing may be made through statistical evidence revealing a disparity so great that it cannot reasonably be attributable to chance. See, e.g., Hazelwood School District v. United States, 433 U.S. 299, 307-08, 97 S.Ct. 2736, 2741-42, 53 L.Ed.2d 768 (1977).

A widely accepted benchmark for assessing disparate impact is the “four-fifths rule” of the 1978 Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. Part 1607 (“Guidelines”). See, e.g., Guardians Ass’n of the New York City Police Dep’t v. Civil Service Comm’n, 630 F.2d 79, 88 (2d Cir.1980). The Guidelines explain that “[a] selection rate for any race, sex, or ethnic group which is less than four-fifths (16) (or eighty-percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact-” 29 C.F.R. § 1607.4(D). Here, the disparate racial impact of the 1989 and 1990 examinations was far below the 80% standard. Accordingly, plaintiffs have established a prima facie case of discrimination.

Once a plaintiff has succeeded in establishing a prima facie case of disparate impact with respect to a challenged employment practice, courts proceed to analyze “whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer.” Wards Cove, 490 U.S. at 659, 109 S.Ct. at 2126. “A mere insubstantial justification in this regard will not suffice, because such a low standard of review would permit discrimination to be practiced through the use of spurious, seemingly neutral employment practices. At the same time, though, there is no requirement that the challenged practice be ‘essential’ or ‘indispensable’ to the employer’s business for it to pass muster.” Wards Cove, 490 U.S. at 659, 109 S.Ct. at 2126.

The “employer carries the burden of producing evidence of a business justification for his employment practice. The burden of persuasion, however, remains with the disparate-impact plaintiff.” Wards Cove, 490 U.S. at 659, 109 S.Ct. at 2126.

Defendants have offered three business goals that they claim the examinations at issue serve: (1) compliance with obligations imposed by the state constitution to appoint civil servants according to merit and fitness ascertained by a competitive examination, see New York State Constitution, Article V, section 6; (2) equal opportunity for and fair appraisal of all applicants; and (3) the employment of minimally qualified civil servants.

Defendants concede that the concept of a fair competition requires that an examination test for “KSAs” that are critical to job performance, and that are likely to be present among the pool of people who meet the minimum qualifications. In addition, the New York State Constitution provides that appointments and promotions in the civil service must be made “according to merit and fitness to be ascertained, as far as practicable by examinations which, as far as practicable, shall be competitive.” N.Y.S. Constitution, Art. V, § 6. The New York State Civil Service Law provides that examinations for civil service positions “shall be practical in their character and shall relate to those matters which will fairly test the relative capacity and fitness of the persons examined to discharge the duties of that service into which they seek to be appointed.” N.Y.C.S.L § 50(6).

Indeed, courts have held that “the crucial question under Title VII is job relatedness— whether or not the abilities being tested for are those that can be determined by direct, verifiable observation to be required or desirable for the job.” Guardians, 630 F.2d 79, 93. In other words, the employer must show that the test has “ ‘a manifest relationship to the employment in question.’ ” Albemarle Paper Co. v. Moody, 422 U.S. at 425, 95 S.Ct. at 2375 (quoting Griggs v. Duke Power Co., 401 U.S. at 432, 91 S.Ct. at 854).

There are three strategies for assessing the relationship of a written examination to job performance: content validation, construct validation, and criterion-related validation. Criterion-related validation determines the validity of a selection procedure by measuring the results of the procedure against ratings of actual job performance. Construct validation determines the validity of an examination by measuring the degree to which it tests job applicants for identifiable characteristics that have been determined to be important in successful job performance. Content validity is demonstrated by showing an exam measures knowledge, skills, or abilities that are used on the job. See Washington v. Davis, 426 U.S. 229, 247 n. 13, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976); Guardians, 630 F.2d at 91-92. The EEOC Guidelines define content validation as a study consisting of “data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated.” 29 C.F.R. § 1607.5B.

There are five requirements for “an exam with sufficient content validity to be used notwithstanding its disparate racial impact”: (1) the test-makers must have conducted a suitable job analysis; (2) they must have used reasonable competence in constructing the test itself; (3) the content of the test must be related to the content of the job; (4) the content of the test must be representative of the content of the job; and (5) the test must be used with a scoring system that usefully selects from among the applicants those who can better perform the job. Guardians, 630 F.2d at 95.

I consider each requirement in turn and conclude that defendants have failed to meet their burden of production of evidence of a business justification for their use of the examinations.

An acceptable job analysis must involve “an analysis of the important work behavior(s) required for successful performance and their relative importance.” Guardians, 630 F.2d at 95 (quoting 29 C.F.R. § 1607.14(C)(2)). In 1975, the New York State DCS conducted a job analysis of the important work behaviors required for successful performance of an Eligibility Examiner. Although Dr. Elizabeth Kaido, Principal Personnel Examiner with the Testing Services Division of the New York State DCS and an expert in the areas of job analysis, selection plan design, test question development and statistical analysis of test results, testified that the KSAs were rated by importance, she was not involved in the study, and the summary of the results, entitled “Social Welfare Examiners,” does not indicate that the KSAs were rated.

The second requirement is that the test have been constructed with “reasonable competence.” Dr. Kaido testified that, in connection with the 1975 job analysis, subject matter experts (at the supervisory and administrative levels) in the welfare eligibility field and hundreds of incumbent Eligibility Examiners were asked to list the KSAs needed for successful performance of Eligibility Examiner position. According to Dr. Kaido, the experts and incumbents were further asked to rate the KSAs by their relative importance for successful job performance. Testing specialists in the New York State DCS then combined the lists and developed the tests.

As mentioned above, however, Dr. Kaido was not involved in the 1975 job analysis. She testified: “I worked for the woman who did this study and during the time she was doing it, we would discuss the process from time to time because I was doing a similar study for entry level case workers.” She did not testify, nor is there any other evidence, about the identity or competence of the experts, incumbents or testing specialists involved in the job analysis or test development. There is no evidence as to whether the interviews and questionnaires were structured to ensure consistency. There is no evidence of the sample size of incumbents or of the sample selection procedure. There is no evidence as to whether the importance of a KSA was determined by general consensus among all respondents or by some arithmetical formula. There is no evidence as to how the 49 KSAs on the final list were selected. Nor is there any evidence on how exam questions were developed to test the KSAs nor on how the number of questions testing a particular KSA was determined. In fact, Dr. Kaido testified that she believed all documents generated during the job analysis study, except the summary of results entitled “Social Welfare Examiners,” were thrown out in the normal course of business. The absence of credible evidence on the 1975 job analysis and the development of the tests at issue precludes this court from finding that the tests were constructed with reasonable competence.

Third, the crux of Title VII’s requirement is that “the content of the test must be related to the content of the job.” Guardians, 630 F.2d at 95. According to the summary of results from the 1975 job analysis, the examinations at issue tested the following abilities: to translate requirements into layman’s language, to recognize a person’s needs from oral discussion, to ask pertinent questions, to deal with a variety of emotional behavior, to detect clues that indicate a need for service, to read, to relate written material to agency requirements, to follow written instructions in different situations, to record information accurately, to read schedules and manuals accurately, to read and record figures accurately, to do arithmetic, and to plan and organize work.

The examinations did not test the 15 other entry-level abilities identified as necessary to the job: to express oneself orally in a clear and accurate manner, to communicate by telephone, to listen, to give oral instructions, to give written instructions, to recognize conflicting facts, to recognize the culture of poverty, to relate information on an application to the eligibility requirements, to write clear and concise letters, to meet deadlines, to work under pressure, to assimilate facts quickly, to spot missing information, to summarize relevant facts in writing.

It is clear from these lists that the examinations tested mainly reading comprehension. In addition, 38% of the questions required arithmetic. The ability to do arithmetic, however was found to be unimportant to successful job performance in the 1989-91 job analysis. The examinations did not test written expression or oral expression, except the ability to describe eligibility requirements in a comprehensible manner, despite the fact that these abilities were found to be very important to successful job performance in the 1989-91 job analysis.

Moreover, the examinations at issue were developed from the data generated by a questionable job analysis that was 15 years old. Although Dr. Kaido testified that, based on the 1989-91 job analysis, she did not conclude that the old tests were out-of-date, the fact that the 1989-91 job analysis produced a different and shorter list of KSAs for the Eligibility Examiner job and new examinations were created based upon the new information belies her claim and suggests that the content of the old examinations was no longer wholly related to the content of the job in 1989 and 1990.

The Court also notes that, although Defendants were not required to do so, they never conducted a predictive validity study during the 15 years that these examinations were used, despite their knowledge for over two years of the examinations’ disparate impact. In her affidavit, Dr. Kaido stated that predictive studies were not done because “adequate criterion is not available and there was not an adequate supply of test takers from Westchester County. The turnover of Eligibility Examiners in Westchester County was high, which reduced the number of people who had been trained and were performing the job at the journey level. This group would have formed the predictive validity sample.”

At trial, however, Dr. Kaido testified that 100-200 test-takers statewide would be needed for a reasonable sample size and acknowledged that there were several hundred Eligibility Examiners in Westchester County alone. Dr. Kaido further testified that she did not know the turnover rate among Eligibility Examiners. She conceded that the real reason that the New York State DCS had never done a predictive validity study was because it perceived resistance to such a project at the county and municipal levels. She could not recall, however, which year, if any, the New York State DCS had asked Westchester County, or any county, whether it was willing to participate in a predictive validity study.

In addition, Dr. Kaido conceded that the New York State DCS had enough information from field studies and knowledge of the position to develop criterion from which to measure job performance. She testified that when she wrote in her affidavit that such criterion were not available, she meant that New York State DCS had simply not developed a criterion measure, administered it, nor determined its reliability. Because defendants have failed to produce credible evidence that the content of the examinations developed in 1975 was related to the content of the job of Eligibility Examiner in 1989 and 1990, I conclude that defendants have failed to carry their burden of production on this factor.

The “representativeness” requirement has two different meanings. “The first is that the content of the test must be representative of the content of the job; the second is that the procedure, or methodology, of the test must be similar to the procedures required by the job itself.” Guardians, 630 F.2d at 98; 29 C.F.R. § 1607.14(C)(4). Neither requirement is to be interpreted so rigorously as to foreclose any possibility of constructing a valid test. “The task of identifying every capacity and determining its appropriate proportion is a practical impossibility. It is similarly impossible for the procedures of the test to be truly representative of the actual job proce dures.” Guardians, 630 F.2d at 98. Rather, the test should “measure important aspects of the job, at least those for which appropriate measurement is feasible,” but need not “measure all aspects, regardless of significance, in their exact proportions.” Guardians, 630 F.2d at 99.

Only 13 of the 49 KSAs identified by the 1975 job analysis as necessary to the successful performance of the job were tested by the examinations at issue here. Dr. Kaido testified that the other 36 KSAs could not be tested by a written instrmnent. She testified that 24 of the untested KSAs were full performance KSAs, that is, they could only be learned on the job. The EEOC Guidelines make it clear that testing for material that can only be learned on the job is inappropriate. 29 CFR § 1607.5(f). She testified that the other 12 untested KSAs were not subject to testing by a written instrument. Thus, oral expression, which was identified as one of the most important KSAs, was not tested. Dr. Kaido offered no credible explanation as to why such KSAs as the ability to give written instructions, recognize conflicting facts, recognize resource possibilities, relate information on an application to eligibility requirements, work under pressure, and summarize relevant facts in writing could not be tested by a written examination. She conceded, however, that the ability to write clear and concise letters, which also was not tested, could have been tested by a written examination.

On the basis of this testimony, the Court concludes that the examinations were needlessly unrepresentative. Excluding those KSAs that could only be learned on the job, the examinations at issue tested only 50% of the remaining KSAs identified in the 1975 job analysis study. The test seized upon relatively minor aspects of the Eligibility Examiner job, such as reading comprehension and arithmetic and ignored others.

Under the Guidelines, “where cutoff scores are used, they should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.” 29 C.F.R. § 1607.5(H); Guardians, 630 F.2d at 105. “[A] criterion-related study is not necessarily required” in order to establish a basis for the cutoff score; the employer may rely on “a professional estimate of the requisite ability levels, or, at the very least, by analyzing the test results to locate a logical ‘break-point’ in the distribution of scores.” Guardians, 630 F.2d 79, 105.

Defendants, however, have offered no such basis. They rely merely upon the New York State Civil Service Commission President’s Regulations which set 70% of the total possible score as the passpoint, and set the pass-point two standard errors of measurement below 70% to guard against false negatives. They have not offered any evidence, however, that the passpoint was either a logical “break-point” in the distribution of scores or that it corresponded to the ability level required by the job. In her “Response to Dr. Backman’s Report,” dated January 20, 1994, Dr. Kaido admitted that “[w]e do not know in fact whether the job performance of those who scored between any passpoint and two SEMs below it would be satisfactory” (emphasis added).

CONCLUSION

Because I find that the examinations had a significant disparate impaet and defendants have failed to offer credible evidence that the examinations served the legitimate business goal of fair competition in civil service employment, I find for the plaintiffs. The parties shall within 80 days of the date of this Order submit briefs and affidavits on the issues of relief, attorney fees and any other matters pertinent to a final judgment.

SO ORDERED. 
      
      . The Civil Rights Act of 1991 ("the 1991 Act"), 42 U.S.C. §§ 2000e, 2000e-2(k), legislatively overruled the burden of proof that had been established in Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989), returning the law in disparate impact cases to its pre-Wards Cove state. The 1991 Act became effective on November 21, 1991. Although this action was filed after the passage of the amendments to the 1991 Act, the conduct at issue in this case occurred before the Act was passed. In post-trial briefs, all parties agreed that the 1991 Act does not affect the claim of disparate impact. The Court, therefore, will not consider the 1991 Act.
     
      
      . Employers are not required, even when defending standardized or objective tests, to introduce formal validation studies showing that particular criteria predict actual on-the-job performance, Watson v. Fort Worth Bank and Trust, 487 U.S. 977, 998, 108 S.Ct. 2777, 2790-91, 101 L.Ed.2d 827 (1988).
     
      
      . The New York State Civil Service Commission President's Regulations set 70% of the total possible score as the passpoint, but permit the pass-point to be lowered depending upon the needs of the service or the difficulty of the examination or other substantial factors. Here, the passpoint of 70 was set two standard errors of measurement ("SEM”) below 70% of the total possible score; it was set at 58.6% of the total possible score (i.e. 35 out of a total possible 60), thereby almost eliminating the possibility of a person failing the examination due to unreliability in the examination itself.
      In addition, the Regulations require the final score of a candidate on a written examination to be reported on a scale to 100, where the score of 100 is to represent the best performance possible, and where the score of 70 is to represent a performance meeting the minimum needs of the position. 4 N.Y.C.R.R. § 67.1(a). Here, a passing raw score of 35 to 60 on the examinations was converted to a final passing score of 70 to 100 by a simple arithmetical formula.
     