
    John B. SMITH, Harold Wado, John S. Bernhard, Philip D. Cufari, Salvatore Catalano, Robert H. Gusciora, Patricia Rake, Pedro Santiago, Judith Caruana, Edward Lalik, Jr., Eugene Hosenfeld and George Hamann, Plaintiffs-Appellants, v. XEROX CORPORATION, Defendant-Appellee.
    Nos. 98-7178, 98-7182, 98-7184, 98-7186, 98-7188, 98-7196, 98-7198, 98-7202, 98-7204, 98-7206, 98-7208 and 98-7212.
    United States Court of Appeals, Second Circuit.
    Argued: Nov. 30, 1998.
    Decided: Nov. 5, 1999.
    
      Theodore S. Kantor, Bilgore, Reich, Levine, Kroll & Kantor, Rochester, N.Y. for Plaintiffs-Appellants Smith, Wado, Lalik, Bernhard, Caruana, Hosenfeld, Hamann, Gusciora, Rake and Santiago.
    Donna Marianetti, Rochester, NY. for Plaintiffs-Appellants Cufari and Catalano.
    Margaret A. Clemens, Nixon, Hargrave, Devans & Doyle LLP, Rochester, NY. for Defendant-Appellee.
    Before: NEWMAN, LEVAL, PARKER, Circuit Judges.
   PARKER, Circuit Judge:

Plaintiffs-appellants appeal from a final judgment of the United States District Court for the Western • District of New York (David G. Larimer, Chief Judge) entered January 16, 1998, granting summary judgment for Xerox Corporation (“Xerox”) on the plaintiffs’ employment discrimination claims based on both disparate treatment and disparate impact theories.

I. BACKGROUND-

A. Facts

The facts of this case are more fully set forth by the district court in its decision, see Wado v. Xerox Corp., 991 F.Supp. 174 (W.D.NY.1998); therefore, we will repeat only those facts particularly pertinent to the issues which we address in detail.

In late Fall 1993, Xerox announced plans for a world-wide involuntary reduction in force (“IRIF”) which would reduce its 97,-500 member workforce by about 10,000 persons over the next two to three years. Each decentralized organization within Xerox was responsible for determining whether and by how much its workforce would be reduced. The organizations that chose to eliminate positions utilized the same decision-making process to determine which employees to retain.

In each work-unit an immediate supervisor ranked each employee in Work Quality, Work Speed, Work Orientation, and Work Skills, entering the scores on a Contribution Assessment Form (“CAF”). The Work Quality category purported to measure reliability and accuracy, as well as use of methods, tools, and processes. The Work Speed category was intended to measure the employee’s ability to plan, prioritize, execute a plan, and meet due dates. Work Orientation included action orientation, business orientation, team orientation, and customer orientation. Work Skills were assessed as to adequacy, self-development, and continuous learning. The employee was given a score of 0-5 in each of the four areas, for a total of 0-20 points. A group of senior managers then reviewed the CAFs from each work-unit for fairness and consistency and made any adjustments deemed warranted.

Subsequent to receiving a final score of 0-20, the employees were stack-ranked on a matrix against other employees from their respective work-units. The vertical axis of the matrix represented the employee’s total CAF score and the horizontal axis represented years of service at Xerox, either less than 20 years or greater than or equal to 20 years. Selections for termination were then made in a pattern of assessment score/tenure combinations that favored workers with greater years, with the exception of certain employees with special skills. For example, out of two employees each receiving a CAF score of 12, the employee with less than twenty years with the company was chosen for termination before the employee with more than twenty years at Xerox. A certain percentage of the lowest ranking persons from each unit was selected for termination which became effective January 18, 1994.

B. Proceedings Below

Fifteen Xerox employees selected for termination as part of the 1994 wave of the IRIF each filed suit against Xerox in federal district court pursuant to a Right to Sue letter issued to each complainant by the Equal Employment Opportunity Commission (“EEOC”). In their respective complaints the plaintiffs asserted various theories of employment discrimination under the following: (1) the Age Discrimination in Employment Act, 29 U.S.C. § 621 et seq. (“ADEA”), (2) Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. (“Title VII”), (3) the Americans with Disabilities Act, 42 U.S.C. § 12101 et seq. (“ADA”), and (4) the New York State Human Rights Law, N.Y. Exec. Law § 296 (“NYSHRL”).

Xerox moved for summary judgment against all plaintiffs on April 21, 1997. The court consolidated the actions pursuant to Fed.R.Civ.P. 42(a) since they involved common questions of law and fact. The court heard oral argument on December 5, 1997 and granted the defendant’s

motion for summary judgment on January 16, 1998. In its ruling on the defendant’s summary judgment motion, the court first addressed the disparate impact claims brought by thirteen of the plaintiffs under the ADEA and brought by two female plaintiffs and two male plaintiffs alleging sex discrimination in violation of Title VII. Xerox moved to exclude the reports of plaintiffs’ statistician expert, Dr. Philip A. Smethurst, arguing that he had inappropriately grouped work-units together and that he had neglected to conduct multiple regression analyses on the data. The district court denied the motion. However, the court ultimately decided that Sme-thurst’s conclusions were of little probative value for the reasons stated by the defendant in its motion to exclude and thus held that the plaintiffs failed to establish a prima facie case of disparate impact based on either age or gender. See Wado, 991 F.Supp. at 183-86. The court also held that the statistics did not support any plaintiffs disparate treatment claim. See id. at 214.

The court next addressed the non-statistical evidence presented by each plaintiff to prove the respective disparate treatment claims. The court assumed that each plaintiff had made out a prima facie case of discrimination and focused on whether each plaintiff raised a genuine issue of material fact as to whether Xerox’s legitimate nondiscriminatory reason for the'termination, namely, the need for a reduction-in-force, was merely a pretext for discrimination. The court decided that no plaintiff presented facts that, even when viewed in their most favorable light, could prove that Xerox had used the IRIF as a pretext to discriminate against any employee on the basis of age or sex. See Wado, 991 F.Supp. at 214.

Plaintiff Pedro Santiago also raised a retaliation claim under Title VII, contending that Xerox had terminated him because he had previously complained that he was being discriminated against because he is Hispanic. Noting that the plaintiffs most recent complaint was made four years before he was terminated, the court held that the plaintiff had not established any causal connection between his protected activity under Title VII and his termination. Id. at 202.

Three plaintiffs, Philip Cufari, Eugene Hosenfeld, and Patricia Rake, also asserted below that Xerox had discriminated against them because they were disabled. The court found as to all three of them that they failed to connect their disabilities to their terminations in any manner, as required by the ADA. See id. at 196, 200, 209-210. The court further held as to Cufari that he did not make out a prima facie case because he failed to show that he was disabled within the meaning of the ADA. See id. at 209-210.

Twelve of the plaintiffs timely filed a notice of appeal.

II. DISCUSSION

On appeal, the plaintiffs contend that the district court erred in finding that their statistical evidence was not probative of either disparate impact or disparate treatment. In addition, each plaintiff argues that he or she presented sufficient non-statistical evidence to support a jury finding that Xerox used the IRIF as a pretext to discriminate on the basis of age or gender. This Court reviews a district court’s grant of summary judgment de novo. See Young v. County of Fulton, 160 F.3d 899, 902 (2d Cir.1998). Having carefully reviewed the record, we affirm the district court’s holding that no plaintiff presented sufficient evidence to raise a triable issue of fact as to whether Xerox’s proffered reason for the termination of employment was merely a pretext for discrimination under the ADEA, the ADA or Title VII, for substantially the same reasons as stated by the district court in its thorough opinion. See Wado, 991 F.Supp. at 187-202, 204-14. We write only to address the appropriate use of statistics in this case, particularly with respect to the disparate impact claim.

A. Disparate Impact Claims in General

Ten of the twelve appellants claim that the IRIF disparately impacted certain groups of workers, specifically, employees 40 years of age or older under the ADEA, and either men or women under Title VII, depending on which plaintiff made the claim. A plaintiff need not prove discriminatory intent to make out a claim of disparate impact. See Griggs v. Duke Power Co., 401 U.S. 424, 432, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971). Instead, disparate impact theory targets “practices that are fair in form, but discriminatory in operation.” Id. at 431, 91 S.Ct. 849. A plaintiff establishes a prima facie case of disparate impact by identifying a specific employment practice which, although facially neutral, has had an adverse impact on her as a member of a protected class. See Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 994, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988). Statistical data may be admitted to show a disparity in outcome between groups, but to make out a prima facie case the statistical disparity must be sufficiently substantial to raise an inference of causation. Id. at 994-95, 108 S.Ct. 2777; NAACP v. Town of East Haven, 70 F.3d 219, 225 (2d Cir.1995).

Once the plaintiff establishes a prima facie case, the employer must make a showing of explain the business necessity of the challenged employment practice. See Griggs v. Duke Power Co., 401 U.S. 424, 432, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971); Maresco v. Evans Chemetics, Div. of W.R. Grace & Co., 964 F.2d 106, 115 (2d Cir.1992). Even if the employer successfully defends the business necessity of the practice, the plaintiff may still prevail if she can show that the employer’s proffered explanation was merely a pretext for discrimination. See District Council 37, AFSCME v. New York City Dep’t of Parks and Recreation, 113 F.3d 347, 352 (2d Cir.1997). For example, if the plaintiff can show that another practice would achieve the same result at comparable cost without causing a disparate impact on the protected group, she has proven pretext. See Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 660-61, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).

B. Statistics in Disparate Impact Cases

Although no bright line rules exist to guide courts in deciding whether plaintiffs’ statistics raise an inference of discrimination, several overarching principles inform the issue. Among these is Congress’s intent that employers not be required to treat any individual or group preferentially because of a protected characteristic or to establish a numerical quota system. See 42 U.S.C. § 2000e-2(j). Accordingly, the Supreme Court has established safeguards to prevent these results. First, plaintiffs are required to identify a specific employment practice, rather than rely on bottom line numbers in an employer’s workforce. See Watson, 487 U.S. at 994, 108 S.Ct. 2777. Plaintiffs must then present statistical evidence “of a kind and degree sufficient to show that the practice in question has caused the exclusion of applicants for jobs or promotions because of their membership in a protected group.” Id.

In evaluating disparate impact claims under Title VII, this Court has primarily relied on two methods of measuring disparities between groups. First, we have considered persuasive the EEOC Guideline that states that:

A selection rate for any race, sex, or ethnic group which is less than four-fifths (%)(or eighty percent) of the rate for the group with the highest rate will generally be regarded by Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact. Smaller differences in selection rate may nevertheless constitute adverse impact, where they are significant in both statistical and practical terms....

29 C.F.R. § 1607.4D (1998). See, e.g., Waisome v. Port Authority of New York & New Jersey, 948 F.2d 1370, 1376 (2d Cir.1991) (race-based Title VII disparate impact claim); Bushey v. New York State Civil Serv. Comm’n, 733 F.2d 220, 225-26 (2d Cir.1984)(same).

As an alternative measure of differences between groups, we have also looked to whether the plaintiff can show a statistically significant disparity of two standard deviations. A standard deviation is a measure of variance from the mean (or average) value in a given sample. Basically, looking at standard deviations indicates how far an obtained result varies from an expected result. See Waisome, 948 F.2d at 1376. For example, absent discrimination, we would expect that in a group of 100 workers, half of whom were 40 years of age or older and half of whom were younger than 40, about 25 workers in each group would be selected for retention if the group were required to reduce its headcount by one half. The number that actually is retained will vary from that expected number by some small amount due to chance. If the obtained result varies too greatly from the expected result we are willing to infer that one group has been discriminated against. This may be as a result of intentional discrimination or it may stem from some characteristic of the selection process that favors the other group. If an obtained result varies from the expected result by two standard deviations, there is only about a 5% probability that the variance is due to chance. Id. Courts generally consider this level of significance sufficient to warrant an inference of discrimination. See Ottaviani v. State Univ. of New York, 875 F.2d 365, 371-72 (2d Cir.1989).

Although courts have considered both the four-fifths rule and standard deviation calculations in deciding whether a disparity is sufficiently substantial to establish a prima facie case of disparate impact, there is no one test that always answers the question. Instead, the substantiality of a disparity is judged on a case-by-case basis. See Watson, 487 U.S. at 996 n. 3, 108 S.Ct. 2777 (approving the case-by-case approach because “statistics ‘come in infinite variety and ... their usefulness depends on all of the surrounding facts and circumstances.’ ’’(quoting Teamsters v. United States, 431 U.S. 324, 340, 97 S.Ct. 1843, 52 L.Ed.2d 396 (1977)).

C. Plaintiffs’ Statistics

The plaintiffs’ statistician, Dr. Philip Smethurst, ran statistical tests for each of the plaintiffs disparate impact claims. Generally, he attempted to group each plaintiff with the coworkers to whom that particular plaintiff was compared for selection purposes. However, when the number of persons in a particular unit was too small to yield a statistically valid result he pooled units that he thought were reasonably homogeneous. For example, plaintiff John Smith worked in the Integrated Supply Chain (“ISC”) which consisted of 111 employees before the IRIF. Smethurst determined that this number was insufficiently large to yield valid statistical results and thus decided to combine ISC (also known as Ml) with M2, both of which were Manufacturing Support groups within Corporate Strategic Services. The combination increased the group size to 2480 pre-IRIF employees.

On each plaintiffs work-group, many of which were reconstituted as described above, Smethurst performed hypothesis testing using a “t-test.” This methodology posits a null hypothesis, in this case that there was no disparity between the two groups compared, i.e., persons under 40 years of age compared to persons 40 or over, or women compared to men, as to rate of selection for retention. The data are analyzed using the t-test to determine whether the null hypothesis can be rejected, given a selected level of statistical significance, which in this case was p=.05, or 95% certainty. That is, if the null hypothesis can be rejected, then we can be 95% certain that chance does not account for the favored group of employees having a higher probability of being selected for retention. See Ramona L. Paetzold & Steven L. Willborn, The Statistics of Discrimination § 2.04 at 9-14 (1998). The results of a t-test can only tell us that it is very unlikely that chance is responsible for a disparity, this method cannot pinpoint what the, causative factor is. See id. § 2.04 at 10 n. 12. Another way of saying that a t-test is statistically significant at the p=.05 level is to say that the obtained result, i.e., an observed difference between the two groups compared, varied from the expected result, i.e., no difference between the two groups compared, by two standard deviations. Cf. Waisome, 948 F.2d at 1376.

Dr. Smethurst discovered statistically significant results for each of the t-tests he executed, indicating that the IRIF adversely affected persons 40 years of age or older in every plaintiffs work-group and that there was also a disparate impact by gender in the work groups of two male plaintiffs, Harold Wado and Eugene Ho-senfeld. In its motion for summary judgment, Xerox disputed these results, presenting the statistical analyses performed by its own expert, Dr. David Bloom. Bloom argued that Smethurst impermissi-bly pooled units, making his findings of statistical significance invalid. For the work-groups used by Bloom, t-tests showed no statistical significance at the p=.05 level.

D. Plaintiffs’ Claims

1. Disparate Impact

This Court generally assesses claims brought under the ADEA identically to those brought pursuant to Title VII, including disparate impact claims. See Geller v. Markham, 635 F.2d 1027, 1082 (2d Cir.1980)(holding that disparate impact is a substantive theory warranting the same treatment under the ADEA as under Title VII); AFSCME, 113 F.3d at 351. For this reason and because the defects in the plaintiffs’ theory pertain to the disparate impact claims brought under both statutes we assess the age-based claims and the gender-based claims together.

All plaintiffs to the current action identify the overall decision-making process utilized in the 1994 wave of the IRIF as the specific employment practice which allegedly had an adverse impact on older (or male) workers. Xerox argues that a decision-making process cannot constitute a specific employment practice. Defendant is correct that a plaintiff generally cannot rely on the overall decision-making process of the employer as a specific employment practice. See Wards Cove, 490 U.S. at 656-58, 109 S.Ct. 2115 (requiring plaintiffs to demonstrate as part of a prima facie case under Title VII that specific elements of the hiring process caused a significant disparate impact on racial minority applicants); see also Lowe v. Commack Union Free Sch. Dist., 886 F.2d 1364, 1371 (2d Cir.1989) (holding that plaintiffs could not establish a prima facie case of disparate impact under the ADEA by “broadly attacking as discriminatory the hiring process as a whole.”)- As part of the amendments to Title VII passed in the Civil Rights Act of 1991, Congress softened the holding of Wards Cove to allow a plaintiff to focus on an employer’s overall decision-making process as the cause of a disparate impact if the plaintiff can show that the elements of the employer’s decision-making process are not capable of separation for analysis. 42 U.S.C. § 2000e-2(k)(1)(B)(i). We need not decide whether or not the decision-making process utilized in the IRIF is capable of separation for analysis, however, because even assuming that the plaintiffs may identify the overall decision-making process as a specific employment practice, they fail to demonstrate a disparate impact based on age or sex.

After specifying the employment practice allegedly responsible for excluding members of their protected class from a benefit, plaintiffs must identify the correct population for analysis. In the typical disparate impact case the proper population for analysis is the applicant pool or the eligible labor pool. The composition of this population is compared to the composition of the employer’s workforce in a relevant manner, depending on the nature of the benefit sought. See, e.g., Hazelwood Sch. Dist. v. United States, 433 U.S. 299, 308, 97 S.Ct. 2736, 53 L.Ed.2d 768 (1977)(appropriate comparison was between the racial composition of the school’s teaching staff and the racial composition of the qualified public school teachers in the relevant labor market); Waisome, 948 F.2d at 1372 (population was all black candidates who sought promotion compared to the composition of those candidates actually promoted); Lowe, 886 F.2d at 1371 (proper population for analysis was the applicant pool compared to the composition of those teachers hired); E.E.O.C. v. Steamship Clerks Union, Local 1066, 48 F.3d 594, 604 (1st Cir.1995)(in a case alleging that the union’s membership policy caused racial minority workers to be excluded, proper comparison was between the composition of the available labor force and the composition of workers admitted to the union); see generally Paetzold & Willborn, § 1.09 at 28 (discussing proper statistical methodology for disparate impact cases).

The corresponding population in a reduetion-in-force situation consists of workers subject to termination. As in a promotion scenario like that of Waisome, the relevant population is divided into protected and non-protected groups and the selection rates of the two groups are compared. See, e.g., AFSCME, 113 F.3d at 348-49 (comparison of the age composition of the 5,180 employees subject to lay-off to the composition of the workforce after 1,585 employees were laid off). The questions to be answered are thus what is the composition of the population subject to the reduction-in-force, what was the retention rate of the protected group compared to the retention rate of other employees, and how much of a differential in selection rates will be considered to constitute a disparate impact. See Paetzold & Willborn, § 1.09 at 28-29.

In the present case, Xerox planned to reduce its workforce of approximately 97,500 by about 10,000. In order to determine whether the IRIF had a disparate impact on older (or male) workers we would first need to know how many of these 97,500 employees were subject to termination in the 1994 wave of lay-offs as part of which the plaintiffs were fired. From that population the rate of retention for older (or male) employees and the rate of retention for younger (or female) workers should be calculated. These rates could then be compared to ascertain whether there is a “gross statistical disparity” in the selection rates between groups. Waisome, 948 F.2d at 1375. Plaintiffs have offered no such evidence. Instead, plaintiffs’ expert either isolated units or pooled various units and ran separate anal-yses for each of these work-groups. This methodology cannot tell us if the overall decision-making process utilized in the IRIF produced a disparate impact on older (or male) workers. In any large population a subset can be chosen that will make it appear as though the complained of practice produced a disparate impact. Yet, when the entire group is analyzed any observed differential may disappear, indicating that the identified employment practice was not the cause of the disparity observed in the subset.

This idea can be illustrated by considering what would happen if the plaintiffs are correct in their claims that their respective supervisors purposely lowered the CAF scores of older (or male) employees. If that were the case, we would certainly expect to see a significant disparity in the selection rates between older and younger (or male and female) workers in those work-groups. However, the facially neutral selection process would not be the cause of the disparity; instead, the difference in retention rates would stem from the intentional discrimination of those supervisors. Under these circumstances, we would not know if the IRIF decision-making process itself caused a disparate impact. And indeed, given that only 8,444 employees were included in the plaintiffs’ analyses, it is very possible that when the total number of workers subject to termination in 1994 were considered, the differences caused by the intentional discrimination in these discrete work-groups would no longer be reflected in the statistics. The bottom line is we cannot reasonably infer from statistics based on a sub-set of work-groups that the IRIF caused the observed disparate impact, rather than some other factor relevant only to those work-groups.

The problematic nature of isolating work-groups in this manner is further highlighted by the results obtained, in this case. In plaintiff Judith Caruana’s work-group, workers 40 years of age and older were retained at 88.79% the rate of workers who had not yet reached their fortieth birthday. In Harold Wado’s work-group, older workers were retained at 96.69% the rate of younger workers. The difference in retention rates for the remaining eight plaintiffs are the same or lie somewhere between these two figures: Smith= 89.29%; Lalik= 89.29%; Bernhard= 96.69%; Hamann= 90.99%; Gusciora= 93.77%; Rake= 90.99%; Hosenfeld= 90.36%; and Santiago = 90.86%. A similarly large range of selection rates was discovered in the work-groups chosen by Dr. Smethurst for analysis of the gender-based claims. In plaintiff Harold Wado’s group the retention rate of the protected group was 98.59% that of the favored group. The retention rate of men in plaintiff Eugene Hosenfeld’s work-group was 90.36% that of the retention rate for women.

In some workforces, a disparate impact might well be actionable if older workers were retained at 88.79% of the rate for younger workers, but not if the comparison were 96.69%, especially considering the finding of statistical significance. Yet, it would be nonsensical for a court to decide that only some of these plaintiffs established a prima facie case of disparate impact when they all purport to specify the identical employment practice as causing a disparate impact. The decision-making process either caused a disparate impact or it did not.

For these reasons, we conclude that plaintiffs relied on the wrong population for their statistical analyses. It is only reasonable to infer a disparate impact from the IRIF decision-making process if all persons who were subject to the process are included in the analysis.

This is not to say that there is no disparate impact on a protected group as long as the bottom line numbers show no adverse effect. An employer may not defend against a disparate impact claim by arguing that its workforce is balanced overall. See Connecticut v. Teal, 457 U.S. 440, 450-51, 102 S.Ct. 2525, 73 L.Ed.2d 130 (1982)(under Title VII an employment practice that adversely affects black employees’ equal access to promotion opportunities is not cured by the fact that the bottom-line numbers of workers promoted reflected no disparate impact on black employees). Even if the overall decision-making process did not create an adverse impact on a protected group, that group still has a cause of action if it can show that some component of the decision-making process caused a disparate impact. See id.

In other words, if the plaintiffs in this case had demonstrated statistically that some portion of the IRIF decision-making process, such as the evaluation of work speed, produced a disparate impact on older (or male) workers, Xerox could not defend by showing that overall the same percentages of older and younger (or male and female) workers were selected for retention. However, the plaintiffs here alleged that the overall decision-making process itself, not some component thereof, resulted in an adverse effect on older (or male) workers. Having chosen the overall process, they must present statistics that support that contention. As discussed above, isolating a few work-groups and analyzing the effect of the IRIF on each work-group is misleading at best. Cf. Fisher v. Vassar College, 70 F.3d 1420, 1443 (2d Cir.1995)(a plaintiff may not “gerrymander” data to skew the results of statistical analyses in her favor).

Accordingly, we affirm the district court’s holding as to the lack of probative value of plaintiffs’ statistics for the disparate impact claim, albeit for somewhat different reasons than those on which the court below based its holding. The plaintiffs did not use incorrect statistical methodology, they applied appropriate analyses, but mismatched the population and the specific employment practice.

2. Disparate Treatment

A plaintiff may also present statistical findings as circumstantial evidence of intentional discrimination. See Hollander v. American Cyanamid Co., 172 F.3d 192, 202 (2d Cir.1999). In contrast to a disparate impact claim where the focus is on how a facially neutral employment practice affects a protected group, a disparate treatment claim looks at how an individual was treated compared to her similarly situated coworkers. Thus, statistical analyses that compare coworkers who competed directly against each other to receive a benefit, here selection for retention, are appropriate.

Xerox argues persuasively that plaintiffs’ statistics are inadequate to support the individual plaintiffs disparate treatment claims both because the work-units were pooled incorrectly and because Smethurst should have conducted multiple regression analyses to control for each plaintiffs performance evaluation. First, Dr. Smethurst pooled some plaintiffs into work-groups that included workers to whom the plaintiff was not directly compared in the IRIF process and who were, in fact, rated by other decision-makers. Because intent is the critical issue, only a comparison between persons evaluated by the same decision-maker is probative of discrimination.

Moreover, plaintiffs’ statistical analyses fail to account for other possible causes for the fact that older (or male) workers were more likely to be terminated. See Hollander, 172 F.3d at 203 (plaintiffs statistical findings insufficient to support an inference of intentional discrimination because they did not account for other potential causes of the age-related disparity between employees); Raskin v. Wyatt Co., 125 F.3d 55, 67-68 (2d Cir.1997). Multiple regression analyses, such as those performed by the defendant’s expert, would have been appropriate to eliminate other possible causes. See Ottaviani, 875 F.2d at 367 (noting that in disparate treatment cases plaintiffs usually use multiple regression analysis to “isolate the influences of [the protected trait] on employment decisions ... ”). Plaintiffs’ hypothesis testing only showed that chance was most likely not responsible for the perceived difference in treatment of the older (or male) workers; this methodology could not, by itself, support a conclusion that discrimination must have been the cause for the disparity. See Paetzold & Willborn, § 2.04 at 10 n. 12. Thus, since the district court properly held that no plaintiff presented sufficient non-statistical evidence from which an inference of intentional discrimination could be drawn, summary judgment was properly granted.

III. CONCLUSION

The district court correctly held that plaintiffs’ statistics were of little probative value in determining whether Xerox’s 1994 reduction-in-force caused a disparate impact on employees forty years of age or older or on male employees. The plaintiffs chose the overall decision-making process utilized in the IRIF as the specific employment practice which allegedly caused a disparity. Yet, plaintiffs’ statistical analy-ses, which isolate work-groups rather than assessing the effect of the decision-making process on the population of Xerox employees subject to termination, do not support a finding that the IRIF decision-making process resulted in a harsher effect on the protected groups. As to plaintiffs’ disparate treatment claims, the district court correctly held that the proffered evidence, both statistical and non-statistical, did not suffice to raise an inference of intentional discrimination. We have considered all of plaintiffs’ other claims raised on appeal and consider them to be without merit. Accordingly, the judgment of the district court is AFFIRMED. 
      
      . Beyond mentioning that the plaintiffs alleged violations of the New York State Human Rights Law, the district court did not explicitly evaluate the plaintiffs’ claims under that law. However, since claims under the NYSHRL are analyzed identically to claims under the ADEA and Title VII, the outcome of an employment discrimination claim made pursuant to the NYSHRL is the same as it is under the ADEA and Title VII. See Leopold v. Baccarat, Inc., 174 F.3d 261, 264 n. 1 (2d Cir.1999). Accordingly, we will not address the NYSHRL claims separately.
     
      
      . Philip Cufari and Salvatore Catalano stipulated to dismiss their disparate impact claims.
     
      
      . Multiple regression analysis is a statistical test which identifies factors, called independent variables, that might influence the outcome of an observed phenomenon, called a dependent variable. In the employment discrimination context the dependent variable is the employment decision, such as hiring, promotion, termination. The statistician identifies legitimate factors that could have influenced the decision, e.g., education and experience, and determines via multiple regression analyses how well these legitimate factors account for the employment decision. In this manner the influence of a protected characteristic on the employment decision can be statistically isolated. See Ottaviani v. State Univ. of New York, 875 F.2d 365, 366-67 (2d Cir.1989)(explaining multiple regression analysis) (citations omitted).
     
      
      . A plaintiff in a disparate impact case usually complains that persons from the favored group were "selected” for some benefit, often related to hiring or promotion, at a greater rate than members of the protected class to which the plaintiff belongs. In the case of a reduction-in-force the use of the word "selection” is somewhat counter-intuitive, since plaintiffs are complaining that they were selected for termination, which is hardly a benefit. In order to utilize the word "selection” in a manner consistent with its usage in the majority of disparate impact cases, we will refer to being selected for retention.
     
      
      . We acknowledge that a different analysis may apply to claims brought under the ADEA under some circumstances, because age tends to be highly correlated to certain factors an employer is permitted to consider when making employment decisions, such as pension status. See Hazen Paper Co. v. Biggins, 507 U.S. 604, 611, 113 S.Ct. 1701, 123 L.Ed.2d 338 (1993)(noting that an employer's decision to terminate an employee based on a permissible factor highly correlated with age does not constitute intentional age discrimination). However, Xerox does not purport to have relied on any such factors in making its decisions regarding whose employment to terminate. Therefore, we need not decide the extent to which the holding in Hazen Paper controls claims brought under a disparate impact theory.
     
      
      . The viability of the disparate impact theory under the ADEA is far from settled among the circuits. Several circuits have rejected or called into question the availability of a disparate impact cause of action under the ADEA in light of Hazen Paper. See, e.g., Mullin v. Raytheon Co., 164 F.3d 696, 699-704 (1st Cir.1999)(holding that disparate impact claims are not cognizable under the ADEA); Ellis v. United Airlines, Inc., 73 F.3d 999, 1006-10 (10th Cir.1996)(same); E.E.O.C. v. Francis W. Parker Sch., 41 F.3d 1073, 1076-78 (7th Cir.1994)(plainliff’s disparate impact claim not cognizable under the ADEA because age is highly correlated with work experience, which is a permissible factor for an employer to consider in making hiring decisions). Other circuits, including this one, have continued to recognize disparate impact ADEA claims. See, e.g., AFSCME, 113 F.3d at 351 (analyzing a disparate impact claim under the ADEA without discussion of Hazen Paper); Houghton v. SIPCO, Inc., 38 F.3d 953, 958-59 (8th Cir.1994)(same).
     
      
      .As for the gender-based claims brought by the two female plaintiffs, Judith Caruana and Patricia Rake, their own expert found that there was no statistically significant difference in retention rates between male and female workers in their work-groups. In fact, females were retained at a slightly higher rate than males, 89% versus 85% in Caruana’s work-group and 89% versus 87% in Rake’s work-group. Since there was no disparate impact on women even on plaintiffs’ terms, we need not consider the validity of their statistical analyses as to these two claims.
     
      
      . We would be presented with a different proposition if the groups used by plaintiffs’ expert purported to be randomly drawn samples of the total population of Xerox workers subject to the IRIF. A properly chosen random sample that evidenced a disparate impact would reflect a disparate impact on the entire population. However, the groups chosen by plaintiffs' expert were not randomly drawn from all such Xerox workers.
     
      
      . Because we hold that plaintiffs’ statistical methodology was flawed we need not decide whether these selection rates would serve to establish a prima facie case of disparate impact. However, it is interesting to note that even were we to accept the plaintiffs’ statistics as valid, no plaintiff demonstrated that the retention rate for the protected group was less than 80% of that of the group supposedly favored by the IRIF selection process.
     
      
      . Of course, this point assumes that Xerox did not direct supervisors to treat older (or male) workers more harshly, but that certain supervisors may have chosen on their own to discriminate. The assumption is warranted in this case because plaintiffs present no evidence that Xerox instructed all of its supervisors to give lower scores than were deserved to male workers and/or workers forty years of age or over. In fact, those plaintiffs who were themselves managers within the company at some point, testified at their depositions that Xerox had never directed them or requested them to consider an employee's age or sex in decision-making.
     
      
      . Plaintiffs argue that inclusion of their CAF scores as independent variables in multiple regression analyses would not have been appropriate since they all contend that they were purposely given scores lower than they deserved. They are correct that tainted variables do not further the causation inquiry. Cf. Ottaviani, 875 F.2d at 375 (rank variable not tainted by discrimination could be used in a multiple regression analysis as a legitimate factor to explain pay differentials). However, plaintiffs' expert could have conducted tests to determine whether the CAF scores were reliable measures of each plaintiff's performance by statistically assessing whether the plaintiff's past performance evaluations were strongly positively correlated with the CAF scores. See Diehl v. Xerox Corp., 933 F.Supp. 1157, 1168 (W.D.N.Y.1996). Dr. Smethurst did opine that plaintiffs' past evaluations generally looked better than their respective CAF scores, but he offered no quantitative study of the data. He thus lacked any foundation on which to base his opinion. Absent evidence tending to show that the CAF scores were tainted they should have been included in a multiple regression analysis in an effort to eliminate a relatively poor performance compared to coworkers as a cause of each plaintiff's termination. Certainly, performance is a factor Xerox was permitted to consider in deciding whom to retain.
     