
    Kip YURT, Plaintiff-Appellant, v. Carolyn W. COLVIN, Acting Commissioner of Social Security, Defendant-Appellee.
    No. 13-2964.
    United States Court of Appeals, Seventh Circuit.
    Argued Feb. 24, 2014.
    Decided July 10, 2014.
    
      William L. Fouche, Jr., Law Office of William Fouche, Dallas, TX, for Plaintiff-Appellant.
    Jared Christian Jodrey, Edward J. Kris-tof, Attorney, Social Security Administration Office of the General Counsel, Region V, Chicago, IL, for Defendant-Appellee.
    Before FLAUM and ROVNER, Circuit Judges and KENDALL, District Judge.
    
    
      
       The Honorable Virginia M. Kendall, United States District Court for the Northern District of Illinois, sitting by designation.
    
   ROVNER, Circuit Judge.

Kip Yurt suffers from a psychotic disorder which causes him to experience, among other things, auditory hallucinations and bouts of uncontrollable rage. He also struggles with obsessive compulsive disorder, moderately severe chronic obstructive pulmonary disease (“COPD”), and chronic bifrontal tension headaches. As a result, he applied for Disability Insurance Benefits from the Social Security Administration, but an Administrative Law Judge (“ALJ”) denied his application. After the Appeals Council declined to review the ALJ’s decision, Yurt sought review in the district court pursuant to 42 U.S.C. § 405(g). A magistrate judge affirmed the decision of the ALJ, and Yurt appeals, arguing principally that the ALJ erred by failing to include many of his medical limitations in the hypothetical that she posed to the vocational expert (“VE”). Yurt eon-tends that the flawed hypothetical led the VE and the ALJ to erroneously conclude that he could be gainfully employed. For the reasons discussed below, we reverse the judgment of the district court and remand to the agency for further proceedings.

I.

Yurt applied for disability in February 2011, alleging disability beginning on August 4, 2010. The Social Security Administration denied both Yurt’s claim and his request for reconsideration. On his application for a hearing with an ALJ, Yurt noted that he had worked in the past in various capacities as a cook and a janitor. His final job at the “substantial gainful activity” level effectively ended in May 2010, when he had some sort of break with reality. He was taken to the emergency room and subsequently placed on medical leave for several months. Shortly after he returned to work in August 2010, he threatened a coworker with a knife, which led, unsurprisingly, to his termination.

Between the episode at the hospital in May 2010 and the date of his hearing with the ALJ on April 3, 2012, Yurt saw a number of different physicians and therapists and attempted at least one other job. The May 2010 incident occurred at Park-view Noble Hospital, where Yurt had worked in the kitchen for several years when he was found “wandering the halls” without any memory of how he had gotten there. After being taken to the emergency room, he was referred to a neurologist, Dr. Madhav Bhat, who treated him on July 1, 2010. Dr. Bhat suggested weaning Yurt off an anti-seizure medicine he had been taking and doubling the dosage of Prozac Yurt was already taking for depression. Dr. Bhat recognized that Yurt suffered from “[rjecurrent episodes of altered awareness of surroundings,” and diagnosed Yurt’s nearly daily recurring bifron-tal pain in his head as a chronic tension headache. He concluded that Yurt should remain on medical leave from work for the time being.

Yurt returned to work that August, but reported that Parkview fired him shortly thereafter because “they were really afraid that he might hurt other people” and because he was accused of holding up a knife and threatening coworkers. On August 13, 2010, Yurt saw psychiatrist Dr. Frank Shao, who concluded that Yurt’s frequent self-described “black outs” were difficult to diagnose precisely. Dr. Shao recommended that Yurt obtain a second opinion and prescribed Lamictal in slowly increasing dosages to help Yurt’s “mood lability and violent behaviors.” He recognized that Yurt may have a “certain risk of violence to himself and others” because of his urges and history of aggression, but deemed the risk not to be “acute.” He assigned Yurt a Global Assessment of Functioning (“GAF”) score of 40 to 50. This GAF score correlates with “[s]erious symptoms ... or any serious impairment in social, occupational, or school functioning (e.g., no friends, unable to keep a job).” Am. Psychiatric Ass’n, Diagnostic & Statistical Manual of Mental Disorders 32 (4th ed. text revision 2000).

Yurt then attempted to work part-time as a cook at St. Francis School. Although the record is short on specifics, it appears that Yurt lost this job on account of again threatening a coworker. This likely corresponds to the beginning of December 2010, when Yurt called Dr. Shao’s office and reported grabbing a co-worker by the throat. He did not remember the details because he had blacked out.

Later that same month, he was admitted to the hospital for psychiatric evaluation. Dr. Shao reported that Yurt was hearing voices telling him to “kill people” and that he was afraid to go outside because the voice in his head (which he called “Alex”) was instructing him to “randomly hurt people.” Dr. Shao described Yurt as “disheveled” and assessed his GAF score to be between 25 to 30. This corresponds to behavior that is “considerably influenced by delusions or hallucinations or serious impairment in communication or judgment (e.g., sometimes incoherent, acts grossly inappropriately, suicidal preoccupation) or inability to function in almost all areas (e.g., stays in bed all day; no job, home, or friends).” Id. Dr. Shao recommended inpatient treatment for what he expected would be one to two weeks. He also increased Yurt’s dosage of Celexa (an antidepressant) and continued him on Lamic-tal (an anticonvulsant used to treat both epilepsy and bipolar disorder) as well as Seroquel (another medication for bipolar disorder). Despite Dr. Shao’s estimation that Yurt would need between one and two weeks of inpatient treatment, Yurt checked out of the hospital approximately two days later, denying auditory hallucinations, homicidal or suicidal ideations, delusions, or depression.

In January 2011, Yurt saw Dr. Kenneth Ogu for a psychiatric evaluation. Dr. Ogu noted that Yurt described having command hallucinations, sleep difficulty, racing thoughts and obsessive compulsive thoughts. He diagnosed Yurt with psychosis, not otherwise specified as well as “Rule out Bipolar I Disorder” and “Rule out Intermittent Explosive Disorder.” Yurt asked if his anti-psychotic medications (he was taking three) could be changed because they did not seem to be working for the voices. Dr. Ogu agreed and set out a plan for reducing some medications and adding several others.

Yurt was again admitted for psychiatric inpatient care on January 25, 2011. He continued to complain of auditory hallucinations — specifically the voice of “Alex” which Yurt described as “so strong” that he could no longer control it. This time Dr. Shao recommended hospitalizing Yurt to keep him from hurting others as a result of the auditory hallucinations. Dr. Shao again opined that Yurt had a GAF of 25 to 30. Here again, Yurt was released from the hospital two days later. At that time, Dr. Shao recorded a slightly higher GAF score of 35 to 40. This corresponds to “[s]ome impairment in reality testing or communication (e.g. speech is at times illogical, obscure, or irrelevant) or major impairment in several areas, such as work or school, family relations, judgment, thinking, or mood (e.g. depressed adult avoids friends, neglects family, and is unable to work[.])” Am. Psychiatric Ass’n, Diagnostic & Statistical Manual of Mental Disorders 32 (4th ed. text revision 2000). After his January 2011 stay in the hospital, Yurt was taking the following medications on a daily basis: 40 milligrams of Prozac for depression; 100 milligrams of Lamictal (used for treating bipolar disorder); 500 milligrams of Depakote for mood stabilization; 1 milligram of Klono-pin (used for treating epilepsy and panic disorders) at bedtime; 10 milligrams of Ambien at bedtime; and an increased dosage of 2 milligrams of Risperdal for psychosis.

In April 2011, Yurt met with the psychologist selected by the Disability Determination Bureau, Revathi Bingi, Ed.D. After evaluating Yurt, she concluded that he appeared to “have great difficulty managing his symptoms” in spite of good family support. She observed that Yurt’s “hallucinations, paranoia and anger appear to be restricting his life” and that his quality of life “appears to be very poor.” She assigned him a GAF of 45, which, as described above, represents “[s]erious symptoms ... or any serious impairment in social, occupational, or school functioning[J” Id. That same month, Yurt began meeting for therapy with Rachel DeFran-cesco, M.A. She identified Yurt’s issues as “anxiety, depression, employment, interpersonal problems, psychosis, [and] sleep.” She characterized Yurt’s prognosis as “fair,” and described him as suffering from “severe” symptoms but possessing a “strong motivation to gain understanding.”

In May 2011, state agency psychologist Ken Lovko reviewed Yurt’s file for a mental residual functional capacity assessment (“RFC”). As relevant here, Dr. Lovko checked boxes indicating that Yurt was “moderately limited” in his ability to: (1) understand and remember detailed instructions; (2) carry out detailed instructions; (3) perform activities within a schedule and maintain regular attendance; (4) perform at a consistent pace and complete a normal workday and workweek; (5) interact appropriately with the general public; (6) get along with coworkers or peers; and (7) maintain socially appropriate behavior. Dr. Lovko then opined that although Yurt’s diagnosis was “serious and consistent with severe impairments,” his functioning did not suggest that he had lost the capacity for unskilled work. Dr. Lovko also noted that Yurt’s GAF score of 60 (given by Dr. Ogu in January 2011) indicated only “minimal impairments.” Dr. Lovko further allowed that Yurt’s symptoms may impede his ability to work around large numbers of people, but that Yurt could likely work in an environment with fewer people and low levels of stress. Dr. Lovko also thought that Yurt could relate “at least on a superficial basis ... with co-workers and supervisors.”

In April 2012, Yurt had a hearing before an ALJ. The ALJ heard testimony from Yurt and his wife Lori as well as a vocational expert. Yurt testified that his “rage” and inability to “be around people” prevented him from holding a full-time job. He also testified that he could not sit or stand still for more than a few minutes at a time, and that his left hand shakes and prevents him from using it. Finally, he testified that he repeated certain cleaning routines at home as many as ten times daily and that he did not think he could get a job because he had “a real problem around people.”

Yurt’s wife of eighteen years, Lori, testified that because of his memory problems she needs to make sure he takes his various medications both in the morning and again at night. As for his level of functioning, she stated that she did not see him “functioning that much” and that when she did see him he was often lethargic, sleeping all day, or watching television. She also explained that even slight changes to his medication make it difficult for him to function and cause him to stare into space or otherwise lose focus. Finally, she expressed her opinion that Yurt’s memory loss would prevent him from succeeding at even a job where he was able to work alone and avoid other people because he would be unable to do what he was told.

The ALJ then formulated a hypothetical for the VE to assess what jobs Yurt could perform. She described to the VE an individual that can “remember and carry out unskilled task[s] without special considerations ... relate on at least a superficial basis with eoworkers and supervisors ... attend to tasks for sufficient periods of time to complete” and who “should not work around large numbers of people.” When asked if such an individual could perform any of Yurt’s past work, the VE opined that Yurt would be capable of performing his past work of dishwasher, janitor, and kitchen helper. She also thought that Yurt could carry out the duties of the light, unskilled job of “towel folder” or work as a cleaner/housekeeper. The VE also stated that in competitive employment workers were expected to be on task 80 to 85 percent of the time and could not miss more than one or two days per month and up to approximately ten per year. Yurt’s attorney then asked what jobs would be eliminated if Yurt needed to avoid exposure to pulmonary irritants such as dust and fumes (on account of his COPD). The VE opined that such a restriction would essentially eliminate any cleaning jobs. Finally, she allowed that the kitchen helper position would be eliminated if it was necessary to avoid any position that involved frequent exposure to hazards.

After analyzing the five steps in 20 C.F.R. § 404.1520, the ALJ concluded that Yurt was not disabled. At Step One, the ALJ determined that Yurt had not engaged in substantial gainful activity since the alleged onset date in August 2010. The ALJ noted that Yurt initially testified that he had not worked since the alleged onset date but that the evidence showed that he had worked as a part-time chef from October 2010 through March 2011. Yurt attributed the discrepancy to his alleged memory difficulties; the issue was ultimately irrelevant because the ALJ concluded that his earnings as a part-time chef did not represent disqualifying substantial gainful activity. At Step Two, the ALJ concluded that Yurt’s psychotic disorder was severe, but that his obsessive compulsive disorder, COPD, and hand tremors were not. At Step Three, the ALJ determined that Yurt did not have an impairment or combination of impairments that met or medically equaled the criteria of Listing 12.03 — Schizophrenic, paranoid and other psychotic disorders. Specifically, the ALJ concluded that Yurt’s mental impairment did not restrict his activities of daily living. And although the ALJ acknowledged that Yurt had “moderate difficulties” with social functioning and concentration, persistence, or pace, she concluded that the record did not support a finding of marked limitation in either domain as required to meet the criteria of Listing 12.03. She also determined that Yurt had not experienced any episodes of extended de-compensation or repeated episodes of de-compensation, which the regulations define as “exacerbations or temporary increases in symptoms or signs accompanied by a loss of adaptive functioning^]” 20 C.F.R. Pt. 404, Subpt. P, App. 1, § 12.00(C)(4).

The ALJ next determined that Yurt possessed the residual functional capacity to perform a full range of work at all exer-tional levels so long as he had only brief and superficial interaction with others and was not around large numbers of individuals. She based this largely on Dr. Lovko’s assessment that Yurt retained capacity to perform unskilled tasks without special considerations as long as he was not in large groups and had to relate only on a superficial basis. She concluded that although Yurt’s medically determinable impairments could be expected to cause some of his stated inability to be around people and sit or stand still, those limitations were not fully credible to the extent they were inconsistent with her RFC assessment. She also noted that Yurt’s treatment records documented improvement in his condition between his initial psychiatric consultations in August 2010 and records from counseling sessions in 2011 and 2012. The ALJ also made much of Yurt’s ability to go shopping on “Black Friday” in December 2011 without incident. She generally rejected Dr. Bingi’s findings as an inaccurate representation of Yurt’s overall mental capacity. She concluded that both Dr. Ogu’s evaluation in January 2011 and a later evaluation in May 2011 reflected that Yurt was articulate and displayed a normal speech pattern, findings that called into question Dr. Bingi’s GAF score of only 45 and her assessment that Yurt had great difficulty managing symptoms on account of his hallucinations, paranoia, and anger.

At Step Four, the ALJ concluded that Yurt was capable of performing his past work of a dishwasher and kitchen helper. Alternatively, she found at Step Five that Yurt could also work as an industrial janitor, cleaner, or towel folder consistent with the VE’s testimony on that point; accordingly, she entered a finding that Yurt was “not disabled.” The Appeals Council denied review, rendering the ALJ’s decision the Commissioner’s final decision subject to judicial review. 20 C.F.R. §§ 416.1455, 416.1481. Yurt appealed to the district court, which affirmed after finding that the ALJ’s decision was supported by substantial evidence.

II.

We review the district court’s affirmance de novo and therefore review the ALJ’s decision directly. E.g., Thomas v. Colvin, 745 F.3d 802, 805 (7th Cir.2014). We review the ALJ’s decision deferentially only to determine if it is supported by “substantial evidence,” which we have described as “such relevant evidence as a reasonable mind might accept as adequate to support a conclusion.” Moore v. Colvin, 743 F.3d 1118, 1120-21 (7th Cir.2014) (internal quotations and citation omitted). We neither reweigh the evidence nor substitute our own judgment in place of the ALJ, but her decision must provide enough discussion for us to afford Yurt meaningful judicial review and assess the validity of the agency’s ultimate conclusion. Id.

On appeal, Yurt argues that several flaws in the ALJ’s decision undercut her conclusions at Steps Four and Five that he could perform his past work or other jobs in the national economy. He first claims that the ALJ’s hypothetical to the VE is flawed because it failed to fully account for his limitations. Relatedly, he attacks the ALJ’s failure to consider his tension headaches at all. He also claims the ALJ did not properly weigh the medical evidence from his treating physicians. Finally, he asserts that the ALJ failed to build a logical bridge between the medical evidence and her conclusion that Yurt had not experienced any episodes of extended decompensation.

We begin with the ALJ’s hypothetical question to the VE, which, as detailed above, simply described an individual who could perform unskilled tasks, relate superficially to small numbers of people, and attend to tasks long enough to complete them. Yurt notes that the hypothetical fails to mention his headaches, his COPD, his tendency to “black out,” the voices he hears, and significantly, the limitations outlined in state agency psychologist Dr. Lovko’s assessment that the ALJ expressly “adopted.”

Instead of directly defending the hypothetical, the Commissioner focuses on the ALJ’s related finding regarding Yurt’s residual functional capacity, which essentially mirrored her hypothetical to the VE. Their dispute centers on whether the ALJ was required to incorporate into her hypothetical and RFC the “moderate” limitations Dr. Lovko noted on the Mental Residual Functional Capacity Assessment (“MRFCA”) form that he completed. Specifically, Yurt contends that the ALJ ignored all six mental activity categories where Dr. Lovko found that he was “moderately limited.” As detailed above, these included several limitations in concentration, persistence, and pace, including moderate limitations in the ability to carry out detailed instructions, perform within a schedule, be punctual, perform at a consistent pace, and to complete a normal workday and workweek.

As a general rule, both the hypothetical posed to the VE and the ALJ’s RFC assessment must incorporate all of the claimant’s limitations supported by the medical record. See O’Connor-Spinner v. Astrue, 627 F.3d 614, 619 (7th Cir.2010) (“Our cases, taken together, suggest that the most effective way to ensure that the VE is apprised fully of the claimant’s limitations is to include all of them directly in the hypothetical.”); Indoranto v. Barnhart, 374 F.3d 470, 473-74 (7th Cir.2004) (“If the ALJ relies on testimony from a vocational expert, the hypothetical question he poses to the VE must incorporate all of the claimant’s limitations supported by medical evidence in the record.”); see also SSR 96-5p, 1996 WL 374183, at *5 (RFC assessment “is based upon consideration of all relevant evidence in the case record, including medical evidence and relevant nonmedical evidence”); 20 C.F.R. § 404.1545. This includes any deficiencies the claimant may have in concentration, persistence, or pace. O’Connor-Spinner, 627 F.3d at 619; (“Among the limitations the VE must consider are deficiencies of concentration, persistence and pace.”); Stewart v. Astrue, 561 F.3d 679, 684 (7th Cir.2009) (hypothetical question “must account for documented limitations of ‘concentration, persistence, or pace’ ”) (collecting cases). Although it is not necessary that the ALJ use this precise terminology (“concentration, persistence and pace”), we will not assume that the VE is apprised of such limitations unless she has independently reviewed the medical record. There is no evidence here that the VE reviewed Yurt’s medical history or heard testimony about the various medical limitations that he complains were omitted from the ALJ’s hypothetical. Thus, we would expect an adequate hypothetical to include the limitations identified by Dr. Lovko and Yurt’s treating physicians.

Relying on Johansen v. Barnhart, 314 F.3d 283 (7th Cir.2002), the Commissioner argues that we should be unconcerned here with the failure of the ALJ to mention the six areas where Dr. Lovko found moderate limitations because the narrative portion of the form adequately “translated” these limitations into a mental RFC that the ALJ could reasonably adopt. In Johansen, we concluded that substantial evidence supported the denial of disability benefits where the ALJ’s mental RFC assessment and hypothetical to the VE failed to explicitly note the three areas where one consultative physician had noted that the claimant was “moderately limited.” Id. at 288-89. We upheld the ALJ’s decision despite these omissions, after observing that in addition to the finding that the claimant was “moderately limited” in three areas, the consultative physician “went further” and “translated” his findings into a specific RFC assessment opining that the claimant was still able to perform low-stress, repetitive work. Id.

The first and most obvious problem with the Commissioner’s argument is that it focuses entirely on the ALJ’s mental RFC when it is in fact the hypothetical she posed to the VE that Yurt attacks. Even if we ignore this shortcoming, Johansen is not as applicable as the Commissioner suggests. The three alleged omissions from the hypothetical in Johansen were moderate limitations in the claimant’s ability to (1) perform activities within a schedule; (2) complete a normal workweek and perform at a consistent pace; and (3) accept instructions and respond appropriately to criticism. Id. at 286. Only one of the limitations found by Dr. Lovko — performing activities within a schedule — appears in Johansen. Given the additional limitations Dr. Lovko found and their bearing on Yurt’s limitations in concentration, persistence, and pace, we would be hard-pressed to conclude that Dr. Lovko’s narrative RFC “went further” in capturing those limitations.

Moreover, we allowed the hypothetical in Johansen to stand despite its omissions because its description of “repetitive, low-stress work” specifically excluded positions likely to trigger the panic disorder that formed the basis of the claimant’s limitations in concentration, persistence, and pace. See O’Connor-Spinner, 627 F.3d at 619 (collecting and distinguishing cases, including Johansen, where we have upheld hypothetical that omitted restrictions in “concentration, persistence, and pace”). Significantly, Yurt’s hypothetical did not limit him to low stress positions or otherwise capture his moderate difficulties understanding and remembering instructions or performing activities within a schedule. See Craft v. Astrue, 539 F.3d 668, 677 (7th Cir.2008) (“In Johansen, the RFC reflected some work requirements that were relevant to mental abilities (i.e., repetition and stress); here, the RFC was for ‘unskilled’ work, which by itself does not provide any information about Craft’s mental condition or abilities.”). This is true despite Dr. Lovko’s having specifically mentioned in his narrative RFC that Yurt could deal with an environment “where stress levels are limited.”

Indeed, the Commissioner seems to be suggesting that the hypothetical and the mental RFC adequately accounted for Yurt’s limitations in concentration, persistence, and pace by limiting Yurt to unskilled work. But we have repeatedly rejected the notion that a hypothetical like the one here confining the claimant to simple, routine tasks and limited interactions with others adequately captures temperamental deficiencies and limitations in concentration, persistence, and pace. See generally Stewart, 561 F.3d at 685 (collecting cases); see also Craft, 539 F.3d at 677-78 (restricting claimant to unskilled, simple work does not account for his difficulty with memory, concentration, and mood swings); Young v. Barnhart, 362 F.3d 995, 1004 (7th Cir.2004); see also SSR 85-15, 1985 WL 56857 at *6 (1985) (“[BJecause response to the demands of work is highly individualized, the skill level of a position is not necessarily related to the difficulty an individual will have in meeting the demands of the job. A claimant’s [mental] condition may make performance of an unskilled job as difficult as an objectively more demanding job.”). The ALJ specifically found at Step 4 that Yurt had “moderate difficulties ... [w]ith regard to concentration, persistence, or pace.” These limitations were highlighted again in Dr. Lovko’s findings on the MRFCA form. Beyond stating that Yurt could perform “unskilled task[s] without special considerations,” the hypothetical does nothing to ensure that the VE eliminated from her responses those positions that would prove too difficult for someone with Yurt’s depression and psychotic disorder. Nor is this a case like Simila v. Astrue, 573 F.3d 503, 522 (7th Cir.2009), where the hypothetical describes the claimant’s underlying mental diagnoses (chronic pain syndrome and somatoform disorder) and the link between those conditions and the mental limitations is clear. In short, although the ALJ’s hypothetical contained several limitations accounting for Yurt’s difficulties in social functioning, the blanket statement that he could perform “unskilled” work fails to accurately capture Yurt’s documented difficulties with concentration, persistence, and pace. This failure to build an “accurate and logical bridge” between the evidence of mental impairments and the hypothetical and the mental RFC requires us to remand for further proceedings. See O’Connor-Spin-ner, 627 F.3d at 620-21; Craft, 539 F.3d at 677-78.

There are other reasons the ALJ should not have adopted non-examining psychologist Dr. Lovko’s RFC finding. In concluding broadly that Yurt retained the capacity for unskilled work, Dr. Lovko commented that a “GAF of 60 suggests minimal impairments.” But this conclusion fails to note that the GAF of 60 assigned by Dr. Ogu in January 2011 was the highest GAF assessment Yurt ever received. Notably, just two weeks after Dr. Ogu’s assessment, Yurt was hospitalized after having a psychotic break. At intake, his GAF was assessed at 25 to 30, and upon his release two days later Dr. Shao recorded a GAF score of 35 to 40. The higher score of 35 to 40 corresponds to some impairment in reality or major impairment in several areas (i.e., avoids friends, neglects family, and is unable to work). This is a far cry from the sort of “minimal” impairment Dr. Lovko believed could be expected with Yurt’s high-water mark GAF of 60. Seizing upon the GAF of 60 to conclude that Yurt was not substantially impaired is precisely the type of cherry-picking of the medical record that we have repeatedly forbidden. See, e.g., Bates v. Colvin, 736 F.3d 1093, 1099 (7th Cir.2013) (“An ALJ cannot rely only on the evidence that supports her opinion.”). The Commissioner attempts to minimize Dr. Lovko’s reliance on Yurt’s best GAF score by pointing out that it is the ALJ and not Dr. Lovko who is forbidden from cherry-picking the medical evidence in support of her finding. But such a distinction is largely irrelevant here given the ALJ’s assertion that she credited and indeed adopted Dr. Lovko’s opinion. And although the Commissioner is correct that the ALJ was not required to give any weight to individual GAF scores, see Denton v. Astrue, 596 F.3d 419, 425 (7th Cir.2010), the problem here is not the failure to individually weigh the low GAF scores but a larger general tendency to ignore or discount evidence favorable to Yurt’s claim, which included GAF scores from multiple physicians suggesting a far lower level of functioning than that captured by the ALJ’s hypothetical and mental RFC. See Bates, 736 F.3d at 1100 (low GAF score alone is insufficient to overturn ALJ’s finding of no disability but GAF scores in context revealed ALJ’s deficient consideration of entirety of claimant’s evidence).

We are also troubled by the ALJ’s failure to mention Yurt’s bifrontal tension headaches, which the neurologist Dr. Bhat described as having a tendency “to recur almost every day.” The Commissioner attempts to excuse this omission because Yurt did not mention them in his function reports or testify about them at the hearing. Although we have recognized the claimant’s obligation to explain why certain conditions are disabling, Pepper v. Colvin, 712 F.3d 351, 367 (7th Cir.2013), it is the ALJ who carries the burden of developing the record, Terry v. Astrue, 580 F.3d 471, 477 (7th Cir.2009). The fact that the headaches standing alone were not disabling is not grounds for the ALJ to ignore them entirely — it is their impact in combination with Yurt’s other impairments that may be critical to his claim. See SSR 96-8P, 1996 WL 374184 at *5 (observing that when considered in combination with other impairments a non-severe impairment may become “critical” to the outcome of a claim); see also Indoranto, 374 F.3d at 474 (“Notably absent from the ALJ’s order is a discussion of how Indoranto’s headaches and blurred vision affect her ability to work.”). Although this omission standing alone probably would not have been grounds for remand, the ALJ may clarify on remand the effect of Yurt’s tension headaches on his claim. See O’Connor-Spinner, 627 F.3d at 621.

Because these shortcomings are enough to require remand to the Agency for further proceedings, we need not belabor Yurt’s remaining arguments regarding whether the ALJ properly weighed the evidence provided by treating physicians and whether substantial evidence supports her conclusion that he experienced no episodes of decompensation. Yurt complains that the ALJ failed to properly weigh the evidence provided by his treating physicians. He points specifically to the assessments by Dr. Shao and Dr. Ogu as well as Rachelle DeFrancesco, M.A., who worked as a therapist under Dr. Ogu’s supervision. As for treating physicians Dr. Shao and Dr. Ogu, we simply note that in addition to summarizing Yurt’s visits and describing their treatment notes, the ALJ should explicitly consider the details of the treatment relationship and provide reasons for the weight given to their opinions. See 20 C.F.R. § 404.1527(c)(2) (describing six factor weighing process ALJ must perform for “every” treating physician); see also Scott v. Astrue, 647 F.3d 734, 739 (7th Cir.2011) (citing § 404.1527 for principle that ALJ must offer “good reasons” for rejecting treating physicians opinion, which is accorded controlling weight so long as it is “well supported” and consistent with other evidence in the record) (internal quotations and citation omitted); see also Moss v. Astrue, 555 F.3d 556, 561 (7th Cir.2009). Likewise, on remand the ALJ should consider DeFrancesco’s observations about the side effects of Yurt’s medications and her assessment that Yurt’s hallucinations and psychotic symptoms left him in “acute” distress.

That leaves the ALJ’s perfunctory conclusion at Step 4 that Yurt had suffered no extended episodes of decompensation, as would be required for him to satisfy the “B criteria” for a finding of per se disability under Listing 12.03 for psychotic disorders. See 20 C.F.R. Pt. 404, Subpart P., App. 1, § 12.04; Larson v. Astrue, 615 F.3d 744, 748 (7th Cir.2010) (describing requirement that claimant suffering from an affective disorder must have both a severe impairment under the “A criteria” and at least two “B criteria”). Specifically, Listing 12.03 requires that a claimant experience either three or more decompensation episodes lasting at least two weeks, a lesser number of longer episodes, or a greater number of shorter episodes of equivalent severity. See 20 C.F.R. Pt. 404, Subpt. P, App. 1 § 12.03(C). Here the ALJ pointed only to Yurt’s brief hospitalizations and concluded without elaboration that because they were both short-lived he had not suffered from extended episodes of decompensation. Although we reach no conclusion as to whether Yurt has suffered from decompensation episodes of sufficient frequency and severity to satisfy the “B criteria,” we note that on remand the ALJ should consider that hospitalizations are not the only way a claimant can satisfy the decompensation requirement. See 20 C.F.R. Pt. 404, Subpt. P, App. 1 § 12.00(C)(4) (observing that ALJ “must use judgment” to determine if more frequent decompensation episodes of shorter duration or less frequent episodes of longer duration may be used to substitute for the listed finding).

III.

For the foregoing reasons, the judgment affirming the denial of benefits is REVERSED and the case is REMANDED with instructions that it be returned to the SSA for further proceedings consistent with this opinion. 
      
      . Yurt wrote in his application that his employment dates were "estimated” and in fact some of the dates are not entirely consistent, a problem attributed at argument to Yurt’s documented short-term memory difficulties.
     
      
      . The GAF score is a numeric scale of 0 through 100 used to assess severity of symptoms and functional level. Am. Psychiatric Ass’n, Diagnostic & Statistical Manual of Mental Disorders 32 (4th ed. text revision 2000). Although the American Psychiatric Association recently discontinued use of the GAF metric, it was still in use during the period Yurt's examinations occurred. See id. 16 (5th ed.2013).
     
      
      . This is true even when Dr. Binghi’s GAF estimate of 45 is excluded pursuant to the ALJ's finding that it was not entirely credible or consistent with the record evidence as a whole.
     