Update: Reuters, Artificial Intelligence Is Now an A+ Law Student, Study Finds
Andrew Blair-Stanek (Maryland; Google Scholar), Donald G. Gifford (Maryland), Mark Graber (Maryland; Google Scholar), Guha Krishnamurthi (Maryland; Google Scholar), Jeff Sovern (Maryland; Google Scholar), Donald B. Tobin (Maryland; Google Scholar) & Michael P. Van Alstine (Maryland), AI Gets Its First Law School A+s:
We had o3, OpenAI's new reasoning model, take our Spring 2025 law school final exams with the "reasoning effort" parameter set to high. We graded o3's answers on the same curve as our students. Three semesters ago, we found the older model GPT-4-turbo's outputs would have received grades ranging between B+ and D. This semester, we find o3 got three A+s, one A, one A-, two B+s, and a B. We find clear, fixable explanations for two of o3's lowest grades.
Conclusion
AI models can now perform at an A+ level on some law school final exams. For two exams where o3 got a B or B+, we can identify likely reasons why o3 underperformed its potential. In Administrative Law, o3 had not seen the Loper Bright case in its training data, meaning o3 did not know that the key Chevron case had been overruled. And in Secured Transactions, we failed to pass a key portion of the exam instructions to o3. For a third exam, we ran a secondary experiment, seeing whether o3 performed better or worse when passed the professor’s full 70,000 words of class notes. Although only a single data point, o3 performed slightly worse with the class notes. This makes sense, given how AI models’ performance can degrade when passed more text.
This semester’s experiment indicates an experimental design for next semester, for determining how much of a threat AI cheating poses and for assessing AI’s abilities to mimic high human performance. We will have the state-of-the-art AI model take final exams, while making sure that it has access to any relevant new cases, as well as all exam instructions. To ensure that its answers remain focused on topics and cases covered in the course, we will give the model the syllabus or list of cases covered. And we will instruct it to make occasional spelling and grammar errors, as time-pressured exam takers often do. Then our registrar will paste the AI answers into our normal exam-taking software,12 so that we can grade it truly blindly and test whether we can pick out which answers were generated by AI.
Editor's Note: If you would like to receive a daily email with links to legal education posts on TaxProf Blog, email me here.



