On Using ChatGPT for Statutory Interpretation

June 11, 2024

Judge Kevin Newsom of the Eleventh Circuit Court of Appeals recently wrote a concurring opinion in an insurance case involving an issue of statutory interpretation.[i] Specifically, the question was whether a landowner’s in-ground trampoline constituted “landscaping” under a policy that provided him coverage for negligence arising from “landscaping” work but provided no definition of “landscaping.”[ii]

After reviewing numerous dictionary definitions of landscaping and finding all of them leaving “a little something to be desired” because none of them fully captured his own understanding of the term, Judge Newsom confessed to having consulted various generative AI tools (out of pure academic curiosity) for a definition.[iii] While the case was ultimately resolved on a different question, Judge Newsom chose to use his concurring opinion as a platform to discuss the potential use of generative AI for statutory interpretation, specifically when the issue involves discerning the plain and ordinary meaning of a word.[iv]

He concluded that large language models (LLMs), such as ChatGPT, might be useful in the interpretation of legal texts.[v] He followed his conclusion with a list of benefits and risks of doing so.

Judge Newsom identified the benefits as follows:

(1) “LLMs train on ordinary-language inputs,” thereby reflecting the “common speech of common people”;[vi]

(2) “LLMs can ‘understand’ context,” which allows them to “discern the difference—and distinguish—between the flying-mammal ‘bat’ . . . and the wooden ‘bat’” used in baseball;[vii]

(3) “LLMs are accessible,” which can both “democratiz[e] the interpretive enterprise” and provide “an inexpensive research tool”;[viii]

(4) “LLM research is relatively transparent” because we know they are trained on “tons and tons of internet data” and because they provide the opportunity for judges to “show their work” by disclosing “both the queries put to the LLMs . . . and the models’ answer”;[ix] and

(5) “LLMs hold advantages over other empirical interpretive methods,” such as conducting broad surveys and corpus linguistics.[x]

Judge Newsom also recognized the following risks:

(1) “LLMs can ‘hallucinate’”;[xi]

(2) “LLMs don’t capture offline speech, and thus might not fully account for underrepresented populations’ usages”;[xii]

(3) “Lawyers, judges, and would-be litigants might try to manipulate LLMs” by reverse-engineering a preferred answer;[xiii] and

(4) “Reliance on LLMs will lead us into dystopia” where “‘robo judges’ algorithmically resolv[e] human disputes.”[xiv]

Though Judge Newsom found each of the identified risks to be either non-fatal or easily mitigated, I’m not sure he fully appreciated the potential that the LLMs might fail to account for word usage among underrepresented populations. The inherent bias baked into generative AI is well documented.[xv] One study in particular “revealed systematic gender and racial biases in [multiple] AI generators against women and African Americans. The study also uncovered more nuanced biases or prejudices in the portrayal of emotions and appearances.”[xvi]

If a benefit of using LLMs to discern ordinary meaning is their ability to “democratiz[e] the interpretive enterprise,” then we should also be giving more consideration to websites such as Urban Dictionary and Wikipedia.

But the primary concern with a judge using any of these sources to discern “ordinary meaning” is that, in doing so, the judge becomes an advocate by both proposing and relying on a new definition not previously advanced or supported by any party. Admittedly, the same concern is true when judges consult dictionaries for definitions, but I’ve previously identified my concerns with that approach.

Despite the drawbacks of relying on LLMs and other unconventional sources, Judge Newsom makes some very good points about their potential utility. Perhaps the best approach lies somewhere in between complete reliance and absolute prohibition. Perhaps we should create standardized rules regarding the appropriate usage (by courts and litigants alike) of readily accessible, crowd-sourced information, such as LLMs, Urban Dictionary, and Wikipedia.[xvii]

And we could throw in dictionaries as well for good measure.

[i] Snell v. United Spec. Ins. Co., No. 22-12581, slip op. at 1 (11th Cir. May 28, 2024) (Newsom, J., concurring), https://media.ca11.uscourts.gov/opinions/pub/files/202212581.pdf#page=25 (last accessed June 10, 2024).

[ii] Id. at 1-2.

[iii] Id. at 5-6, 8.

[iv] Id. at 4.

[v] Id. at 10.

[vi] Id. at 11.

[vii] Id. at 14-15.

[viii] Id. at 15.

[ix] Id. at 16, 18, 19.

[x] Id. at 19-20.

[xi] Id. at 21.

[xii] Id. at 22.

[xiii] Id. at 23.

[xiv] Id. at 24-25.

[xv] Nettrice Gaskins, The Boy on the Tricycle: Bias in Generative AI (May 1, 2024), available at: https://nettricegaskins.medium.com/the-boy-on-the-tricycle-bias-in-generative-ai-d0fd050121ec#:~:text=While%20generative%20AI%20has%20numerous,against%20women%20and%20African%20Americans (last accessed June 10, 2024).

[xvi] Id.

[xvii] See Leslie Kaufman, For the Word on the Street, Courts Call Up an Online Witness, New York Times (May 20, 2013), available at: https://www.nytimes.com/2013/05/21/business/media/urban-dictionary-finds-a-place-in-the-courtroom.html.

Posted in:

Appellate Advocacy, Appellate Practice, Rhetoric and Web/Tech