New paper questions “LLM library learning” gains - accepted for publication at EACL

20.01.2026

The paper “Is This LLM Library Learning? Evaluation Must Account For Compute and Behaviour” by Ian Berlot-Attwell, Tobias Sesterhenn, Frank Rudzicz and Xujie Si has been accepted for publication at the Conference of the European Chapter of the Association for Computational Linguistics (EACL). The work critically examines recent in-context “library learning” systems that claim improved performance by learning and reusing tools/lemmas without fine-tuning. Across three published systems (including an in-depth analysis of LEGO-Prover), the authors argue that many reported gains largely disappear once computational cost is properly controlled, and they find little evidence that learned libraries are actually being reused in the intended way. The paper concludes with recommendations for stronger evaluation standards, including compute-matched baselines and behavioural analysis.

Abstract:

The in-context learning (ICL) coding, reasoning, and tool-using ability of LLMs has spurred interest in library learning (i.e., the creation and exploitation of reusable and composable functions, tools, or lemmas). Such systems often promise improved task performance and computational efficiency by caching reasoning (i.e., storing generated tools) - all without finetuning. However, we find strong reasons to be skeptical. Specifically, we identify a serious evaluation flaw present in a large number of ICL library learning works: these works do not correct for the difference in computational cost between baseline and library learning systems. Studying three separately published ICL library learning systems, we find that all of them fail to consistently outperform the simple baseline of prompting the model - improvements in task accuracy often vanish or reverse once computational cost is accounted for. Furthermore, we perform an in-depth examination of one such system, LEGO-Prover, which purports to learn reusable lemmas for mathematical reasoning. We find no evidence of the direct reuse of learned lemmas, and find evidence against the soft reuse of learned lemmas (i.e., reuse by modifying relevant examples).

Name	Zweck	Ablauf	Typ	Anbieter
_pk_id	Wird verwendet, um ein paar Details über den Benutzer wie die eindeutige Besucher-ID zu speichern.	13 Monate	HTML	Matomo
_pk_ref	Wird benutzt, um die Informationen der Herkunftswebsite des Benutzers zu speichern.	6 Monate	HTML	Matomo
_pk_ses	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo
_pk_cvar	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo
_pk_hsr	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo

New paper questions “LLM library learning” gains - accepted for publication at EACL

Info

Portale

Wetter & Webcam

Folgen Sie uns auf

Facebook

Instagram

YouTube

LinkedIn

Schnellzugriff

Schnellzugriff

New paper questions “LLM library learning” gains - accepted for publication at EACL

Info

Portale

Wetter & Webcam

Folgen Sie uns auf

Facebook

Instagram

YouTube

LinkedIn