The International Conference on Learning Representations (ICLR, Ranking A*) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning, and will be held in Singapore, Apr 24 — Apr 28th, 2025.
Main Conference
1. The paper “Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization” by Sascha Marton, Tim Grams, Florian Vogt, Stefan Lüdtke, Christian Bartelt, and Heiner Stuckenschmidt has been accepted as a Spotlight (Top 5%) at ICLR 2025.
Abstract: Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challenging. In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions while maintaining a high level of interpretability. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches in terms of performance and interpretability. Unlike existing methods, it enables gradient-based, end-to-end learning of interpretable, axis-aligned decision trees within standard on-policy RL algorithms. Therefore, SYMPOL can become the foundation for a new class of interpretable RL based on decision trees.
The full paper can be read at https://arxiv.org/pdf/2408.08761.
2. The paper “NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals” has been accepted at ICLR 2025. It was co-authored by Jannik Brinkmann, among many contributors, with Jaden Fiotto-Kaufman and Alexander Loftus as first authors.
Abstract: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the Intervention Graph, an architecture developed to decouple experimental design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches.
The full paper can be read at https://openreview.net/forum?id=MxbEiFRf39.
Workshops
3. The paper “Disentangling Exploration of Large Language Models by Optimal Exploitation” by Tim Grams, Patrick Betz, and Christian Bartelt has been accepted in the Workshop on Reasoning and Planning for Large Language Models at ICLR 2025.
Abstract: Exploration is a crucial skill for self-improvement and open-ended problem-solving. However, it remains unclear if large language models can effectively explore the state-space within an unknown environment. This work isolates exploration as the sole objective, tasking the agent with delivering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation and decompose missing rewards into exploration and exploitation components based on the optimal achievable return. Comprehensive experiments with various models reveal that most struggle to sufficiently explore the state-space and weak exploration is insufficient. We observe a positive correlation between parameter count and exploration performance, with larger models demonstrating superior capabilities. Furthermore, we show that our decomposition provides insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks.
The full paper can be read at https://arxiv.org/pdf/2501.08925.
4. The paper “Shedding Light on Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model” by Janis Zenkner, Tobias Sesterhenn, and Christian Bartelt has been accepted at the Deep Learning for Code workshop at the International Conference on Learning Representations (ICLR, CORE Rank A*).
Abstract: Task decomposition is a fundamental mechanism in program synthesis, enabling complex problems to be broken down into manageable subtasks. ExeDec, a state-of-the-art program synthesis framework, employs this approach by combining a Subgoal Model for decomposition and a Synthesizer Model for program generation to facilitate compositional generalization. In this work, we develop REGISM, an adaptation of ExeDec that removes decomposition guidance and relies solely on iterative execution-driven synthesis. By comparing these two exemplary approaches—ExeDec, which leverages task decomposition, and REGISM, which does not—we investigate the interplay between task decomposition and program generation. Our findings indicate that ExeDec exhibits significant advantages in length generalization and concept composition tasks, likely due to its explicit decomposition strategies. At the same time, REGISM frequently matches or surpasses ExeDec’s performance across various scenarios, with its solutions often aligning more closely with ground truth decompositions. These observations highlight the importance of repeated execution-guided synthesis in driving task-solving performance, even within frameworks that incorporate explicit decomposition strategies. Our analysis suggests that task decomposition approaches like ExeDec hold significant potential for advancing program synthesis, though further work is needed to clarify when and why these strategies are most effective.
5. The paper "Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models” by Patrick Knab, Sacha Marton and Christian Bartelt has been accepted at foundation models in the wild workshop at ICLR 2025.
Abstract: LIME (Local Interpretable Model-agnostic Explanations) is a popular XAI framework for unraveling decision-making processes in vision machine-learning models. The technique utilizes image segmentation methods to identify fixed regions for calculating feature importance scores as explanations. Therefore, poor segmentation can weaken the explanation and reduce the importance of segments, ultimately affecting the overall clarity of interpretation. To address these challenges, we introduce the DSEG-LIME (Data-Driven Segmentation LIME) framework, featuring: i) a data-driven segmentation for human-recognized feature generation by foundation model integration, and ii) a user-steered granularity in the hierarchical segmentation procedure through composition. Our findings demonstrate that DSEG outperforms on several XAI metrics on pre-trained ImageNet models and improves the alignment of explanations with human-recognized concepts.
The full paper can be read at https://arxiv.org/pdf/2403.07733
6. The paper "Unreflected Use of Tabular Data Repositories Can Undermine Research Quality" by Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke, Frank Hutter, Christian Bartelt, and Heiner Stuckenschmidt has been accepted at the Workshop on the Future of Machine Learning Data Practices and Repositories at ICLR 2025.
Abstract: Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue that, despite great achievements in usability, the unreflected usage of datasets from data repositories may have led to reduced research quality and scientific rigor. We present examples from prominent recent studies that illustrate the problematic use of datasets from OpenML, a large data repository for tabular data. Our illustrations help users of data repositories avoid falling into the traps of (1) overfitting validation data during model selection, (2) overlooking strong baselines, and (3) inappropriate preprocessing. In response, we discuss possible solutions for how data repositories can prevent inappropriate use of datasets and become the cornerstones for improved overall quality of empirical research studies.
7. The paper "Decision Trees That Remember: Gradient-Based Learning of Recurrent Decision Trees with Memory" by Sascha Marton, Moritz Schneider, Jannik Brinkmann, Stefan Lüdtke, Christian Bartelt, and Heiner Stuckenschmidt has been accepted at the Workshop on New Frontiers in Associative Memories at ICLR 2025.
Abstract: Neural architectures such as Recurrent Neural Networks (RNNs), Transformers, and State-Space Models have shown great success in handling sequential data by learning temporal dependencies. Decision Trees (DTs), on the other hand, remain a widely used class of models for structured tabular data but are typically not designed to capture sequential patterns directly. Instead, DT-based approaches for time-series data often rely on feature engineering, such as manually incorporating lag features, which can be suboptimal for capturing complex temporal dependencies. To address this limitation, we introduce ReMeDe Trees, a novel recurrent decision tree architecture that integrates an internal memory mechanism, similar to RNNs, to learn long-term dependencies in sequential data. Our model learns hard, axis-aligned decision rules for both output generation and state updates, optimizing them efficiently via gradient descent. We provide a proof-of-concept study on synthetic benchmarks to demonstrate the effectiveness of our approach.