Skip to main content
ISSUE4May 2026

Interpretability of large models

Public abstract

This issue surveys interpretability methods, reviews recent work and includes materials from the May webinar.

Full issue

Subscriber access

The full issue offers readers a systematic map of interpretability methods — from single-neuron analysis to a mechanistic breakdown of reasoning chains in large language models. The authors compare approaches by required resources, reliability of conclusions and applicability to frontier models.

A dedicated section covers practice: how teams use interpretability to debug model behaviour, find undesirable strategies and prepare systems for a safety audit. The conclusion lists open questions and the Forum's direction for the year ahead.

Full material available to AI Forum Review subscribers

Get a free subscription to read full issues, event materials and restricted reports.