ISSUE №4May 2026

Interpretability of large models

Public abstract

This issue surveys interpretability methods, reviews recent work and includes materials from the May webinar.

Full issue

Subscriber access

The full issue offers readers a systematic map of interpretability methods — from single-neuron analysis to a mechanistic breakdown of reasoning chains in large language models. The authors compare approaches by required resources, reliability of conclusions and applicability to frontier models.

A dedicated section covers practice: how teams use interpretability to debug model behaviour, find undesirable strategies and prepare systems for a safety audit. The conclusion lists open questions and the Forum's direction for the year ahead.

Full material available to AI Forum Review subscribers

Get a free subscription to read full issues, event materials and restricted reports.

Other issues

From the archive

№3

April 2026

Value alignment: a survey of methods

A survey of value-alignment methods: from learning with feedback to constitutional approaches.

№2

March 2026

Energy efficiency of AI

How to reduce the energy and compute cost of modern models without losing quality.