Topic: Mechanistic Interpretability

Short answer

This page shows the most relevant public items for Mechanistic Interpretability, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time
Current monthLast month2 months ago

← Back to home

  1. Zoom In: An Introduction to Circuits

    PaperMar 10, 2020DistillChris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter

    Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...

  2. Multimodal Neurons in Artificial Neural Networks

    PaperMar 4, 2021DistillGabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah

    We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...

  3. Language models can explain neurons in language models

    PaperMay 9, 2023OpenAISteven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders

    Understanding the internal mechanisms of massive language models is a critical bottleneck for AI safety and alignment. Given the billions of parameters in modern models, manual human inspection of ...

Related Topics

company:openai-research (3)cs.CV (2)CLIP (1)Alignment (1)Multimodal AI (1)