Topic: Mechanistic Interpretability

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Mechanistic Interpretability, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

Current month Last month 2 months ago

← Back to home

Zoom In: An Introduction to Circuits
Paper • Mar 10, 2020 • Distill • Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter
Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...
Multimodal Neurons in Artificial Neural Networks
Paper • Mar 4, 2021 • Distill • Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah
We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...
Language models can explain neurons in language models
Paper • May 9, 2023 • OpenAI • Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders
Understanding the internal mechanisms of massive language models is a critical bottleneck for AI safety and alignment. Given the billions of parameters in modern models, manual human inspection of ...

FAQ

What does this Mechanistic Interpretability page rank?

It ranks public content for Mechanistic Interpretability using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Mechanistic Interpretability topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Mechanistic Interpretability topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Mechanistic Interpretability?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Mechanistic Interpretability topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Mechanistic Interpretability topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: Mechanistic Interpretability

Short answer

Zoom In: An Introduction to Circuits

Multimodal Neurons in Artificial Neural Networks

Language models can explain neurons in language models

Top Entities In This Topic

Related Topics

FAQ

What does this Mechanistic Interpretability page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in Mechanistic Interpretability?

Can I follow this topic for updates?