Skip to main content

transformer-lens-interpretability

Guides mechanistic interpretability research using TransformerLens to analyze transformer models and their internal mechanisms.

Install this skill

or
91/100

Security score

The transformer-lens-interpretability skill was audited on Feb 28, 2026 and we found 9 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 329

External URL reference

SourceSKILL.md
329- [Main Demo Notebook](https://transformerlensorg.github.io/TransformerLens/generated/demos/Main_Demo.html)
low line 330

External URL reference

SourceSKILL.md
330- [Activation Patching Demo](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Activation_Patching_in_TL_Demo.ipynb)
low line 331

External URL reference

SourceSKILL.md
331- [ARENA Mech Interp Course](https://arena-foundation.github.io/ARENA/) - 200+ hours of tutorials
low line 334

External URL reference

SourceSKILL.md
334- [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html)
low line 335

External URL reference

SourceSKILL.md
335- [In-context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
low line 336

External URL reference

SourceSKILL.md
336- [Interpretability in the Wild (IOI)](https://arxiv.org/abs/2211.00593)
low line 339

External URL reference

SourceSKILL.md
339- [Official Docs](https://transformerlensorg.github.io/TransformerLens/)
low line 340

External URL reference

SourceSKILL.md
340- [Model Properties Table](https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html)
low line 341

External URL reference

SourceSKILL.md
341- [Neel Nanda's Glossary](https://www.neelnanda.io/mechanistic-interpretability/glossary)
Scanned on Feb 28, 2026
View Security Dashboard