ai-research-04-mechanistic-interpretability-saelens
Guides training and analysis of Sparse Autoencoders for interpretable feature extraction in neural networks.
Install this skill
or
93/100
Security score
The ai-research-04-mechanistic-interpretability-saelens skill was audited on Jun 8, 2026 and we found 7 security issues across 1 threat category. Review the findings below before installing.
Categories Tested
Security Issues
low line 325
External URL reference
SourceSKILL.md
| 325 | Browse pre-trained SAE features at [neuronpedia.org](https://neuronpedia.org): |
low line 358
External URL reference
SourceSKILL.md
| 358 | - [ARENA SAE Curriculum](https://www.lesswrong.com/posts/LnHowHgmrMbWtpkxx/intro-to-superposition-and-sparse-autoencoders-colab) |
low line 361
External URL reference
SourceSKILL.md
| 361 | - [Towards Monosemanticity](https://transformer-circuits.pub/2023/monosemantic-features) - Anthropic (2023) |
low line 362
External URL reference
SourceSKILL.md
| 362 | - [Scaling Monosemanticity](https://transformer-circuits.pub/2024/scaling-monosemanticity/) - Anthropic (2024) |
low line 363
External URL reference
SourceSKILL.md
| 363 | - [Sparse Autoencoders Find Highly Interpretable Features](https://arxiv.org/abs/2309.08600) - Cunningham et al. (ICLR 2024) |
low line 366
External URL reference
SourceSKILL.md
| 366 | - [SAELens Docs](https://jbloomaus.github.io/SAELens/) |
low line 367
External URL reference
SourceSKILL.md
| 367 | - [Neuronpedia](https://neuronpedia.org) - Feature browser |
Scanned on Jun 8, 2026
View Security Dashboard