Contents Menu Expand Light mode Dark mode Auto light/dark mode
TransformerLens Documentation
Logo
TransformerLens Documentation

Introduction

  • Getting Started
  • Getting Started in Mechanistic Interpretability
  • Gallery

Documentation

  • Transformer Lens API
    • transformer_lens
      • transformer_lens.ActivationCache
      • transformer_lens.BertNextSentencePrediction
      • transformer_lens.FactoredMatrix
      • transformer_lens.HookedEncoder
      • transformer_lens.HookedEncoderDecoder
      • transformer_lens.HookedTransformer
      • transformer_lens.HookedTransformerConfig
      • transformer_lens.SVDInterpreter
      • transformer_lens.evals
      • transformer_lens.head_detector
      • transformer_lens.hook_points
      • transformer_lens.loading_from_pretrained
      • transformer_lens.past_key_value_caching
      • transformer_lens.patching
      • transformer_lens.train
      • transformer_lens.utils
      • transformer_lens.components
        • transformer_lens.components.abstract_attention
        • transformer_lens.components.attention
        • transformer_lens.components.bert_block
        • transformer_lens.components.bert_embed
        • transformer_lens.components.bert_mlm_head
        • transformer_lens.components.bert_nsp_head
        • transformer_lens.components.bert_pooler
        • transformer_lens.components.embed
        • transformer_lens.components.grouped_query_attention
        • transformer_lens.components.layer_norm
        • transformer_lens.components.layer_norm_pre
        • transformer_lens.components.pos_embed
        • transformer_lens.components.rms_norm
        • transformer_lens.components.rms_norm_pre
        • transformer_lens.components.t5_attention
        • transformer_lens.components.t5_block
        • transformer_lens.components.token_typed_embed
        • transformer_lens.components.transformer_block
        • transformer_lens.components.unembed
      • transformer_lens.pretrained
        • transformer_lens.pretrained.weight_conversions
          • transformer_lens.pretrained.weight_conversions.bert
          • transformer_lens.pretrained.weight_conversions.bloom
          • transformer_lens.pretrained.weight_conversions.coder
          • transformer_lens.pretrained.weight_conversions.gemma
          • transformer_lens.pretrained.weight_conversions.gpt2
          • transformer_lens.pretrained.weight_conversions.gptj
          • transformer_lens.pretrained.weight_conversions.llama
          • transformer_lens.pretrained.weight_conversions.mingpt
          • transformer_lens.pretrained.weight_conversions.mistral
          • transformer_lens.pretrained.weight_conversions.mixtral
          • transformer_lens.pretrained.weight_conversions.nanogpt
          • transformer_lens.pretrained.weight_conversions.neel_solu_old
          • transformer_lens.pretrained.weight_conversions.neo
          • transformer_lens.pretrained.weight_conversions.neox
          • transformer_lens.pretrained.weight_conversions.opt
          • transformer_lens.pretrained.weight_conversions.phi
          • transformer_lens.pretrained.weight_conversions.phi3
          • transformer_lens.pretrained.weight_conversions.qwen
          • transformer_lens.pretrained.weight_conversions.qwen2
          • transformer_lens.pretrained.weight_conversions.t5
      • transformer_lens.utilities
        • transformer_lens.utilities.activation_functions
        • transformer_lens.utilities.addmm
        • transformer_lens.utilities.attention
        • transformer_lens.utilities.devices
  • Model Properties Table

Resources

  • Tutorials
  • Citation
  • Contributing
  • Transformer Lens Main Demo Notebook
  • Setup
  • Introduction
  • Features
  • Exploratory Analysis Demo
  • Special Cases

News

  • TransformerLens 2.0

Development

  • Contributing
  • Code Coverage
  • Github
Back to top

Gallery#

Research done involving TransformerLens:

  • Progress Measures for Grokking via Mechanistic Interpretability (ICLR Spotlight, 2023) by Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

  • Finding Neurons in a Haystack: Case Studies with Sparse Probing by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas

  • Towards Automated Circuit Discovery for Mechanistic Interpretability by Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, AdriĆ  Garriga-Alonso

  • Actually, Othello-GPT Has A Linear Emergent World Representation by Neel Nanda

  • A circuit for Python docstrings in a 4-layer attention-only transformer by Stefan Heimersheim and Jett Janiak

  • A Toy Model of Universality (ICML, 2023) by Bilal Chughtai, Lawrence Chan, Neel Nanda

  • N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models (2023, ICLR Workshop RTML) by Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Fazl Barez

  • Eliciting Latent Predictions from Transformers with the Tuned Lens by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

User contributed examples of the library being used in action:

  • Induction Heads Phase Change Replication: A partial replication of In-Context Learning and Induction Heads from Connor Kissane

  • Decision Transformer Interpretability: A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found here.

Next
Transformer Lens API
Previous
Getting Started in Mechanistic Interpretability
Copyright © 2023, Neel Nanda
Made with Sphinx and @pradyunsg's Furo