transformer_lens.model_bridge.supported_architectures.falcon module¶
Falcon architecture adapter.
Supports original Falcon models (7B, 40B, 180B) with: - Parallel attention+MLP (both read same residual input) - Multi-query or grouped-query attention (fused QKV) - RoPE or ALiBi position embeddings
- class transformer_lens.model_bridge.supported_architectures.falcon.FalconArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Falcon models (FalconForCausalLM).
- prepare_model(hf_model: Any) None¶
Patch Falcon modules to avoid backward hook conflicts.
Two issues: 1. FalconLinear does input @ self.weight.T where .T is a view —
clone the transpose to break the view chain.
FalconDecoderLayer does mlp_output += attention_output (inplace) — this modifies a tensor captured by mlp.hook_out’s backward hook. Patch to use non-inplace addition.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for component testing.