transformer_lens.model_bridge.supported_architectures.lfm2_moe module¶

LiquidAI LFM2 MoE architecture adapter.

class transformer_lens.model_bridge.supported_architectures.lfm2_moe.Lfm2MoeArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for LiquidAI LFM2 MoE models.

LFM2 MoE is a hybrid decoder with both short-convolution and full-attention layers. The adapter delegates each decoder layer to HF and exposes residual hooks around the whole layer rather than pretending every layer has a homogeneous attention/MLP substructure.

__init__(cfg: Any) → None¶: Initialize the LFM2 MoE architecture adapter.

applicable_phases: list[int] = [4]¶

component_mapping: ComponentMapping | None¶

prepare_loading(model_name: str, model_kwargs: dict) → None¶: Force eager attention when the HF config exposes the implementation knob.

prepare_model(hf_model: Any) → None¶: Force eager attention on the loaded HF model when supported.

uses_split_attention: bool¶

weight_processing_conversions: Dict[str, ParamProcessingConversion | str] | None¶

class transformer_lens.model_bridge.supported_architectures.lfm2_moe.Lfm2MoeBlockBridge(name: str, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, hook_alias_overrides: Dict[str, str] | None = None)¶

Bases: BlockBridge

Whole-layer LFM2 bridge exposing only residual stream hooks.

LFM2 MoE interleaves short-convolution and full-attention operator layers. Wrapping the HF layer as a whole preserves correct execution while avoiding unresolved standard attention/MLP aliases on layers that do not have them.

hook_aliases: Dict[str, str | List[str]] = {'hook_resid_post': 'hook_out', 'hook_resid_pre': 'hook_in'}¶