transformer_lens.model_bridge.generalized_components.ssm_mixer module¶
Wrap-don’t-reimplement bridge for HF’s MambaMixer (Mamba-1).
- class transformer_lens.model_bridge.generalized_components.ssm_mixer.SSMMixerBridge(name: str | None, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, conversion_rule: BaseTensorConversion | None = None, hook_alias_overrides: Dict[str, str] | None = None, optional: bool = False)¶
Bases:
GeneralizedComponentOpaque wrapper around Mamba-1’s MambaMixer.
Submodules (in_proj, conv1d, x_proj, dt_proj, out_proj) are swapped into the HF mixer by
replace_remote_component, so their hooks fire when slow_forward accesses them.A_logandDreach the user viaGeneralizedComponent.__getattr__delegation.Decode-step caveat:
conv1d.hook_outfires only on prefill during stateful generation; seeDepthwiseConv1DBridgefor the reason.- forward(*args: Any, **kwargs: Any) Any¶
Hook the input, delegate to HF slow_forward, hook the output.
- hook_aliases: Dict[str, str | List[str]] = {'hook_conv': 'conv1d.hook_out', 'hook_dt_proj': 'dt_proj.hook_out', 'hook_in_proj': 'in_proj.hook_out', 'hook_ssm_out': 'hook_out', 'hook_x_proj': 'x_proj.hook_out'}¶
- real_components: Dict[str, tuple]¶
- training: bool¶