transformer_lens.model_bridge.generalized_components.ssm_mixer module

Wrap-don’t-reimplement bridge for HF’s MambaMixer (Mamba-1).

class transformer_lens.model_bridge.generalized_components.ssm_mixer.SSMMixerBridge(name: str | None, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, conversion_rule: BaseTensorConversion | None = None, hook_alias_overrides: Dict[str, str] | None = None, optional: bool = False)

Bases: GeneralizedComponent

Opaque wrapper around Mamba-1’s MambaMixer.

Submodules (in_proj, conv1d, x_proj, dt_proj, out_proj) are swapped into the HF mixer by replace_remote_component, so their hooks fire when slow_forward accesses them. A_log and D reach the user via GeneralizedComponent.__getattr__ delegation.

Decode-step caveat: conv1d.hook_out fires only on prefill during stateful generation; see DepthwiseConv1DBridge for the reason.

forward(*args: Any, **kwargs: Any) Any

Hook the input, delegate to HF slow_forward, hook the output.

hook_aliases: Dict[str, str | List[str]] = {'hook_conv': 'conv1d.hook_out', 'hook_dt_proj': 'dt_proj.hook_out', 'hook_in_proj': 'in_proj.hook_out', 'hook_ssm_out': 'hook_out', 'hook_x_proj': 'x_proj.hook_out'}
real_components: Dict[str, tuple]
training: bool