transformer_lens.model_bridge.generalized_components.depthwise_conv1d module¶
Bridge for Mamba-style depthwise causal Conv1d (distinct from GPT-2’s Conv1D linear).
- class transformer_lens.model_bridge.generalized_components.depthwise_conv1d.DepthwiseConv1DBridge(name: str | None, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, conversion_rule: BaseTensorConversion | None = None, hook_alias_overrides: Dict[str, str] | None = None, optional: bool = False)¶
Bases:
GeneralizedComponentWraps an
nn.Conv1ddepthwise causal convolution with input/output hooks.- Hook shapes (channel-first, as HF’s MambaMixer transposes before the call):
hook_in: [batch, channels, seq_len] hook_out: [batch, channels, seq_len + conv_kernel - 1] (pre causal trim)
Decode-step limitation: on stateful generation, HF’s Mamba/Mamba-2 mixers bypass
self.conv1d(...)and readself.conv1d.weightdirectly, so the forward hook never fires on decode steps — only on prefill. For per-step conv output during decode, compute it manually from the cached conv_states andconv1d.original_component.weight, or run token-by-token viaforward()instead ofgenerate().- forward(input: Tensor, *args: Any, **kwargs: Any) Tensor¶
Generic forward pass for bridge components with input/output hooks.