transformer_lens.model_bridge.generalized_components.depthwise_conv1d module¶

Bridge for Mamba-style depthwise causal Conv1d (distinct from GPT-2’s Conv1D linear).

class transformer_lens.model_bridge.generalized_components.depthwise_conv1d.DepthwiseConv1DBridge(name: str | None, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, conversion_rule: BaseTensorConversion | None = None, hook_alias_overrides: Dict[str, str] | None = None, optional: bool = False)¶

Bases: GeneralizedComponent

Wraps an nn.Conv1d depthwise causal convolution with input/output hooks.

Hook shapes (channel-first, as HF’s MambaMixer transposes before the call):: hook_in: [batch, channels, seq_len] hook_out: [batch, channels, seq_len + conv_kernel - 1] (pre causal trim)

Decode-step limitation: on stateful generation, HF’s Mamba/Mamba-2 mixers bypass self.conv1d(...) and read self.conv1d.weight directly, so the forward hook never fires on decode steps — only on prefill. For per-step conv output during decode, compute it manually from the cached conv_states and conv1d.original_component.weight, or run token-by-token via forward() instead of generate().

forward(input: Tensor, *args: Any, **kwargs: Any) → Tensor¶: Generic forward pass for bridge components with input/output hooks.