Skip to content

Zero padding - possibly incorrect behavior? #76

@iceboundflame

Description

@iceboundflame

Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.

Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads every Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.

Here is a diagram of the issue.

Screen Shot 2022-12-31 at 19 14 15

The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)

Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).

Impact: A network with receptive field $r$ will produce incorrect results prior to the $r$-th input. "Incorrect" in this case means at least inconsistent with its behavior in the steady state, far from the beginning of the input. This might be especially important with long receptive fields, where sequences are similar in length to the receptive field, because a substantial portion of the training examples will be using these wrong padding values.

Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.

def test_tcn():
    torch.manual_seed(42)
    def init_weights(m):
        if isinstance(m, nn.Conv1d):
            if hasattr(m, 'weight_g'):
                # weight_norm was applied to this layer
                torch.nn.init.uniform_(m.weight_g)
                torch.nn.init.uniform_(m.weight_v)
                # XXX: not sure if this is correct way to initialize
            else:
                torch.nn.init.uniform_(m.weight)
            torch.nn.init.uniform_(m.bias)

    with torch.no_grad():
        net = tcn.TemporalConvNet(num_inputs=1, num_channels=[2, 1], kernel_size=2, dropout=0)
        net.apply(init_weights)
        print("Receptive field", net.receptive_field_size)

        for i in range(8):
            print(f"Padding with {i} zeros:",
                  net(torch.Tensor([[ [0] * i + [1] ]])))

        print("Zero input response:", net(torch.Tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])))
Receptive field 7
Padding with 0 zeros: tensor([[[2.1018]]])
Padding with 1 zeros: tensor([[[1.3458, 2.2364]]])
Padding with 2 zeros: tensor([[[1.3458, 1.4805, 2.4149]]])
Padding with 3 zeros: tensor([[[1.3458, 1.4805, 1.6590, 2.4309]]])
Padding with 4 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 2.4466]]])
Padding with 5 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 2.4550]]])
Padding with 6 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 2.4550]]])
Padding with 7 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 2.4550]]])

Zero input response: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 1.6991,
          1.6991, 1.6991, 1.6991, 1.6991]]])

Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions