-
Notifications
You must be signed in to change notification settings - Fork 902
Description
Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.
Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads every Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.
Here is a diagram of the issue.
The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)
Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).
Impact: A network with receptive field
Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.
def test_tcn():
torch.manual_seed(42)
def init_weights(m):
if isinstance(m, nn.Conv1d):
if hasattr(m, 'weight_g'):
# weight_norm was applied to this layer
torch.nn.init.uniform_(m.weight_g)
torch.nn.init.uniform_(m.weight_v)
# XXX: not sure if this is correct way to initialize
else:
torch.nn.init.uniform_(m.weight)
torch.nn.init.uniform_(m.bias)
with torch.no_grad():
net = tcn.TemporalConvNet(num_inputs=1, num_channels=[2, 1], kernel_size=2, dropout=0)
net.apply(init_weights)
print("Receptive field", net.receptive_field_size)
for i in range(8):
print(f"Padding with {i} zeros:",
net(torch.Tensor([[ [0] * i + [1] ]])))
print("Zero input response:", net(torch.Tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])))Receptive field 7
Padding with 0 zeros: tensor([[[2.1018]]])
Padding with 1 zeros: tensor([[[1.3458, 2.2364]]])
Padding with 2 zeros: tensor([[[1.3458, 1.4805, 2.4149]]])
Padding with 3 zeros: tensor([[[1.3458, 1.4805, 1.6590, 2.4309]]])
Padding with 4 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 2.4466]]])
Padding with 5 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 2.4550]]])
Padding with 6 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 2.4550]]])
Padding with 7 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 2.4550]]])
Zero input response: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 1.6991,
1.6991, 1.6991, 1.6991, 1.6991]]])
Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.
