Skip to content

Asymmetry in read gate application #68

@jozef-mokry

Description

@jozef-mokry

This probably does not make much difference, but I noticed that the read gates r1 and r2 in gru_cond_layer method are used slightly differently:

Here (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L448) the hidden state is computed as:

h1 = tanh(xx_ + r1*(Ux*h)) where xx_ is Wx*state_below + bx [Notice that the read gate r1 is not applied onto the bias bx]

However, when computing the second hidden state h2 at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L477) the hidden state is computed as:

h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl)) [Notice that the read gate r2 is applied onto the bias bx_nl]
If r2 "kills" some dimensions of the bias term bx_nl then some decision hyperplanes of Wcx are forced to go through origin.

Is this asymmetry intended?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions