Asymmetry in read gate application

This probably does not make much difference, but I noticed that the read gates r1 and r2 in gru_cond_layer method are used slightly differently:

Here (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L448) the hidden state is computed as:

`h1 = tanh(xx_ + r1*(Ux*h))` where `xx_` is `Wx*state_below + bx` [Notice that the read gate `r1` is not applied onto the bias `bx`]

However, when computing the second hidden state h2 at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L477) the hidden state is computed as:

`h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl))` [Notice that the read gate `r2` is applied onto the bias `bx_nl`]
If `r2` "kills" some dimensions of the bias term `bx_nl` then some decision hyperplanes of `Wcx` are forced to go through origin. 

Is this asymmetry intended?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Asymmetry in read gate application #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Asymmetry in read gate application #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions