File talk:Full GPT architecture.png

there is a mistake here with how the normalization is applied. the norm layer before the mlp is actually an internal layer of the mlp and the add is happening externally. essentially the arrow out should have happened 1 step before.

this can be seen clearly in the source code at: https://github.com/openai/gpt-2/blob/master/src/model.py

def block(x, scope, *, past, hparams):

   with tf.variable_scope(scope):
       nx = x.shape[-1].value
       a, present = attn(norm(x, 'ln_1'), 'attn', nx, past=past, hparams=hparams)
       x = x + a

look here

       m = mlp(norm(x, 'ln_2'), 'mlp', nx*4, hparams=hparams) #this one

look here

       x = x + m
       return x, present

Talk pages are where people discuss how to make content on Wikipedia the best that it can be. You can use this page to start a discussion with others about how to improve the "File:Full GPT architecture.png" page.

Start a discussion