I am applying the transformer model and I apply padding_mask + look_a_head_mask to the attention layer. But the masks are not propagated to outputs. Is there any way to apply padding_mask when calculating loss?
https://stackoverflow.com/questions/65386351/how-do-i-mask-output-in-transformer-model December 21, 2020 at 09:06AM
没有评论:
发表评论