Commit d928b586 authored by Atreya Majumdar's avatar Atreya Majumdar
Browse files

Edited layer usage explanation

parent 04634cad
Loading
Loading
Loading
Loading
+4 −1
Original line number Diff line number Diff line
@@ -13,6 +13,9 @@ class ScaleNorm(nn.Module):
  The norm value is calculated as `sqrt(scale) / matrix norm`.
  Finally, the result is returned as `input_tensor * norm value`.

  This layer can be used instead of LayerNorm when a scaled version of the norm is required.
  Instead of performing the scaling operation (`scale / norm`) in a lambda-like layer, we are defining it within this layer to make prototyping more efficient.

  References
  ----------
  .. [1] Lukasz Maziarka et al. "Molecule Attention Transformer" Graph Representation Learning workshop and Machine Learning and the Physical Sciences workshop at NeurIPS 2019. 2020. https://arxiv.org/abs/2002.08264