Edited layer usage explanation (d928b586) · Commits · 钟慕尧 / deepchem

deepchem/models/torch_models/layers.py

+4 −1

Original line number	Diff line number	Diff line
		@@ -13,6 +13,9 @@ class ScaleNorm(nn.Module):
		The norm value is calculated as `sqrt(scale) / matrix norm`.
		Finally, the result is returned as `input_tensor * norm value`.

		This layer can be used instead of LayerNorm when a scaled version of the norm is required.
		Instead of performing the scaling operation (`scale / norm`) in a lambda-like layer, we are defining it within this layer to make prototyping more efficient.

		References
		----------
		.. [1] Lukasz Maziarka et al. "Molecule Attention Transformer" Graph Representation Learning workshop and Machine Learning and the Physical Sciences workshop at NeurIPS 2019. 2020. https://arxiv.org/abs/2002.08264