Update (0e29b0b8) · Commits · 钟慕尧 / deepchem

deepchem/models/torch_models/layers.py

+4 −9

Original line number	Diff line number	Diff line
		@@ -52,16 +52,14 @@ class ScaleNorm(nn.Module):


		class MultiHeadedMATAttention(nn.Module):
		"""Converts an existing attention layer to a multi-headed attention module.
		"""First constructs an attention layer tailored to the Molecular Attention Transformer [1]_ and then converts it into Multi-Headed Attention.

		Multi-Headed attention the attention mechanism multiple times parallely through the multiple attention heads.
		In Multi-Headed attention the attention mechanism multiple times parallely through the multiple attention heads.
		Thus, different subsequences of a given sequences can be processed differently.
		The query, key and value parameters are split multiple ways and each split is passed separately through a different attention head.

		References
		----------
		.. [1] Lukasz Maziarka et al. "Molecule Attention Transformer" Graph Representation Learning workshop and Machine Learning and the Physical Sciences workshop at NeurIPS 2019. 2020. https://arxiv.org/abs/2002.08264

		Examples
		--------
		>>> import deepchem as dc
		@@ -84,7 +82,6 @@ class MultiHeadedMATAttention(nn.Module):
		dropout_p: float,
		output_bias: bool = True):
		"""Initialize a multi-headed attention layer.

		Parameters
		----------
		dist_kernel: str
		@@ -128,7 +125,6 @@ class MultiHeadedMATAttention(nn.Module):
		eps: float = 1e-6,
		inf: float = 1e12):
		"""Defining and computing output for a single MAT attention layer.

		Parameters
		----------
		query: torch.Tensor
		@@ -185,7 +181,6 @@ class MultiHeadedMATAttention(nn.Module):
		inf: float = 1e12,
		**kwargs):
		"""Output computation for the MultiHeadedAttention layer.

		Parameters
		----------
		query: torch.Tensor

Admin message