Commit 26bf0d9c authored by Bharath Ramsundar's avatar Bharath Ramsundar
Browse files

tfp

parent d695f5b3
Loading
Loading
Loading
Loading
+5 −0
Original line number Diff line number Diff line
@@ -122,6 +122,11 @@ class A2C(object):
  except specifying the new goal.  It should return that list of states, and the rewards that would
  have been received for taking the specified actions from those states.  The output arrays may be
  shorter than the input ones, if the modified rollout would have terminated sooner.


  Note
  ----
  Using this class on continuous action spaces requires that `tensorflow_probability` be installed.
  """

  def __init__(self,