Commit bc4ca37d authored by jameskrw's avatar jameskrw
Browse files

minor

parent df4fa0d4
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -19,8 +19,8 @@ class FrozenLakeEnvConfig(BaseEnvConfig):
    
    # configs for process reward for grounding and world modeling
    use_state_reward: bool = False
    grounding_reward_weight: float = 2.0
    worldmodeling_reward_weight: float = 2.0
    grounding_reward_weight: float = 0.5
    worldmodeling_reward_weight: float = 0.5
    
    def config_id(self) -> str:
        id_fields=["is_slippery", "size", "p", "render_mode", "max_actions_per_step", "min_actions_to_succeed","format_reward"]
+2 −2
Original line number Diff line number Diff line
@@ -18,8 +18,8 @@ class PrimitiveSkillEnvConfig(BaseEnvConfig):
    
    # configs for process reward for grounding and world modeling
    use_state_reward: bool = False
    grounding_reward_weight: float = 2.0
    worldmodeling_reward_weight: float = 2.0
    grounding_reward_weight: float = 0.5
    worldmodeling_reward_weight: float = 0.5
    
    
    def config_id(self) -> str:
+2 −2
Original line number Diff line number Diff line
@@ -18,8 +18,8 @@ class SokobanEnvConfig(BaseEnvConfig):
    
    # configs for process reward for grounding and world modeling
    use_state_reward: bool = False
    grounding_reward_weight: float = 2.0
    worldmodeling_reward_weight: float = 2.0
    grounding_reward_weight: float = 0.5
    worldmodeling_reward_weight: float = 0.5
    
    def config_id(self) -> str:
        id_fields = ["dim_room", "max_steps", "num_boxes", "render_mode", "min_actions_to_succeed", "max_actions_per_step"]
+16 −10
Original line number Diff line number Diff line
@@ -29,10 +29,13 @@ prompt_templates:
      Compare the description of the current state with the groundtruth current state information.
      Answer YES if the description matches the current state information, or NO if it doesn't.

      Groundtruth Current State Information:
      # Context
      You are evaluating whether an agent's description accurately reflects the actual state. The description must be both correct overall AND specifically relevant to the important elements of the current state. Generic observations (like "player, box and target is on the ground") that don't capture the meaningful relationships and positions in the state are insufficient. The description should demonstrate understanding of the specific configuration and relationships that matter for decision-making.

      # Groundtruth Current State Information:
      {state_information_dict}

      State Description:
      # State Description:
      {natural_language_description}

      Think step by step and end with your answer.
@@ -41,10 +44,13 @@ prompt_templates:
      Compare the prediction of the next state with the groundtruth next state information.
      Answer YES if the prediction accurately matches the next state information, or NO if it doesn't.

      Groundtruth Next State Information:
      # Context
      You are evaluating whether an agent's prediction of the next state is accurate. The prediction must be both correct overall AND specifically relevant to the important elements of the next state. Generic predictions that don't capture the meaningful changes, relationships, and positions in the state are insufficient. The prediction should demonstrate understanding of the specific configuration and relationships that will result from the action, showing how the state will transform in ways that matter for decision-making.

      # Groundtruth Next State Information:
      {state_information_dict}

      Next State Prediction:
      # Next State Prediction:
      {natural_language_description}

      Think step by step and end with your answer.