Checking Constraints in pyRDDLGym#
In this notebook, we show how to check constraints on states or actions during simulation in pyRDDLGym.
First, install the required packages:
pip install --quiet --upgrade pip pyRDDLGym rddlrepository
Note: you may need to restart the kernel to use updated packages.
Import the required packages:
import warnings
warnings.filterwarnings('ignore')
import pyRDDLGym
from pyRDDLGym.core.policy import RandomAgent
Let’s simulate the elevators domain, with constraint checking on actions:
env = pyRDDLGym.make('Elevators', '0', enforce_action_constraints=True)
agent = RandomAgent(env.action_space, num_actions=env.max_allowed_actions)
agent.evaluate(env, episodes=1, verbose=False, render=False)
---------------------------------------------------------------------------
RDDLActionPreconditionNotSatisfiedError Traceback (most recent call last)
Cell In[2], line 3
1 env = pyRDDLGym.make('Elevators', '0', enforce_action_constraints=True)
2 agent = RandomAgent(env.action_space, num_actions=env.max_allowed_actions)
----> 3 agent.evaluate(env, episodes=1, verbose=False, render=False)
File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\policy.py:93, in BaseAgent.evaluate(self, env, episodes, verbose, render, seed)
91 # take a step in the environment
92 action = self.sample_action(state)
---> 93 next_state, reward, terminated, truncated, _ = env.step(action)
94 total_reward += reward * cuml_gamma
95 cuml_gamma *= gamma
File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\env.py:231, in RDDLEnv.step(self, actions)
229 sampler.check_default_action_count(sim_actions, self.enforce_count_non_bool)
230 if self.enforce_action_constraints:
--> 231 sampler.check_action_preconditions(sim_actions, silent=False)
233 # sample next state and reward
234 obs, reward, terminated = sampler.step(sim_actions)
File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\simulator.py:382, in RDDLSimulator.check_action_preconditions(self, actions, silent)
380 if not bool(sample):
381 if not silent:
--> 382 raise RDDLActionPreconditionNotSatisfiedError(
383 f'{loc} is not satisfied for actions {actions}.\n' +
384 print_stack_trace(precond))
385 return False
386 return True
RDDLActionPreconditionNotSatisfiedError: Precondition 0 is not satisfied for actions {'move-current-dir': array([False, False]), 'open-door': array([ True, False]), 'close-door': array([ True, False])}.
>> ( forall_{?e: elevator} [ ( ( open-door(?e) + close-door(?e) ) + move-current-dir(?e) ) <= 1 ] )
As you can see, the simulation terminates with an invalid action, since the built-in random policy cannot account for arbitrary action-preconditions during simulation. Rejection sampling is a simple way to enforce constraints, at the expense of some extra computation:
def rejection_sample_action(state):
action = agent.sample_action(state)
sim_action = env.sampler.prepare_actions_for_sim(action)
while not env.sampler.check_action_preconditions(sim_action, silent=True):
action = agent.sample_action(state)
sim_action = env.sampler.prepare_actions_for_sim(action)
return action
Let’s use this to sample actions that satisfy constraints:
state, _ = env.reset()
agent.reset()
total_reward = 0.0
for _ in range(env.horizon):
action = rejection_sample_action(state)
state, reward, term, trunc, _ = env.step(action)
total_reward += reward
if term or trunc: break
print(f'Total reward: {total_reward}')
Total reward: -1006.5
As you can see, sampling now proceeds without violating the action preconditions.