Checking Constraints in pyRDDLGym#

In this notebook, we show how to check constraints on states or actions during simulation in pyRDDLGym.

First, install the required packages:

pip install --quiet --upgrade pip pyRDDLGym rddlrepository

Note: you may need to restart the kernel to use updated packages.

Import the required packages:

import warnings
warnings.filterwarnings('ignore')

import pyRDDLGym
from pyRDDLGym.core.policy import RandomAgent

Let’s simulate the elevators domain, with constraint checking on actions:

env = pyRDDLGym.make('Elevators', '0', enforce_action_constraints=True)
agent = RandomAgent(env.action_space, num_actions=env.max_allowed_actions)
agent.evaluate(env, episodes=1, verbose=False, render=False)

---------------------------------------------------------------------------
RDDLActionPreconditionNotSatisfiedError   Traceback (most recent call last)
Cell In[2], line 3
      1 env = pyRDDLGym.make('Elevators', '0', enforce_action_constraints=True)
      2 agent = RandomAgent(env.action_space, num_actions=env.max_allowed_actions)
----> 3 agent.evaluate(env, episodes=1, verbose=False, render=False)

File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\policy.py:93, in BaseAgent.evaluate(self, env, episodes, verbose, render, seed)
     91 # take a step in the environment
     92 action = self.sample_action(state)   
---> 93 next_state, reward, terminated, truncated, _ = env.step(action)
     94 total_reward += reward * cuml_gamma
     95 cuml_gamma *= gamma

File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\env.py:231, in RDDLEnv.step(self, actions)
    229 sampler.check_default_action_count(sim_actions, self.enforce_count_non_bool)
    230 if self.enforce_action_constraints:
--> 231     sampler.check_action_preconditions(sim_actions, silent=False)
    233 # sample next state and reward
    234 obs, reward, terminated = sampler.step(sim_actions)

File c:\Python\envs\rddl2\Lib\site-packages\pyRDDLGym\core\simulator.py:382, in RDDLSimulator.check_action_preconditions(self, actions, silent)
    380     if not bool(sample):
    381         if not silent:
--> 382             raise RDDLActionPreconditionNotSatisfiedError(
    383                 f'{loc} is not satisfied for actions {actions}.\n' + 
    384                 print_stack_trace(precond))
    385         return False
    386 return True

RDDLActionPreconditionNotSatisfiedError: Precondition 0 is not satisfied for actions {'move-current-dir': array([False, False]), 'open-door': array([ True, False]), 'close-door': array([ True, False])}.
>> ( forall_{?e: elevator} [ ( ( open-door(?e) + close-door(?e) ) + move-current-dir(?e) ) <= 1 ] )

As you can see, the simulation terminates with an invalid action, since the built-in random policy cannot account for arbitrary action-preconditions during simulation. Rejection sampling is a simple way to enforce constraints, at the expense of some extra computation:

def rejection_sample_action(state):
    action = agent.sample_action(state)
    sim_action = env.sampler.prepare_actions_for_sim(action)
    while not env.sampler.check_action_preconditions(sim_action, silent=True):
        action = agent.sample_action(state)
        sim_action = env.sampler.prepare_actions_for_sim(action)
    return action

Let’s use this to sample actions that satisfy constraints:

state, _ = env.reset()
agent.reset()
total_reward = 0.0
for _ in range(env.horizon):
    action = rejection_sample_action(state)
    state, reward, term, trunc, _ = env.step(action)
    total_reward += reward
    if term or trunc: break
print(f'Total reward: {total_reward}')

Total reward: -1006.5

As you can see, sampling now proceeds without violating the action preconditions.