Getting Started: Basics#
Initializing Environments#
Built-In Environments#
To initialize a built-in environment from the rddlrepository:
import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")
where “CartPole_Continuous_gym” is the name of the domain and “0” is the instance.
From RDDL Files#
To initialize an environment from RDDL description files stored on the file system:
import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")
where both arguments must be valid file paths to domain and instance RDDL description files.
Note
make() returns an object of type RDDLEnv, which is also a gymnasium.Env, and can thus be used in
most workflows where gym or gymnasium environments are required.
Writing new domains and instances is as easy as writing a few lines of text in a mathematical fashion! The complete and up-to-date syntax of the RDDL language is described here.
Policies#
A policy interacts with an environment by providing actions or controls in each state.
Built-In Policies#
pyRDDLGym provides two simple policies, which are all instances of pyRDDLGym.core.policy.BaseAgent:
NoOpAgentreturns the default action values specified in the RDDL domain.RandomAgentsamples a random action according to theenv.action_spaceand themax-nondef-actions.
For example, to initialize a random policy:
from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)
All policies must implement a sample_action() function for sampling an action in each state:
action = agent.sample_action(state)
Note
Random policies respect only box constraints, due to limitations in Gym.
To handle arbitrary nonlinear constraints, implement a custom BaseAgent
with its own sample_action() function.
Custom Policies#
To implement your own custom policy, inherit from pyRDDLGym.core.policy.BaseAgent:
from pyRDDLGym.core.policy import BaseAgent
class CustomAgent(BaseAgent):
def sample_action(self, state):
# here goes the code that returns the current action
...
Interacting with an Environment#
Interaction with an environment is done by calling env.step(action)
and env.reset(), just like regular Gym/Gymnasium.
Reading and Passing Fluents#
All fluent values are passed and received as Python dict objects,
whose keys are valid fluent names as defined in the RDDL domain description.
The structure of the keys for parameterized fluents deserves attention, since the keys
need to specify not only the fluent name, but also the objects assigned to their parameters.
In pyRDDLGym, the fluent name must be followed by ___ (3 underscores), then the
list of objects separated by __ (2 underscores). To illustrate, for the fluent
put-out(?x, ?y), the required key for objects (x1, y1) is put-out___x1__y1.
Another option is to pass a dict whose keys are lifted fluent names, i.e. put-out, in which
case the values must be numpy arrays (of the necessary shape and dtype).
Note
When passing an action dictionary to a RDDLEnv,
any missing key-value pairs in the dictionary will be assigned the default (or no-op) values
as specified in the RDDL domain description.
Interaction Loop#
We now show what a complete agent-environment loop looks like in pyRDDLGym.
The example below will run the CartPole_Continuous_gym environment for a single episode,
rendering the state to the screen in real time:
import pyRDDLGym
from pyRDDLGym.core.policy import RandomAgent
# set up the Mars Rover instance 0
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")
# set up a random policy
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)
# perform a roll-out from the initial state
total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
env.render()
action = agent.sample_action(state)
next_state, reward, terminated, truncated, _ = env.step(action)
print(f'state = {state}, action = {action}, reward = {reward}')
total_reward += reward
state = next_state
if terminated or truncated:
break
print(f'episode ended with reward {total_reward}')
Alternatively, the evaluate() bypasses the need to write out the entire loop:
total_reward = agent.evaluate(env, episodes=1, render=True)['mean']
The agent.evaluate() call returns a dictionary of summary statistics about the
total rewards collected across episodes, such as mean, median, and standard deviation.
Setting the Random Seed#
In order to get reproducible results, it is necessary to set the random seed.
This can be passed to env.reset() once at the start of the experiment:
env.reset(seed=42)
or alternatively to agent.evaluate():
agent.evaluate(env, seed=42)
Other objects that require randomness typically support setting random seeds.
For example, to set the seed of the RandomAgent instance:
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions, seed=42)
Handling Simulation Errors#
By default, evaluate() will not raise errors if action preconditions or state invariants are violated.
State invariant violations are stored in the truncated field returned by env.step().
If you wish to enforce action constraints, simply initialize your environment like this:
import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0", enforce_action_constraints=True)
By default, evaluate() will not raise an exception if a numerical error occurs during an intermediate calculation,
such as divide by zero or under/overflow. If you wish to raise/catch all numerical errors, add the following
before calling evaluate():
import numpy as np
np.seterror(all='raise')
More details about controlling error handling behavior can be found here.
Warning
Branched error handling in operations such as if and switch
is incompatible with vectorized computation. To illustrate, an expression like
if (pvar(?x) == 0) then default(?x) else 1.0 / pvar(?x) will evaluate 1.0 / pvar(?x) first
for all values of ?x, regardless of the branch condition, and will thus trigger an exception if pvar(?x) == 0
for some value of ?x. For the time being, we recommend suppressing errors as described above.
Gym state_space and action_space#
The state and action spaces of a RDDLEnv are standard gymnasium.spaces and are
accessible via env.state_space and env.action_space, respectively.
In most cases, state and action spaces are gymnasium.spaces.Dict objects, whose key-value pairs
are fluent names and their current values.
To compute bounds on RDDL fluents, pyRDDLGym analyzes the
action-preconditions and state-invariants expressions.
For box constraints, the conversion happens as follows:
real ->
Box(l, u)where(l, u)are the bounds on the fluentint ->
Discrete(l, u)where(l, u)are the bounds on the fluentbool ->
Discrete(2)
Note
Any constraints that cannot be rewritten as box constraints are ignored, due to limitations of Gymnasium.
If no valid box bounds for a fluent are available, they are set to (-np.inf, np.inf)
Visualizing Environments#
Built-In Visualizers#
Every domain has a default visualizer assigned to it, which is either a
ChartVisualizer that plots the state trajectory as a graph, or a domain-dependent implementation.
Assigning a visualizer to an environment can be done by calling
env.set_visualizer(viz) with viz as the desired visualization object (or a string identifier).
For example, to assign the ChartVisualizer or the HeatmapVisualizer,
which use line charts or heatmaps to track the state across time,
or the TextVisualizer, which produces a textual representation of the state:
env.set_visualizer("chart")
env.set_visualizer("heatmap")
env.set_visualizer("text")
Calling env.set_visualizer(viz=None, ...) will not change the visualizer already assigned: this is useful
if you want to record movies using the default viz as described later.
Custom Visualizers#
To assign a custom visualizer object MyDomainViz that implements a valid render(state) method,
from pyRDDLGym.core.visualizer.viz import BaseViz
class MyDomainViz(BaseViz)
def render(self, state):
# here goes the visualization implementation
...
env.set_visualizer(MyDomainViz)
Warning
The visualizer argument in set_visualizer should not contain the customary
() when initializing the visualizer object, since this is done internally.
So, instead of writing env.set_visualizer(MyDomainViz(**MyArgs)), write
env.set_visualizer(MyDomainViz, viz_kwargs=MyArgs).
All visualizers can be activated in an environment by calling env.render()
on each call to env.step() or env.reset(), just like regular Gym/Gymnasium.
Logging Information#
Recording Movies#
A MovieGenerator class is provided to capture videos of the environment interaction over time:
from pyRDDLGym.core.visualizer.movie import MovieGenerator
recorder = MovieGenerator("/folder/path/to/save/animation", "env_name", max_frames=999999)
env.set_visualizer(viz=None, movie_gen=recorder)
Upon calling env.close(), the images captured will be combined into video format and saved to the desired path.
Any temporary files created to capture individual frames during interaction will be deleted from disk.
Note
Videos will not be saved until the environment is closed with env.close(). However, frames will be recorded
to disk continuously while the environment interaction is taking place (to save RAM), which will be used to generate the video.
Therefore, it is important to not delete these images while the recording is taking place.
Logging Simulation Data#
A record of all past interactions with an environment can be logged to a machine readable CSV file for later analysis:
env = pyRDDLGym.make("CartPole_Continuous_gym", "0", log_path="/path/to/output.csv")
Upon interacting with the environment, pyRDDLGym appends the new observations to the log file at the
specified path. Logging continues until env.close() is called.
Debugging Logs#
To log information about the RDDL compilation to a file for debugging:
import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0", debug_path="\path\to\log\file")
where debug_path is the full path to the debug file minus the extension.
A log file will be created in the specified path with the .log extension.
Currently, the following information is logged:
description of pvariables as they are stored in memory (e.g., parameters, data type, data shape)
dependency graph between CPFs
calculated order of evaluation of CPFs
information used by the simulator for operating on pvariables stored as arrays
simulation bounds for state and action fluents
if you are using
pyRDDLGym-jax, the computation graphs will also be loggedif you are using
pyRDDLGym-rl, the observation and action spaces information will also be logged