Getting Started: Basics#
Initializing Built-In Environments#
To initialize a built-in environment from the rddlrepository:
import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")
where “CartPole_Continuous_gym” is the name of the domain and “0” is the instance.
Initializing Environments from RDDL Files#
To initialize an environment from RDDL description files stored on the file system:
import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")
where both arguments must be valid file paths to domain and instance RDDL description files.
Note
make()
returns an object of type RDDLEnv
, which is also a gymnasium.Env
, and can thus be used in
most workflows where gym or gymnasium environments are required.
Writing new domains and instances is as easy as writing a few lines of text in a mathematical fashion! The complete and up-to-date syntax of the RDDL language is described here.
Policies#
A policy interacts with an environment by providing actions or controls in each state.
pyRDDLGym provides two simple policies, which are all instances of pyRDDLGym.core.policy.BaseAgent
:
NoOpAgent
returns the default action values specified in the RDDL domain.RandomAgent
samples a random action according to theenv.action_space
and the maximum number of concurrent actions specified in the RDDL file.
For example, to initialize a random policy:
from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)
All policies must implement a sample_action()
function for sampling an action in each state:
action = agent.sample_action(state)
Note
Random policies respect only box constraints, due to limitations in Gym.
To handle arbitrary nonlinear constraints, implement a custom BaseAgent
with its own sample_action()
function.

To implement your own custom policy, inherit from pyRDDLGym.core.policy.BaseAgent
:
from pyRDDLGym.core.policy import BaseAgent
class CustomAgent(BaseAgent):
def sample_action(self, state=None):
# here goes the code that returns the current action
...

Interacting with an Environment#
Interaction with an environment is done by calling env.step()
and env.reset()
, just like regular Gym/Gymnasium.
All fluent values are passed and received as Python dict
objects,
whose keys are valid fluent names as defined in the RDDL domain description.
The structure of the keys for parameterized fluents deserves attention, since the keys
need to specify not only the fluent name, but also the objects assigned to their parameters.
In pyRDDLGym, the fluent name must be followed by ___
(3 underscores), then the
list of objects separated by __
(2 underscores). To illustrate, for the fluent
put-out(?x, ?y)
, the required key for objects (x1, y1)
is put-out___x1__y1
.
Note
When passing an action dictionary to a RDDLEnv
,
any missing key-value pairs in the dictionary will be assigned the default (or no-op) values
as specified in the RDDL domain description.
We now show what a complete agent-environment loop looks like in pyRDDLGym.
The example below will run the CartPole_Continuous_gym
environment for a single episode/trial,
rendering the state to the screen in real time:
import pyRDDLGym
from pyRDDLGym.core.policy import RandomAgent
# set up the Mars Rover instance 0
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")
# set up a random policy
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)
# perform a roll-out from the initial state
total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
env.render()
action = agent.sample_action(state)
next_state, reward, terminated, truncated, _ = env.step(action)
print(f'state = {state}, action = {action}, reward = {reward}')
total_reward += reward
state = next_state
if terminated or truncated:
break
print(f'episode ended with reward {total_reward}')
Alternatively, the evaluate()
bypasses the need to write out the entire loop:
total_reward = agent.evaluate(env, episodes=1, render=True)['mean']
The agent.evaluate()
call returns a dictionary of summary statistics about the
total rewards collected across episodes, such as mean, median, and standard deviation.
Fixing the Random Seed#
In order to get reproducible results when running an experiment, it is necessary to
fix the random seed. This can be passed to env.reset()
once at the start of the experiment:
env.reset(seed=42)
or alternatively passing it to agent.evaluate()
as follows:
agent.evaluate(env, seed=42)
Other objects that require randomness typically support setting the random seed.
For example, to fix the seed of the RandomAgent
instance:
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions, seed=42)
Spaces#
The state and action spaces of a RDDLEnv
are standard gymnasium.spaces
and are
accessible via env.state_space
and env.action_space
, respectively.
In most cases, state and action spaces are gymnasium.spaces.Dict
objects, whose key-value pairs
are fluent names and their current values.
To compute bounds on RDDL fluents, pyRDDLGym analyzes the
action-preconditions
and state-invariants
expressions.
For box constraints, the conversion happens as follows:
real ->
Box(l, u)
where(l, u)
are the bounds on the fluentint ->
Discrete(l, u)
where(l, u)
are the bounds on the fluentbool ->
Discrete(2)
Note
Any constraints that cannot be rewritten as box constraints are ignored, due to limitations of Gymnasium.
If no valid box bounds for a fluent are available, they are set to (-np.inf, np.inf)
Using Built-In Visualizers#
Every domain has a default visualizer assigned to it, which is either a graphical
ChartVisualizer
that plots the state trajectory over time, or a custom domain-dependent implementation.
Assigning a visualizer for an environment can be done by calling
env.set_visualizer(viz)
with viz
as the desired visualization object (or a string identifier).
For example, to assign the ChartVisualizer
or the HeatmapVisualizer
,
which use line charts or heatmaps to track the state across time,
or the TextVisualizer
, which produces a textual representation of the state:
env.set_visualizer("chart")
env.set_visualizer("heatmap")
env.set_visualizer("text")
Calling env.set_visualizer(viz=None, ...)
will not change the visualizer already assigned: this is useful
if you want to record movies using the default viz as described later.
Using a Custom Visualizer#
To assign a custom visualizer object MyDomainViz
that implements a valid render(state)
method,
from pyRDDLGym.core.visualizer.viz import BaseViz
class MyDomainViz(BaseViz)
def render(self, state):
# here goes the visualization implementation
...
env.set_visualizer(MyDomainViz)
Warning
The visualizer argument in set_visualizer
should not contain the customary
()
when initializing the visualizer object, since this is done internally.
So, instead of writing env.set_visualizer(MyDomainViz(**MyArgs))
, write
env.set_visualizer(MyDomainViz, viz_kwargs=MyArgs)
.
All visualizers can be activated in an environment by calling env.render()
on each call to env.step()
or env.reset()
, just like regular Gym/Gymnasium.
Recording Movies#
A MovieGenerator
class is provided to capture videos of the environment interaction over time:
from pyRDDLGym.core.visualizer.movie import MovieGenerator
recorder = MovieGenerator("/folder/path/to/save/animation", "env_name", max_frames=999999)
env.set_visualizer(viz=None, movie_gen=recorder)
Upon calling env.close()
, the images captured will be combined into video format and saved to the desired path.
Any temporary files created to capture individual frames during interaction will be deleted from disk.
Note
Videos will not be saved until the environment is closed with env.close()
. However, frames will be recorded
to disk continuously while the environment interaction is taking place (to save RAM), which will be used to generate the video.
Therefore, it is important to not delete these images while the recording is taking place.

Logging Simulation Data#
A record of all past interactions with an environment can be logged to a machine readable CSV file for later analysis:
env = pyRDDLGym.make("CartPole_Continuous_gym", "0", log_path="/path/to/output.csv")
Upon interacting with the environment, pyRDDLGym appends the new observations to the log file at the
specified path. Logging continues until env.close()
is called.