Getting Started: Basics#

Initializing Built-In Environments#

To initialize a built-in environment from the rddlrepository:

import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")

where “CartPole_Continuous_gym” is the name of the domain and “0” is the instance.

Initializing Environments from RDDL Files#

To initialize an environment from RDDL description files stored on the file system:

import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")

where both arguments must be valid file paths to domain and instance RDDL description files.

Note

make() returns an object of type RDDLEnv, which is also a gymnasium.Env, and can thus be used in most workflows where gym or gymnasium environments are required.

Writing new domains and instances is as easy as writing a few lines of text in a mathematical fashion! The complete and up-to-date syntax of the RDDL language is described here.

Policies#

A policy interacts with an environment by providing actions or controls in each state. pyRDDLGym provides two simple policies, which are all instances of pyRDDLGym.core.policy.BaseAgent:

  • NoOpAgent returns the default action values specified in the RDDL domain.

  • RandomAgent samples a random action according to the env.action_space and the maximum number of concurrent actions specified in the RDDL file.

For example, to initialize a random policy:

from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)

All policies must implement a sample_action() function for sampling an action in each state:

action = agent.sample_action(state)

Note

Random policies respect only box constraints, due to limitations in Gym. To handle arbitrary nonlinear constraints, implement a custom BaseAgent with its own sample_action() function.

Jupyter Notebook Related example: Simulating an environment in pyRDDLGym with a built-in policy.

To implement your own custom policy, inherit from pyRDDLGym.core.policy.BaseAgent:

from pyRDDLGym.core.policy import BaseAgent

class CustomAgent(BaseAgent):

    def sample_action(self, state=None):
        # here goes the code that returns the current action
        ...
Jupyter Notebook Related example: Simulating an environment in pyRDDLGym with a custom policy.

Interacting with an Environment#

Interaction with an environment is done by calling env.step() and env.reset(), just like regular Gym/Gymnasium.

All fluent values are passed and received as Python dict objects, whose keys are valid fluent names as defined in the RDDL domain description.

The structure of the keys for parameterized fluents deserves attention, since the keys need to specify not only the fluent name, but also the objects assigned to their parameters. In pyRDDLGym, the fluent name must be followed by ___ (3 underscores), then the list of objects separated by __ (2 underscores). To illustrate, for the fluent put-out(?x, ?y), the required key for objects (x1, y1) is put-out___x1__y1.

Note

When passing an action dictionary to a RDDLEnv, any missing key-value pairs in the dictionary will be assigned the default (or no-op) values as specified in the RDDL domain description.

We now show what a complete agent-environment loop looks like in pyRDDLGym. The example below will run the CartPole_Continuous_gym environment for a single episode/trial, rendering the state to the screen in real time:

import pyRDDLGym
from pyRDDLGym.core.policy import RandomAgent

# set up the Mars Rover instance 0
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")

# set up a random policy
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)

# perform a roll-out from the initial state
total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
    env.render()
    action = agent.sample_action(state)
    next_state, reward, terminated, truncated, _ = env.step(action)
    print(f'state = {state}, action = {action}, reward = {reward}')
    total_reward += reward
    state = next_state
    if terminated or truncated:
        break
print(f'episode ended with reward {total_reward}')

Alternatively, the evaluate() bypasses the need to write out the entire loop:

total_reward = agent.evaluate(env, episodes=1, render=True)['mean']

The agent.evaluate() call returns a dictionary of summary statistics about the total rewards collected across episodes, such as mean, median, and standard deviation.

Fixing the Random Seed#

In order to get reproducible results when running an experiment, it is necessary to fix the random seed. This can be passed to env.reset() once at the start of the experiment:

env.reset(seed=42)

or alternatively passing it to agent.evaluate() as follows:

agent.evaluate(env, seed=42)

Other objects that require randomness typically support setting the random seed. For example, to fix the seed of the RandomAgent instance:

agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions, seed=42)

Spaces#

The state and action spaces of a RDDLEnv are standard gymnasium.spaces and are accessible via env.state_space and env.action_space, respectively. In most cases, state and action spaces are gymnasium.spaces.Dict objects, whose key-value pairs are fluent names and their current values.

To compute bounds on RDDL fluents, pyRDDLGym analyzes the action-preconditions and state-invariants expressions. For box constraints, the conversion happens as follows:

  • real -> Box(l, u) where (l, u) are the bounds on the fluent

  • int -> Discrete(l, u) where (l, u) are the bounds on the fluent

  • bool -> Discrete(2)

Note

Any constraints that cannot be rewritten as box constraints are ignored, due to limitations of Gymnasium. If no valid box bounds for a fluent are available, they are set to (-np.inf, np.inf)

Using Built-In Visualizers#

Every domain has a default visualizer assigned to it, which is either a graphical ChartVisualizer that plots the state trajectory over time, or a custom domain-dependent implementation.

Assigning a visualizer for an environment can be done by calling env.set_visualizer(viz) with viz as the desired visualization object (or a string identifier).

For example, to assign the ChartVisualizer or the HeatmapVisualizer, which use line charts or heatmaps to track the state across time, or the TextVisualizer, which produces a textual representation of the state:

env.set_visualizer("chart")
env.set_visualizer("heatmap")
env.set_visualizer("text")

Calling env.set_visualizer(viz=None, ...) will not change the visualizer already assigned: this is useful if you want to record movies using the default viz as described later.

Using a Custom Visualizer#

To assign a custom visualizer object MyDomainViz that implements a valid render(state) method,

from pyRDDLGym.core.visualizer.viz import BaseViz

class MyDomainViz(BaseViz)

    def render(self, state):
        # here goes the visualization implementation
        ...

env.set_visualizer(MyDomainViz)

Warning

The visualizer argument in set_visualizer should not contain the customary () when initializing the visualizer object, since this is done internally. So, instead of writing env.set_visualizer(MyDomainViz(**MyArgs)), write env.set_visualizer(MyDomainViz, viz_kwargs=MyArgs).

All visualizers can be activated in an environment by calling env.render() on each call to env.step() or env.reset(), just like regular Gym/Gymnasium.

Recording Movies#

A MovieGenerator class is provided to capture videos of the environment interaction over time:

from pyRDDLGym.core.visualizer.movie import MovieGenerator
recorder = MovieGenerator("/folder/path/to/save/animation", "env_name", max_frames=999999)
env.set_visualizer(viz=None, movie_gen=recorder)

Upon calling env.close(), the images captured will be combined into video format and saved to the desired path. Any temporary files created to capture individual frames during interaction will be deleted from disk.

Note

Videos will not be saved until the environment is closed with env.close(). However, frames will be recorded to disk continuously while the environment interaction is taking place (to save RAM), which will be used to generate the video. Therefore, it is important to not delete these images while the recording is taking place.

Jupyter Notebook Related example: Recording a movie of a simulation in pyRDDLGym.

Logging Simulation Data#

A record of all past interactions with an environment can be logged to a machine readable CSV file for later analysis:

env = pyRDDLGym.make("CartPole_Continuous_gym", "0", log_path="/path/to/output.csv")

Upon interacting with the environment, pyRDDLGym appends the new observations to the log file at the specified path. Logging continues until env.close() is called.