Getting Started: Advanced Tools

Inspecting the Model

The pyRDDLGym compiler provides a convenient API for querying a variety of properties about RDDL constructs in a domain. These can be accessed through the model field of a RDDLEnv:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0")
model = env.model

Below are some commonly used fields of model that can be accessed directly.

Commonly-used properties accessible in `model`
syntax	description
`horizon`	horizon as defined in the instance
`discount`	discount factor as defined in the instance
`max_allowed_actions`	`max-nondef-actions` as defined in the instance
`variable_types`	dict of pvariable types (e.g. non-fluent, …) for each variable
`variable_ranges`	dict of pvariable ranges (e.g. real, …) for each variable
`variable_params`	dict of parameters and their types for each variable
`type_to_objects`	dict of all defined objects for each type
`non_fluents`	dict of initial values for each non-fluent
`state_fluents`	dict of initial values for each state-fluent
`action_fluents`	dict of default values for each action-fluent
`interm_fluents`	dict of initial values for each interm-fluent
`observ_fluents`	dict of initial values for each observ-fluent
`cpfs`	dict of `Expression` objects for each cpf
`reward`	`Expression` object for reward function
`preconditions`	list of `Expression` objects for each action-precondition
`invariants`	list of `Expression` objects for each state-invariant

Expression objects are abstract syntax trees that describe the flow of computations in each cpf, constraint relation, or the reward function of the RDDL domain: - the etype() function provides basic information about the expression, such as its type - the args() function provides its sub-expressions, which consists of other Expression objects, aggregation variables, or other information required by the engine.

Grounding a Domain

By default, pyRDDLGym works directly from the (lifted) domain description. Parameterized variables (p-variables) are represented internally as numpy arrays, whose values are propagated in a vectorized manner through mathematical expressions.

However, sometimes it is required to work with the grounded representation. For example, given a p-variable some-var(?x, ?y) of two parameters ?x and ?y, and the expression cpf'(?x, ?y) = some-var(?x, ?y) + 1.0;, the grounded representation is as follows:

cpf___x1__y1' = some-var___x1__y1 + 1.0;
cpf___x1__y2' = some-var___x1__y2 + 1.0;
cpf___x2__y1' = some-var___x2__y1 + 1.0;
cpf___x2__y2' = some-var___x2__y2 + 1.0;
...

where x1, x2... are the values of ?x and y1, y2... are the values of ?y. In other words, all p-variables are replaced by sets of non-parameterized variables (one per valid combination of objects), and all expressions are replaced by sets of expressions whose p-variable dependencies are replaced by their non-parameterized counterparts. In all cases, the grounded and lifted representations should produce the same numerical results, albeit in a slightly different format.

pyRDDLGym provides a convenient class for producing a grounded model from a lifted domain representation, as shown below:

from pyRDDLGym.core.grounder import RDDLGrounder
grounded = RDDLGrounder(env.model._AST).ground()

The grounded object returned is also an environment model, so the properties discussed in the table at the top of the page work interchangeably with grounded and lifted models.

Vectorized Input and Output

Some algorithms require a vectorized representation of states and/or actions. The RDDLEnv class provides a vectorized option to work directly with the tensor representations of state and action fluents.

For example, a bool action fluent put-out(?x, ?y) taking two parameters ?x and ?y, with 3 objects each, would be provided as a boolean-valued 3-by-3 matrix, and state fluents are returned in a similar format.

This option can be enabled as follows:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0", vectorized=True)

With this option enabled, the bounds of the observation_space and action_space of the environment are instances of gymnasium.spaces.Box with the correct shape and dtype.

Exception Handling

By default, evaluate() will not raise errors if action preconditions or state invariants are violated. State invariant violations are stored in the truncated field returned by env.step(). If you wish to enforce action constraints, simply initialize your environment like this:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0", enforce_action_constraints=True)

By default, evaluate() will not raise an exception if a numerical error occurs during an intermediate calculation, such as divide by zero or under/overflow. This behavior can be controlled through numpy.

For example, if you wish to raise/catch all numerical errors, you can add the following lines before calling env.evaluate():

import numpy as np
np.seterror(all='raise')

More details about controlling error handling behavior can be found here.

Warning

Currently, branched error handling in operations such as if and switch is incompatible with vectorized computation. To illustrate, an expression like if (pvar(?x) == 0) then default(?x) else 1.0 / pvar(?x) will evaluate 1.0 / pvar(?x) first for all values of ?x, regardless of the branch condition, and will thus trigger an exception if pvar(?x) == 0 for some value of ?x. For the time being, we recommend suppressing errors as described above.

Generating Debug Logs

To log information about the RDDL compilation to a file for debugging, error reporting or diagnosis:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0", debug_path="\path\to\log\file")

where debug_path is the full path to the debug file minus the extension. A log file will be created in the specified path with the .log extension.

Currently, the following information is logged:

description of pvariables as they are stored in memory (e.g., parameters, data type, data shape)
dependency graph between CPFs
calculated order of evaluation of CPFs
information used by the simulator for operating on pvariables stored as arrays
simulation bounds for state and action fluents (unbounded or non-box constraints are represented as [-inf, inf])
if you are using pyRDDLGym-jax, the computation graphs will also be logged
if you are using pyRDDLGym-rl, the observation and action spaces information will also be logged

Running pyRDDLGym through TCP

Some older algorithms and infrastructure built around the Java rddlsim required a TCP connection with a server that provides the environment interaction. pyRDDLGym provides a RDDLSimServer class that functions in a similar way.

To create and run a server built around a specific domain or instance:

from pyRDDLGym.core.server import RDDLSimServer
server = RDDLSimServer("/path/to/domain.rddl", "/path/to/instance.rddl", rounds, time, port=2323)
server.run()

The rounds specifies the number of epsiodes/rounds of simulation to perform, and time specifies the time the server connection should remain open. The optional port parameter allows multiple connections to be established in parallel at different ports. Finally, the run() command starts the server.