pyRDDLGym-gurobi: Gurobi MIP Compiler and Planner

In this tutorial, we discuss the compilation of RDDL into a Gurobi mixed-integer program (MIP) for computing optimal actions. The Gurobi planner can optimize discrete state/action problems where the JAX planner could perform poorly.


This package requires Python 3.8+

  • pyRDDLGym>=2.0

  • gurobipy>=10.0.1

Installing via pip

You can install pyRDDLGym-gurobi and all of its requirements via pip:

pip install git+

Running the Basic Example

The basic example provided in pyRDDLGym-gurobi will run the Gurobi planner on a domain and instance of your choosing. To run this, navigate to the install directory of pyRDDLGym-gurobi, and run:

python -m pyRDDLGym_gurobi.examples.run_plan <domain> <instance> <horizon>


  • <domain> is the domain identifier as specified in rddlrepository, or a path pointing to a valid domain.rddl file

  • <instance> is the instance identifier in rddlrepository, or a path pointing to a valid instance.rddl file

  • <horizon> is the lookahead horizon used by the planner.

Running from the Python API

If you are working with the Python API, you can instantiate the environment and planner as follows:

import pyRDDLGym
from pyRDDLGym_gurobi.core.planner import GurobiStraightLinePlan, GurobiOnlineController

# Create the environment
env = pyRDDLGym.make("domain", "instance")

# Create the planner
plan = GurobiStraightLinePlan()
controller = GurobiOnlineController(rddl=env.model, plan=plan, rollout_horizon=5)

# Run the planner
controller.evaluate(env, episodes=1, verbose=True, render=True)



An online and offline controller type are provided in pyRDDLGym-gurobi, which mirror the functionality of the JAX planner discussed in the previous section. Both are instances of pyRDDLGym’s BaseAgent, so the evaluate() function can be used to streamline evaluation.

Passing Parameters to the Gurobi Backend

Gurobi is by its nature highly configurable. Parameters can be passed to the Gurobi model through a gurobi.env file, or directly through the pyRDDLGym interface.

Suppose we wish to instruct Gurobi to limit each optimization to 60 seconds, and to print progress during optimization to console. To apply the first approach, create a gurobi.env file in the same directory where your launch script is located, with the following content:

TimeLimit 60
OutputFlag 1

To apply the second approach, you can pass these parameters as a dictionary to the model_params argument of the controller instance:

controller = GurobiOnlineController(rddl=model, plan=plan, rollout_horizon=5,
                                    model_params={'TimeLimit': 60, 'OutputFlag': 1})

Current Limitations

We cite several limitations of the current baseline Gurobi optimizer:

  • Stochastic variables introduce computational difficulties since mixed-integer problems are inherently deterministic
    • the planner currently applies determinization, where stochastic variables are substituted with their means (we hope to incorporate more sophisticated techniques from optimization to better deal with stochasticity)

  • Discrete non-linear domains can require exponential computation time
    • the planner uses piecewise linear functions to approximate non-linearities, and quadratic expressions in other cases

    • if the planner does not make progress, we recommend reducing the planning horizon, simplying the RDDL description as much as possible, or tweaking the parameters of the Gurobi model.