Generic environment to learn locomotion skills for legged robots using Jiminy simulator as physics engine.

class gym_jiminy.common.envs.env_locomotion.WalkerJiminyEnv(urdf_path, hardware_path=None, mesh_path=None, simu_duration_max=20.0, step_dt=0.001, enforce_bounded_spaces=False, reward_mixture=None, std_ratio=None, config_path=None, avoid_instable_collisions=True, debug=False, **kwargs)[source]

Bases: gym_jiminy.common.envs.env_generic.BaseJiminyEnv

Gym environment for learning locomotion skills for legged robots using torque control directly.

Jiminy Engine is used to perform physics evaluation, and Meshcat is used for rendering.

The observation and action spaces are unchanged wrt BaseJiminyEnv.

  • urdf_path (str) – Path of the urdf model to be used for the simulation.

  • hardware_path (Optional[str]) – Path of Jiminy hardware description toml file. Optional: Looking for ‘*_hardware.toml’ file in the same folder and with the same name.

  • mesh_path (Optional[str]) – Path to the folder containing the model meshes. Optional: Env variable ‘JIMINY_DATA_PATH’ will be used if available.

  • simu_duration_max (float) – Maximum duration of a simulation before returning done.

  • step_dt (float) – Simulation timestep for learning.

  • enforce_bounded_spaces (Optional[bool]) – Whether or not to enforce finite bounds for the observation and action spaces. If so, ‘*_MAX’ are used whenever it is necessary.

  • reward_mixture (Optional[dict]) – Weighting factors of selected contributions to total reward.

  • std_ratio (Optional[dict]) – Relative standard deviation of selected contributions to environment stochasticity.

  • config_path (Optional[str]) – Configuration toml file to import. It will be imported AFTER loading the hardware description file. It can be automatically generated from an instance by calling export_config_file method. Optional: Looking for ‘*_options.toml’ file in the same folder and with the same name. If not found, using default configuration.

  • avoid_instable_collisions (bool) – Prevent numerical instabilities by replacing collision mesh by vertices of associated minimal volume bounding box, and replacing primitive box by its vertices.

  • debug (bool) – Whether or not the debug mode must be activated. Doing it enables telemetry recording.

  • kwargs (Any) – Keyword arguments to forward to BaseJiminyEnv class.

Return type


metadata = {'render.modes': ['human']}

Configure the environment.

It is doing the following steps, successively:

  • updates some proxies that will be used for computing the reward and termination condition,

  • enforce some options of the low-level robot and engine,

  • randomize the environment according to ‘std_ratio’.


This method is called internally by reset method at the very beginning. One must overide it to implement new contributions to the environment stochasticity, or to create custom low-level robot if the model must be different for each learning episode.

Return type



Get time space.

It takes advantage of knowing the maximum simulation duration to shrink down the range. Note that observation will be out-of-bounds steps are performed after this point.

Return type


_force_external_profile(t, q, v, wrench)[source]

User-specified processing of external force profiles.

Typical usecases are time rescaling (1.0 second by default), or changing the orientation of the force (x/y in world frame by default). It could also be used for clamping the force.


Beware it updates ‘wrench’ by reference for the sake of efficiency.

  • t (float) – Current time.

  • q (numpy.ndarray) – Current configuration vector of the robot.

  • v (numpy.ndarray) – Current velocity vector of the robot.

  • wrench (numpy.ndarray) – Force to apply on the robot as a vector (linear and angular) [Fx, Fy, Fz, Mx, My, Mz].

Return type



Determine whether the episode is over.

The termination conditions are the following:

  • fall detection (enabled if the robot has a freeflyer): the freeflyer goes lower than 75% of its height in neutral configuration.

  • maximum simulation duration exceeded

Return type



Compute reward at current episode state.

It computes the reward associated with each individual contribution according to ‘reward_mixture’.


This method can be overwritten to implement new contributions to the reward, or to monitor more information.


Total reward.


info (Dict[str, Any]) –

Return type


compute_reward_terminal(*, info)[source]

Compute the reward at the end of the episode.

It computes the terminal reward associated with each individual contribution according to ‘reward_mixture’.


info (Dict[str, Any]) –

Return type


simulator: Simulator
engine: jiminy.EngineMultiRobot
stepper_state: jiminy.StepperState
system_state: jiminy.SystemState
sensors_data: jiminy.sensorsData
_registered_variables: MutableMappingT[str, Tuple[FieldNested, DataNested]]
log_headers: MappingT[str, FieldNested]
_seed: List[np.uint32]
log_path: Optional[str]
_info: Dict[str, Any]
_num_steps_beyond_done: Optional[int]
_dt_eps: Optional[float]
observe_dt: float
_observation: Optional[DataNested]
control_dt: float
_action: Optional[DataNested]