Locomotion

Generic environment to learn locomotion skills for legged robots using Jiminy simulator as physics engine.

class gym_jiminy.common.envs.locomotion.WalkerJiminyEnv(urdf_path, hardware_path=None, mesh_path_dir=None, simulation_duration_max=30.0, step_dt=0.04, reward_mixture=None, std_ratio=None, config_path=None, avoid_instable_collisions=True, debug=False, *, robot=None, viewer_kwargs=None, **kwargs)[source]

Bases: BaseJiminyEnv

Gym environment for learning locomotion skills for legged robots.

Jiminy is used for both physics computations and rendering.

The observation and action spaces are unchanged wrt BaseJiminyEnv.

Parameters:
  • urdf_path (str | None) – Path of the urdf model to be used for the simulation. It is assumed that the robot has a floating base.

  • hardware_path (str | None) – Path of Jiminy hardware description toml file. Optional: Looking for ‘*_hardware.toml’ file in the same folder and with the same name.

  • mesh_path_dir (str | None) – Path to the folder containing the model meshes. Optional: Env variable ‘JIMINY_DATA_PATH’ will be used if available.

  • simulation_duration_max (float) – Maximum duration of a simulation before returning done.

  • step_dt (float) – Simulation timestep for learning.

  • reward_mixture (dict | None) – Weighting factors of selected contributions to total reward.

  • std_ratio (dict | None) – Relative standard deviation of selected contributions to environment stochasticity.

  • config_path (str | None) – Configuration toml file to import. It will be imported AFTER loading the hardware description file. It can be automatically generated from an instance by calling export_config_file method. Optional: Looking for ‘*_options.toml’ file in the same folder and with the same name. If not found, using default configuration.

  • avoid_instable_collisions (bool) – Prevent numerical instabilities by replacing collision mesh by vertices of associated minimal volume bounding box, and replacing primitive box by its vertices.

  • debug (bool) – Whether the debug mode must be activated. Doing it enables telemetry recording.

  • robot (Robot | None) – Robot being simulated, already instantiated and initialized. Build default robot using ‘urdf_path’, ‘hardware_path’ and ‘mesh_path_dir’ if omitted. Optional: None by default.

  • viewer_kwargs (Dict[str, Any] | None) – Keyword arguments used to override the original default values whenever a viewer is instantiated. This is the only way to pass custom arguments to the viewer when calling render method, unlike replay which forwards extra keyword arguments. Optional: None by default.

  • kwargs (Any) – Keyword arguments to forward to Simulator and BaseJiminyEnv constructors.

reward_range: Tuple[float, float] = (0.0, 1.0)
_setup()[source]

Configure the environment.

It is doing the following steps, successively:

  • updates some proxies that will be used for computing the reward and termination condition,

  • enforce some options of the low-level robot and engine,

  • randomize the environment according to ‘std_ratio’.

Note

This method is called internally by reset method at the very beginning. One must override it to implement new contributions to the environment stochasticity, or to create custom low-level robot if the model must be different for each learning episode.

Return type:

None

_force_external_profile(t, q, v, wrench)[source]

User-specified processing of external force profiles.

Typical usecases are time rescaling (1.0 second by default), or changing the orientation of the force (x/y in world frame by default). It could also be used for clamping the force.

Warning

Beware it updates ‘wrench’ by reference for the sake of efficiency.

Parameters:
  • t (float) – Current time.

  • q (ndarray) – Current configuration vector of the robot.

  • v (ndarray) – Current velocity vector of the robot.

  • wrench (ndarray) – Force to apply on the robot as a vector (linear and angular) [Fx, Fy, Fz, Mx, My, Mz].

Return type:

None

has_terminated(info)[source]

Determine whether the episode is over.

It terminates (terminated=True) under the following conditions:

  • fall detection: the freeflyer goes lower than 75% of its height in neutral configuration.

It is truncated under the following conditions:

  • observation out-of-bounds

  • maximum simulation duration exceeded

Parameters:

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns:

terminated and truncated flags.

Return type:

Tuple[bool, bool]

compute_reward(terminated, info)[source]

Compute reward at current episode state.

It computes the reward associated with each individual contribution according to ‘reward_mixture’.

Note

This method can be overwritten to implement new contributions to the reward, or to monitor more information.

Returns:

Aggregated reward.

Parameters:
Return type:

float

_abc_impl = <_abc._abc_data object>