Locomotion¶

Generic environment to learn locomotion skills for legged robots using Jiminy simulator as physics engine.

class gym_jiminy.common.envs.locomotion.WalkerJiminyEnv(urdf_path, hardware_path=None, mesh_dir_path=None, simulation_duration_max=30.0, step_dt=0.04, reward_mixture=None, std_ratio=None, config_path=None, avoid_instable_collisions=True, debug=False, *, robot=None, viewer_kwargs=None, **kwargs)[source]¶

Bases: BaseJiminyEnv

Gym environment for learning locomotion skills for legged robots.

Jiminy is used for both physics computations and rendering.

The observation and action spaces are unchanged wrt BaseJiminyEnv.

Parameters:

urdf_path (str | None) – Path of the urdf model to be used for the simulation. It is assumed that the robot has a floating base.
hardware_path (str | None) – Path of Jiminy hardware description toml file. Optional: Looking for ‘*_hardware.toml’ file in the same folder and with the same name.
mesh_dir_path (str | None) – Path to the folder containing the model meshes. Optional: Env variable ‘JIMINY_DATA_PATH’ will be used if available.
simulation_duration_max (float) – Maximum duration of a simulation before returning done.
step_dt (float) – Environment timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.
reward_mixture (dict | None) – Weighting factors of selected contributions to total reward.
std_ratio (dict | None) – Relative standard deviation of selected contributions to environment stochasticity.
config_path (str | None) – Configuration toml file to import. It will be imported AFTER loading the hardware description file. It can be automatically generated from an instance by calling export_config_file method. Optional: Looking for ‘*_options.toml’ file in the same folder and with the same name. If not found, using default configuration.
avoid_instable_collisions (bool) – Prevent numerical instabilities by replacing collision mesh by vertices of associated minimal volume bounding box, and replacing primitive box by its vertices.
debug (bool) – Whether the debug mode must be activated. Doing it enables telemetry recording.
robot (Robot | None) – Robot being simulated, already instantiated and initialized. Build default robot using ‘urdf_path’, ‘hardware_path’ and ‘mesh_dir_path’ if omitted. Optional: None by default.
viewer_kwargs (Dict[str, Any] | None) – Keyword arguments used to override the original default values whenever a viewer is instantiated. This is the only way to pass custom arguments to the viewer when calling render method, unlike replay which forwards extra keyword arguments. Optional: None by default.
kwargs (Any) – Keyword arguments to forward to Simulator and BaseJiminyEnv constructors.

reward_range: Tuple[float, float] = (0.0, 1.0)¶

_setup()[source]¶

Configure the environment.

It is doing the following steps, successively:

updates some proxies that will be used for computing the reward and termination condition,

enforce some options of the low-level robot and engine,

randomize the environment according to ‘std_ratio’.

Note

This method is called internally by reset method at the very beginning. One must override it to implement new contributions to the environment stochasticity, or to create custom low-level robot if the model must be different for each learning episode.

Return type:: None

_force_external_profile(t, q, v, wrench)[source]¶

User-specified processing of external force profiles.

Typical usecases are time rescaling (1.0 second by default), or changing the orientation of the force (x/y in world frame by default). It could also be used for clamping the force.

Warning

Beware it updates ‘wrench’ by reference for the sake of efficiency.

Parameters:

t (float) – Current time.
q (ndarray) – Current configuration vector of the robot.
v (ndarray) – Current velocity vector of the robot.
wrench (ndarray) – Force to apply on the robot as a vector (linear and angular) [Fx, Fy, Fz, Mx, My, Mz].

Return type:

None

has_terminated(info)[source]¶

Determine whether the episode is over.

It terminates (terminated=True) under the following conditions:

fall detection: the freeflyer goes lower than 75% of its height in neutral configuration.

It is truncated under the following conditions:

observation out-of-bounds

maximum simulation duration exceeded

Parameters:: info (Dict[str, Any]) – Dictionary of extra information for monitoring.
Returns:: terminated and truncated flags.
Return type:: Tuple[bool, bool]

compute_reward(terminated, info)[source]¶

Compute reward at current episode state.

It computes the reward associated with each individual contribution according to ‘reward_mixture’.

Note

This method can be overwritten to implement new contributions to the reward, or to monitor more information.

Returns:

Aggregated reward.

Parameters:

terminated (bool)
info (Dict[str, Any])

Return type:

float

_abc_impl = <_abc._abc_data object>¶