Locomotion¶
Generic environment to learn locomotion skills for legged robots using Jiminy simulator as physics engine.
- class gym_jiminy.common.envs.locomotion.WalkerJiminyEnv(urdf_path, hardware_path=None, mesh_path_dir=None, simulation_duration_max=30.0, step_dt=0.04, reward_mixture=None, std_ratio=None, config_path=None, avoid_instable_collisions=True, debug=False, *, robot=None, viewer_kwargs=None, **kwargs)[source]¶
Bases:
BaseJiminyEnv
Gym environment for learning locomotion skills for legged robots.
Jiminy is used for both physics computations and rendering.
The observation and action spaces are unchanged wrt BaseJiminyEnv.
- Parameters:
urdf_path (str | None) – Path of the urdf model to be used for the simulation. It is assumed that the robot has a floating base.
hardware_path (str | None) – Path of Jiminy hardware description toml file. Optional: Looking for ‘*_hardware.toml’ file in the same folder and with the same name.
mesh_path_dir (str | None) – Path to the folder containing the model meshes. Optional: Env variable ‘JIMINY_DATA_PATH’ will be used if available.
simulation_duration_max (float) – Maximum duration of a simulation before returning done.
step_dt (float) – Simulation timestep for learning.
reward_mixture (dict | None) – Weighting factors of selected contributions to total reward.
std_ratio (dict | None) – Relative standard deviation of selected contributions to environment stochasticity.
config_path (str | None) – Configuration toml file to import. It will be imported AFTER loading the hardware description file. It can be automatically generated from an instance by calling export_config_file method. Optional: Looking for ‘*_options.toml’ file in the same folder and with the same name. If not found, using default configuration.
avoid_instable_collisions (bool) – Prevent numerical instabilities by replacing collision mesh by vertices of associated minimal volume bounding box, and replacing primitive box by its vertices.
debug (bool) – Whether the debug mode must be activated. Doing it enables telemetry recording.
robot (Robot | None) – Robot being simulated, already instantiated and initialized. Build default robot using ‘urdf_path’, ‘hardware_path’ and ‘mesh_path_dir’ if omitted. Optional: None by default.
viewer_kwargs (Dict[str, Any] | None) – Keyword arguments used to override the original default values whenever a viewer is instantiated. This is the only way to pass custom arguments to the viewer when calling render method, unlike replay which forwards extra keyword arguments. Optional: None by default.
kwargs (Any) – Keyword arguments to forward to Simulator and BaseJiminyEnv constructors.
- _setup()[source]¶
Configure the environment.
It is doing the following steps, successively:
updates some proxies that will be used for computing the reward and termination condition,
enforce some options of the low-level robot and engine,
randomize the environment according to ‘std_ratio’.
Note
This method is called internally by reset method at the very beginning. One must override it to implement new contributions to the environment stochasticity, or to create custom low-level robot if the model must be different for each learning episode.
- Return type:
None
- _force_external_profile(t, q, v, wrench)[source]¶
User-specified processing of external force profiles.
Typical usecases are time rescaling (1.0 second by default), or changing the orientation of the force (x/y in world frame by default). It could also be used for clamping the force.
Warning
Beware it updates ‘wrench’ by reference for the sake of efficiency.
- Parameters:
t (float) – Current time.
q (ndarray) – Current configuration vector of the robot.
v (ndarray) – Current velocity vector of the robot.
wrench (ndarray) – Force to apply on the robot as a vector (linear and angular) [Fx, Fy, Fz, Mx, My, Mz].
- Return type:
None
- has_terminated(info)[source]¶
Determine whether the episode is over.
It terminates (terminated=True) under the following conditions:
fall detection: the freeflyer goes lower than 75% of its height in neutral configuration.
It is truncated under the following conditions:
observation out-of-bounds
maximum simulation duration exceeded
- compute_reward(terminated, info)[source]¶
Compute reward at current episode state.
It computes the reward associated with each individual contribution according to ‘reward_mixture’.
Note
This method can be overwritten to implement new contributions to the reward, or to monitor more information.
- _abc_impl = <_abc._abc_data object>¶