Generic

Generic gym environment specifically tailored to work with Jiminy Simulator as backend physics engine, and Jiminy Viewer as 3D visualizer. It implements the official OpenAI Gym API and extended it to add more functionalities.

class gym_jiminy.common.envs.env_generic._LazyDictItemFilter(dict_packed, item_index)[source]

Bases: collections.abc.Mapping

Parameters
  • dict_packed (Mapping[str, Tuple[Any, ...]]) –

  • item_index (int) –

Return type

None

_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 56
_abc_registry = <_weakrefset.WeakSet object>
get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
values() an object providing a view on D's values
class gym_jiminy.common.envs.env_generic.BaseJiminyEnv(simulator, step_dt, enforce_bounded_spaces=False, debug=False, **kwargs)[source]

Bases: gym_jiminy.common.bases.generic_bases.ObserverControllerInterface, gym.core.Env

Base class to train a robot in Gym OpenAI using a user-specified Python Jiminy engine for physics computations.

It creates an Gym environment wrapping Jiminy Engine and behaves like any other Gym environment.

The observation space is a dictionary gathering the current simulation time, the real robot state, and the sensors data. The action is a vector gathering the torques of the actuator of the robot.

There is no reward by default. It is up to the user to overload this class to implement one. It has been designed to be highly flexible and easy to customize by overloading it to fit the vast majority of users’ needs.

Parameters
  • simulator (jiminy_py.simulator.Simulator) – Jiminy Python simulator used for physics computations. It must be fully initialized.

  • step_dt (float) – Simulation timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.

  • enforce_bounded_spaces (Optional[bool]) – Whether or not to enforce finite bounds for the observation and action spaces. If so, then ‘*_MAX’ are used whenever it is necessary. Note that whose bounds are very spread to make sure it is suitable for the vast majority of systems.

  • debug (bool) – Whether or not the debug mode must be enabled. Doing it enables telemetry recording.

  • kwargs (Any) – Extra keyword arguments that may be useful for derived environments with multiple inheritance, and to allow automatic pipeline wrapper generation.

Return type

None

metadata = {'render.modes': ['human', 'rgb_array']}
observation_space: gym.spaces.space.Space = None
action_space: gym.spaces.space.Space = None
simulator: Simulator
stepper_state: jiminy.StepperState
system_state: jiminy.SystemState
sensors_data: jiminy.sensorsData
_action: Optional[DataNested]
_observation: Optional[DataNested]
_controller_handle(t, q, v, sensors_data, command)[source]

Thin wrapper around user-specified compute_command method.

Warning

This method is not supposed to be called manually nor overloaded.

Parameters
Return type

None

_get_time_space()[source]

Get time space.

Return type

gym.spaces.space.Space

_get_state_space(use_theoretical_model=None)[source]

Get state space.

This method is not meant to be overloaded in general since the definition of the state space is mostly consensual. One must rather overload _refresh_observation_space to customize the observation space as a whole.

Parameters

use_theoretical_model (Optional[bool]) – Whether to compute the state space corresponding to the theoretical model to the actual one. None to use internal value ‘simulator.use_theoretical_model’. Optional: None by default.

Return type

gym.spaces.space.Space

_get_sensors_space()[source]

Get sensor space.

It gathers the sensors data in a dictionary. It maps each available type of sensor to the associated data matrix. Rows correspond to the sensor type’s fields, and columns correspond to each individual sensor.

Return type

gym.spaces.space.Space

_refresh_action_space()[source]

Configure the action space of the environment.

The action is a vector gathering the torques of the actuator of the robot.

Warning

This method is called internally by reset method. It is not meant to be overloaded since the actual action space of the robot is uniquely defined.

Return type

None

register_variable(name, value, fieldnames=None, namespace=None)[source]

TODO: Write documentation.

Parameters
  • name (str) –

  • value (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) –

  • fieldnames (Optional[Union[str, Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]]) –

  • namespace (Optional[str]) –

Return type

None

reset(controller_hook=None)[source]

Reset the environment.

In practice, it resets the backend simulator and set the initial state of the robot. The initial state is obtained by calling ‘_sample_state’. This method is also in charge of setting the initial action (at the beginning) and observation (at the end).

Warning

It starts the simulation immediately. As a result, it is not possible to change the robot (included options), nor to register log variable. The only way to do so is via ‘controller_hook’.

Parameters

controller_hook (Optional[Callable[Optional[Tuple[Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData], None]], Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData, numpy.ndarray], None]]]]]]) – Used internally for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.

Returns

Initial observation of the episode.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

seed(seed=None)[source]

Specify the seed of the environment.

Warning

It also resets the low-level jiminy Engine. Therefore one must call the reset method manually afterward.

Parameters

seed (Optional[int]) – Random seed, as a positive integer. Optional: A strongly random seed will be generated by gym if omitted.

Returns

Updated seed of the environment

Return type

List[numpy.uint32]

close()[source]

Terminate the Python Jiminy engine.

Return type

None

step(action=None)[source]

Run a simulation step for a given action.

Parameters

action (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Action to perform. None to not update the action.

Returns

Next observation, reward, status of the episode (done or not), and a dictionary of extra information

Return type

Tuple[Union[Dict[str, StructNested], Sequence[StructNested], ValueType], float, bool, Dict[str, Any]]

render(mode='human', **kwargs)[source]

Render the current state of the robot.

Note

Do not suport Multi-Rendering RGB output for now.

Parameters
  • mode (str) – Rendering mode. It can be either ‘human’ to display the current simulation state, or ‘rgb_array’ to return instead a snapshot of it as an RGB array without showing it on the screen.

  • kwargs (Any) – Extra keyword arguments to forward to jiminy_py.simulator.Simulator.render method.

Returns

RGB array if ‘mode’ is ‘rgb_array’, None otherwise.

Return type

Optional[numpy.ndarray]

plot(**kwargs)[source]

Display common simulation data and action over time.

Parameters

kwargs (Any) – Extra keyword arguments to forward to simulator.plot.

Return type

None

replay(enable_travelling=True, **kwargs)[source]

Replay the current episode until now.

Parameters
  • enable_travelling (bool) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: True by default.

  • kwargs (Any) – Extra keyword arguments for delegation to replay.play_trajectories method.

Return type

None

static play_interactive(env, enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)[source]

Activate interact mode enabling to control the robot using keyboard.

It stops automatically as soon as ‘done’ flag is True. One has to press a key to start the interaction. If no key is pressed, the action is not updated and the previous one keeps being sent to the robot.

Warning

This method requires _key_to_action method to be implemented by the user by overloading it, otherwise it raises an exception.

Parameters
  • env (Union[gym_jiminy.common.envs.env_generic.BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.

  • enable_travelling (Optional[bool]) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.

  • start_paused (bool) – Whether or not to start in pause. Optional: Enabled by default.

  • verbose (bool) – Whether or not to display status messages.

  • kwargs (Any) – Extra keyword arguments to forward to _key_to_action method.

  • enable_is_done (bool) –

Return type

None

train()[source]

Sets the environment in training mode.

Note

This mode is enabled by default.

Return type

None

eval()[source]

Sets the environment in evaluation mode.

This has any effect only on certain environment. See documentations of particular environment for details of their behaviors in training and evaluation modes, if they are affected. It can be used to activate clipping or some filtering of the action specifical at evaluation time.

Return type

None

_setup()[source]

Configure the environment. It must guarantee that its internal state is valid after calling this method.

By default, it enforces some options of the engine.

Warning

Beware this method is called BEFORE observe_dt and controller_dt are properly set, so one cannot rely on it at this point. Yet, step_dt is available and should always be. One can still access the low-level controller update period through engine_options[‘stepper’][‘controllerUpdatePeriod’].

Note

The user must overload this method to enforce custom observer update period, otherwise it will be the same of the controller.

Note

This method is called internally by reset methods.

Return type

None

_refresh_observation_space()[source]

Configure the observation of the environment.

By default, the observation is a dictionary gathering the current simulation time, the real robot state, and the sensors data.

Note

This method is called internally by reset method at the very end, just before computing and returning the initial observation. This method, alongside refresh_observation, must be overwritten in order to define a custom observation space.

Return type

None

_neutral()[source]

Returns a neutral valid configuration for the robot.

The default implementation returns the neutral configuration if valid, the “mean” configuration otherwise (right in the middle of the position lower and upper bounds).

Warning

Beware there is no guarantee for this configuration to be statically stable.

Note

This method is called internally by ‘_sample_state’ to generate the initial state. It can be overloaded to ensure static stability of the configuration.

Return type

numpy.ndarray

_sample_state()[source]

Returns a valid configuration and velocity for the robot.

The default implementation returns the neutral configuration and zero velocity.

Offsets are applied on the freeflyer to ensure no contact points are going through the ground and up to three are in contact.

Note

This method is called internally by reset to generate the initial state. It can be overloaded to act as a random state generator.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

_refresh_internal()[source]

Refresh internal buffers.

Note

This method is called right after every internal engine.step, so it is the right place to update shared data between is_done and compute_reward. Be careful when using it to share data with refresh_observation, but the later is called at self.observe_dt update period, which the others are called at self.step_dt update period. self.observe_dt is likely to different from self.step, unless configured otherwise by overloading _setup method.

Return type

None

refresh_observation()[source]

Compute the observation based on the current state of the robot.

Note

This method is called and the end of every low-level Engine.step.

Note

Note that np.nan values will be automatically clipped to 0.0 by get_observation method before return it, so it is valid.

Warning

In practice, it updates the internal buffer directly for the sake of efficiency.

As a side note, there is no way in the current implementation to discriminate the initialization of the observation buffer from the next one. The workaround is to check if the simulation already started. Even though it is not the same rigorously speaking, it does the job here since it is only about preserving efficiency.

Return type

None

compute_command(measure, action)[source]

Compute the motors efforts to apply on the robot.

By default, it does not perform any processing. One is responsible of overloading this method to clip the action if necessary to make sure it does not violate the lower and upper bounds.

Parameters
  • measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Observation of the environment.

  • action (numpy.ndarray) – Desired motors efforts.

Return type

numpy.ndarray

is_done(*args, **kwargs)[source]

Determine whether the episode is over.

By default, it returns True if the observation reaches or exceeds the lower or upper limit. It must be overloaded to implement a custom termination condition for the simulation.

Note

This method is called after refresh_observation, so that the internal buffer ‘_observation’ is up-to-date.

Parameters
  • args (Any) – Extra arguments that may be useful for derived environments, for example Gym.GoalEnv.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Return type

bool

_key_to_action(key, obs, reward, **kwargs)[source]

Mapping from input keyboard keys to actions.

Note

This method is called before step method systematically, even if not key has been pressed, or reward is not defined. In such a case, the value is None.

Note

The mapping can be state dependent, and the key can be used for something different than computing the action directly. For instance, one can provide as extra argument to this method a custom policy taking user parameters mapped to keyboard in input.

Warning

Overloading this method is required for calling play_interactive method.

Parameters
  • key (Optional[str]) – Key pressed by the user as a string. None if no key has been pressed since the last step of the environment.

  • obs (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Previous observation from last step of the environment. It is always available, included right after reset.

  • reward (Optional[float]) – Previous reward from last step of the environment. Not available before first step right after reset.

  • kwargs (Any) – Extra keyword argument provided by the user when calling play_interactive method.

Returns

Action to forward to the environment.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

_observer_handle(t, q, v, sensors_data)

TODO Write documentation.

Parameters
Return type

None

compute_reward(*args, info, **kwargs)

Compute reward at current episode state.

See ControllerInterface.compute_reward for details.

Note

This method is called after updating the internal buffer ‘_num_steps_beyond_done’, which is None if the simulation is not done, 0 right after, and so on.

Parameters
  • args (Any) – Extra arguments that may be useful for derived environments, for example Gym.GoalEnv.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Returns

Total reward.

Return type

float

compute_reward_terminal(*, info)

Compute terminal reward at current episode final state.

Note

Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.

Warning

Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.

Parameters

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Terminal reward.

Return type

float

get_observation()

Get post-processed observation.

By default, it does not perform any post-processing. One is responsible for clipping the observation if necessary to make sure it does not violate the lower and upper bounds. This can be done either by overloading this method, or in the case of pipeline design, by adding a clipping observation block at the very end.

Warning

In most cases, it is not necessary to overloaded this method, and doing so may lead to unexpected behavior if not done carefully.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

reward_range = (-inf, inf)
spec = None
property unwrapped

Completely unwrap this env.

Returns:

gym.Env: The base non-wrapped gym.Env instance

_dt_eps: Optional[float]
observe_dt: float
control_dt: float
class gym_jiminy.common.envs.env_generic.BaseJiminyGoalEnv(simulator, step_dt, debug=False)[source]

Bases: gym_jiminy.common.envs.env_generic.BaseJiminyEnv, gym.core.GoalEnv

Base class to train a robot in Gym OpenAI using a user-specified Jiminy Engine for physics computations.

It creates an Gym environment wrapping Jiminy Engine and behaves like any other Gym goal-environment.

Parameters
  • simulator (Optional[jiminy_py.simulator.Simulator]) – Jiminy Python simulator used for physics computations. It must be fully initialized.

  • step_dt (float) – Simulation timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.

  • enforce_bounded_spaces – Whether or not to enforce finite bounds for the observation and action spaces. If so, then ‘*_MAX’ are used whenever it is necessary. Note that whose bounds are very spread to make sure it is suitable for the vast majority of systems.

  • debug (bool) – Whether or not the debug mode must be enabled. Doing it enables telemetry recording.

  • kwargs – Extra keyword arguments that may be useful for derived environments with multiple inheritance, and to allow automatic pipeline wrapper generation.

Return type

None

observation_space: gym.spaces.space.Space = None
get_observation()[source]

Get post-processed observation.

It gathers the original observation from the environment with the currently achieved and desired goal, as a dictionary. See ObserverInterface.get_observation documentation for details.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

reset(controller_hook=None)[source]

Reset the environment.

In practice, it resets the backend simulator and set the initial state of the robot. The initial state is obtained by calling ‘_sample_state’. This method is also in charge of setting the initial action (at the beginning) and observation (at the end).

Warning

It starts the simulation immediately. As a result, it is not possible to change the robot (included options), nor to register log variable. The only way to do so is via ‘controller_hook’.

Parameters

controller_hook (Optional[Callable[Optional[Tuple[Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData], None]], Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData, numpy.ndarray], None]]]]]]) – Used internally for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.

Returns

Initial observation of the episode.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

_get_goal_space()[source]

Get goal space.

Note

This method is called internally at init to define the observation space. It is called BEFORE super().reset so non goal-env-specific internal buffers are NOT up-to-date. This method must be overloaded while implementing a goal environment.

Return type

gym.spaces.space.Space

_sample_goal()[source]

Sample a goal randomly.

Note

This method is called internally by reset to sample the new desired goal that the agent will have to achieve. It is called BEFORE super().reset so non goal-env-specific internal buffers are NOT up-to-date. This method must be overloaded while implementing a goal environment.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

_get_achieved_goal()[source]

Compute the achieved goal based on current state of the robot.

Note

This method can be called by refresh_observation to get the currently achieved goal. This method must be overloaded while implementing a goal environment.

Returns

Currently achieved goal.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

is_done(achieved_goal=None, desired_goal=None)[source]

Determine whether a termination condition has been reached.

By default, it uses the termination condition inherited from normal environment.

Note

This method is called right after calling refresh_observation, so that the internal buffer ‘_observation’ is up-to-date. This method can be overloaded while implementing a goal environment.

Parameters
  • achieved_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Achieved goal. If set to None, one is supposed to call _get_achieved_goal instead. Optional: None by default.

  • desired_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Desired goal. If set to None, one is supposed to use the internal buffer ‘_desired_goal’ instead. Optional: None by default.

Return type

bool

compute_reward(achieved_goal=None, desired_goal=None, *, info)[source]

Compute the reward for any given episode state.

Parameters
  • achieved_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Achieved goal. None to evalute the reward for currently achieved goal.

  • desired_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Desired goal. None to evalute the reward for currently desired goal.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Total reward.

Return type

float

_controller_handle(t, q, v, sensors_data, command)

Thin wrapper around user-specified compute_command method.

Warning

This method is not supposed to be called manually nor overloaded.

Parameters
Return type

None

_get_sensors_space()

Get sensor space.

It gathers the sensors data in a dictionary. It maps each available type of sensor to the associated data matrix. Rows correspond to the sensor type’s fields, and columns correspond to each individual sensor.

Return type

gym.spaces.space.Space

_get_state_space(use_theoretical_model=None)

Get state space.

This method is not meant to be overloaded in general since the definition of the state space is mostly consensual. One must rather overload _refresh_observation_space to customize the observation space as a whole.

Parameters

use_theoretical_model (Optional[bool]) – Whether to compute the state space corresponding to the theoretical model to the actual one. None to use internal value ‘simulator.use_theoretical_model’. Optional: None by default.

Return type

gym.spaces.space.Space

_get_time_space()

Get time space.

Return type

gym.spaces.space.Space

_key_to_action(key, obs, reward, **kwargs)

Mapping from input keyboard keys to actions.

Note

This method is called before step method systematically, even if not key has been pressed, or reward is not defined. In such a case, the value is None.

Note

The mapping can be state dependent, and the key can be used for something different than computing the action directly. For instance, one can provide as extra argument to this method a custom policy taking user parameters mapped to keyboard in input.

Warning

Overloading this method is required for calling play_interactive method.

Parameters
  • key (Optional[str]) – Key pressed by the user as a string. None if no key has been pressed since the last step of the environment.

  • obs (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Previous observation from last step of the environment. It is always available, included right after reset.

  • reward (Optional[float]) – Previous reward from last step of the environment. Not available before first step right after reset.

  • kwargs (Any) – Extra keyword argument provided by the user when calling play_interactive method.

Returns

Action to forward to the environment.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

_neutral()

Returns a neutral valid configuration for the robot.

The default implementation returns the neutral configuration if valid, the “mean” configuration otherwise (right in the middle of the position lower and upper bounds).

Warning

Beware there is no guarantee for this configuration to be statically stable.

Note

This method is called internally by ‘_sample_state’ to generate the initial state. It can be overloaded to ensure static stability of the configuration.

Return type

numpy.ndarray

_observer_handle(t, q, v, sensors_data)

TODO Write documentation.

Parameters
Return type

None

_refresh_action_space()

Configure the action space of the environment.

The action is a vector gathering the torques of the actuator of the robot.

Warning

This method is called internally by reset method. It is not meant to be overloaded since the actual action space of the robot is uniquely defined.

Return type

None

_refresh_internal()

Refresh internal buffers.

Note

This method is called right after every internal engine.step, so it is the right place to update shared data between is_done and compute_reward. Be careful when using it to share data with refresh_observation, but the later is called at self.observe_dt update period, which the others are called at self.step_dt update period. self.observe_dt is likely to different from self.step, unless configured otherwise by overloading _setup method.

Return type

None

_refresh_observation_space()

Configure the observation of the environment.

By default, the observation is a dictionary gathering the current simulation time, the real robot state, and the sensors data.

Note

This method is called internally by reset method at the very end, just before computing and returning the initial observation. This method, alongside refresh_observation, must be overwritten in order to define a custom observation space.

Return type

None

_sample_state()

Returns a valid configuration and velocity for the robot.

The default implementation returns the neutral configuration and zero velocity.

Offsets are applied on the freeflyer to ensure no contact points are going through the ground and up to three are in contact.

Note

This method is called internally by reset to generate the initial state. It can be overloaded to act as a random state generator.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

_setup()

Configure the environment. It must guarantee that its internal state is valid after calling this method.

By default, it enforces some options of the engine.

Warning

Beware this method is called BEFORE observe_dt and controller_dt are properly set, so one cannot rely on it at this point. Yet, step_dt is available and should always be. One can still access the low-level controller update period through engine_options[‘stepper’][‘controllerUpdatePeriod’].

Note

The user must overload this method to enforce custom observer update period, otherwise it will be the same of the controller.

Note

This method is called internally by reset methods.

Return type

None

action_space: gym.spaces.space.Space = None
close()

Terminate the Python Jiminy engine.

Return type

None

compute_command(measure, action)

Compute the motors efforts to apply on the robot.

By default, it does not perform any processing. One is responsible of overloading this method to clip the action if necessary to make sure it does not violate the lower and upper bounds.

Parameters
  • measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Observation of the environment.

  • action (numpy.ndarray) – Desired motors efforts.

Return type

numpy.ndarray

compute_reward_terminal(*, info)

Compute terminal reward at current episode final state.

Note

Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.

Warning

Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.

Parameters

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Terminal reward.

Return type

float

eval()

Sets the environment in evaluation mode.

This has any effect only on certain environment. See documentations of particular environment for details of their behaviors in training and evaluation modes, if they are affected. It can be used to activate clipping or some filtering of the action specifical at evaluation time.

Return type

None

metadata = {'render.modes': ['human', 'rgb_array']}
static play_interactive(env, enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)

Activate interact mode enabling to control the robot using keyboard.

It stops automatically as soon as ‘done’ flag is True. One has to press a key to start the interaction. If no key is pressed, the action is not updated and the previous one keeps being sent to the robot.

Warning

This method requires _key_to_action method to be implemented by the user by overloading it, otherwise it raises an exception.

Parameters
  • env (Union[gym_jiminy.common.envs.env_generic.BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.

  • enable_travelling (Optional[bool]) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.

  • start_paused (bool) – Whether or not to start in pause. Optional: Enabled by default.

  • verbose (bool) – Whether or not to display status messages.

  • kwargs (Any) – Extra keyword arguments to forward to _key_to_action method.

  • enable_is_done (bool) –

Return type

None

plot(**kwargs)

Display common simulation data and action over time.

Parameters

kwargs (Any) – Extra keyword arguments to forward to simulator.plot.

Return type

None

refresh_observation()

Compute the observation based on the current state of the robot.

Note

This method is called and the end of every low-level Engine.step.

Note

Note that np.nan values will be automatically clipped to 0.0 by get_observation method before return it, so it is valid.

Warning

In practice, it updates the internal buffer directly for the sake of efficiency.

As a side note, there is no way in the current implementation to discriminate the initialization of the observation buffer from the next one. The workaround is to check if the simulation already started. Even though it is not the same rigorously speaking, it does the job here since it is only about preserving efficiency.

Return type

None

register_variable(name, value, fieldnames=None, namespace=None)

TODO: Write documentation.

Parameters
  • name (str) –

  • value (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) –

  • fieldnames (Optional[Union[str, Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]]) –

  • namespace (Optional[str]) –

Return type

None

render(mode='human', **kwargs)

Render the current state of the robot.

Note

Do not suport Multi-Rendering RGB output for now.

Parameters
  • mode (str) – Rendering mode. It can be either ‘human’ to display the current simulation state, or ‘rgb_array’ to return instead a snapshot of it as an RGB array without showing it on the screen.

  • kwargs (Any) – Extra keyword arguments to forward to jiminy_py.simulator.Simulator.render method.

Returns

RGB array if ‘mode’ is ‘rgb_array’, None otherwise.

Return type

Optional[numpy.ndarray]

replay(enable_travelling=True, **kwargs)

Replay the current episode until now.

Parameters
  • enable_travelling (bool) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: True by default.

  • kwargs (Any) – Extra keyword arguments for delegation to replay.play_trajectories method.

Return type

None

reward_range = (-inf, inf)
seed(seed=None)

Specify the seed of the environment.

Warning

It also resets the low-level jiminy Engine. Therefore one must call the reset method manually afterward.

Parameters

seed (Optional[int]) – Random seed, as a positive integer. Optional: A strongly random seed will be generated by gym if omitted.

Returns

Updated seed of the environment

Return type

List[numpy.uint32]

spec = None
step(action=None)

Run a simulation step for a given action.

Parameters

action (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Action to perform. None to not update the action.

Returns

Next observation, reward, status of the episode (done or not), and a dictionary of extra information

Return type

Tuple[Union[Dict[str, StructNested], Sequence[StructNested], ValueType], float, bool, Dict[str, Any]]

train()

Sets the environment in training mode.

Note

This mode is enabled by default.

Return type

None

property unwrapped

Completely unwrap this env.

Returns:

gym.Env: The base non-wrapped gym.Env instance

simulator: Simulator
engine: jiminy.EngineMultiRobot
stepper_state: jiminy.StepperState
system_state: jiminy.SystemState
sensors_data: jiminy.sensorsData
_registered_variables: MutableMappingT[str, Tuple[FieldNested, DataNested]]
log_headers: MappingT[str, FieldNested]
_seed: List[np.uint32]
log_path: Optional[str]
_info: Dict[str, Any]
_num_steps_beyond_done: Optional[int]
_dt_eps: Optional[float]
observe_dt: float
_observation: Optional[DataNested]
control_dt: float
_action: Optional[DataNested]