Generic

Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherite and implement those interfaces.

class gym_jiminy.common.bases.generic_bases.ObserverInterface(*args, **kwargs)[source]

Bases: object

Observer interface for both observers and environments.

Initialize the observation interface.

It only allocates some attributes.

Parameters
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Return type

None

observe_dt: float
observation_space: Optional[gym.spaces.space.Space]
_observation: Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]
get_observation()[source]

Get post-processed observation.

By default, it does not perform any post-processing. One is responsible for clipping the observation if necessary to make sure it does not violate the lower and upper bounds. This can be done either by overloading this method, or in the case of pipeline design, by adding a clipping observation block at the very end.

Warning

In most cases, it is not necessary to overloaded this method, and doing so may lead to unexpected behavior if not done carefully.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

_refresh_observation_space()[source]

Configure the observation space.

Return type

None

refresh_observation(*args, **kwargs)[source]

Update the observation based on the current simulation state.

Parameters
  • args (Any) – Extra arguments that may be useful to derived implementations.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Return type

None

class gym_jiminy.common.bases.generic_bases.ControllerInterface(*args, **kwargs)[source]

Bases: object

Controller interface for both controllers and environments.

Initialize the control interface.

It only allocates some attributes.

Parameters
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Return type

None

control_dt: float
action_space: Optional[gym.spaces.space.Space]
_action: Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]
_refresh_action_space()[source]

Configure the action space of the controller.

Note

This method is called right after _setup, so that both the environment to control and the controller itself should be already initialized.

Return type

None

compute_command(measure, action)[source]

Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.

Note

The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.

Parameters
  • measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Observation of the environment.

  • action (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Target to achieve.

Returns

Action to perform.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

compute_reward(*args, info, **kwargs)[source]

Compute reward at current episode state.

See ControllerInterface.compute_reward for details.

Note

This method is called after updating the internal buffer ‘_num_steps_beyond_done’, which is None if the simulation is not done, 0 right after, and so on.

Parameters
  • args (Any) – Extra arguments that may be useful for derived environments, for example Gym.GoalEnv.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Returns

Total reward.

Return type

float

compute_reward_terminal(*, info)[source]

Compute terminal reward at current episode final state.

Note

Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.

Warning

Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.

Parameters

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Terminal reward.

Return type

float

class gym_jiminy.common.bases.generic_bases.ObserverControllerInterface(*args, **kwargs)[source]

Bases: gym_jiminy.common.bases.generic_bases.ObserverInterface, gym_jiminy.common.bases.generic_bases.ControllerInterface

Observer plus controller interface for both generic pipeline blocks, including environments.

Initialize the observation interface.

It only allocates some attributes.

Parameters
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Return type

None

simulator: Optional[jiminy_py.simulator.Simulator]
stepper_state: Optional[jiminy_py.core.StepperState]
system_state: Optional[jiminy_py.core.SystemState]
sensors_data: Optional[Dict[str, numpy.ndarray]]
_setup()[source]

Configure the observer-controller.

Note

This method must be called once, after the environment has been reset. This is done automatically when calling reset method.

Return type

None

_observer_handle(t, q, v, sensors_data)[source]

TODO Write documentation.

Parameters
Return type

None

_controller_handle(t, q, v, sensors_data, command)[source]

This method is the main entry-point to interact with the simulator.

Warning

This method is not supposed to be called manually nor overloaded. It must be passed to set_controller_handle to send to use the controller to send commands directly to the robot.

Parameters
  • t (float) – Current simulation time.

  • q (numpy.ndarray) – Current actual configuration of the robot. Note that it is not the one of the theoretical model even if ‘use_theoretical_model’ is enabled for the backend Python Simulator.

  • v (numpy.ndarray) – Current actual velocity vector.

  • sensors_data (jiminy_py.core.sensorsData) – Current sensor data. Note that it is the raw data, which means that it is not an actual dictionary but it behaves similarly.

  • command (numpy.ndarray) – Output argument to update by reference using [:] or np.copyto in order to apply motors torques on the robot.

Return type

None