Interfaces

Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.

class gym_jiminy.common.bases.interfaces.EngineObsType[source]

Bases: TypedDict

Raw observation provided by Jiminy Core Engine prior to any post-processing.

t: ndarray

Current simulation time.

states: Dict[str, Mapping[str, StructNested[ValueT]] | Iterable[StructNested[ValueT]] | ndarray]

State of the agent.

measurements: Dict[str, ndarray[Any, dtype[float64]]]

Sensor measurements. Individual data for each sensor are aggregated by types in 2D arrays whose first dimension gathers the measured components and second dimension corresponds to individual measurements sorted by sensor indices.

class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]

Bases: ABC, Generic[ObsT, BaseObsT]

Observer interface for both observers and environments.

Initialize the observer interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract refresh_observation(measurement)[source]

Compute observed features based on the current simulation state and lower-level measure.

Parameters:

measurement (BaseObsT) – Low-level measure from the environment to process to get higher-level observation.

Return type:

None

class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]

Bases: ABC, Generic[ActT, BaseActT]

Controller interface for both controllers and environments.

Initialize the controller interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract compute_command(action, command)[source]

Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.

Note

The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.

Note

The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.

Parameters:
  • target – Target to achieve by means of the output action.

  • action (ActT)

  • command (BaseActT)

Returns:

Action to perform.

Return type:

None

compute_reward(terminated, truncated, info)[source]

Compute the reward related to a specific control block.

For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.

By default, it returns 0.0 no matter what. It is up to the user to provide a dedicated reward function whenever appropriate.

Warning

Only returning an aggregated scalar reward is supported. Yet, it is possible to update ‘info’ by reference if one wants for keeping track of individual reward components or any kind of extra info that may be helpful for monitoring or debugging purposes.

Parameters:
  • terminated (bool) – Whether the episode has reached the terminal state of the MDP at the current step. This flag can be used to compute a specific terminal reward.

  • truncated (bool) – Whether a truncation condition outside the scope of the MDP has been satisfied at the current step. This flag can be used to adapt the reward.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns:

Aggregated reward for the current step.

Return type:

float

class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]

Bases: InterfaceObserver[ObsT, EngineObsType], InterfaceController[ActT, ndarray], Env[ObsT, ActT], Generic[ObsT, ActT]

Observer plus controller interface for both generic pipeline blocks, including environments.

Initialize the observer interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

property unwrapped: InterfaceJiminyEnv

Base environment of the pipeline.

abstract property step_dt: float

Get timestep of a single ‘step’.

abstract property is_training: bool

Check whether the environment is in ‘train’ or ‘eval’ mode.

abstract train()[source]

Sets the environment in training mode.

Return type:

None

abstract eval()[source]

Sets the environment in evaluation mode.

This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.

Return type:

None