Interfaces

Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.

class gym_jiminy.common.bases.interfaces.EngineObsType[source]

Bases: TypedDict

Raw observation provided by Jiminy Core Engine prior to any post-processing.

t: ndarray

Current simulation time.

states: Dict[str, Mapping[str, StructNested[ValueT]] | Iterable[StructNested[ValueT]] | ndarray]

State of the agent.

measurements: Dict[str, ndarray[Any, dtype[float64]]]

Sensor measurements. Individual data for each sensor are aggregated by types in 2D arrays whose first dimension gathers the measured components and second dimension corresponds to individual measurements sorted by sensor indices.

class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]

Bases: ABC, Generic[ObsT, BaseObsT]

Observer interface for both observers and environments.

Initialize the observer interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract refresh_observation(measurement)[source]

Compute observed features based on the current simulation state and lower-level measure.

Parameters:

measurement (BaseObsT) – Low-level measure from the environment to process to get higher-level observation.

Return type:

None

class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]

Bases: ABC, Generic[ActT, BaseActT]

Controller interface for both controllers and environments.

Initialize the controller interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract compute_command(action, command)[source]

Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.

Note

The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.

Note

The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.

Parameters:
  • target – Target to achieve by means of the output action.

  • action (ActT)

  • command (BaseActT)

Returns:

Action to perform.

Return type:

None

compute_reward(terminated, info)[source]

Compute the reward related to a specific control block, plus extra information that may be helpful for monitoring or debugging purposes.

For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.

By default, it returns 0.0 without extra information no matter what. The user is expected to provide an appropriate reward on its own, either by overloading this method or by wrapping the environment with ComposedJiminyEnv for modular environment pipeline design.

Parameters:
  • terminated (bool) – Whether the episode has reached the terminal state of the MDP at the current step. This flag can be used to compute a specific terminal reward.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns:

Aggregated reward for the current step.

Return type:

float

class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]

Bases: InterfaceObserver[ObsT, EngineObsType], InterfaceController[ActT, ndarray], Env[ObsT, ActT], Generic[ObsT, ActT]

Observer plus controller interface for both generic pipeline blocks, including environments.

Initialize the observer interface.

Parameters:
  • args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

num_steps: ndarray[Any, dtype[int64]]

Number of simulation steps that has been performed since last reset of the base environment.

Note

The counter is incremented before updating the observation at the end of the step, and consequently, before evaluating the reward and the termination conditions.

stop()[source]

Stop the episode immediately without waiting for a termination or truncation condition to be satisfied.

Note

This method is mainly intended for data analysis and debugging. Stopping the episode is necessary to log the final state, otherwise it will be missing from plots and viewer replay. Moreover, sensor data will not be available during replay using object-oriented method replay. Helper method play_logs_data must be preferred to replay an episode that cannot be stopped at the time being.

Return type:

None

abstract has_terminated(info)[source]

Determine whether the episode is over, because a terminal state of the underlying MDP has been reached or an aborting condition outside the scope of the MDP has been triggered.

Note

This method is called after refresh_observation, so that the internal buffer ‘observation’ is up-to-date.

Parameters:

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns:

terminated and truncated flags.

Return type:

Tuple[bool, bool]

abstract train()[source]

Sets the environment in training mode.

Return type:

None

abstract eval()[source]

Sets the environment in evaluation mode.

This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.

Return type:

None

abstract property unwrapped: BaseJiminyEnv

The “underlying environment at the basis of the pipeline from which this environment is part of.

abstract property step_dt: float

Get timestep of a single ‘step’.

abstract property is_training: bool

Check whether the environment is in ‘train’ or ‘eval’ mode.