Interfaces¶

Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.

class gym_jiminy.common.bases.interfaces.EngineObsType[source]¶

Bases: TypedDict

Raw observation provided by Jiminy Core Engine prior to any post-processing.

t: ndarray¶: Current simulation time.

states: Dict[str, Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | ndarray]¶: State of the agent.

measurements: Dict[str, ndarray[Any, dtype[float64]]]¶: Sensor measurements. Individual data for each sensor are aggregated by types in 2D arrays whose first dimension gathers the measured components and second dimension corresponds to individual measurements sorted by sensor indices.

class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]¶

Bases: Generic[Obs, BaseObs]

Observer interface for both observers and environments.

Initialize the observer interface.

Parameters:

args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.
kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract refresh_observation(measurement)[source]¶

Compute observed features based on the current simulation state and lower-level measure.

Parameters:: measurement (BaseObs) – Low-level measure from the environment to process to get higher-level observation.
Return type:: None

class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]¶

Bases: Generic[Act, BaseAct]

Controller interface for both controllers and environments.

Initialize the controller interface.

Parameters:

args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.
kwargs (Any) – Extra keyword arguments. See ‘args’.

abstract compute_command(action, command)[source]¶

Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.

Note

The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.

Note

The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.

Parameters:

target – Target to achieve by means of the output action.
action (Act)
command (BaseAct)

Returns:

Action to perform.

Return type:

None

compute_reward(terminated, info)[source]¶

Compute the reward related to a specific control block, plus extra information that may be helpful for monitoring or debugging purposes.

For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.

By default, it returns 0.0 without extra information no matter what. The user is expected to provide an appropriate reward on its own, either by overloading this method or by wrapping the environment with ComposedJiminyEnv for modular environment pipeline design.

Parameters:

terminated (bool) – Whether the episode has reached the terminal state of the MDP at the current step. This flag can be used to compute a specific terminal reward.
info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns:

Aggregated reward for the current step.

Return type:

float

class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]¶

Bases: InterfaceObserver[Obs, EngineObsType], InterfaceController[Act, ndarray], Env[Obs, Act], Generic[Obs, Act]

Observer plus controller interface for both generic pipeline blocks, including environments.

Initialize the observer interface.

Parameters:

args (Any) – Extra arguments that may be useful for mixing multiple inheritance through multiple inheritance.
kwargs (Any) – Extra keyword arguments. See ‘args’.

log_fieldnames: Mapping[str, Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | str]¶: Fielnames associated with all the variables that have been recorded to the telemetry by any of the layer of the whole pipeline environment.

num_steps: ndarray[Any, dtype[int64]]¶: Number of simulation steps that has been performed since last reset of the base environment.

Note

The counter is incremented before updating the observation at the end of the step, and consequently, before evaluating the reward and the termination conditions.

property log_data: Dict[str, Any]¶: Get log data associated with the ongoing simulation if any, the previous one otherwise.

See also

See Simulator.log_data documentation for details.

abstract stop()[source]¶

Stop the underlying simulation completely.

Note

This method has nothing to do with termination and/or truncation of episodes. Calling it manually should never be necessary for collecting samples during training.

Note

This method is mainly intended for evaluation analysis and debugging. Stopping the episode is only necessary for switching between training and evaluation mode, and to log the final state, otherwise it will be missing from plots and viewer replay (see InterfaceJiminyEnv.log_data for details). Moreover, sensor data will not be available when calling replay. The helper method jiminy_py.viewer.replay.play_logs_data must be preferred to replay an episode that cannot be stopped.

Note

Resuming a simulation is not supported, which means that calling reset to start a new simulation is necessary prior to calling step once again. Falling to do so will trigger an exception.

Return type:: None

abstract plot(enable_block_states=False, **kwargs)[source]¶

Plot figures of simulation data over time associated with the ongoing episode until now if any, the previous one otherwise.

Parameters:

kwargs (Any) – Implementation-specific extra keyword arguments if any.
enable_block_states (bool)

Return type:

TabbedFigure

abstract replay(**kwargs)[source]¶

Replay the ongoing episode until now if any, the previous one otherwise.

Parameters:: kwargs (Any) – Implementation-specific extra keyword arguments if any.
Return type:: None

abstract evaluate(policy_fn, seed=None, horizon=None, enable_stats=True, enable_replay=None, **kwargs)[source]¶

Evaluate a policy on the environment over a complete episode.

Warning

It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the environment pipeline.

Parameters:

policy_fn (Callable[[Obs, Act | None, SupportsFloat | None, bool, bool, Dict[str, Any]], Act]) – Policy to evaluate as a callback function. It must have the following signature (**rew** = None at reset):

policy_fn(obs: Obs,

action_prev: Optional[Act],

reward: Optional[float],

terminated: bool,

truncated: bool,

info: InfoType

) -> Act # action
seed (int | None) – Seed of the environment to be used for the evaluation of the policy. Optional: None by default. If not specified, then a strongly random seed will be generated by gym.
horizon (float | None) – Horizon of the simulation before early termination. None to disable. Optional: None by default.
enable_stats (bool) – Whether to print high-level statistics after the simulation. Optional: Enabled by default.
enable_replay (bool | None) – Whether to enable replay of the simulation, and eventually recording if the extra keyword argument record_video_path is provided. Optional: Enabled by default if display is available, disabled otherwise.
kwargs (Any) – Extra keyword arguments to forward to the replay method if replay is requested.

Return type:

Tuple[List[SupportsFloat], List[Dict[str, Any]]]

abstract play_interactive(enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)[source]¶

Interact evaluation mode where the robot or the world itself are actively “controlled” via keyboard inputs, with real-time rendering.

This method is not available for all pipeline environments. When it does, the available interactions and keyboard mapping is completely implementation-specific. Please refer to the documentation of the base environment being considered for details.

Warning

It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the pipeline environment.

Parameters:

enable_travelling (bool | None) – Whether enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.
start_paused (bool) – Whether to start in pause. Optional: Enabled by default.
verbose (bool) – Whether to display status messages.
kwargs (Any) – Implementation-specific extra keyword arguments if any.
enable_is_done (bool)

Return type:

None

abstract has_terminated(info)[source]¶

Determine whether the episode is over, because a terminal state of the underlying MDP has been reached or an aborting condition outside the scope of the MDP has been triggered.

Note

This method is called after refresh_observation, so that the internal buffer ‘observation’ is up-to-date.

Parameters:: info (Dict[str, Any]) – Dictionary of extra information for monitoring.
Returns:: terminated and truncated flags.
Return type:: Tuple[bool, bool]

abstract set_wrapper_attr(name, value, *, force=True)[source]¶

Assign an attribute to a specified value in the first layer of the pipeline environment for which it already exists, from this wrapper to the base environment. If the attribute does not exist in any layer, it is directly added to this wrapper.

Parameters:

name (str) – Name of the attribute.
value (Any) – Desired value of the attribute.
force (bool)

Return type:

None

abstract train(mode=True)[source]¶

Sets the environment in training or evaluation mode.

Parameters:: mode (bool) – Whether to set training (True) or evaluation mode (False). Optional: True by default.
Return type:: None

eval()[source]¶

Sets the environment in evaluation mode.

This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.

Return type:: None

abstract property training: bool¶: Check whether the environment is in training or evaluation mode.

abstract property unwrapped: BaseJiminyEnv¶: The “underlying environment at the basis of the pipeline from which this environment is part of.

abstract property step_dt: float¶: Get timestep of a single ‘step’.