Interfaces¶
Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.
- class gym_jiminy.common.bases.interfaces.EngineObsType[source]¶
Bases:
TypedDict
Raw observation provided by Jiminy Core Engine prior to any post-processing.
- t: ndarray¶
Current simulation time.
- class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]¶
Bases:
ABC
,Generic
[ObsT
,BaseObsT
]Observer interface for both observers and environments.
Initialize the observer interface.
- Parameters:
- class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]¶
Bases:
ABC
,Generic
[ActT
,BaseActT
]Controller interface for both controllers and environments.
Initialize the controller interface.
- Parameters:
- abstract compute_command(action, command)[source]¶
Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.
Note
The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.
Note
The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.
- Parameters:
target – Target to achieve by means of the output action.
action (ActT)
command (BaseActT)
- Returns:
Action to perform.
- Return type:
None
- compute_reward(terminated, truncated, info)[source]¶
Compute the reward related to a specific control block.
For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.
By default, it returns 0.0 no matter what. It is up to the user to provide a dedicated reward function whenever appropriate.
Warning
Only returning an aggregated scalar reward is supported. Yet, it is possible to update ‘info’ by reference if one wants for keeping track of individual reward components or any kind of extra info that may be helpful for monitoring or debugging purposes.
- Parameters:
terminated (bool) – Whether the episode has reached the terminal state of the MDP at the current step. This flag can be used to compute a specific terminal reward.
truncated (bool) – Whether a truncation condition outside the scope of the MDP has been satisfied at the current step. This flag can be used to adapt the reward.
info (Dict[str, Any]) – Dictionary of extra information for monitoring.
- Returns:
Aggregated reward for the current step.
- Return type:
- class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]¶
Bases:
InterfaceObserver
[ObsT
,EngineObsType
],InterfaceController
[ActT
,ndarray
],Env
[ObsT
,ActT
],Generic
[ObsT
,ActT
]Observer plus controller interface for both generic pipeline blocks, including environments.
Initialize the observer interface.
- Parameters:
- property unwrapped: InterfaceJiminyEnv¶
Base environment of the pipeline.
- abstract eval()[source]¶
Sets the environment in evaluation mode.
This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.
- Return type:
None