Interfaces¶
Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.
- class gym_jiminy.common.bases.interfaces.EngineObsType[source]¶
Bases:
TypedDict
Raw observation provided by Jiminy Core Engine prior to any post-processing.
- t: ndarray¶
Current simulation time.
- class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]¶
Bases:
ABC
,Generic
[ObsT
,BaseObsT
]Observer interface for both observers and environments.
Initialize the observer interface.
- Parameters:
- class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]¶
Bases:
ABC
,Generic
[ActT
,BaseActT
]Controller interface for both controllers and environments.
Initialize the controller interface.
- Parameters:
- abstract compute_command(action, command)[source]¶
Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.
Note
The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.
Note
The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.
- Parameters:
target – Target to achieve by means of the output action.
action (ActT)
command (BaseActT)
- Returns:
Action to perform.
- Return type:
None
- compute_reward(terminated, info)[source]¶
Compute the reward related to a specific control block, plus extra information that may be helpful for monitoring or debugging purposes.
For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.
By default, it returns 0.0 without extra information no matter what. The user is expected to provide an appropriate reward on its own, either by overloading this method or by wrapping the environment with ComposedJiminyEnv for modular environment pipeline design.
- Parameters:
- Returns:
Aggregated reward for the current step.
- Return type:
- class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]¶
Bases:
InterfaceObserver
[ObsT
,EngineObsType
],InterfaceController
[ActT
,ndarray
],Env
[ObsT
,ActT
],Generic
[ObsT
,ActT
]Observer plus controller interface for both generic pipeline blocks, including environments.
Initialize the observer interface.
- Parameters:
- num_steps: ndarray[Any, dtype[int64]]¶
Number of simulation steps that has been performed since last reset of the base environment.
Note
The counter is incremented before updating the observation at the end of the step, and consequently, before evaluating the reward and the termination conditions.
- stop()[source]¶
Stop the episode immediately without waiting for a termination or truncation condition to be satisfied.
Note
This method is mainly intended for data analysis and debugging. Stopping the episode is necessary to log the final state, otherwise it will be missing from plots and viewer replay. Moreover, sensor data will not be available during replay using object-oriented method replay. Helper method play_logs_data must be preferred to replay an episode that cannot be stopped at the time being.
- Return type:
None
- abstract has_terminated(info)[source]¶
Determine whether the episode is over, because a terminal state of the underlying MDP has been reached or an aborting condition outside the scope of the MDP has been triggered.
Note
This method is called after refresh_observation, so that the internal buffer ‘observation’ is up-to-date.
- abstract eval()[source]¶
Sets the environment in evaluation mode.
This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.
- Return type:
None
- abstract property unwrapped: BaseJiminyEnv¶
The “underlying environment at the basis of the pipeline from which this environment is part of.