Interfaces¶
Controller and observer abstract interfaces from reinforcement learning, specifically design for Jiminy engine, and defined as mixin classes. Any observer/controller block must inherit and implement those interfaces.
- class gym_jiminy.common.bases.interfaces.EngineObsType[source]¶
Bases:
TypedDict
Raw observation provided by Jiminy Core Engine prior to any post-processing.
- t: ndarray¶
Current simulation time.
- class gym_jiminy.common.bases.interfaces.InterfaceObserver(*args, **kwargs)[source]¶
Bases:
Generic
[Obs
,BaseObs
]Observer interface for both observers and environments.
Initialize the observer interface.
- Parameters:
- class gym_jiminy.common.bases.interfaces.InterfaceController(*args, **kwargs)[source]¶
Bases:
Generic
[Act
,BaseAct
]Controller interface for both controllers and environments.
Initialize the controller interface.
- Parameters:
- abstract compute_command(action, command)[source]¶
Compute the action to perform by the subsequent block, namely a lower-level controller, if any, or the environment to ultimately control, based on a given high-level action.
Note
The controller is supposed to be already fully configured whenever this method might be called. Thus it can only be called manually after reset. This method has to deal with the initialization of the internal state, but _setup method does so.
Note
The user is expected to fetch by itself the observation of the environment if necessary to carry out its computations by calling self.env.observation. Beware it will NOT contain any information provided by higher-level blocks in the pipeline.
- Parameters:
target – Target to achieve by means of the output action.
action (Act)
command (BaseAct)
- Returns:
Action to perform.
- Return type:
None
- compute_reward(terminated, info)[source]¶
Compute the reward related to a specific control block, plus extra information that may be helpful for monitoring or debugging purposes.
For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.
By default, it returns 0.0 without extra information no matter what. The user is expected to provide an appropriate reward on its own, either by overloading this method or by wrapping the environment with ComposedJiminyEnv for modular environment pipeline design.
- Parameters:
- Returns:
Aggregated reward for the current step.
- Return type:
- class gym_jiminy.common.bases.interfaces.InterfaceJiminyEnv(*args, **kwargs)[source]¶
Bases:
InterfaceObserver
[Obs
,EngineObsType
],InterfaceController
[Act
,ndarray
],Env
[Obs
,Act
],Generic
[Obs
,Act
]Observer plus controller interface for both generic pipeline blocks, including environments.
Initialize the observer interface.
- Parameters:
- log_fieldnames: Mapping[str, Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | str]¶
Fielnames associated with all the variables that have been recorded to the telemetry by any of the layer of the whole pipeline environment.
- num_steps: ndarray[Any, dtype[int64]]¶
Number of simulation steps that has been performed since last reset of the base environment.
Note
The counter is incremented before updating the observation at the end of the step, and consequently, before evaluating the reward and the termination conditions.
- property log_data: Dict[str, Any]¶
Get log data associated with the ongoing simulation if any, the previous one otherwise.
See also
See Simulator.log_data documentation for details.
- abstract stop()[source]¶
Stop the underlying simulation completely.
Note
This method has nothing to do with termination and/or truncation of episodes. Calling it manually should never be necessary for collecting samples during training.
Note
This method is mainly intended for evaluation analysis and debugging. Stopping the episode is only necessary for switching between training and evaluation mode, and to log the final state, otherwise it will be missing from plots and viewer replay (see InterfaceJiminyEnv.log_data for details). Moreover, sensor data will not be available when calling replay. The helper method jiminy_py.viewer.replay.play_logs_data must be preferred to replay an episode that cannot be stopped.
Note
Resuming a simulation is not supported, which means that calling reset to start a new simulation is necessary prior to calling step once again. Falling to do so will trigger an exception.
- Return type:
None
- abstract plot(enable_block_states=False, **kwargs)[source]¶
Plot figures of simulation data over time associated with the ongoing episode until now if any, the previous one otherwise.
- Parameters:
- Return type:
- abstract replay(**kwargs)[source]¶
Replay the ongoing episode until now if any, the previous one otherwise.
- Parameters:
kwargs (Any) – Implementation-specific extra keyword arguments if any.
- Return type:
None
- abstract evaluate(policy_fn, seed=None, horizon=None, enable_stats=True, enable_replay=None, **kwargs)[source]¶
Evaluate a policy on the environment over a complete episode.
Warning
It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the environment pipeline.
- Parameters:
policy_fn (Callable[[Obs, Act | None, SupportsFloat | None, bool, bool, Dict[str, Any]], Act]) – Policy to evaluate as a callback function. It must have the following signature (**rew** = None at reset):
policy_fn(obs: Obs,action_prev: Optional[Act],reward: Optional[float],terminated: bool,truncated: bool,info: InfoType) -> Act # actionseed (int | None) – Seed of the environment to be used for the evaluation of the policy. Optional: None by default. If not specified, then a strongly random seed will be generated by gym.
horizon (float | None) – Horizon of the simulation before early termination. None to disable. Optional: None by default.
enable_stats (bool) – Whether to print high-level statistics after the simulation. Optional: Enabled by default.
enable_replay (bool | None) – Whether to enable replay of the simulation, and eventually recording if the extra keyword argument record_video_path is provided. Optional: Enabled by default if display is available, disabled otherwise.
kwargs (Any) – Extra keyword arguments to forward to the replay method if replay is requested.
- Return type:
- abstract play_interactive(enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)[source]¶
Interact evaluation mode where the robot or the world itself are actively “controlled” via keyboard inputs, with real-time rendering.
This method is not available for all pipeline environments. When it does, the available interactions and keyboard mapping is completely implementation-specific. Please refer to the documentation of the base environment being considered for details.
Warning
It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the pipeline environment.
- Parameters:
enable_travelling (bool | None) – Whether enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.
start_paused (bool) – Whether to start in pause. Optional: Enabled by default.
verbose (bool) – Whether to display status messages.
kwargs (Any) – Implementation-specific extra keyword arguments if any.
enable_is_done (bool)
- Return type:
None
- abstract has_terminated(info)[source]¶
Determine whether the episode is over, because a terminal state of the underlying MDP has been reached or an aborting condition outside the scope of the MDP has been triggered.
Note
This method is called after refresh_observation, so that the internal buffer ‘observation’ is up-to-date.
- abstract set_wrapper_attr(name, value, *, force=True)[source]¶
Assign an attribute to a specified value in the first layer of the pipeline environment for which it already exists, from this wrapper to the base environment. If the attribute does not exist in any layer, it is directly added to this wrapper.
- abstract train(mode=True)[source]¶
Sets the environment in training or evaluation mode.
- Parameters:
mode (bool) – Whether to set training (True) or evaluation mode (False). Optional: True by default.
- Return type:
None
- eval()[source]¶
Sets the environment in evaluation mode.
This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.
- Return type:
None
- abstract property unwrapped: BaseJiminyEnv¶
The “underlying environment at the basis of the pipeline from which this environment is part of.