Generic¶
Generic gym environment specifically tailored to work with Jiminy Simulator as backend physics engine, and Jiminy Viewer as 3D visualizer. It implements the official OpenAI Gym API and extended it to add more functionalities.
- class gym_jiminy.common.envs.generic.BaseJiminyEnv(simulator, step_dt, simulation_duration_max=86400.0, debug=False, render_mode=None, **kwargs)[source]¶
Bases:
InterfaceJiminyEnv
[Obs
,Act
],Generic
[Obs
,Act
]Base class to train an agent in Gym OpenAI using Jiminy simulator for physics computations.
It creates an Gym environment wrapping an already instantiated Jiminy simulator and behaves like any standard Gym environment.
The observation space is a dictionary gathering the current simulation time, the real robot state, and the sensors data. The action is a vector gathering the torques of the actuator of the robot.
There is no reward by default. It is up to the user to overload this class to implement one. It has been designed to be highly flexible and easy to customize by overloading it to fit the vast majority of users’ needs.
Note
In evaluation or debug modes, log files of the simulations are automatically written in the default temporary file directory of the system. Writing is triggered by calling stop manually or upon reset, right before starting the next episode. The path of the lof file assocciated with a given is stored under key log_path of the extra info output when calling reset. The user is responsible for deleting manually old log files if necessary.
- Parameters:
simulator (Simulator) – Jiminy Python simulator used for physics computations. It must be fully initialized.
step_dt (float) – Environment timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.
simulation_duration_max (float) – Maximum duration of a simulation. If the current simulation time exceeds this threshold, then it will triggers is_truncated=True. It cannot exceed the maximum possible duration before telemetry log time overflow which is extremely large (about 30 years). Beware that log data are stored in RAM, which may cause out-of-memory error if the episode is lasting for too long without reset. Optional: About 4GB of log data assuming 5ms control update period and telemetry disabled for everything but the robot configuration.
render_mode (str | None) – Desired rendering mode, ie “human” or “rgb_array”. If “human” is specified, calling render will open a graphical window for visualization, otherwise a rgb image is returned, as a 3D numpy array whose first dimension are the 3 red, green, blue channels and the two subsequent dimensions are the pixel height and weight respectively. None to select automatically the most appropriate mode based on the user-specified rendering backend if any, or the machine environment. Note that “rgb_array” does not require a graphical window manager. Optional: None by default.
debug (bool) – Whether the debug mode must be enabled. Doing it enables telemetry recording.
kwargs (Any) – Extra keyword arguments that may be useful for derived environments with multiple inheritance, and to allow automatic pipeline wrapper generation.
- stepper_state: jiminy.StepperState¶
- is_simulation_running: npt.NDArray[np.bool_]¶
- robot: jiminy.Robot¶
- robot_state: jiminy.RobotState¶
- derived: InterfaceJiminyEnv¶
Top-most block from which this environment is part of when leveraging modular pipeline design capability.
- log_fieldnames: Mapping[str, FieldNested]¶
Fielnames associated with all the variables that have been recorded to the telemetry by any of the layer of the whole pipeline environment.
- property np_random: Generator¶
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- num_steps: npt.NDArray[np.int64]¶
Number of simulation steps that has been performed since last reset of the base environment.
Note
The counter is incremented before updating the observation at the end of the step, and consequently, before evaluating the reward and the termination conditions.
- quantities: QuantityManager¶
- observation: Obs¶
- action: Act¶
- _initialize_action_space()[source]¶
Configure the action space of the environment.
The action is a vector gathering the torques of the actuator of the robot.
Warning
This method is called internally by reset method. It is not meant to be overloaded since the actual action space of the robot is uniquely defined.
- Return type:
None
- _initialize_seed(seed=None)[source]¶
Specify the seed of the environment.
Note
This method is not meant to be called manually.
Warning
It also resets the low-level jiminy Engine. Therefore one must call the reset method afterward.
- Parameters:
seed (int | None) – Random seed, as a positive integer. Optional: A strongly random seed will be generated by gym if omitted.
- Returns:
Updated seed of the environment
- Return type:
None
- register_variable(name, value, fieldnames=None, namespace=None, *, is_eval_only=True)[source]¶
Register variable to the telemetry.
Warning
Variables are registered by reference. Consequently, the user is responsible to manage the lifetime of the data to prevent it from being garbage collected.
See also
See gym_jiminy.common.utils.register_variables for details.
- Parameters:
name (str) – Base name of the variable. It will be used to prepend fields, using ‘.’ delimiter.
value (Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | ndarray) – Variable to register. It supports any nested data structure whose leaves have type np.ndarray and either dtype np.float64 or np.int64.
fieldnames (str | Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | None) – Nested fieldnames with the exact same data structure as the variable to register ‘value’. Individual elements of each leaf array must have its own fieldname, all gathered in a nested tuple with the same shape of the array. Optional: Generic fieldnames will be generated automatically.
namespace (str | None) – Namespace used to prepend the base name ‘name’, using ‘.’ delimiter. Empty string to disable. Optional: Disabled by default.
is_eval_only (bool) – Whether to register the variable to the telemetry only in evaluation mode. Optional: True by default.
- Return type:
None
- set_wrapper_attr(name, value, *, force=True)[source]¶
Assign an attribute to a specified value in the first layer of the pipeline environment for which it already exists, from this wrapper to the base environment. If the attribute does not exist in any layer, it is directly added to this wrapper.
- property viewer_kwargs: Dict[str, Any]¶
Default keyword arguments for the instantiation of the viewer, e.g. when render method is first called.
Warning
The default argument backend is ignored if a backend is already up and running, even if no viewer has been instantiated for the environment at hand in particular.
- property unwrapped: BaseJiminyEnv¶
The “underlying environment at the basis of the pipeline from which this environment is part of.
- train(mode=True)[source]¶
Sets the environment in training or evaluation mode.
- Parameters:
mode (bool) – Whether to set training (True) or evaluation mode (False). Optional: True by default.
- Return type:
None
- _update_pipeline(derived)[source]¶
Dynamically update which blocks are declared as part of the environment pipeline.
Internally, this method first unregister all blocks of the old pipeline, then register all blocks of the new pipeline, and finally notify the base environment that the top-most block of the pipeline as changed and must be updated accordingly.
Warning
This method is not supposed to be called manually nor overloaded.
- Parameters:
derived (InterfaceJiminyEnv | None) – Either the top-most block of the pipeline or None. If None, unregister all blocks of the old pipeline. If not None, first unregister all blocks of the old pipeline, then register all blocks of the new pipeline.
- Return type:
None
- reset(*, seed=None, options=None)[source]¶
Reset the environment.
In practice, it resets the backend simulator and set the initial state of the robot. The initial state is obtained by calling ‘_sample_state’. This method is also in charge of setting the initial action (at the beginning) and observation (at the end).
Warning
It starts the simulation immediately. As a result, it is not possible to change the robot (included options), nor to register log variable.
- Parameters:
seed (int | None) – Random seed, as a positive integer. Optional: None by default. If None, then the internal random generator of the environment will be kept as-is, without updating its seed.
options (Dict[str, Any] | None) – Additional information to specify how the environment is reset. The field ‘reset_hook’ is reserved for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.
- Returns:
Initial observation of the episode and some auxiliary information for debugging or monitoring purpose.
- Return type:
Tuple[Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | ndarray, Dict[str, Any]]
- close()[source]¶
Clean up the environment after the user has finished using it.
It terminates the Python Jiminy engine.
Warning
Calling reset or step afterward is an undefined behavior.
- Return type:
None
- step(action)[source]¶
Integration timestep of the environment’s dynamics under prescribed agent’s action over a single timestep, i.e. collect one transition step of the underlying Markov Decision Process of the learning problem.
Warning
When reaching the end of an episode (
terminated or truncated
), it is necessary to reset the environment for starting a new episode before being able to do steps once again.- Parameters:
action (Act) – Action performed by the agent during the ongoing step.
- Returns:
observation (ObsType) - The next observation of the environment’s as a result of taking agent’s action.
reward (float): the reward associated with the transition step.
terminated (bool): Whether the agent reaches the terminal state, which means success or failure depending of the MDP of the task.
truncated (bool): Whether some truncation condition outside the scope of the MDP is satisfied. This can be used to end an episode prematurely before a terminal state is reached, for instance if the agent’s state is going out of bounds.
info (dict): Contains auxiliary information that may be helpful for debugging and monitoring. This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms from which the total reward is derived.
- Return type:
Tuple[Mapping[str, StructNested[ValueT]] | Sequence[StructNested[ValueT]] | ndarray, SupportsFloat, bool, bool, Dict[str, Any]]
- stop()[source]¶
Stop the underlying simulation completely.
Note
This method has nothing to do with termination and/or truncation of episodes. Calling it manually should never be necessary for collecting samples during training.
Note
This method is mainly intended for evaluation analysis and debugging. Stopping the episode is only necessary for switching between training and evaluation mode, and to log the final state, otherwise it will be missing from plots and viewer replay (see InterfaceJiminyEnv.log_data for details). Moreover, sensor data will not be available when calling replay. The helper method jiminy_py.viewer.replay.play_logs_data must be preferred to replay an episode that cannot be stopped.
Note
Resuming a simulation is not supported, which means that calling reset to start a new simulation is necessary prior to calling step once again. Falling to do so will trigger an exception.
- Return type:
None
- render()[source]¶
Render the agent in its environment.
Changed in version This: method does not take input arguments anymore due to changes of the official gym.Wrapper API. A workaround is to set simulator.viewer_kwargs beforehand. Alternatively, it is possible to call the low-level implementation simulator.render directly to avoid API restrictions.
- Returns:
RGB array if ‘render_mode’ is ‘rgb_array’, None otherwise.
- Return type:
RenderFrame | List[RenderFrame] | None
- plot(enable_block_states=False, **kwargs)[source]¶
Plot figures of simulation data over time associated with the ongoing episode until now if any, the previous one otherwise.
- Parameters:
- Return type:
- replay(**kwargs)[source]¶
Replay the ongoing episode until now if any, the previous one otherwise.
- Parameters:
kwargs (Any) – Extra keyword arguments to forward to jiminy_py.viewer.replay.play_trajectories.
- Return type:
None
- evaluate(policy_fn, seed=None, horizon=None, enable_stats=True, enable_replay=None, **kwargs)[source]¶
Evaluate a policy on the environment over a complete episode.
Warning
It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the environment pipeline.
- Parameters:
policy_fn (Callable[[Obs, Act | None, SupportsFloat | None, bool, bool, Dict[str, Any]], Act]) – Policy to evaluate as a callback function. It must have the following signature (**rew** = None at reset):
policy_fn(obs: Obs,action_prev: Optional[Act],reward: Optional[float],terminated: bool,truncated: bool,info: InfoType) -> Act # actionseed (int | None) – Seed of the environment to be used for the evaluation of the policy. Optional: None by default. If not specified, then a strongly random seed will be generated by gym.
horizon (float | None) – Horizon of the simulation before early termination. None to disable. Optional: None by default.
enable_stats (bool) – Whether to print high-level statistics after the simulation. Optional: Enabled by default.
enable_replay (bool | None) – Whether to enable replay of the simulation, and eventually recording if the extra keyword argument record_video_path is provided. Optional: Enabled by default if display is available, disabled otherwise.
kwargs (Any) – Extra keyword arguments to forward to the replay method if replay is requested.
- Return type:
- play_interactive(enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)[source]¶
Interact evaluation mode where the robot or the world itself are actively “controlled” via keyboard inputs, with real-time rendering.
This method is not available for all pipeline environments. When it does, the available interactions and keyboard mapping is completely implementation-specific. Please refer to the documentation of the base environment being considered for details.
Warning
It ignores any top-level gym.Wrapper that may be used for training but are not considered part of the pipeline environment.
- Parameters:
enable_travelling (bool | None) – Whether enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.
start_paused (bool) – Whether to start in pause. Optional: Enabled by default.
verbose (bool) – Whether to display status messages.
kwargs (Any) – Implementation-specific extra keyword arguments if any.
enable_is_done (bool)
- Return type:
None
- _setup()[source]¶
Configure the environment. It must guarantee that its internal state is valid after calling this method.
By default, it enforces some options of the engine.
Warning
Beware this method is called BEFORE observe_dt and controller_dt are properly set, so one cannot rely on it at this point. Yet, step_dt is available and should always be. One can still access the low-level controller update period through engine_options[‘stepper’][‘controllerUpdatePeriod’].
Note
The user must overload this method to enforce custom observer update period, otherwise it will be the same of the controller.
Note
This method is called internally by reset methods.
- Return type:
None
- _initialize_observation_space()[source]¶
Configure the observation of the environment.
By default, the observation is a dictionary gathering the current simulation time, the real agent state, and the sensors data.
Note
This method is called internally by reset method at the very end, just before computing and returning the initial observation. This method, alongside refresh_observation, must be overwritten in order to define a custom observation space.
- Return type:
None
- _neutral()[source]¶
Returns a neutral valid configuration for the agent.
The default implementation returns the neutral configuration if valid, the “mean” configuration otherwise (right in the middle of the position lower and upper bounds).
Warning
Beware there is no guarantee for this configuration to be statically stable.
Note
This method is called internally by ‘_sample_state’ to generate the initial state. It can be overloaded to ensure static stability of the configuration.
- Return type:
ndarray
- _sample_state()[source]¶
Returns a randomized yet valid configuration and velocity for the robot.
The default implementation returns the neutral configuration and zero velocity.
Offsets are applied on the freeflyer to ensure no contact points are going through the ground and up to three are in contact.
Note
This method is called internally by reset to generate the initial state. It can be overloaded to act as a random state generator.
- Return type:
Tuple[ndarray, ndarray]
- _initialize_buffers()[source]¶
Initialize internal buffers for fast access to shared memory or to avoid redundant computations.
Note
This method is called at every reset, right after self.simulator.start. At this point, the simulation is running but refresh_observation has never been called, so that it can be used to initialize buffers involving the engine state but required to refresh the observation.
Note
Buffers requiring manual update must be refreshed using _refresh_buffers method.
Warning
This method is not appropriate for initializing buffers involved in compute_command. At the time being, there is no better way that taking advantage of the flag self.is_simulation_running in the method compute_command itself.
- Return type:
None
- _refresh_buffers()[source]¶
Refresh internal buffers that must be updated manually.
Note
This method is called after every internal engine.step and before refreshing the observation one last time. As such, it is the right place to update shared data between has_terminated and compute_reward. However, it is not appropriate for quantities involved in refresh_observation not compute_command, which may be called more often than once per step.
Note
_initialize_buffers method can be used to initialize buffers that may requires special care.
Warning
Be careful when using this method to update buffers involved in refresh_observation. The latter is called at self.observe_dt update period, while this method is called at self.step_dt update period. self.observe_dt is likely to be different from self.step_dt, unless configured manually when overloading _setup method.
- Return type:
None
- _abc_impl = <_abc._abc_data object>¶
- _controller_handle(t, q, v, sensor_measurements, command)¶
Thin wrapper around user-specified refresh_observation and compute_command methods.
Note
The internal cache of managed quantities is cleared right away systematically, before anything else.
Warning
This method is not supposed to be called manually nor overloaded. It will be used by the base environment to instantiate a jiminy.FunctionalController responsible for both refreshing observations and compute commands of all the way through a given pipeline in the correct order of the blocks to finally sends command motor torques directly to the robot.
- Parameters:
t (float) – Current simulation time.
q (ndarray) – Current extended configuration vector of the robot.
v (ndarray) – Current actual velocity vector of the robot.
sensor_measurements (SensorMeasurementTree) – Current sensor measurements.
command (ndarray) – Output argument corresponding to motors torques to apply on the robot. It must be updated by reference using [:] or np.copyto.
- Returns:
Motors torques to apply on the robot.
- Return type:
None
- _observer_handle(t, q, v, sensor_measurements)¶
Thin wrapper around user-specified refresh_observation method.
Warning
This method is not supposed to be called manually nor overloaded.
- Parameters:
t (float) – Current simulation time.
q (ndarray) – Current extended configuration vector of the robot.
v (ndarray) – Current extended velocity vector of the robot.
sensor_measurements (SensorMeasurementTree) – Current sensor data.
- Return type:
None
- compute_reward(terminated, info)¶
Compute the reward related to a specific control block, plus extra information that may be helpful for monitoring or debugging purposes.
For the corresponding MDP to be stationary, the computation of the reward is supposed to involve only the transition from previous to current state of the simulation (possibly comprising multiple agents) under the ongoing action.
By default, it returns 0.0 without extra information no matter what. The user is expected to provide an appropriate reward on its own, either by overloading this method or by wrapping the environment with ComposedJiminyEnv for modular environment pipeline design.
- Parameters:
- Returns:
Aggregated reward for the current step.
- Return type:
- eval()¶
Sets the environment in evaluation mode.
This only has an effect on certain environments. It can be used for instance to enable clipping or filtering of the action at evaluation time specifically. See documentations of a given environment for details about their behaviors in training and evaluation modes.
- Return type:
None
- get_wrapper_attr(name)¶
Gets the attribute name from the environment.
- has_wrapper_attr(name)¶
Checks if the attribute name exists in the environment.
- property log_data: Dict[str, Any]¶
Get log data associated with the ongoing simulation if any, the previous one otherwise.
See also
See Simulator.log_data documentation for details.
- property np_random_seed: int¶
Returns the environment’s internal
_np_random_seed
that if not set will first initialise with a random int as seed.If
np_random_seed
was set directly instead of throughreset()
orset_np_random_through_seed()
, the seed will take the value -1.- Returns:
int: the seed of the current np_random or -1, if the seed of the rng is unknown
- refresh_observation(measurement)[source]¶
Compute the observation based on the current state of the robot.
In practice, it updates the internal buffer directly for the sake of efficiency.
By default, it sets the observation to the value of the measurement, which would not work unless Obs corresponds to EngineObsType.
Note
This method is called and the end of every low-level Engine.step.
Warning
This method may be called without any simulation running, either to perform basic consistency checking or allocate and initialize buffers. There is no way at the time being to distinguish the initialization stage in particular. A workaround consists in checking whether the simulation already started. It is not exactly the same but it does the job regarding preserving efficiency.
Warning
One must only rely on measurement to get the state of the robot, as anything else is not reliable for this. More specifically, self.robot_state would not be valid if an adaptive stepper is being used for physics integration.
- Parameters:
measurement (EngineObsType)
- Return type:
None
- action_space: gym.Space[Act]¶
- observation_space: gym.Space[Obs]¶
- measurements: EngineObsType¶
- compute_command(action, command)[source]¶
Compute the motors efforts to apply on the robot.
By default, all it does is forwarding the input action as is, without performing any processing. One is responsible of overloading this method if the action space has been customized, or just to clip the action to make sure it is never out-of-bounds if necessary.
Warning
There is not good place to initialize buffers that are necessary to compute the command. The only solution for now is to define initialization inside this method itself, using the safeguard if not self.is_simulation_running:.
- Parameters:
action (Act) – High-level target to achieve by means of the command.
command (ndarray)
- Return type:
None
- has_terminated(info)[source]¶
Determine whether the episode is over, because a terminal state of the underlying MDP has been reached or an aborting condition outside the scope of the MDP has been triggered.
By default, it always returns terminated=False, and truncated=True iif the observation is out-of-bounds. One can overload this method to implement custom termination conditions for the environment at hand.
Warning
No matter what, truncation will happen when reaching the maximum simulation duration, i.e. ‘self.simulation_duration_max’.
Note
This method is called after refresh_observation, so that the internal buffer ‘observation’ is up-to-date.
- _key_to_action(key, obs, reward, **kwargs)[source]¶
Mapping from input keyboard keys to actions.
Note
This method is called before step method systematically, even if not key has been pressed, or reward is not defined. In such a case, the value is None.
Note
The mapping can be state dependent, and the key can be used for something different than computing the action directly. For instance, one can provide as extra argument to this method a custom policy taking user parameters mapped to keyboard in input.
Warning
Overloading this method is required for calling play_interactive method.
- Parameters:
key (str | None) – Key pressed by the user as a string. None if no key has been pressed since the last step of the environment.
obs (Obs) – Previous observation from last step of the environment. It is always available, included right after reset.
reward (float | None) – Previous reward from last step of the environment. Not available before first step right after reset.
kwargs (Any) – Extra keyword argument provided by the user when calling play_interactive method.
- Returns:
Action to forward to the environment. None to hold the
- Return type:
Act | None
previous action without updating it.