Generic¶
Generic gym environment specifically tailored to work with Jiminy Simulator as backend physics engine, and Jiminy Viewer as 3D visualizer. It implements the official OpenAI Gym API and extended it to add more functionalities.
- class gym_jiminy.common.envs.env_generic._LazyDictItemFilter(dict_packed, item_index)[source]¶
Bases:
collections.abc.Mapping
- _abc_impl = <_abc_data object>¶
- get(k[, d]) D[k] if k in D, else d. d defaults to None. ¶
- items() a set-like object providing a view on D's items ¶
- keys() a set-like object providing a view on D's keys ¶
- values() an object providing a view on D's values ¶
- class gym_jiminy.common.envs.env_generic.BaseJiminyEnv(simulator, step_dt, enforce_bounded_spaces=False, debug=False, **kwargs)[source]¶
Bases:
gym_jiminy.common.bases.generic_bases.ObserverControllerInterface
,gym.core.Env
Base class to train a robot in Gym OpenAI using a user-specified Python Jiminy engine for physics computations.
It creates an Gym environment wrapping Jiminy Engine and behaves like any other Gym environment.
The observation space is a dictionary gathering the current simulation time, the real robot state, and the sensors data. The action is a vector gathering the torques of the actuator of the robot.
There is no reward by default. It is up to the user to overload this class to implement one. It has been designed to be highly flexible and easy to customize by overloading it to fit the vast majority of users’ needs.
- Parameters
simulator (jiminy_py.simulator.Simulator) – Jiminy Python simulator used for physics computations. It must be fully initialized.
step_dt (float) – Simulation timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.
enforce_bounded_spaces (Optional[bool]) – Whether or not to enforce finite bounds for the observation and action spaces. If so, then ‘*_MAX’ are used whenever it is necessary. Note that whose bounds are very spread to make sure it is suitable for the vast majority of systems.
debug (bool) – Whether or not the debug mode must be enabled. Doing it enables telemetry recording.
kwargs (Any) – Extra keyword arguments that may be useful for derived environments with multiple inheritance, and to allow automatic pipeline wrapper generation.
- Return type
None
- simulator: jiminy_py.simulator.Simulator¶
- stepper_state: jiminy_py.core.StepperState¶
- system_state: jiminy_py.core.SystemState¶
- _action: DataNested¶
- _observation: DataNested¶
- _controller_handle(t, q, v, sensors_data, command)[source]¶
Thin wrapper around user-specified compute_command method.
Warning
This method is not supposed to be called manually nor overloaded.
- Parameters
t (float) –
q (numpy.ndarray) –
v (numpy.ndarray) –
sensors_data (jiminy_py.core.sensorsData) –
command (numpy.ndarray) –
- Return type
None
- _get_state_space(use_theoretical_model=None)[source]¶
Get state space.
This method is not meant to be overloaded in general since the definition of the state space is mostly consensual. One must rather overload _initialize_observation_space to customize the observation space as a whole.
- _get_sensors_space()[source]¶
Get sensor space.
It gathers the sensors data in a dictionary. It maps each available type of sensor to the associated data matrix. Rows correspond to the sensor type’s fields, and columns correspond to each individual sensor.
- Return type
gym.spaces.dict.Dict
- _initialize_action_space()[source]¶
Configure the action space of the environment.
The action is a vector gathering the torques of the actuator of the robot.
Warning
This method is called internally by reset method. It is not meant to be overloaded since the actual action space of the robot is uniquely defined.
- Return type
None
- reset(controller_hook=None)[source]¶
Reset the environment.
In practice, it resets the backend simulator and set the initial state of the robot. The initial state is obtained by calling ‘_sample_state’. This method is also in charge of setting the initial action (at the beginning) and observation (at the end).
Warning
It starts the simulation immediately. As a result, it is not possible to change the robot (included options), nor to register log variable. The only way to do so is via ‘controller_hook’.
- Parameters
controller_hook (Optional[Callable[[], Optional[Tuple[Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData], None]], Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData, numpy.ndarray], None]]]]]]) – Used internally for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.
- Returns
Initial observation of the episode.
- Return type
Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]
- seed(seed=None)[source]¶
Specify the seed of the environment.
Warning
It also resets the low-level jiminy Engine. Therefore one must call the reset method manually afterward.
- step(action=None)[source]¶
Run a simulation step for a given action.
- Parameters
action (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Action to perform. None to not update the action.
- Returns
Next observation, reward, status of the episode (done or not), and a dictionary of extra information
- Return type
Tuple[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray], float, bool, Dict[str, Any]]
- render(mode=None, **kwargs)[source]¶
Render the world.
- Parameters
mode (Optional[str]) – Rendering mode. It can be either ‘human’ to display the current simulation state, or ‘rgb_array’ to return a snapshot as an RGB array without showing it on the screen. Optional: ‘human’ by default if available, ‘rgb_array’ otherwise.
kwargs (Any) – Extra keyword arguments to forward to jiminy_py.simulator.Simulator.render method.
- Returns
RGB array if ‘mode’ is ‘rgb_array’, None otherwise.
- Return type
Optional[numpy.ndarray]
- plot(**kwargs)[source]¶
Display common simulation data and action over time.
- Parameters
kwargs (Any) – Extra keyword arguments to forward to simulator.plot.
- Return type
None
- replay(enable_travelling=True, **kwargs)[source]¶
Replay the current episode until now.
- Parameters
- Return type
None
- static play_interactive(env, enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)[source]¶
Activate interact mode enabling to control the robot using keyboard.
It stops automatically as soon as ‘done’ flag is True. One has to press a key to start the interaction. If no key is pressed, the action is not updated and the previous one keeps being sent to the robot.
Warning
This method requires _key_to_action method to be implemented by the user by overloading it, otherwise it raises an exception.
- Parameters
env (Union[gym_jiminy.common.envs.env_generic.BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.
enable_travelling (Optional[bool]) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.
start_paused (bool) – Whether or not to start in pause. Optional: Enabled by default.
verbose (bool) – Whether or not to display status messages.
kwargs (Any) – Extra keyword arguments to forward to _key_to_action method.
enable_is_done (bool) –
- Return type
None
- static evaluate(env, policy_fn, seed=None, horizon=None, enable_stats=True, enable_replay=True, **kwargs)[source]¶
Evaluate a policy on the environment over a complete episode.
- Parameters
env (Union[BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.
policy_fn (Callable[[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray], Optional[float]], Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Policy to evaluate as a callback function. It must have the following signature (**rew** = None at reset):
policy_fn(obs: DataNested,reward: Optional[float]) -> DataNested # actionseed (Optional[int]) – Seed of the environment to be used for the evaluation of the policy. Optional: Random seed if not provided.
horizon (Optional[int]) – Horizon of the simulation, namely maximum number of steps before termination. None to disable. Optional: Disabled by default.
enable_stats (bool) – Whether or not to print high-level statistics after simulation. Optional: Enabled by default.
enable_replay (bool) – Whether or not to enable replay of the simulation, and eventually recording if the extra keyword argument record_video_path is provided. Optional: Enabled by default.
kwargs (Any) – Extra keyword arguments to forward to the replay method if replay is requested.
- Return type
- train()[source]¶
Sets the environment in training mode.
Note
This mode is enabled by default.
- Return type
None
- eval()[source]¶
Sets the environment in evaluation mode.
This has any effect only on certain environment. See documentations of particular environment for details of their behaviors in training and evaluation modes, if they are affected. It can be used to activate clipping or some filtering of the action specifical at evaluation time.
- Return type
None
- _setup()[source]¶
Configure the environment. It must guarantee that its internal state is valid after calling this method.
By default, it enforces some options of the engine.
Warning
Beware this method is called BEFORE observe_dt and controller_dt are properly set, so one cannot rely on it at this point. Yet, step_dt is available and should always be. One can still access the low-level controller update period through engine_options[‘stepper’][‘controllerUpdatePeriod’].
Note
The user must overload this method to enforce custom observer update period, otherwise it will be the same of the controller.
Note
This method is called internally by reset methods.
- Return type
None
- _initialize_observation_space()[source]¶
Configure the observation of the environment.
By default, the observation is a dictionary gathering the current simulation time, the real robot state, and the sensors data.
Note
This method is called internally by reset method at the very end, just before computing and returning the initial observation. This method, alongside refresh_observation, must be overwritten in order to define a custom observation space.
- Return type
None
- _neutral()[source]¶
Returns a neutral valid configuration for the robot.
The default implementation returns the neutral configuration if valid, the “mean” configuration otherwise (right in the middle of the position lower and upper bounds).
Warning
Beware there is no guarantee for this configuration to be statically stable.
Note
This method is called internally by ‘_sample_state’ to generate the initial state. It can be overloaded to ensure static stability of the configuration.
- Return type
numpy.ndarray
- _sample_state()[source]¶
Returns a valid configuration and velocity for the robot.
The default implementation returns the neutral configuration and zero velocity.
Offsets are applied on the freeflyer to ensure no contact points are going through the ground and up to three are in contact.
Note
This method is called internally by reset to generate the initial state. It can be overloaded to act as a random state generator.
- Return type
Tuple[numpy.ndarray, numpy.ndarray]
- _initialize_buffers()[source]¶
Initialize internal buffers for fast access to shared memory or to avoid redundant computations.
Note
This method is called at reset, right after self.simulator.start. At this point, the simulation is running but refresh_observation has never been called, so that it can be used to initialize buffers involving the engine state but required to refresh the observation. Note that it is not appropriate to initialize buffers that would be used by compute_command.
Note
Buffers requiring manual update must be refreshed using _refresh_buffers method.
- Return type
None
- _refresh_buffers()[source]¶
Refresh internal buffers that must be updated manually.
Note
This method is called right after every internal engine.step, so it is the right place to update shared data between is_done and compute_reward.
Note
_initialize_buffers method can be used to initialize buffers that may requires special care.
Warning
Be careful when using it to update buffers that are used by refresh_observation. The later is called at self.observe_dt update period, while the others are called at self.step_dt update period. self.observe_dt is likely to be different from self.step_dt, unless configured manually when overloading _setup method.
- Return type
None
- refresh_observation()[source]¶
Compute the observation based on the current state of the robot.
Note
This method is called and the end of every low-level Engine.step.
Note
Note that np.nan values will be automatically clipped to 0.0 by get_observation method before return it, so it is valid.
Warning
In practice, it updates the internal buffer directly for the sake of efficiency.
As a side note, there is no way in the current implementation to discriminate the initialization of the observation buffer from the next one. The workaround is to check if the simulation already started. Even though it is not the same rigorously speaking, it does the job here since it is only about preserving efficiency.
- Return type
None
- compute_command(measure, action)[source]¶
Compute the motors efforts to apply on the robot.
By default, it does not perform any processing. One is responsible of overloading this method to clip the action if necessary to make sure it does not violate the lower and upper bounds.
Warning
There is not good place to initialize buffers that are necessary to compute the command. The only solution for now is to define initialization inside this method itself, using the safeguard if not self.simulator.is_simulation_running:.
- is_done(*args, **kwargs)[source]¶
Determine whether the episode is over.
By default, it returns True if the observation reaches or exceeds the lower or upper limit. It must be overloaded to implement a custom termination condition for the simulation.
Note
This method is called after refresh_observation, so that the internal buffer ‘_observation’ is up-to-date.
- _key_to_action(key, obs, reward, **kwargs)[source]¶
Mapping from input keyboard keys to actions.
Note
This method is called before step method systematically, even if not key has been pressed, or reward is not defined. In such a case, the value is None.
Note
The mapping can be state dependent, and the key can be used for something different than computing the action directly. For instance, one can provide as extra argument to this method a custom policy taking user parameters mapped to keyboard in input.
Warning
Overloading this method is required for calling play_interactive method.
- Parameters
key (Optional[str]) – Key pressed by the user as a string. None if no key has been pressed since the last step of the environment.
obs (Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]) – Previous observation from last step of the environment. It is always available, included right after reset.
reward (Optional[float]) – Previous reward from last step of the environment. Not available before first step right after reset.
kwargs (Any) – Extra keyword argument provided by the user when calling play_interactive method.
- Returns
Action to forward to the environment.
- Return type
Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]
- _observer_handle(t, q, v, sensors_data)¶
TODO Write documentation.
- Parameters
t (float) –
q (numpy.ndarray) –
v (numpy.ndarray) –
sensors_data (jiminy_py.core.sensorsData) –
- Return type
None
- action_space: gym.Space = None¶
- compute_reward(*args, info, **kwargs)¶
Compute reward at current episode state.
See ControllerInterface.compute_reward for details.
Note
This method is called after updating the internal buffer ‘_num_steps_beyond_done’, which is None if the simulation is not done, 0 right after, and so on.
- compute_reward_terminal(*, info)¶
Compute terminal reward at current episode final state.
Note
Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.
Warning
Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.
- get_observation()¶
Get post-processed observation.
By default, it does not perform any post-processing. One is responsible for clipping the observation if necessary to make sure it does not violate the lower and upper bounds. This can be done either by overloading this method, or in the case of pipeline design, by adding a clipping observation block at the very end.
Warning
In most cases, it is not necessary to overloaded this method, and doing so may lead to unexpected behavior if not done carefully.
- metadata = {'render.modes': []}¶
- observation_space: gym.Space = None¶
- reward_range = (-inf, inf)¶
- spec = None¶
- property unwrapped¶
Completely unwrap this env.
- Returns:
gym.Env: The base non-wrapped gym.Env instance
- class gym_jiminy.common.envs.env_generic.BaseJiminyGoalEnv(simulator, step_dt, debug=False)[source]¶
Bases:
gym_jiminy.common.envs.env_generic.BaseJiminyEnv
A goal-based environment. It functions just as any regular OpenAI Gym environment but it imposes a required structure on the observation_space. More concretely, the observation space is required to contain at least three elements, namely observation, desired_goal, and achieved_goal. Here, desired_goal specifies the goal that the agent should attempt to achieve. achieved_goal is the goal that it currently achieved instead. observation contains the actual observations of the environment as per usual.
- Parameters
simulator (Optional[jiminy_py.simulator.Simulator]) – Jiminy Python simulator used for physics computations. It must be fully initialized.
step_dt (float) – Simulation timestep for learning. Note that it is independent from the controller and observation update periods. The latter are configured via engine.set_options.
enforce_bounded_spaces – Whether or not to enforce finite bounds for the observation and action spaces. If so, then ‘*_MAX’ are used whenever it is necessary. Note that whose bounds are very spread to make sure it is suitable for the vast majority of systems.
debug (bool) – Whether or not the debug mode must be enabled. Doing it enables telemetry recording.
kwargs – Extra keyword arguments that may be useful for derived environments with multiple inheritance, and to allow automatic pipeline wrapper generation.
- Return type
None
- observation_space: gym.Space = None¶
- get_observation()[source]¶
Get post-processed observation.
It gathers the original observation from the environment with the currently achieved and desired goal, as a dictionary. See ObserverInterface.get_observation documentation for details.
- reset(controller_hook=None)[source]¶
Reset the environment.
In practice, it resets the backend simulator and set the initial state of the robot. The initial state is obtained by calling ‘_sample_state’. This method is also in charge of setting the initial action (at the beginning) and observation (at the end).
Warning
It starts the simulation immediately. As a result, it is not possible to change the robot (included options), nor to register log variable. The only way to do so is via ‘controller_hook’.
- Parameters
controller_hook (Optional[Callable[[], Optional[Tuple[Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData], None]], Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData, numpy.ndarray], None]]]]]]) – Used internally for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.
- Returns
Initial observation of the episode.
- Return type
Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]
- _get_goal_space()[source]¶
Get goal space.
Note
This method is called internally at init to define the observation space. It is called BEFORE super().reset so non goal-env-specific internal buffers are NOT up-to-date. This method must be overloaded while implementing a goal environment.
- Return type
gym.spaces.space.Space
- _sample_goal()[source]¶
Sample a goal randomly.
Note
This method is called internally by reset to sample the new desired goal that the agent will have to achieve. It is called BEFORE super().reset so non goal-env-specific internal buffers are NOT up-to-date. This method must be overloaded while implementing a goal environment.
- _get_achieved_goal()[source]¶
Compute the achieved goal based on current state of the robot.
Note
This method can be called by refresh_observation to get the currently achieved goal. This method must be overloaded while implementing a goal environment.
- is_done(achieved_goal=None, desired_goal=None)[source]¶
Determine whether a termination condition has been reached.
By default, it uses the termination condition inherited from normal environment.
Note
This method is called right after calling refresh_observation, so that the internal buffer ‘_observation’ is up-to-date. This method can be overloaded while implementing a goal environment.
- Parameters
achieved_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Achieved goal. If set to None, one is supposed to call _get_achieved_goal instead. Optional: None by default.
desired_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Desired goal. If set to None, one is supposed to use the internal buffer ‘_desired_goal’ instead. Optional: None by default.
- Return type
- compute_reward(achieved_goal=None, desired_goal=None, *, info)[source]¶
Compute the step reward. This externalizes the reward function and makes it dependent on a desired goal and the one that was achieved. If you wish to include additional rewards that are independent of the goal, you can include the necessary values to derive it in ‘info’ and compute it accordingly.
- Parameters
achieved_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Achieved goal. None to evalute the reward for currently achieved goal.
desired_goal (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Desired goal. None to evalute the reward for currently desired goal.
info (Dict[str, Any]) – Dictionary of extra information for monitoring.
- Returns
The reward that corresponds to the provided achieved goal wrt to the desired goal. The following should always hold true:
obs, reward, done, info = env.step() assert reward == env.compute_reward( obs['achieved_goal'], obs['desired_goal'], info=info)
- Return type
- _controller_handle(t, q, v, sensors_data, command)¶
Thin wrapper around user-specified compute_command method.
Warning
This method is not supposed to be called manually nor overloaded.
- Parameters
t (float) –
q (numpy.ndarray) –
v (numpy.ndarray) –
sensors_data (jiminy_py.core.sensorsData) –
command (numpy.ndarray) –
- Return type
None
- _get_sensors_space()¶
Get sensor space.
It gathers the sensors data in a dictionary. It maps each available type of sensor to the associated data matrix. Rows correspond to the sensor type’s fields, and columns correspond to each individual sensor.
- Return type
gym.spaces.dict.Dict
- _get_state_space(use_theoretical_model=None)¶
Get state space.
This method is not meant to be overloaded in general since the definition of the state space is mostly consensual. One must rather overload _initialize_observation_space to customize the observation space as a whole.
- _get_time_space()¶
Get time space.
- Return type
gym.spaces.box.Box
- _initialize_action_space()¶
Configure the action space of the environment.
The action is a vector gathering the torques of the actuator of the robot.
Warning
This method is called internally by reset method. It is not meant to be overloaded since the actual action space of the robot is uniquely defined.
- Return type
None
- _initialize_buffers()¶
Initialize internal buffers for fast access to shared memory or to avoid redundant computations.
Note
This method is called at reset, right after self.simulator.start. At this point, the simulation is running but refresh_observation has never been called, so that it can be used to initialize buffers involving the engine state but required to refresh the observation. Note that it is not appropriate to initialize buffers that would be used by compute_command.
Note
Buffers requiring manual update must be refreshed using _refresh_buffers method.
- Return type
None
- _initialize_observation_space()¶
Configure the observation of the environment.
By default, the observation is a dictionary gathering the current simulation time, the real robot state, and the sensors data.
Note
This method is called internally by reset method at the very end, just before computing and returning the initial observation. This method, alongside refresh_observation, must be overwritten in order to define a custom observation space.
- Return type
None
- _key_to_action(key, obs, reward, **kwargs)¶
Mapping from input keyboard keys to actions.
Note
This method is called before step method systematically, even if not key has been pressed, or reward is not defined. In such a case, the value is None.
Note
The mapping can be state dependent, and the key can be used for something different than computing the action directly. For instance, one can provide as extra argument to this method a custom policy taking user parameters mapped to keyboard in input.
Warning
Overloading this method is required for calling play_interactive method.
- Parameters
key (Optional[str]) – Key pressed by the user as a string. None if no key has been pressed since the last step of the environment.
obs (Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]) – Previous observation from last step of the environment. It is always available, included right after reset.
reward (Optional[float]) – Previous reward from last step of the environment. Not available before first step right after reset.
kwargs (Any) – Extra keyword argument provided by the user when calling play_interactive method.
- Returns
Action to forward to the environment.
- Return type
Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]
- _neutral()¶
Returns a neutral valid configuration for the robot.
The default implementation returns the neutral configuration if valid, the “mean” configuration otherwise (right in the middle of the position lower and upper bounds).
Warning
Beware there is no guarantee for this configuration to be statically stable.
Note
This method is called internally by ‘_sample_state’ to generate the initial state. It can be overloaded to ensure static stability of the configuration.
- Return type
numpy.ndarray
- _observer_handle(t, q, v, sensors_data)¶
TODO Write documentation.
- Parameters
t (float) –
q (numpy.ndarray) –
v (numpy.ndarray) –
sensors_data (jiminy_py.core.sensorsData) –
- Return type
None
- _refresh_buffers()¶
Refresh internal buffers that must be updated manually.
Note
This method is called right after every internal engine.step, so it is the right place to update shared data between is_done and compute_reward.
Note
_initialize_buffers method can be used to initialize buffers that may requires special care.
Warning
Be careful when using it to update buffers that are used by refresh_observation. The later is called at self.observe_dt update period, while the others are called at self.step_dt update period. self.observe_dt is likely to be different from self.step_dt, unless configured manually when overloading _setup method.
- Return type
None
- _sample_state()¶
Returns a valid configuration and velocity for the robot.
The default implementation returns the neutral configuration and zero velocity.
Offsets are applied on the freeflyer to ensure no contact points are going through the ground and up to three are in contact.
Note
This method is called internally by reset to generate the initial state. It can be overloaded to act as a random state generator.
- Return type
Tuple[numpy.ndarray, numpy.ndarray]
- _setup()¶
Configure the environment. It must guarantee that its internal state is valid after calling this method.
By default, it enforces some options of the engine.
Warning
Beware this method is called BEFORE observe_dt and controller_dt are properly set, so one cannot rely on it at this point. Yet, step_dt is available and should always be. One can still access the low-level controller update period through engine_options[‘stepper’][‘controllerUpdatePeriod’].
Note
The user must overload this method to enforce custom observer update period, otherwise it will be the same of the controller.
Note
This method is called internally by reset methods.
- Return type
None
- action_space: gym.Space = None¶
- close()¶
Terminate the Python Jiminy engine.
- Return type
None
- compute_command(measure, action)¶
Compute the motors efforts to apply on the robot.
By default, it does not perform any processing. One is responsible of overloading this method to clip the action if necessary to make sure it does not violate the lower and upper bounds.
Warning
There is not good place to initialize buffers that are necessary to compute the command. The only solution for now is to define initialization inside this method itself, using the safeguard if not self.simulator.is_simulation_running:.
- compute_reward_terminal(*, info)¶
Compute terminal reward at current episode final state.
Note
Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.
Warning
Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.
- eval()¶
Sets the environment in evaluation mode.
This has any effect only on certain environment. See documentations of particular environment for details of their behaviors in training and evaluation modes, if they are affected. It can be used to activate clipping or some filtering of the action specifical at evaluation time.
- Return type
None
- static evaluate(env, policy_fn, seed=None, horizon=None, enable_stats=True, enable_replay=True, **kwargs)¶
Evaluate a policy on the environment over a complete episode.
- Parameters
env (Union[BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.
policy_fn (Callable[[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray], Optional[float]], Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Policy to evaluate as a callback function. It must have the following signature (**rew** = None at reset):
policy_fn(obs: DataNested,reward: Optional[float]) -> DataNested # actionseed (Optional[int]) – Seed of the environment to be used for the evaluation of the policy. Optional: Random seed if not provided.
horizon (Optional[int]) – Horizon of the simulation, namely maximum number of steps before termination. None to disable. Optional: Disabled by default.
enable_stats (bool) – Whether or not to print high-level statistics after simulation. Optional: Enabled by default.
enable_replay (bool) – Whether or not to enable replay of the simulation, and eventually recording if the extra keyword argument record_video_path is provided. Optional: Enabled by default.
kwargs (Any) – Extra keyword arguments to forward to the replay method if replay is requested.
- Return type
- metadata = {'render.modes': []}¶
- static play_interactive(env, enable_travelling=None, start_paused=True, enable_is_done=True, verbose=True, **kwargs)¶
Activate interact mode enabling to control the robot using keyboard.
It stops automatically as soon as ‘done’ flag is True. One has to press a key to start the interaction. If no key is pressed, the action is not updated and the previous one keeps being sent to the robot.
Warning
This method requires _key_to_action method to be implemented by the user by overloading it, otherwise it raises an exception.
- Parameters
env (Union[gym_jiminy.common.envs.env_generic.BaseJiminyEnv, gym.core.Wrapper]) – BaseJiminyEnv environment instance to play with, eventually wrapped by composition, typically using gym.Wrapper.
enable_travelling (Optional[bool]) – Whether or not enable travelling, following the motion of the root frame of the model. This parameter is ignored if the model has no freeflyer. Optional: Enabled by default iif ‘panda3d’ viewer backend is used.
start_paused (bool) – Whether or not to start in pause. Optional: Enabled by default.
verbose (bool) – Whether or not to display status messages.
kwargs (Any) – Extra keyword arguments to forward to _key_to_action method.
enable_is_done (bool) –
- Return type
None
- plot(**kwargs)¶
Display common simulation data and action over time.
- Parameters
kwargs (Any) – Extra keyword arguments to forward to simulator.plot.
- Return type
None
- refresh_observation()¶
Compute the observation based on the current state of the robot.
Note
This method is called and the end of every low-level Engine.step.
Note
Note that np.nan values will be automatically clipped to 0.0 by get_observation method before return it, so it is valid.
Warning
In practice, it updates the internal buffer directly for the sake of efficiency.
As a side note, there is no way in the current implementation to discriminate the initialization of the observation buffer from the next one. The workaround is to check if the simulation already started. Even though it is not the same rigorously speaking, it does the job here since it is only about preserving efficiency.
- Return type
None
- register_variable(name, value, fieldnames=None, namespace=None)¶
TODO: Write documentation.
- render(mode=None, **kwargs)¶
Render the world.
- Parameters
mode (Optional[str]) – Rendering mode. It can be either ‘human’ to display the current simulation state, or ‘rgb_array’ to return a snapshot as an RGB array without showing it on the screen. Optional: ‘human’ by default if available, ‘rgb_array’ otherwise.
kwargs (Any) – Extra keyword arguments to forward to jiminy_py.simulator.Simulator.render method.
- Returns
RGB array if ‘mode’ is ‘rgb_array’, None otherwise.
- Return type
Optional[numpy.ndarray]
- replay(enable_travelling=True, **kwargs)¶
Replay the current episode until now.
- Parameters
- Return type
None
- reward_range = (-inf, inf)¶
- seed(seed=None)¶
Specify the seed of the environment.
Warning
It also resets the low-level jiminy Engine. Therefore one must call the reset method manually afterward.
- spec = None¶
- step(action=None)¶
Run a simulation step for a given action.
- Parameters
action (Optional[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray]]) – Action to perform. None to not update the action.
- Returns
Next observation, reward, status of the episode (done or not), and a dictionary of extra information
- Return type
Tuple[Union[Dict[str, StructNested], Sequence[StructNested], numpy.ndarray], float, bool, Dict[str, Any]]
- train()¶
Sets the environment in training mode.
Note
This mode is enabled by default.
- Return type
None
- property unwrapped¶
Completely unwrap this env.
- Returns:
gym.Env: The base non-wrapped gym.Env instance
- simulator: jiminy_py.simulator.Simulator¶
- stepper_state: jiminy_py.core.StepperState¶
- system_state: jiminy_py.core.SystemState¶
- _observation: DataNested¶
- _action: DataNested¶
- engine: jiminy.EngineMultiRobot¶
- _seed: List[np.uint32]¶