Frames Stacking

TODO: Write documentation.

class gym_jiminy.common.wrappers.frame_stack.FilteredFrameStack(env, num_stack, nested_filter_keys=None, **kwargs)[source]

Bases: gym.core.Wrapper

Observation wrapper that stacks filtered observations in a rolling manner.

It combines and extends OpenAI Gym wrappers FrameStack and FilterObservation to support nested filter keys.

Note

The observation space must be gym.spaces.Dict, while, ultimately, stacked leaf fields must be gym.spaces.Box.

Parameters
  • env (gym.core.Env) – Environment to wrap.

  • nested_filter_keys (Optional[Sequence[Union[Sequence[str], str]]]) – List of nested observation fields to stack. Those fields does not have to be leaves. If not, then every leaves fields from this root will be stacked.

  • num_stack (int) – Number of observation frames to partially stack.

  • kwargs (Any) – Extra keyword arguments to allow automatic pipeline wrapper generation.

observation_space = None
_setup()[source]

TODO: Write documentation.

Return type

None

observation(observation)[source]

TODO: Write documentation.

Parameters

observation (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) –

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

compute_observation(measure)[source]

TODO: Write documentation.

Parameters

measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) –

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:

action (object): an action provided by the agent

Returns:

observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Parameters

action (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) –

Return type

Tuple[Union[Dict[str, StructNested], Sequence[StructNested], ValueType], float, bool, Dict[str, Any]]

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:

observation (object): the initial observation.

Parameters

kwargs (Any) –

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

action_space = None
classmethod class_name()
close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

compute_reward(achieved_goal, desired_goal, info)
metadata = {'render.modes': []}
render(mode='human', **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:
Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reward_range = (-inf, inf)
seed(seed=None)

Sets the seed for this env’s random number generator(s).

Note:

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns:
list<bigint>: Returns the list of seeds used in this env’s random

number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

property spec
property unwrapped

Completely unwrap this env.

Returns:

gym.Env: The base non-wrapped gym.Env instance

class gym_jiminy.common.wrappers.frame_stack.StackedJiminyEnv(env, skip_frames_ratio=0, **kwargs)[source]

Bases: gym_jiminy.common.bases.pipeline_bases.BasePipelineWrapper

TODO: Write documentation.

TODO: Write documentation.

Parameters
  • env (gym.core.Env) –

  • skip_frames_ratio (int) –

  • kwargs (Any) –

Return type

None

action_space: Optional[gym.Space] = None
observation_space: Optional[gym.Space] = None
_action: Optional[DataNested]
_observation: Optional[DataNested]
_setup()[source]

Configure the wrapper.

By default, it only resets some internal buffers.

Note

This method must be called once, after the environment has been reset. This is done automatically when calling reset method.

Return type

None

refresh_observation()[source]

Compute the unified observation.

By default, it forwards the observation computed by the environment.

Parameters

measure – Observation of the environment.

Return type

None

_controller_handle(t, q, v, sensors_data, command)

Thin wrapper around user-specified compute_command method.

Warning

This method is not supposed to be called manually nor overloaded.

Parameters
Return type

None

_get_block_index()

Get the index of the block. It corresponds the “deepness” of the block, namely how many blocks deriving from the same wrapper type than the current one are already wrapped in the environment.

Return type

int

_observer_handle(t, q, v, sensors_data)

TODO Write documentation.

Parameters
Return type

None

_refresh_action_space()

Configure the action space of the controller.

Note

This method is called right after _setup, so that both the environment to control and the controller itself should be already initialized.

Return type

None

_refresh_observation_space()

Configure the observation space.

Return type

None

classmethod class_name()
close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

compute_command(measure, action)

Compute the motors efforts to apply on the robot.

By default, it forwards the command computed by the environment.

Parameters
  • measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Observation of the environment.

  • action (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Target to achieve.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

compute_reward(*args, info, **kwargs)

Compute reward at current episode state.

See ControllerInterface.compute_reward for details.

Note

This method is called after updating the internal buffer ‘_num_steps_beyond_done’, which is None if the simulation is not done, 0 right after, and so on.

Parameters
  • args (Any) – Extra arguments that may be useful for derived environments, for example Gym.GoalEnv.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Returns

Total reward.

Return type

float

compute_reward_terminal(*, info)

Compute terminal reward at current episode final state.

Note

Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.

Warning

Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.

Parameters

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Terminal reward.

Return type

float

get_observation()

Get post-processed observation.

It performs a recursive shallow copy of the observation.

Warning

This method is not supposed to be called manually nor overloaded.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

metadata = {'render.modes': []}
render(mode='human', **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:
Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reset(controller_hook=None, **kwargs)

Reset the unified environment.

In practice, it resets the environment and initializes the generic pipeline internal buffers through the use of ‘controller_hook’.

Parameters
  • controller_hook (Optional[Callable[Optional[Tuple[Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData], None]], Optional[Callable[[float, numpy.ndarray, numpy.ndarray, jiminy_py.core.sensorsData, numpy.ndarray], None]]]]]]) – Used internally for chaining multiple BasePipelineWrapper. It is not meant to be defined manually. Optional: None by default.

  • kwargs (Any) – Extra keyword arguments to comply with OpenAI Gym API.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

reward_range = (-inf, inf)
seed(seed=None)

Sets the seed for this env’s random number generator(s).

Note:

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns:
list<bigint>: Returns the list of seeds used in this env’s random

number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

property spec
step(action=None)

Run a simulation step for a given action.

Parameters

action (Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]) – Next action to perform. None to not update it.

Returns

Next observation, reward, status of the episode (done or not), and a dictionary of extra information.

Return type

Tuple[Union[Dict[str, StructNested], Sequence[StructNested], ValueType], float, bool, Dict[str, Any]]

property unwrapped

Completely unwrap this env.

Returns:

gym.Env: The base non-wrapped gym.Env instance

env: Union[gym.core.Wrapper, gym_jiminy.common.envs.env_generic.BaseJiminyEnv]
simulator: Simulator
stepper_state: jiminy.StepperState
system_state: jiminy.SystemState
sensors_data: jiminy.sensorsData
_dt_eps: Optional[float]
observe_dt: float
control_dt: float