Generic

Generic reward components that may be relevant for any kind of robot, regardless its topology (multiple or single branch, fixed or floating base…) and the application (locomotion, grasping…).

class gym_jiminy.common.compositions.generic.SurviveReward(env)[source]

Bases: AbstractReward

Reward the agent for surviving, ie make episodes last as long as possible by avoiding triggering termination conditions.

Constant positive reward equal to 1.0 systematically, unless the current state of the environment is the terminal state. In which case, the value 0.0 is returned instead.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

property is_terminal: bool | None

Whether the reward is terminal, non-terminal, or indefinite.

A reward is said to be “terminal” if only evaluated for the terminal state of the MDP, “non-terminal” if evaluated for all states except the terminal one, or indefinite if systematically evaluated no matter what.

All rewards are supposed to be indefinite unless stated otherwise by overloading this method. The responsibility of evaluating the reward only when necessary is delegated to compute. This allows for complex evaluation logics beyond terminal or non-terminal without restriction.

Note

Truncation is not consider the same as termination. The reward to not be evaluated in such a case, which means that it will never be for such episodes.

property is_normalized: bool

Whether the reward is guaranteed to be normalized, ie it is in range [0.0, 1.0].

compute(terminated, info)[source]

Return a constant positive reward equal to 1.0 no matter what.

Parameters:
Return type:

float | None

_abc_impl = <_abc._abc_data object>
property name: str

Name uniquely identifying every reward.

It will be used as key not only for storing reward-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.TrackingQuantityReward(env, name, quantity_creator, cutoff, *, op=<built-in function sub>, order=2)[source]

Bases: QuantityReward

Base class from which to derive reward defined as a difference between the current and reference value of a given quantity.

A reference trajectory must be selected before evaluating this reward otherwise an exception will be risen. See DatasetTrajectoryQuantity and AbstractQuantity documentations for details.

The error is transformed in a normalized reward to maximize by applying RBF kernel on the error. The reward will be 0.0 if the error cancels out completely and less than ‘CUTOFF_ESP’ above the user-specified cutoff threshold.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • name (str) – Desired name of the reward. This name will be used as key for storing current value of the reward in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.

  • quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ValueT]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.

  • cutoff (float) – Cutoff threshold for the RBF kernel transform.

  • op (Callable[[ValueT, ValueT], ValueT]) – Any callable taking the true and reference values of the quantity as input argument and returning the difference between them, considering the algebra defined by their Lie Group. The basic subtraction operator operator.sub is appropriate for the Euclidean space. Optional: operator.sub by default.

  • order (int) – Order of L^p-norm that will be used as distance metric. Optional: 2 by default.

_abc_impl = <_abc._abc_data object>
_is_protocol = False
compute(terminated, info)

Compute the reward if necessary depending on whether the reward and state are terminal. If so, then first evaluate the underlying quantity, next apply post-processing if requested.

Warning

This method is not meant to be overloaded.

Returns:

Scalar value if the reward was evaluated, None otherwise.

Parameters:
Return type:

float | None

property is_normalized: bool

Whether the reward is guaranteed to be normalized, ie it is in range [0.0, 1.0].

property is_terminal: bool | None

Whether the reward is terminal, non-terminal, or indefinite.

A reward is said to be “terminal” if only evaluated for the terminal state of the MDP, “non-terminal” if evaluated for all states except the terminal one, or indefinite if systematically evaluated no matter what.

All rewards are supposed to be indefinite unless stated otherwise by overloading this method. The responsibility of evaluating the reward only when necessary is delegated to compute. This allows for complex evaluation logics beyond terminal or non-terminal without restriction.

Note

Truncation is not consider the same as termination. The reward to not be evaluated in such a case, which means that it will never be for such episodes.

property name: str

Name uniquely identifying every reward.

It will be used as key not only for storing reward-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.TrackingActuatedJointPositionsReward(env, cutoff)[source]

Bases: TrackingQuantityReward

Reward the agent for tracking the position of all the actuated joints of the robot wrt some reference trajectory.

See also

See TrackingQuantityReward documentation for technical details.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • cutoff (float) – Cutoff threshold for the RBF kernel transform.

_abc_impl = <_abc._abc_data object>
_is_protocol = False
compute(terminated, info)

Compute the reward if necessary depending on whether the reward and state are terminal. If so, then first evaluate the underlying quantity, next apply post-processing if requested.

Warning

This method is not meant to be overloaded.

Returns:

Scalar value if the reward was evaluated, None otherwise.

Parameters:
Return type:

float | None

property is_normalized: bool

Whether the reward is guaranteed to be normalized, ie it is in range [0.0, 1.0].

property is_terminal: bool | None

Whether the reward is terminal, non-terminal, or indefinite.

A reward is said to be “terminal” if only evaluated for the terminal state of the MDP, “non-terminal” if evaluated for all states except the terminal one, or indefinite if systematically evaluated no matter what.

All rewards are supposed to be indefinite unless stated otherwise by overloading this method. The responsibility of evaluating the reward only when necessary is delegated to compute. This allows for complex evaluation logics beyond terminal or non-terminal without restriction.

Note

Truncation is not consider the same as termination. The reward to not be evaluated in such a case, which means that it will never be for such episodes.

property name: str

Name uniquely identifying every reward.

It will be used as key not only for storing reward-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.DriftTrackingQuantityTermination(env, name, quantity_creator, low, high, horizon, grace_period=0.0, *, op=<built-in function sub>, post_fn=None, is_truncation=False, is_training_only=False)[source]

Bases: QuantityTermination

Base class to derive termination condition from the difference between the current and reference drift of a given quantity.

The drift is defined as the difference between the most recent and oldest values of a time series. In this case, a variable-length horizon bounded by ‘max_stack’ is considered.

All elements must be within bounds for at least one time step in the fixed horizon. If so, then the episode continues, otherwise it is either truncated or terminated according to ‘is_truncation’ constructor argument. This only applies after the end of a grace period. Before that, the episode continues no matter what.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • name (str) – Desired name of the termination condition. This name will be used as key for storing the current episode state from the perspective of this specific condition in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.

  • quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ndarray | number | float | int | bool | complex]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.

  • low (ndarray | number | float | int | bool | complex | Sequence[float | int | bool | complex | number] | None) – Lower bound below which termination is triggered.

  • high (ndarray | number | float | int | bool | complex | Sequence[float | int | bool | complex | number] | None) – Upper bound above which termination is triggered.

  • horizon (float) – Horizon over which values of the quantity will be stacked before computing the drift.

  • grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.

  • op (Callable[[ndarray | number | float | int | bool | complex, ndarray | number | float | int | bool | complex], ndarray | number | float | int | bool | complex]) – Any callable taking the true and reference values of the quantity as input argument and returning the difference between them, considering the algebra defined by their Lie Group. The basic subtraction operator operator.sub is appropriate for Euclidean space. Optional: operator.sub by default.

  • is_truncation (bool) – Whether the episode should be considered terminated or truncated whenever the termination condition is triggered. Optional: False by default.

  • is_training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

  • post_fn (Callable[[ndarray | number | float | int | bool | complex], ndarray | number | float | int | bool | complex] | None)

Apram post_fn:

Optional callable taking the true and reference drifts of the quantity as input argument and returning some post-processed value to which bound checking will be applied. None to skip post-processing entirely. Optional: None by default.

_compute_drift_error(left, right)[source]

Compute the difference between the true and reference drift over a given horizon, then apply some post-processing on it if requested.

Parameters:
  • left (ndarray) – True value of the drift as a N-dimensional array.

  • right (ndarray) – Reference value of the drift as a N-dimensional array.

Return type:

ndarray | number | float | int | bool | complex

_abc_impl = <_abc._abc_data object>
_is_protocol = False
compute(info)

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if all the elements of its value are within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:

info (Dict[str, Any])

Return type:

bool

property name: str

Name uniquely identifying every termination condition.

It will be used as key not only for storing termination condition-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.ShiftTrackingQuantityTermination(env, name, quantity_creator, thr, horizon, grace_period=0.0, *, op=<built-in function sub>, is_truncation=False, is_training_only=False)[source]

Bases: QuantityTermination[ndarray]

Base class to derive termination condition from the shift between the current and reference values of a given quantity.

The shift is defined as the minimum time-aligned distance (L^2-norm of the difference) between two multivariate time series. In this case, a variable-length horizon bounded by ‘max_stack’ is considered.

All elements must be within bounds for at least one time step in the fixed horizon. If so, then the episode continues, otherwise it is either truncated or terminated according to ‘is_truncation’ constructor argument. This only applies after the end of a grace period. Before that, the episode continues no matter what.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • name (str) – Desired name of the termination condition. This name will be used as key for storing the current episode state from the perspective of this specific condition in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.

  • quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ndarray | number | float | int | bool | complex]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.

  • thr (float) – Termination is triggered if the shift exceeds this threshold.

  • horizon (float) – Horizon over which values of the quantity will be stacked before computing the shift.

  • grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.

  • op (Callable[[ndarray, ndarray], ndarray]) – Any callable taking the true and reference stacked values of the quantity as input argument and returning the difference between them, considering the algebra defined by their Lie Group. True and reference values are stacked in contiguous N-dimension arrays along the first axis, namely the first dimension gathers individual timesteps. For instance, the common subtraction operator operator.sub is appropriate for Euclidean space. Optional: operator.sub by default.

  • order – Order of L^p-norm that will be used as distance metric.

  • is_truncation (bool) – Whether the episode should be considered terminated or truncated whenever the termination condition is triggered. Optional: False by default.

  • is_training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

_compute_min_distance(left, right)[source]

Compute the minimum time-aligned Euclidean distance between two multivariate time series kept in sync.

Internally, the time-aligned difference between the two time series will first be computed according to the user-specified binary operator ‘op’. The classical Euclidean norm of the difference is then computed over all timestamps individually and the minimum value is returned.

Parameters:
  • left (ndarray) – Time series as a N-dimensional array whose first dimension corresponds to individual timestamps over a finite horizon. The value at each timestamp will be regarded as a 1D vector for computing their Euclidean norm. It will be passed as left-hand side of the binary operator ‘op’.

  • right (ndarray) – Time series as a N-dimensional array with the exact same shape as ‘left’. See ‘left’ for details. It will be passed as right-hand side of the binary operator ‘op’.

Return type:

float

_abc_impl = <_abc._abc_data object>
_is_protocol = False
compute(info)

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if all the elements of its value are within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:

info (Dict[str, Any])

Return type:

bool

property name: str

Name uniquely identifying every termination condition.

It will be used as key not only for storing termination condition-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic._MultiActuatedJointBoundDistance(env, parent)[source]

Bases: InterfaceQuantity[Tuple[ndarray, ndarray]]

Distance of the actuated joints from their respective lower and upper mechanical stops.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • parent (InterfaceQuantity | None) – Higher-level quantity from which this quantity is a requirement if any, None otherwise.

  • mode – Desired mode of evaluation for this quantity.

initialize()[source]

Initialize internal buffers.

This is typically useful to refresh shared memory proxies or to re-initialize pre-allocated buffers.

Warning

Intermediary quantities ‘requirements’ are NOT initialized automatically because they can be initialized lazily in most cases, or are optional depending on the most efficient computation path at run-time. It is up to the developer implementing quantities to take care of it.

Note

This method must be called before starting a new episode.

Note

Lazy-initialization is used for efficiency, ie initialize will be called before the first time refresh has to be called, which may never be the case if cache is shared between multiple identical instances of the same quantity.

Return type:

None

refresh()[source]

Evaluate this quantity based on the agent state at the end of the current agent step.

Return type:

Tuple[ndarray, ndarray]

_abc_impl = <_abc._abc_data object>
_is_protocol = False
allow_update_graph: ClassVar[bool] = True

Whether dynamic computation graph update is allowed. This implies that the quantity can be reset at any point in time to re-compute the optimal computation path, typically after deletion or addition of some other node to its dependent sub-graph. When this happens, the quantity gets reset on the spot, even if a simulation is already running. This is not always acceptable, hence the capability to disable this feature at class-level.

property cache: SharedCache[ValueT]

Get shared cache if available, otherwise raises an exception.

Warning

This method is not meant to be overloaded.

get()

Get cached value of requested quantity if available, otherwise evaluate it and store it in cache.

This quantity is considered active as soon as this method has been called at least once since previous tracking reset. The method is_active will be return true even before calling initialize.

Warning

This method is not meant to be overloaded.

Return type:

ValueT

is_active(any_cache_owner=False)

Whether this quantity is considered active, namely initialize has been called at least once since previous tracking reset.

Parameters:
  • any_owner – False to check only if this exact instance is active, True if any of the identical quantities (sharing the same cache) is considered sufficient. Optional: False by default.

  • any_cache_owner (bool)

Return type:

bool

reset(reset_tracking=False, *, ignore_other_instances=False)

Consider that the quantity must be re-initialized before being evaluated once again.

If shared cache is available, then it will be cleared and all identity quantities will jointly be reset.

Note

This method must be called right before performing any agent step, otherwise this quantity will not be refreshed if it was evaluated previously.

Warning

This method is not meant to be overloaded.

Parameters:
  • reset_tracking (bool) – Do not consider this quantity as active anymore until the get method gets called once again. Optional: False by default.

  • ignore_other_instances (bool) – Whether to skip reset of intermediary quantities as well as any shared cache co-owner quantity instances. Optional: False by default.

Return type:

None

requirements: Dict[str, InterfaceQuantity]

Intermediary quantities on which this quantity may rely on for its evaluation at some point, depending on the optimal computation path at runtime. They will be exposed to the user as usual attributes.

_cache: SharedCache[ValueT] | None
_is_initialized: bool
class gym_jiminy.common.compositions.generic.MechanicalSafetyTermination(env, position_margin, velocity_max, grace_period=0.0, *, is_training_only=False)[source]

Bases: AbstractTerminationCondition

Discouraging the agent from hitting the mechanical stops by immediately terminating the episode if the articulated joints approach them at excessive speed.

Hitting the lower and upper mechanical stops is inconvenient but forbidding it completely is not desirable as it induces safety margins that constrain the problem too strictly. This is particularly true when the maximum motor torque becomes increasingly limited and PD controllers are being used for low-level motor control, which turns out to be the case in most instances. Overall, such an hard constraint would impede performance while completing the task successfully remains the highest priority. Still, the impact velocity must be restricted to prevent destructive damage. It is recommended to estimate an acceptable thresholdfrom real experimental data.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • position_margin (float) – Distance of actuated joints from their respective mechanical bounds below which their speed is being watched.

  • velocity_max (float) – Maximum velocity above which further approaching the mechanical stops triggers termination when watched for being close from them.

  • grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.

  • is_training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

compute(info)[source]

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if its value is within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:

info (Dict[str, Any])

Return type:

bool

_abc_impl = <_abc._abc_data object>
property name: str

Name uniquely identifying every termination condition.

It will be used as key not only for storing termination condition-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.MechanicalPowerConsumptionTermination(env, max_power, horizon, generator_mode=EnergyGenerationMode.CHARGE, grace_period=0.0, *, is_training_only=False)[source]

Bases: QuantityTermination

Terminate the episode immediately if the average mechanical power consumption is too high.

High power consumption is undesirable as it means that the motion is suboptimal and probably unnatural and fragile. Moreover, it helps to accommodate hardware capability to avoid motor overheating while increasing battery autonomy and lifespan. Finally, it may be necessary to deal with some hardware limitations on max power drain.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • max_power (float) – Maximum average mechanical power consumption applied on any of the contact points or collision bodies above which termination is triggered.

  • grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.

  • horizon (float) – Horizon over which values of the quantity will be stacked before computing the average.

  • is_training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

  • generator_mode (EnergyGenerationMode)

_abc_impl = <_abc._abc_data object>
_is_protocol = False
compute(info)

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if all the elements of its value are within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:

info (Dict[str, Any])

Return type:

bool

property name: str

Name uniquely identifying every termination condition.

It will be used as key not only for storing termination condition-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.

class gym_jiminy.common.compositions.generic.ShiftTrackingMotorPositionsTermination(env, thr, horizon, grace_period=0.0, *, is_training_only=False)[source]

Bases: ShiftTrackingQuantityTermination

Terminate the episode if the selected reference trajectory is not tracked with expected accuracy regarding the actuated joint positions, whatever the timestep being considered over some fixed-size sliding window.

The robot must track the reference if there is no hazard, only applying minor corrections to keep balance. Rewarding the agent for doing so is not effective as favoring robustness remains more profitable. Indeed, it would anticipate disturbances, lowering its current reward to maximize the future return, primarily averting termination. Limiting the shift over a given horizon allows for large deviations to handle strong pushes. Moreover, assuming that the agent is not able to keep track of the time flow, which means that only the observation at the current step is provided to the agent and o stateful network architecture such as LSTM is being used, restricting the shift also urges to do what it takes to get back to normal as soon as possible for fear of triggering termination, as it may happen any time the deviation is above the maximum acceptable shift, irrespective of its scale.

Parameters:
  • env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

  • thr (float) – Maximum shift above which termination is triggered.

  • horizon (float) – Horizon over which values of the quantity will be stacked before computing the shift.

  • grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.

  • is_training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

_abc_impl = <_abc._abc_data object>
_compute_min_distance(left, right)

Compute the minimum time-aligned Euclidean distance between two multivariate time series kept in sync.

Internally, the time-aligned difference between the two time series will first be computed according to the user-specified binary operator ‘op’. The classical Euclidean norm of the difference is then computed over all timestamps individually and the minimum value is returned.

Parameters:
  • left (ndarray) – Time series as a N-dimensional array whose first dimension corresponds to individual timestamps over a finite horizon. The value at each timestamp will be regarded as a 1D vector for computing their Euclidean norm. It will be passed as left-hand side of the binary operator ‘op’.

  • right (ndarray) – Time series as a N-dimensional array with the exact same shape as ‘left’. See ‘left’ for details. It will be passed as right-hand side of the binary operator ‘op’.

Return type:

float

_is_protocol = False
compute(info)

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if all the elements of its value are within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:

info (Dict[str, Any])

Return type:

bool

property name: str

Name uniquely identifying every termination condition.

It will be used as key not only for storing termination condition-specific monitoring and debugging information in ‘info’, but also for adding the underlying quantity to the ones already managed by the environment.