Generic¶

Generic reward components that may be relevant for any kind of robot, regardless its topology (multiple or single branch, fixed or floating base…) and the application (locomotion, grasping…).

class gym_jiminy.common.compositions.generic.SurviveReward(env)[source]¶

Bases: AbstractReward

Reward the agent for surviving, ie make episodes last as long as possible by avoiding triggering termination conditions.

Constant positive reward equal to 1.0 systematically, unless the current state of the environment is the terminal state. In which case, the value 0.0 is returned instead.

Parameters:: env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.

property is_terminal: bool | None¶

Whether the reward is terminal, non-terminal, or indefinite.

A reward is said to be “terminal” if only evaluated for the terminal state of the MDP, “non-terminal” if evaluated for all states except the terminal one, or indefinite if systematically evaluated no matter what.

All rewards are supposed to be indefinite unless stated otherwise by overloading this method. The responsibility of evaluating the reward only when necessary is delegated to compute. This allows for complex evaluation logics beyond terminal or non-terminal without restriction.

Note

Truncation is not consider the same as termination. The reward to not be evaluated in such a case, which means that it will never be for such episodes.

property is_normalized: bool¶: Whether the reward is guaranteed to be normalized, ie it is in range [0.0, 1.0].

compute(terminated, info)[source]¶

Return a constant positive reward equal to 1.0 systematically, useless the episode is terminated.

Parameters:

terminated (bool)
info (Dict[str, Any])

Return type:

float | None

class gym_jiminy.common.compositions.generic.TrackingQuantityReward(env, name, quantity_creator, cutoff, shape=KernelShape.SQUARED_EXPONENTIAL, *, op=<built-in function sub>, order=2)[source]¶

Bases: QuantityReward

Base class from which to derive reward defined as a difference between the current and reference value of a given quantity.

A reference trajectory must be selected before evaluating this reward otherwise an exception will be risen. See DatasetTrajectoryQuantity and AbstractQuantity documentations for details.

The error is transformed in a normalized reward to maximize by applying a given RBF kernel on the error. The reward will be 0.0 if the error cancels out completely and less than ‘CUTOFF_ESP’ above the user-specified cutoff threshold.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
name (str) – Desired name of the reward. This name will be used as key for storing current value of the reward in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.
quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ValueT]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.
cutoff (float) – Cutoff threshold for the RBF kernel transform.
shape (KernelShape) – Desired type of RBF kernel. Optional: KernelShape.SQUARED_EXPONENTIAL by default.
op (Callable[[ValueT, ValueT], ValueT]) – Any callable taking the true and reference values of the quantity as input argument and returning the difference between them, considering the algebra defined by their Lie Group. The basic subtraction operator operator.sub is appropriate for the Euclidean space. Optional: operator.sub by default.
order (int) – Order of L^p-norm that will be used as distance metric. Optional: 2 by default.

class gym_jiminy.common.compositions.generic.TrackingActuatedJointPositionsReward(env, cutoff, shape=KernelShape.SQUARED_EXPONENTIAL)[source]¶

Bases: TrackingQuantityReward

Reward the agent for tracking the position of all the actuated joints of the robot wrt some reference trajectory.

See also

See TrackingQuantityReward documentation for technical details.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
cutoff (float) – Cutoff threshold for the RBF kernel transform.
shape (KernelShape) – Desired type of RBF kernel. Optional: KernelShape.SQUARED_EXPONENTIAL by default.

class gym_jiminy.common.compositions.generic.MinimizeMechanicalPowerConsumption(env, cutoff, shape=KernelShape.SQUARED_EXPONENTIAL, *, horizon, generator_mode=EnergyGenerationMode.CHARGE)[source]¶

Bases: QuantityReward

Encourages the agent to minimize its average mechanical power consumption.

This reward is useful to favor carefully tailored short burst of motion that may be power-angry for a very short duration, but more energy efficient on average. The resulting motion tends to be more human-like.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
cutoff (float) – Cutoff threshold for the RBF kernel transform.
shape (KernelShape) – Desired type of RBF kernel. Optional: KernelShape.SQUARED_EXPONENTIAL by default.
horizon (float) – Horizon over which values of the quantity will be stacked before computing the average.
generator_mode (EnergyGenerationMode) – Specify what happens to the energy generated by motors when breaking. Optional: EnergyGenerationMode.CHARGE by default.

gym_jiminy.common.compositions.generic.compute_drift_error(delta_true, delta_ref)[source]¶

Compute the difference between the true and reference variation of a quantity over a given horizon, then apply some post-processing on it if requested.

Parameters:

delta_true (ndarray | float) – True value of the variation as a N-dimensional array.
delta_ref (ndarray | float) – Reference value of the variation as a N-dimensional array.

Return type:

float

class gym_jiminy.common.compositions.generic.DriftTrackingQuantityTermination(env, name, quantity_creator, thr, horizon, grace_period=0.0, *, op=<built-in function sub>, bounds_only=True, is_truncation=False, training_only=False)[source]¶

Bases: QuantityTermination

Base class to derive termination condition from the drift between the current and reference values of a given quantity over a horizon.

The drift is defined as the difference between the current and reference variation of the quantity over a sliding window of length ‘horizon’. See DeltaQuantity quantity for details.

In practice, no bound check is applied on the drift directly, which may be multi-variate at this point. Instead, the L2-norm is used as metric in the variation space for computing the error between the current and reference variation of the quantity.

If the error does not exceed the maximum threshold, then the episode continues. Otherwise, it is either truncated or terminated according to ‘is_truncation’ constructor argument. This check only applies after the end of a grace period. Before that, the episode continues no matter what.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
name (str) – Desired name of the termination condition. This name will be used as key for storing the current episode state from the perspective of this specific condition in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.
quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ndarray | number | float | int | bool | complex]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.
thr (float) – Termination is triggered if the drift exceeds this threshold.
horizon (float) – Horizon over which values of the quantity will be stacked before computing the drift.
grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.
op (Callable[[ndarray | Sequence[float | int | bool | complex | number], ndarray | Sequence[float | int | bool | complex | number]], ndarray | number | float | int | bool | complex]) – Any callable taking as input argument the current and some previous value of the quantity in that exact order, and returning the signed difference between them. Typically, the substraction operation is appropriate for position in Euclidean space, but not for orientation as it is important to count turns. Optional: sub by default.
bounds_only (bool) – Whether to compute the total variation as the difference between the most recent and oldest value stored in the history, or the sum of differences between successive timesteps.
is_truncation (bool) – Whether the episode should be considered terminated or truncated whenever the termination condition is triggered. Optional: False by default.
training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

gym_jiminy.common.compositions.generic.min_norm(values)[source]¶

Compute the minimum Euclidean norm over all timestamps of a multivariate time series.

Parameters:: values (ndarray) – Time series as a N-dimensional array whose last dimension corresponds to individual timestamps over a finite horizon. The value at each timestamp will be regarded as a 1D vector for computing their Euclidean norm.
Return type:: float

gym_jiminy.common.compositions.generic.compute_min_distance(op, left, right)[source]¶

Compute the minimum time-aligned Euclidean distance between two multivariate time series kept in sync.

Internally, the time-aligned difference between the two time series will first be computed according to the user-specified binary operator ‘op’. The classical Euclidean norm of the difference is then computed over all timestamps individually and the minimum value is returned.

Parameters:

left (ndarray) – Time series as a N-dimensional array whose first dimension corresponds to individual timestamps over a finite horizon. The value at each timestamp will be regarded as a 1D vector for computing their Euclidean norm. It will be passed as left-hand side of the binary operator ‘op’.
right (ndarray) – Time series as a N-dimensional array with the exact same shape as ‘left’. See ‘left’ for details. It will be passed as right-hand side of the binary operator ‘op’.
op (Callable[[ndarray, ndarray], ndarray])

Return type:

float

class gym_jiminy.common.compositions.generic.ShiftTrackingQuantityTermination(env, name, quantity_creator, thr, horizon, grace_period=0.0, *, op=<built-in function sub>, is_truncation=False, training_only=False)[source]¶

Bases: QuantityTermination

Base class to derive termination condition from the shift between the current and reference values of a given quantity.

The shift is defined as the minimum time-aligned distance (L^2-norm of the difference) between two multivariate time series. In this case, a variable-length horizon bounded by ‘max_stack’ is considered.

All elements must be within bounds for at least one time step in the fixed horizon. If so, then the episode continues, otherwise it is either truncated or terminated according to ‘is_truncation’ constructor argument. This only applies after the end of a grace period. Before that, the episode continues no matter what.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
name (str) – Desired name of the termination condition. This name will be used as key for storing the current episode state from the perspective of this specific condition in ‘info’, and to add the underlying quantity to the set of already managed quantities by the environment. As a result, it must be unique otherwise an exception will be raised.
quantity_creator (Callable[[QuantityEvalMode], Tuple[Type[InterfaceQuantity[ndarray | number | float | int | bool | complex]], Dict[str, Any]]]) – Any callable taking a quantity evaluation mode as input argument and return a tuple gathering the class of the underlying quantity to use as reward after some post-processing, plus any keyword-arguments of its constructor except ‘env’ and ‘parent’.
thr (float) – Termination is triggered if the shift exceeds this threshold.
horizon (float) – Horizon over which values of the quantity will be stacked before computing the shift.
grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.
op (Callable[[ndarray, ndarray], ndarray]) – Any callable taking the true and reference stacked values of the quantity as input argument and returning the difference between them, considering the algebra defined by their Lie Group. True and reference values are stacked in contiguous N-dimension arrays along the first axis, namely the first dimension gathers individual timesteps. For instance, the common subtraction operator operator.sub is appropriate for Euclidean space. Optional: operator.sub by default.
is_truncation (bool) – Whether the episode should be considered terminated or truncated whenever the termination condition is triggered. Optional: False by default.
training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

class gym_jiminy.common.compositions.generic._MultiActuatedJointBoundDistance(env, parent)[source]¶

Bases: InterfaceQuantity[Tuple[ndarray, ndarray]]

Distance of the actuated joints from their respective lower and upper mechanical stops.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
parent (InterfaceQuantity | None) – Higher-level quantity from which this quantity is a requirement if any, None otherwise.
mode – Desired mode of evaluation for this quantity.

initialize()[source]¶

Initialize internal buffers.

This is typically useful to refresh shared memory proxies or to re-initialize pre-allocated buffers.

Warning

Intermediary quantities ‘requirements’ are NOT initialized automatically because they can be initialized lazily in most cases, or are optional depending on the most efficient computation path at run-time. It is up to the developer implementing quantities to take care of it.

Note

This method must be called before starting a new episode.

Note

Lazy-initialization is used for efficiency, ie initialize will be called before the first time refresh has to be called, which may never be the case if cache is shared between multiple identical instances of the same quantity.

Return type:: None

refresh()[source]¶

Evaluate this quantity based on the agent state at the end of the current agent step.

Return type:: Tuple[ndarray, ndarray]

class gym_jiminy.common.compositions.generic.MechanicalSafetyTermination(env, position_margin, velocity_max, grace_period=0.0, *, training_only=False)[source]¶

Bases: AbstractTerminationCondition

Discouraging the agent from hitting the mechanical stops by immediately terminating the episode if the articulated joints approach them at excessive speed.

Hitting the lower and upper mechanical stops is inconvenient but forbidding it completely is not desirable as it induces safety margins that constrain the problem too strictly. This is particularly true when the maximum motor torque becomes increasingly limited and PD controllers are being used for low-level motor control, which turns out to be the case in most instances. Overall, such an hard constraint would impede performance while completing the task successfully remains the highest priority. Still, the impact velocity must be restricted to prevent destructive damage. It is recommended to estimate an acceptable thresholdfrom real experimental data.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
position_margin (float) – Distance of actuated joints from their respective mechanical bounds below which their speed is being watched.
velocity_max (float) – Maximum velocity above which further approaching the mechanical stops triggers termination when watched for being close from them.
grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.
training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.

compute(info)[source]¶

Evaluate the termination condition.

The underlying quantity is first evaluated. The episode continues if its value is within bounds, otherwise the episode is either truncated or terminated according to ‘is_truncation’.

Warning

This method is not meant to be overloaded.

Parameters:: info (Dict[str, Any])
Return type:: bool

class gym_jiminy.common.compositions.generic.MechanicalPowerConsumptionTermination(env, max_power, horizon=None, generator_mode=EnergyGenerationMode.CHARGE, grace_period=0.0, *, training_only=False)[source]¶

Bases: QuantityTermination

Terminate the episode immediately if the average mechanical power consumption is too high.

High power consumption is undesirable as it means that the motion is suboptimal and probably unnatural and fragile. Moreover, it helps to accommodate hardware capability to avoid motor overheating while increasing battery autonomy and lifespan. Finally, it may be necessary to deal with some hardware limitations on max power drain.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
max_power (float) – Maximum average mechanical power consumption applied on any of the contact points or collision bodies above which termination is triggered.
grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.
horizon (float | None) – Horizon over which values of the quantity will be stacked before computing the average. None to consider the instantaneous power consumption. Optional: None by default.
training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.
generator_mode (EnergyGenerationMode)

class gym_jiminy.common.compositions.generic.ShiftTrackingMotorPositionsTermination(env, thr, horizon, grace_period=0.0, *, training_only=False)[source]¶

Bases: ShiftTrackingQuantityTermination

Terminate the episode if the selected reference trajectory is not tracked with expected accuracy regarding the actuated joint positions, whatever the timestep being considered over some fixed-size sliding window.

The robot must track the reference if there is no hazard, only applying minor corrections to keep balance. Rewarding the agent for doing so is not effective as favoring robustness remains more profitable. Indeed, it would anticipate disturbances, lowering its current reward to maximize the future return, primarily averting termination. Limiting the shift over a given horizon allows for large deviations to handle strong pushes. Moreover, assuming that the agent is not able to keep track of the time flow, which means that only the observation at the current step is provided to the agent and o stateful network architecture such as LSTM is being used, restricting the shift also urges to do what it takes to get back to normal as soon as possible for fear of triggering termination, as it may happen any time the deviation is above the maximum acceptable shift, irrespective of its scale.

Parameters:

env (InterfaceJiminyEnv) – Base or wrapped jiminy environment.
thr (float) – Maximum shift above which termination is triggered.
horizon (float) – Horizon over which values of the quantity will be stacked before computing the shift.
grace_period (float) – Grace period effective only at the very beginning of the episode, during which the latter is bound to continue whatever happens. Optional: 0.0 by default.
training_only (bool) – Whether the termination condition should be completely by-passed if the environment is in evaluation mode. Optional: False by default.