Proportional Derivative Controller¶
Implementation of basic Proportional-Derivative controller block compatible with gym_jiminy reinforcement learning pipeline environment design.
- gym_jiminy.common.blocks.proportional_derivative_controller.integrate_zoh(state, state_min, state_max, dt)[source]¶
Second order approximate integration scheme assuming Zero-Order-Hold for the acceleration, taking position, velocity and acceleration bounds into account.
Internally, it simply chains two first-order integrators in cascade. The acceleration will be updated in-place if clipping is necessary to satisfy bounds.
- Parameters:
state (ndarray) – Current state, ordered from lowest to highest derivative order, ie: s[i](t) = s[i](t-1) + integ_{t-1}^{t}(s[i+1](t)), as a 2D array whose first dimension gathers the 3 derivative orders. It will be updated in-place.
state_min (ndarray) – Lower bounds of the state as a 2D array.
state_max (ndarray) – Upper bounds of the state as a 2D array.
dt (float) – Integration delta of time since previous state update.
- Return type:
None
- gym_jiminy.common.blocks.proportional_derivative_controller.pd_controller(q_measured, v_measured, command_state, command_state_lower, command_state_upper, kp, kd, motors_effort_limit, control_dt, out)[source]¶
Compute command torques under discrete-time proportional-derivative feedback control.
Internally, it integrates the command state (position, velocity and acceleration) at controller update period in order to obtain the desired motor positions ‘q_desired’ and velocities ‘v_desired’, takes into account some lower and upper bounds. By computing them this way, the target motor positions and velocities can be interpreted as targets that has be to reach right before updating the command once again. Enforcing consistency between target position and velocity in such a way is a necessary but insufficient condition for the motors to be able to track them.
The command effort is computed as follows:
tau = - kp * ((q_measured - q_desired) + kd * (v_measured - v_desired))
The torque will be held constant until the next controller update.
See also
See PDController documentation to get more information, and integrate_zoh documentation for details about the state integration.
- Parameters:
q_measured (ndarray) – Current position of the actuators.
v_measured (ndarray) – Current velocity of the actuators.
command_state (ndarray) – Current command state, namely, all the derivatives of the target motors positions up to acceleration order.
command_state_lower (ndarray) – Lower bound of the command state that must be satisfied at all cost.
command_state_upper (ndarray) – Upper bound of the command state that must be satisfied at all cost.
kp (ndarray) – PD controller position-proportional gain in motor order.
kd (ndarray) – PD controller velocity-proportional gain in motor order.
motors_effort_limit (ndarray) – Maximum effort that the actuators can output.
control_dt (float) – Controller update period. It will be involved in the integration of the command state.
out (ndarray) – Pre-allocated memory to store the command motor torques.
- Return type:
None
- gym_jiminy.common.blocks.proportional_derivative_controller.pd_adapter(action, order, command_state, command_state_lower, command_state_upper, dt, out)[source]¶
Compute the target motor accelerations that must be held constant for a given time interval in order to reach the desired value of some derivative of the target motor positions at the end of that interval if possible.
Internally, it applies backward in time the same integration scheme as the PD controller. Knowing the initial and final value of the derivative over the time interval, constant target motor accelerations can be uniquely deduced. In practice, it consists in successive clipped finite difference of that derivative up to acceleration-level.
- Parameters:
action (ndarray) – Desired value of the n-th derivative of the command motor positions at the end of the controller update.
order (int) – Derivative order of the position associated with the action.
command_state (ndarray) – Current command state, namely, all the derivatives of the target motors positions up to acceleration order.
command_state_lower (ndarray) – Lower bound of the command state that must be satisfied at all cost.
command_state_upper (ndarray) – Upper bound of the command state that must be satisfied at all cost.
dt (float) – Time interval during which the target motor accelerations will be held constant.
out (ndarray) – Pre-allocated memory to store the target motor accelerations.
- Return type:
None
- gym_jiminy.common.blocks.proportional_derivative_controller.get_encoder_to_motor_map(robot)[source]¶
Get the mapping from encoder sensors to motors.
Warning
If reordering is necessary, then a list of indices is returned, which can used to permute encoder sensor data to match the command torque vector. Yet, it relies on so-called “fancy” or “advanced” indexing for doing so, which means that the returned data is a copy of the original data instead of a reference. On the contrary, it reordering is not necessary, a slice is returned instead and no copy happens whatsoever.
- class gym_jiminy.common.blocks.proportional_derivative_controller.PDController(name, env, *, update_ratio=1, kp, kd, joint_position_margin=0.0, joint_velocity_limit=inf, joint_acceleration_limit=inf)[source]¶
Bases:
BaseControllerBlock
[ndarray
,ndarray
,BaseObsT
,ndarray
]Low-level Proportional-Derivative controller.
The action are the target motors accelerations. The latter are integrated twice using two first-order integrators in cascade, considering that the acceleration is constant until the next controller update:
v_{t+1} = v_{t} + dt * a_{t} q_{t+1} = q_{t} + dt * v_{t+1}
Note
The position and velocity bounds on the command corresponds to the joint limits specified by the dynamical model of the robot. Then, lax default acceleration bounds are extrapolated. More precisely, they are chosen to be sufficient either to span the whole range of velocity or to allow reaching the command effort limits depending on what is the most restrictive. Position, velocity and acceleration.
Warning
It must be connected directly to the base environment to control without any intermediary controllers altering its action space.
- Parameters:
name (str) – Name of the block.
env (InterfaceJiminyEnv[BaseObsT, BaseActT]) – Environment to connect with.
update_ratio (int) – Ratio between the update period of the controller and the one of the subsequent controller. -1 to match the simulation timestep of the environment. Optional: 1 by default.
kp (float | List[float] | ndarray) – PD controller position-proportional gains in motor order.
kd (float | List[float] | ndarray) – PD controller velocity-proportional gains in motor order.
joint_position_margin (float) – Minimum distance of the joint target positions from their respective bounds. Optional: 0.0 by default.
joint_velocity_limit (float) – Restrict maximum joint target velocities wrt their hardware specifications. Optional: “inf” by default.
joint_acceleration_limit (float) – Maximum joint target acceleration. Optional: “inf” by default.
- property fieldnames: List[str]¶
Get mapping between each scalar element of the action space of the controller block and the associated fieldname for logging.
It is expected to return an object with the same structure than the action space, but having lists of string as leaves. Generic fieldnames are used by default.
- compute_command(action, command)[source]¶
Compute the target motor torques using a PD controller.
It is proportional to the error between the observed motors positions/ velocities and the target ones.
Warning
Calling this method manually while a simulation is running is forbidden, because it would mess with the controller update period.
- Parameters:
action (ndarray) – Desired target motor acceleration.
command (ndarray) – Current motor torques that will be updated in-place.
- Return type:
None
- class gym_jiminy.common.blocks.proportional_derivative_controller.PDAdapter(name, env, *, update_ratio=-1, order=1)[source]¶
Bases:
BaseControllerBlock
[ndarray
,ndarray
,BaseObsT
,ndarray
]Adapt the action of a lower-level Proportional-Derivative controller to be the target motor positions or velocities rather than accelerations.
The action is the desired value of some derivative of the target motor positions. The target motor accelerations are then deduced so as to reach this value by the next update of this controller if possible without exceeding the position, velocity and acceleration bounds. Finally, these target position accelerations are passed to a lower-level PD controller, usually running at a higher frequency.
Note
The higher the derivative order of the action, the smoother the command motor torques. Thus, a high order is generally beneficial for robotic applications. However, it introduces some kind of delay between the action and its effects. This phenomenon limits the responsiveness of the agent and therefore impedes its optimal performance. Besides, it introduces addition layer of indirection between the action and its effect which may be difficult to grasp for the agent. Finally, exploration usually consists in addition temporally uncorrelated gaussian random process at action-level. The effect of such random processes tends to vanish when integrated, making exploration very inefficient.
- Parameters:
update_ratio (int) – Ratio between the update period of the controller and the one of the subsequent controller. -1 to match the environment step env.step_dt. Optional: -1 by default.
order (int) – Derivative order of the action. It accepts position or velocity (respectively 0 or 1). Optional: 1 by default.
name (str)
env (InterfaceJiminyEnv[BaseObsT, BaseActT])
- property fieldnames: List[str]¶
Get mapping between each scalar element of the action space of the controller block and the associated fieldname for logging.
It is expected to return an object with the same structure than the action space, but having lists of string as leaves. Generic fieldnames are used by default.
- compute_command(action, command)[source]¶
Compute the target motor accelerations from the desired value of some derivative of the target motor positions.
- Parameters:
action (ndarray) – Desired target motor acceleration.
command (ndarray) – Current motor torques that will be updated in-place.
- Return type:
None