PD Controller

Implementation of basic Proportional-Derivative controller block compatible with gym_jiminy reinforcement learning pipeline environment design.

gym_jiminy.common.controllers.proportional_derivative._compute_command_impl(q_target, v_target, encoders_data, motor_to_encoder, pid_kp, pid_kd)[source]

Implementation of PD control law.

Note

Used internally by PDController to compute command but separated to allow precompilation. It is not meant to be called manually.

Parameters
  • q_target (numpy.ndarray) –

  • v_target (numpy.ndarray) –

  • encoders_data (numpy.ndarray) –

  • motor_to_encoder (numpy.ndarray) –

  • pid_kp (numpy.ndarray) –

  • pid_kd (numpy.ndarray) –

Return type

numpy.ndarray

class gym_jiminy.common.controllers.proportional_derivative.PDController(env, update_ratio=1, pid_kp=0.0, pid_kd=0.0, **kwargs)[source]

Bases: gym_jiminy.common.bases.block_bases.BaseControllerBlock

Low-level Proportional-Derivative controller.

Warning

It must be connected directly to the environment to control without any intermediary controllers.

Parameters
  • update_ratio (int) – Ratio between the update period of the controller and the one of the subsequent controller.

  • pid_kp (Union[float, List[float], numpy.ndarray]) – PD controller position-proportional gain in motor order.

  • pid_kd (Union[float, List[float], numpy.ndarray]) – PD controller velocity-proportional gain in motor order.

  • kwargs (Any) – Used arguments to allow automatic pipeline wrapper generation.

  • env (gym_jiminy.common.envs.env_generic.BaseJiminyEnv) –

Return type

None

_refresh_action_space()[source]

Configure the action space of the controller.

The action spaces corresponds to the position and velocity of motors instead of the torque.

Return type

None

get_fieldnames()[source]

Get mapping between each scalar element of the action space of the controller and the associated fieldname for logging.

It is expected to return an object with the same structure than the action space, the difference being numerical arrays replaced by lists of string.

By default, generic fieldnames using ‘Action’ prefix and index as suffix for np.ndarray.

Note

This method is not supposed to be called before reset, so that the controller should be already initialized at this point.

Return type

Union[Dict[str, StructNested], Sequence[StructNested], ValueType]

compute_command(measure, action)[source]

Compute the motor torques using a PD controller.

It is proportional to the error between the measured motors positions/ velocities and the target ones.

Parameters
  • measure (Union[Dict[str, StructNested], Sequence[StructNested], ValueType]) – Observation of the environment.

  • action (gym.spaces.dict.Dict) – Desired motors positions and velocities as a dictionary.

Return type

numpy.ndarray

_refresh_observation_space()

Configure the observation space of the controller.

It does nothing but to return the observation space of the environment since it is only affecting the action space.

Warning

This method that must not be overloaded. If one need to overload it, then using BaseObserverBlock or BlockInterface directly is probably the way to go.

Return type

None

_setup()

Configure the controller.

It includes:

  • refreshing the action space of the controller

  • allocating memory of the controller’s internal state and initializing it

Note

Note that the environment to ultimately control env has already been fully initialized at this point, so that each of its internal buffers is up-to-date, but the simulation is not running yet. As a result, it is still possible to update the configuration of the simulator, and for example, to register some extra variables to monitor the internal state of the controller.

Return type

None

compute_reward(*args, info, **kwargs)

Compute reward at current episode state.

See ControllerInterface.compute_reward for details.

Note

This method is called after updating the internal buffer ‘_num_steps_beyond_done’, which is None if the simulation is not done, 0 right after, and so on.

Parameters
  • args (Any) – Extra arguments that may be useful for derived environments, for example Gym.GoalEnv.

  • info (Dict[str, Any]) – Dictionary of extra information for monitoring.

  • kwargs (Any) – Extra keyword arguments. See ‘args’.

Returns

Total reward.

Return type

float

compute_reward_terminal(*, info)

Compute terminal reward at current episode final state.

Note

Implementation is optional. Not computing terminal reward if not overloaded by the user for the sake of efficiency.

Warning

Similarly to compute_reward, ‘info’ can be updated by reference to log extra info for monitoring.

Parameters

info (Dict[str, Any]) – Dictionary of extra information for monitoring.

Returns

Terminal reward.

Return type

float

control_dt: float
action_space: Optional[gym.spaces.space.Space]
_action: Optional[Union[Dict[str, StructNested], Sequence[StructNested], ValueType]]
observation_space: Optional[gym.Space]