I have recently implemented a model predictive controller (MPC) to calculate the necessary reaction forces for a legged robot. The work presented here is based on this paper by Kim et al. If you don’t know what model predictive control is, I recommend this excellent explanation by Steve Brunton. I also found this guide to model predictive control with CasADI to be extremely helpful. CasADi is an open source nonlinear optimization tool which I’m using for MPC.
However, I’m not entirely sure yet whether this code works the way it’s intended–I’m not getting any error messages, but the force values I’m getting seem too low. If you spot anything wrong with my implementation, please let me know.
Here’s the code.
Part 1. The Dynamics Formulation
From the paper, we have the following discrete dynamics:
$$ x(k+1) = A_kx(k) + B_k \hat{f_k} + \hat{g} $$
where,
$$ \begin{equation} x = \begin{bmatrix} \theta^{\top} & p^{\top} & \omega^{\top} & \dot{p}^{\top} \end{bmatrix}^{\top} \end{equation} $$
$$ \begin{equation} \hat{f} = \begin{bmatrix} f_1 & … & f_n \end{bmatrix}^{\top} \end{equation} $$
$$ \begin{equation} \hat{g} = \begin{bmatrix} 0_{1\times{}3} & 0_{1\times{}3} & 0_{1\times{}3} & g^{\top} \end{bmatrix}^{\top} \end{equation} $$
$$ \begin{equation} A = \begin{bmatrix} 1_{3\times3} & 0_{3\times3} & R_z(\psi)\Delta t & 0_{3\times3} \newline 0_{3\times3} & 1_{3\times3} & 0_{3\times3} & 1_{3\times3} \Delta t \newline 0_{3\times3} & 0_{3\times3} & 1_{3\times3} & 0_{3\times3} \newline 0_{3\times3} & 0_{3\times3} & 0_{3\times3} & 1_{3\times3} \end{bmatrix} \end{equation} $$
$$ \begin{equation} B = \begin{bmatrix} 0_{3\times3} & 0_{3\times3} \newline 0_{3\times3} & 0_{3\times3} \newline {}_GI^{-1}[r_1]_\times \Delta t & {}_GI^{-1}[r_1]_\times \Delta t \newline 1_{3\times3} \Delta t /m & 1_{3\times3} \Delta t /m \end{bmatrix} \end{equation} $$First off, you may be wondering what this notation means.
$[r_1]_\times$ refers to a skew-symmetric matrix. The $3\times1$ vectors $r_1$ and $r_2$ (footstep positions) are converted into a $3\times3$ matrix in order to perform a cross product operation with the inverse of the global inertia tensor.
% matlab code
r = [[0, -r_z, r_y];
[r_z, 0, -r_x];
[-r_y, r_x, 0]];
i_inv = sym('i', [3, 3]);
mtimes(i_inv, r)
A few more notes:
- The state term $x$ is a $12\times1$ vector composed of the angle, position, angular velocity, and velocity of the body w.r.t. to the global frame.
- In my case, there are only two legs, so there six scalar control values: $f_{1x}$, $f_{1y}$, $f_{1z}$, $f_{2x}$, $f_{2y}$, $f_{2z}$.
- The gravity term $\hat{g}$ is a $12\times1$ vector of zeros until the last term, which is the gravitational constant.
- $R_z$ is a $3\times3$ rotation matrix for translating angular velocity from the global frame to the body frame.
% matlab code
syms theta_x theta_y theta_z p_x p_y p_z omega_x omega_y omega_z pdot_x ...
pdot_y pdot_z f1_x f1_y f1_z f2_x f2_y f2_z ...
dt i_inv r1x r1y r1z r2x r2y r2z mass gravity...
i11 i12 i13 i21 i22 i23 i31 i32 i33...
rz11 rz12 rz13 rz21 rz22 rz23 rz31 rz32 rz33
x = [[theta_x]; % states
[theta_y];
[theta_z];
[p_x];
[p_y];
[p_z];
[omega_x];
[omega_y];
[omega_z];
[pdot_x];
[pdot_y];
[pdot_z]];
f = [[f1_x]; % controls
[f1_y];
[f1_z];
[f2_x];
[f2_y];
[f2_z]];
g = [[0];
[0];
[0];
[0];
[0];
[0];
[0];
[0];
[0];
[0];
[0];
[gravity]];
A = [[1, 0, 0, 0, 0, 0, rz11*dt, rz12*dt, rz13*dt, 0, 0, 0];
[0, 1, 0, 0, 0, 0, rz21*dt, rz22*dt, rz23*dt, 0, 0, 0];
[0, 0, 1, 0, 0, 0, rz31*dt, rz32*dt, rz33*dt, 0, 0, 0];
[0, 0, 0, 1, 0, 0, 0, 0, 0, dt, 0, 0];
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, dt, 0];
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, dt];
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0];
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0];
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0];
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]];
B = [[0, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0];
[0, 0, 0, 0, 0, 0];
[(i12*r1z - i13*r1y)*dt, (-i11*r1z + i13*r1x)*dt, (i11*r1y - i12*r1x)*dt, ...
(i12*r2z - i13*r2y)*dt, (-i11*r2z + i13*r2x)*dt, (i11*r2y - i12*r2x)*dt];
[(i22*r1z - i23*r1y)*dt, (-i21*r1z + i23*r1x)*dt, (i21*r1y - i22*r1x)*dt, ...
(i22*r2z - i23*r2y)*dt, (-i21*r2z + i23*r2x)*dt, (i21*r2y - i22*r2x)*dt];
[(i32*r1z - i33*r1y)*dt, (-i31*r1z + i33*r1x)*dt, (i31*r1y - i32*r1x)*dt, ...
(i32*r2z - i33*r2y)*dt, (-i31*r2z + i33*r2x)*dt, (i31*r2y - i32*r2x)*dt];
[dt/mass, 0, 0, dt/mass, 0, 0];
[0, dt/mass, 0, 0, dt/mass, 0];
[0, 0, dt/mass, 0, 0, dt/mass]];
x_next = mtimes(A, x) + mtimes(B, f) + g;
The end result is a very long equation which will be used as a constraint for the QP. The states and controls are defined as symbolics, but other values such as the inertia tensor, rotation matrix, and foot positions are fed in as numerical values from sensor feedback.
# python
gravity = -9.807
dt = self.dt
mass = self.mass
x_next = [dt * omega_x * rz11 + dt * omega_y * rz12 + dt * omega_z * rz13 + theta_x,
dt * omega_x * rz21 + dt * omega_y * rz22 + dt * omega_z * rz23 + theta_y,
dt * omega_x * rz31 + dt * omega_y * rz32 + dt * omega_z * rz33 + theta_z,
dt * pdot_x + p_x,
dt * pdot_y + p_y,
dt * pdot_z + p_z,
dt * f1_x * (i12 * r1z - i13 * r1y) + dt * f1_y * (-i11 * r1z + i13 * r1x)
+ dt * f1_z * (i11 * r1y - i12 * r1x) + dt * f2_x * (i12 * r2z - i13 * r2y)
+ dt * f2_y * (-i11 * r2z + i13 * r2x) + dt * f2_z * (i11 * r2y - i12 * r2x) + omega_x,
dt * f1_x * (i22 * r1z - i23 * r1y) + dt * f1_y * (-i21 * r1z + i23 * r1x)
+ dt * f1_z * (i21 * r1y - i22 * r1x) + dt * f2_x * (i22 * r2z - i23 * r2y)
+ dt * f2_y * (-i21 * r2z + i23 * r2x) + dt * f2_z * (i21 * r2y - i22 * r2x) + omega_y,
dt * f1_x * (i32 * r1z - i33 * r1y) + dt * f1_y * (-i31 * r1z + i33 * r1x)
+ dt * f1_z * (i31 * r1y - i32 * r1x) + dt * f2_x * (i32 * r2z - i33 * r2y)
+ dt * f2_y * (-i31 * r2z + i33 * r2x) + dt * f2_z * (i31 * r2y - i32 * r2x) + omega_z,
dt * f1_x / mass + dt * f2_x / mass + pdot_x,
dt * f1_y / mass + dt * f2_y / mass + pdot_y,
dt * f1_z / mass + dt * f2_z / mass + gravity + pdot_z]
self.fn = cs.Function('fn', [theta_x, theta_y, theta_z,
p_x, p_y, p_z,
omega_x, omega_y, omega_z,
pdot_x, pdot_y, pdot_z,
f1_x, f1_y, f1_z,
f2_x, f2_y, f2_z],
x_next) # nonlinear mapping of function f(x,u)
Part 2. The Objective Function
$$ \begin{align} \min_{x,f} \quad & \sum^{m}_{k=0} \lVert x(k+1) - x^{ref}(k+1) \lVert_Q + \lVert f_k\lVert_R \end{align} $$
The objective function is shown above.
$$ \lVert f(k)\lVert_R $$
Let’s start with this notation. What does it mean? Well, double bars represent a matrix norm. However, the R in subscript clues us into the fact that this is a special case of the matrix norm with an inner product. Q and R are weighing matrices and should be diagonal.
$$ \begin{equation} \lVert f(k)\lVert_R = \sqrt{f(k)_R} = \sqrt{f(k)*R*f(k)} \end{equation} $$However, I’m not sure what the purpose of the square root would be here. It seems to me that minimizing the square root of $f(x)$ is basically equivalent to minimizing $f(x)$ by itself, but more computationally expensive. In fact, running the code with the square root operation actually causes a QP is infeasible!
error. As such, I’m leaving the square root operation out.
So, the objective function minimizes $x$ and $f$ (the state and control vectors) based on the difference between actual and reference state as well as the value of $f$, scaled to Q and R respectively, over $k$ time steps where $k$ is between 0 and $m$. Here, $m$ refers to the prediction horizon, which is chosen by the user.
u = cs.SX.sym('u', self.n_controls, self.N) # decision variables, control action matrix
st_ref = cs.SX.sym('st_ref', self.n_states + self.n_states) # initial and reference states
x = cs.SX.sym('x', self.n_states, (self.N + 1)) # represents the states over the opt problem.
obj = 0 # cs.SX(0) # objective function
constr = [] # constraints vector
Q = np.zeros((12, 12)) # state weighing matrix
Q[0, 0] = 1
Q[1, 1] = 1
Q[2, 2] = 1
Q[3, 3] = 1
Q[4, 4] = 1
Q[5, 5] = 1
Q[6, 6] = 1
Q[7, 7] = 1
Q[8, 8] = 1
Q[9, 9] = 1
Q[10, 10] = 1
Q[11, 11] = 1
R = np.zeros((6, 6)) # control weighing matrix
R[0, 0] = 1
R[1, 1] = 1
R[2, 2] = 1
R[3, 3] = 1
R[4, 4] = 1
R[5, 5] = 1
st = x[:, 0] # initial state
constr = cs.vertcat(constr, st - st_ref[0:self.n_states]) # initial condition constraints
# compute objective and constraints
for k in range(0, self.N):
st = x[:, k] # state
con = u[:, k] # control action
# calculate objective
obj = obj + cs.mtimes(cs.mtimes((st - st_ref[self.n_states:(self.n_states * 2)]).T, Q),
st - st_ref[self.n_states:(self.n_states * 2)])) \
+ cs.mtimes(cs.mtimes(con.T, R), con))
Both my Q and R are identity matrices right now, as I have not yet tuned them.
Part 3. The Constraints Vector
In the same for loop, we can now add the constraints vector. For each iteration of the loop, the discrete dynamics equation from Part 1 is subtracted from the next state vector, $x(k+1)$.
for k in range(0, self.N):
# ...
st_next = x[:, k + 1]
f_value = self.fn(st[0], st[1], st[2], st[3], st[4], st[5],
st[6], st[7], st[8], st[9], st[10], st[11],
con[0], con[1], con[2], con[3], con[4], con[5])
st_n_e = np.array(f_value)
constr = cs.vertcat(constr, st_next - st_n_e) # compute constraints
There are more constraints, however. Namely, friction cone constraints, governed by friction of the feet with the ground. As stated in the paper:
$$ \begin{equation} \vert{}f_x\vert{} \leq \mu f_z \end{equation} $$
$$ \begin{equation} \vert{}f_x\vert{} \leq \mu f_z \end{equation} $$
$$ \begin{equation} f_z > 0 \end{equation} $$
Here’s my implementation below. Both the $F_x$ and $F_y$ inequalities must be repeated due to the absolute value function. $F_z$ is not required here because it’s extremely simple and can be implemented as an input constraint.
# add additional constraints
for k in range(0, self.N):
constr = cs.vertcat(constr, u[0, k] - self.mu * u[2, k]) # f1x - mu*f1z
constr = cs.vertcat(constr, -u[0, k] - self.mu * u[2, k]) # -f1x - mu*f1z
constr = cs.vertcat(constr, u[1, k] - self.mu * u[2, k]) # f1y - mu*f1z
constr = cs.vertcat(constr, -u[1, k] - self.mu * u[2, k]) # -f1y - mu*f1z
constr = cs.vertcat(constr, u[3, k] - self.mu * u[5, k]) # f2x - mu*f2z
constr = cs.vertcat(constr, -u[3, k] - self.mu * u[5, k]) # -f2x - mu*f2z
constr = cs.vertcat(constr, u[4, k] - self.mu * u[5, k]) # f2y - mu*f2z
constr = cs.vertcat(constr, -u[4, k] - self.mu * u[5, k]) # -f2y - mu*f2z
Part 4. Problem Setup
The solver used here is qpOASES. The optimization variables (in this case, the state and controls), objective function, constraint functions, and reference state are declared as shown. It is important to specify a bound tolerance and termination tolerance.
opt_variables = cs.vertcat(cs.reshape(x, n_states * (self.N + 1), 1),
cs.reshape(u, n_controls * self.N, 1))
qp = {'x': opt_variables, 'f': obj, 'g': constr, 'p': st_ref}
opts = {'print_time': 0, 'error_on_fail': 0, 'printLevel': "low", 'boundTolerance': 1e-6,
'terminationTolerance': 1e-6}
solver = cs.qpsol('S', 'qpoases', qp, opts)
Part 5. Upper and Lower Bounds
The upper and lower bounds for the constraint vector still needs to be set. The dynamics equation is set up as an equality constraint, so the upper and lower bounds are both zero. The initial condition constraint is also an equality constraint, since the first state vector must be equal to the input state vector. The rest of the constraints have a lower bound close to infinity.
c_length = np.shape(constr)[0]
o_length = np.shape(opt_variables)[0]
lbg = list(itertools.repeat(-1e10, c_length)) # inequality constraints: big enough to act like infinity
lbg[0:(self.N + 1)] = itertools.repeat(0, self.N + 1) # IC + dynamics equality constraint
ubg = list(itertools.repeat(0, c_length)) # inequality constraints
Constraints for the optimization variables are set separately. This is where the friction cone restraint for $F_z$ comes into play. The reaction force perpendicular to the ground (assumed flat) must be upward, because the robot can only push itself away from the ground rather than pull itself toward the ground. Therefore, the lower bound on all $F_z$ is 0.
Additionally, the MPC must check if each leg is actually in contact with the ground when calculating forces for it. If that leg is not in contact, all bounds are set equal to zero.
# constraints for optimization variables
lbx = list(itertools.repeat(-1e10, o_length)) # input inequality constraints
ubx = list(itertools.repeat(1e10, o_length)) # input inequality constraints
dyn_len = self.n_states * (self.N + 1)
lbx[(dyn_len + 2)::3] = [0 for i in range(20)] # lower bound on all f1z and f2z
if c_l == 0: # if left leg is not in contact... don't calculate output forces for that leg.
ubx[(self.n_states * (self.N + 1))::6] = [0 for i in range(10)] # upper bound on all f1x
ubx[(self.n_states * (self.N + 1) + 1)::6] = [0 for i in range(10)] # upper bound on all f1y
lbx[(self.n_states * (self.N + 1))::6] = [0 for i in range(10)] # lower bound on all f1x
lbx[(self.n_states * (self.N + 1) + 1)::6] = [0 for i in range(10)] # lower bound on all f1y
ubx[(self.n_states * (self.N + 1) + 2)::6] = [0 for i in range(10)] # upper bound on all f1z
if c_r == 0: # if right leg is not in contact... don't calculate output forces for that leg.
ubx[(self.n_states * (self.N + 1) + 3)::6] = [0 for i in range(10)] # upper bound on all f2x
ubx[(self.n_states * (self.N + 1) + 4)::6] = [0 for i in range(10)] # upper bound on all f2y
lbx[(self.n_states * (self.N + 1) + 3)::6] = [0 for i in range(10)] # lower bound on all f2x
lbx[(self.n_states * (self.N + 1) + 4)::6] = [0 for i in range(10)] # lower bound on all f2y
ubx[(self.n_states * (self.N + 1) + 5)::6] = [0 for i in range(10)] # upper bound on all f2z
Part 6. Solving
Firstly, initial values must be fed into the solver.
I’ve set the initial values for my control inputs as an array of zeros, although I’m wondering if that should really be the last set of control inputs applied.
The initial values for the state are pulled in and calculated from the simulator. This $12\times1$ array is repeated $N+1$ times, where $N$ is the prediction horizon, in order to fill the entire array including future states.
u0 = np.zeros((self.N, self.n_controls)) # six control inputs
X0 = np.matlib.repmat(x_in, 1, self.N + 1).T # initialization of the state's decision variables
parameters = cs.vertcat(x_in, x_ref) # set values of parameters vector
# init value of optimization variables
x0 = cs.vertcat(np.reshape(X0.T, (self.n_states * (self.N + 1), 1)),
np.reshape(u0.T, (self.n_controls * self.N, 1)))
Finally, the initial states, bounds, and parameters are plugged into the solver. The controls vector is pulled from the solution, and the first row of optimized control values is returned. The rest of the values (representing future time steps) are ignored.
sol = solver(x0=x0, lbx=lbx, ubx=ubx, lbg=lbg, ubg=ubg, p=parameters)
solu = np.array(sol['x'][self.n_states * (self.N + 1):])
u = np.reshape(solu.T, (self.n_controls, self.N)).T # get controls from the solution
u_cl = u[0, :] # ignore rows other than first row
return u_cl