Pieter van Goor

The 2D Special Euclidean Group SE(2)

2024-03-12T00:00:00+00:00

Introduction

The Special Euclidean SE(n) group describes the orientation-preserving isometries of the Euclidean space $\mathbb{R}^n$, where $n \in \mathbb{N}$ is some integer. This group can be represented using matrices, as

\[\begin{aligned} \mathbf{SE}(n) = \left\{ X = \begin{pmatrix} R & p \\ 0_{1 \times n} & 1 \end{pmatrix} \in \mathbb{R}^{n+1 \times n+1} \; \middle| \; R^\top R = I_n, \; \det(R) = 1, \; p \in \mathbb{R}^n \right\} \end{aligned}\]

In this post, we will focus solely on the 2D case. This is particularly relevant for ground-based robotics, where the group reflects the invariance of the robot’s dynamics. The 2D Special Euclidean group is given by

\[\begin{aligned} \mathbf{SE}(2) &:= \left\{ X = \begin{pmatrix} R(\theta) & p \\ 0_{1 \times 2} & 1 \end{pmatrix} \in \mathbb{R}^{3\times 3} \; \middle| \; \theta \in (\pi,\pi], \; p \in \mathbb{R}^2 \right\}, \\ R(\theta) &:= \begin{pmatrix} \cos(\theta) & - \sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \end{aligned}\]

The group properties are easily verified. The identity is $I_3 \in \mathbf{SE}(2)$ and the product and inverse are

\[\begin{aligned} X_1 X_2 &= \begin{pmatrix} R(\theta_1) & p_1 \\ 0_{1 \times 2} & 1 \end{pmatrix} \begin{pmatrix} R(\theta_2) & p_2 \\ 0_{1 \times 2} & 1 \end{pmatrix} = \begin{pmatrix} R(\theta_1 + \theta_2) & p_1 + R(\theta_1) p_2 \\ 0_{1 \times 2} & 1 \end{pmatrix}, \\ X^{-1} &= \begin{pmatrix} R(\theta) & p \\ 0_{1 \times 2} & 1 \end{pmatrix}^{-1} = \begin{pmatrix} R(-\theta) & - R(-\theta) p \\ 0_{1 \times 2} & 1 \end{pmatrix} \end{aligned}\]

Putting these together, we obtain the conjugation

\[\begin{aligned} \mathrm{Cn}_{X_1}(X_2) &= X_1 X_2 X_1^{-1} \\ &= \begin{pmatrix} R(\theta_1) & p_1 \\ 0_{1 \times 2} & 1 \end{pmatrix} \begin{pmatrix} R(\theta_2) & p_2 \\ 0_{1 \times 2} & 1 \end{pmatrix} \begin{pmatrix} R(\theta_1) & p_1 \\ 0_{1 \times 2} & 1 \end{pmatrix}^{-1} \\ &= \begin{pmatrix} R(\theta_1 + \theta_2) & p_1 + R(\theta_1) p_2 \\ 0_{1 \times 2} & 1 \end{pmatrix} \begin{pmatrix} R(-\theta_1) & - R(-\theta_1) p_1 \\ 0_{1 \times 2} & 1 \end{pmatrix} \\ &= \begin{pmatrix} R(\theta_1 + \theta_2 - \theta_1) & p_1 + R(\theta_1) p_2 - R(\theta_1 + \theta_2) R(-\theta_1) p_1 \\ 0_{1 \times 2} & 1 \end{pmatrix} \\ &= \begin{pmatrix} R(\theta_2) & (I_2 - R(\theta_2))p_1 + R(\theta_1) p_2 \\ 0_{1 \times 2} & 1 \end{pmatrix} \end{aligned}\]

Lie algebra

The Lie algebra can be obtained by differentiating the matrix $X$ at $\theta = 0$ and $p = 0_{2\times 1}$. We get that

\[\begin{aligned} \mathfrak{se}(2) &= \left\{ U = \begin{pmatrix} \omega^\times & v \\ 0_{1 \times 2} & 0 \end{pmatrix} \in \mathbb{R}^{3\times 3} \; \middle| \; \omega \in \mathbb{R}, \; v \in \mathbb{R}^2 \right\}, \\ \omega^\times &:= \begin{pmatrix} 0 & - \omega \\ \omega & 0 \end{pmatrix}. \end{aligned}\]

The notation $\omega^\times$ is a very useful one. The operator $\cdot^\times : \mathbb{R} \to \mathbb{R}^{2\times 2}$ is linear, so $(\omega_1 + \omega_2)^\times = \omega_1^\times + \omega_2^\times$ and, in particular, $\omega^\times = \omega 1^\times$. This is a particularly nice feature since $1^\times = R(\pi/2)$.

The Lie algebra $\mathfrak{se}(2)$ is a vector space by definition, and we can relate it to $\mathbb{R}^3$ by choosing a basis, or simply by defining a “wedge” map $\cdot^\wedge : \mathbb{R}^3 \to \mathfrak{se}(2)$. This map is required to be an isomorphism in the vector space sense, so it must be linear and invertible. We choose the wedge map and its inverse, the vee map, to be

\[\begin{aligned} \begin{pmatrix} \omega \\ v \end{pmatrix}^\wedge &:= \begin{pmatrix} \omega^\times & v \\ 0_{1\times 2} & 0 \end{pmatrix}, & \begin{pmatrix} \omega^\times & v \\ 0_{1\times 2} & 0 \end{pmatrix}^\vee &:= \begin{pmatrix} \omega \\ v \end{pmatrix}, \end{aligned}\]

where $\omega \in \mathbb{R}$ and $v \in \mathbb{R}^2$. In other words, we define a basis of $\mathfrak{se}(2)$ by $\begin{aligned} E_1 &:= \begin{pmatrix} 1^\times & 0_{2 \times 1} \\ 0_{1\times 2} & 0 \end{pmatrix}, & E_2 &:= \begin{pmatrix} 0_{2\times 2} & \mathbf{e}_1 \\ 0_{1\times 2} & 0 \end{pmatrix}, & E_3 &:= \begin{pmatrix} 0_{2\times 2} & \mathbf{e}_2 \\ 0_{1\times 2} & 0 \end{pmatrix}, \end{aligned}$ where $\mathbf{e}_1, \mathbf{e}_2 \in \mathbb{R}^2$ are the standard basis vectors.

Adjoint and Lie bracket

Using the wedge and vee operators, we can obtain expressions for the Adjoint operator and Lie bracket. The simplest way is to differentiate the conjugation operation. We have

\[\begin{aligned} \mathrm{Ad}_{X}(U) &= \mathrm{D}_Z |_{I_3} \mathrm{Cn}_{X}(Z)[U] &= \begin{pmatrix} \omega^\times & -\omega^\times p + R(\theta) v \\ 0_{1 \times 2} & 0 \end{pmatrix}. \end{aligned}\]

We obtain a matrix expression for $\mathrm{Ad}_X$ by using the wedge and vee isomorphisms. Specifically, for any linear operator $L : \mathfrak{se}(2) \to \mathfrak{se}(2)$, we may define $L^\vee \in \mathbb{R}^{3\times 3}$ to be the matrix such that $L^\vee u = L(u^\wedge)^\vee$ for all $u \in \mathbb{R}^3$. From this definition, we obtain the Adjoint matrix

\[\begin{aligned} \mathrm{Ad}_{X}^\vee (U^\vee) &= \begin{pmatrix} \omega^\times & -\omega^\times p + R(\theta) v \\ 0_{1 \times 2} & 0 \end{pmatrix}^\vee \\ &= \begin{pmatrix} \omega \\ -1^\times p \omega + R(\theta) v \end{pmatrix} \\ &= \begin{pmatrix} 1 & 0_{1\times 2} \\ -1^\times p & R(\theta) \end{pmatrix} \begin{pmatrix} \omega \\ v \end{pmatrix}, \\ \mathrm{Ad}_X^\vee &= \begin{pmatrix} 1 & 0_{1\times 2} \\ -1^\times p & R(\theta) \end{pmatrix} \end{aligned}\]

Differentiating this matrix in terms of the variable $X$ at the identity provides the “little” adjoint matrix and the Lie bracket

\[\begin{aligned} \mathrm{ad}_{U}^\vee&= \begin{pmatrix} 0 & 0_{1\times 2} \\ -1^\times v & \omega^\times \end{pmatrix}, \\ [U_1, U_2] &= \mathrm{ad}_{U_1}(U_2) = \begin{pmatrix} 0 \\ -\omega_2^\times v_1 + \omega_1^\times v_2 \end{pmatrix}^\wedge. \end{aligned}\]

Exponential and Logarithm

The exponential is “simply” given by the matrix exponential. However, it is nice to have formulas that do not rely on solving infinite power series, or, at least, hide these solutions in well-known elementary functions like $\sin$ and $\cos$. Let $U \in \mathfrak{se}(2)$. Then, we have that

\[\begin{aligned} U^2 &= \begin{pmatrix} \omega^\times & v \\ 0_{1 \times 2} & 0 \end{pmatrix}^2 = \begin{pmatrix} (\omega^\times)^2 & \omega^\times v \\ 0_{1 \times 2} & 0 \end{pmatrix} = \begin{pmatrix} -\omega^2 I_2 & \omega^\times v \\ 0_{1 \times 2} & 0 \end{pmatrix}, \\ U^3 &= \begin{pmatrix} \omega^\times & v \\ 0_{1 \times 2} & 0 \end{pmatrix}^3 = \begin{pmatrix} -\omega^2 \omega^\times & \omega^\times \omega^\times v \\ 0_{1 \times 2} & 0 \end{pmatrix} = \begin{pmatrix} -\omega^2 \omega^\times & - \omega^2 v \\ 0_{1 \times 2} & 0 \end{pmatrix} = -\omega^2 U. \end{aligned}\]

This is the property that lets us simplify the exponential formula. It follows that $U^{2k+1} = -\omega^2 U^{2k-1} = (-1)^k\omega^{2k} U$ for all $k \geq 0$. We now solve the matrix exponential. We have that

\[\begin{aligned} \exp(U) &= \sum_{k=0}^\infty \frac{1}{k!} U^k, \\ &= I_3 + \sum_{k=1}^\infty \frac{1}{(2k)!} U^{2k} + \sum_{k=0}^\infty \frac{1}{(2k+1)!} U^{2k+1}, \\ &= I_3 + \left(\sum_{k=1}^\infty \frac{1}{(2k)!} U^{2k-1}\right) U + \sum_{k=0}^\infty \frac{1}{(2k+1)!} U^{2k+1}, \\ &= I_3 + \left(\sum_{k=1}^\infty \frac{(-1)^{k-1}}{(2k)!} \omega^{2k-2} U\right) U + \sum_{k=0}^\infty \frac{(-1)^{k}}{(2k+1)!} \omega^{2k} U, \\ &= I_3 - \left(\sum_{k=1}^\infty \frac{(-1)^{k}}{(2k)!} \omega^{2k} \right) \omega^{-2}U^2 + \left(\sum_{k=0}^\infty \frac{(-1)^{k}}{(2k+1)!} \omega^{2k+1}\right) \omega^{-1}U, \\ &= I_3 - \left(\cos(\omega) - 1 \right) \omega^{-2}U^2 + \sin(\omega) \omega^{-1}U, \\ &= I_3 + \frac{\sin(\omega)}{\omega} U + \frac{1 - \cos(\omega)}{\omega^2} U^2. \end{aligned}\]

Written in terms of the expanded matrix, we get

\[\begin{aligned} \exp(U) &= I_3 + \frac{\sin(\omega)}{\omega} U + \frac{1 - \cos(\omega)}{\omega^2} U^2 \\ &= \begin{pmatrix} I_2 & 0_{2\times 1} \\ 0_{1\times 2} & 1 \end{pmatrix} + \frac{\sin(\omega)}{\omega} \begin{pmatrix} \omega^\times & v \\ 0_{1\times 2} & 0 \end{pmatrix} + \frac{1 - \cos(\omega)}{\omega^2} \begin{pmatrix} -\omega^2 I_2 & \omega^\times v \\ 0_{1\times 2} & 0 \end{pmatrix} \\ &= \begin{pmatrix} I_2 + \frac{\sin(\omega)}{\omega} \omega^\times -\omega^2 \frac{1 - \cos(\omega)}{\omega^2} I_2 & \frac{\sin(\omega)}{\omega} v + \frac{1 - \cos(\omega)}{\omega^2} \omega^\times v \\ 0_{1\times 2} & 1 \end{pmatrix} \\ &= \begin{pmatrix} \sin(\omega) 1^\times + \cos(\omega) I_2 & \frac{1}{\omega} (\sin(\omega)I_2 + (1 - \cos(\omega))1^\times ) v \\ 0_{1\times 2} & 1 \end{pmatrix} \\ &= \begin{pmatrix} R(\omega) & \frac{1}{\omega} (- \sin(\omega)1^\times + I_2 - \cos(\omega)I_2 ) 1^\times v \\ 0_{1\times 2} & 1 \end{pmatrix} \\ &= \begin{pmatrix} R(\omega) & \frac{I_2 - R(\omega)}{\omega} 1^\times v \\ 0_{1\times 2} & 1 \end{pmatrix} \\ \end{aligned}\]

When $\omega = 0$, the formula simplifies to

\[\begin{aligned} \exp(U) = I_3 + U. \end{aligned}\]

The expanded formula tells us how to take the logarithm as well. Given a matrix $X \in \mathbf{SE}(2)$, we match the terms in $\exp(U)= X$ to obtain

\[\begin{aligned} R(\theta) &= R(\omega), & p &= \frac{I_2 - R(\omega)}{\omega} 1^\times v. \end{aligned}\]

The first term is solved by $\omega = \theta + 2k \pi$ for any $k \in \mathbb{N}$, so we choose $\omega \in [-\pi, \pi)$ as the standard solution. The second term is then given by solving

\[\begin{aligned} p &= \frac{I_2 - R(\omega)}{\omega} 1^\times v, \\ p &= \frac{I_2 - R(\omega)}{\omega} R(\pi/2) v, \\ v &= \omega R(-\pi/2) (I_2 - R(\omega) )^{-1} p. \end{aligned}\]

Observe that $(I_2 - R(\omega) ) (I_2 - R(-\omega) ) = 2(1-\cos(\omega))I_2$. Thus,

\[\begin{aligned} v &= \omega R(-\pi/2) (I_2 - R(\omega) )^{-1} p \\ &= \omega R(-\pi/2) \frac{(I_2 - R(-\omega) )}{2 (1-\cos(\omega))} p \\ &= \frac{\omega}{2 (1-\cos(\omega))} (I_2 - R(-\omega)) R(-\pi/2) p \\ \end{aligned}\]

In summary,

\[\begin{aligned} X &= \begin{pmatrix} R & p \\ 0_{1\times 2} & 1 \end{pmatrix}, \qquad \log(X) = \begin{pmatrix} \omega^\times & v \\ 0_{1\times 2} & 0 \end{pmatrix}, \\ \omega &:= \mathrm{atan2}(R_{2,1}, R_{1,1}) = \mathrm{atan2}(\sin(\theta), \cos(\theta)), \\ v &:= \frac{\omega}{2 (1-\cos(\omega))} (I_2 - R(-\omega)) R(-\pi/2) p \end{aligned}\]

Conclusion

The formulas presented in this summary are intended to be useful and practical for implementation, which is what I have done in the pylie library. I hope you find it helpful too, and please let me know if you find any issues or mistakes, or have suggestions for improvement!

The Non-Zero Quaternions as a Lie Group

2024-02-12T00:00:00+00:00

Introduction

Quaternions are well-known to people working in robotics and aerospace. They (the unit quaternions, specifically) provide a smooth representation of attitude in using only four numbers, in contrast to rotation matrices that require 9 and Euler angles that are not smooth. In this post, I will explore the quaternions from a slightly different perspective: the quaternions (excluding zero) form a Lie group under multiplication. We will not restrict ourselves to the unit quaternions, instead exploring the full four-dimensional Lie group.

Basic group properties

Throughout this article, we will write a quaternion $q \in \mathbb{H}$ as $q = (r, u),$ where $r \in \mathbb{R}_{\neq 0}$ and $u \in \mathbb{R}^3$ represent the real and imaginary parts of $q$, respectively. The product is defined by

\[\begin{aligned} q_1 * q_2 &= (r_1, u_1) * (r_2, u_2) \\ &= (r_1 r_2 - u_1^\top u_2, \; r_1 u_2 + r_2 u_1 + u_1 \times u_2). \end{aligned}\]

The inverse of a quaternion is defined by

\[q^{-1} = (r^2 + \vert u \vert^2)^{-1} (r, -u).\]

And the group identity is given by $ e := (1, 0_3) $.

The quaternions act on themselves by conjugation. Specifically,

\[\begin{aligned} \mathrm{Cn}_{q_1}(q_2) &= q_1 * q_2 * q_1^{-1} \\ % ----- &= (r_1^2 + \vert u_1 \vert^2)^{-1} (r_1 r_2 - u_1^\top u_2, \; r_1 u_2 + r_2 u_1 + u_1 \times u_2) * (r_1, -u_1)\\ % ----- &= (r_1^2 + \vert u_1 \vert^2)^{-1} ((r_1 r_2 - u_1^\top u_2) r_1 + (r_1 u_2 + r_2 u_1 + u_1 \times u_2)^\top u_1, \\ &\hspace{1cm} r_1(r_1 u_2 + r_2 u_1 + u_1 \times u_2) - (r_1 r_2 - u_1^\top u_2)u_1 -(r_1 u_2 + r_2 u_1 + u_1 \times u_2) \times u_1 )\\ % ----- &= (r_1^2 + \vert u_1 \vert^2)^{-1} (r_1^2 r_2 - r_1 u_1^\top u_2 + r_1 u_2^\top u_1 + r_2 u_1^\top u_1 , \\ &\hspace{1cm} r_1^2 u_2 + r_1 r_2 u_1 + r_1 u_1 \times u_2 - r_1 r_2 u_1 + u_1 u_1^\top u_2 - r_1 u_2 \times u_1 - (u_1 \times u_2) \times u_1 )\\ % ----- &= (r_1^2 + \vert u_1 \vert^2)^{-1} (r_1^2 r_2 + r_2 u_1^\top u_1 , \\ &\hspace{1cm} r_1^2 u_2 + 2 r_1 u_1 \times u_2 + u_1 u_1^\top u_2 + u_1 \times (u_1 \times u_2) )\\ % ----- &= (r_2 , \; (r_1^2 + \vert u_1 \vert^2)^{-1}(r_1^2 u_2 + \vert u_1 \vert^2 u_2 + 2 r_1 u_1 \times u_2 + 2 u_1 \times (u_1 \times u_2)) )\\ % ----- &= (r_2 , \; u_2 + (2 r_1 u_1 \times u_2 + 2 u_1 \times (u_1 \times u_2))(r_1^2 + \vert u_1 \vert^2)^{-1}). \end{aligned}\]

Let us denote $ \vert q_1 \vert = \sqrt{r_1^2 + \vert u_1 \vert^2}$ and define $u_1^\times \in \mathbb{R}^{3\times 3}$ to be the `skew’ matrix such that $u_1^\times u_2 = u_1 \times u_2$. Then we end up with a nice and simple formula:

\[\mathrm{Cn}_{q_1}(q_2) = (r_2 , \; (I_3 + (2 r_1 u_1^\times + 2 (u_1^\times)^2 )\vert q_1 \vert^{-2})u_2 ).\]

The Quaternion Lie Algebra

There are many ways to think of the Lie algebra of a given Lie group. Since our main interest is computation, we will choose the way that is easiest to work with for computation. The Lie algebra $\mathfrak{h}$ of $\mathbb{H}$ can identified with the tangent space at the identity $e$. This definition is abstract, so we assign some coordinates. A Lie algebra element is described as $w^\vee := (s, v) \in \mathbb{R}^4$, where the $\vee$ operator is the map from the abstract Lie algebra to the coordinates in $\mathbb{R}^4$. Near the identity, quaternion group elements can be written as

\[q = e + t w, \quad (r,u) = (1+t s, t v),\]

for small values of $t \in \mathbb{R}$.

Exponential and Logarithm

The exponential relates the Lie algebra to the Lie group. We will use the `1-parameter subgroup’ definition here. Given a Lie algebra element $w^\vee = (s,v)$, the exponential $\exp(w)$ is defined as the solution to the initial value problem

\[q(0) = e, \quad \dot{q}(t) = q(t) * w,\]

at $t = 1$. Let us evaluate the differential equation to find

\[\begin{aligned} \dot{q} &= q * w \\ &:= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} (r,u) * (1+t s, t v) \\ &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} (r (1+ts) - t u^\top v, \; r t v + (1+ts) u + t u \times v) \\ (\dot{r}, \dot{u}) &= (r s - u^\top v, \; r v + s u + u \times v). \end{aligned}\]

This ODE is not straightforward to solve, unless we realise that this system is, in fact, linear! Writing $q$ as a vector in $\mathbb{R}^4$, we have

\[\begin{aligned} \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} \begin{pmatrix} r \\ u \end{pmatrix} &= \begin{pmatrix} s & - v^\top \\ v & s I_3 - v^\times \end{pmatrix} \begin{pmatrix} r \\ u \end{pmatrix} = \begin{pmatrix} 0 & - v^\top \\ v & - v^\times \end{pmatrix} \begin{pmatrix} r \\ u \end{pmatrix} + s \begin{pmatrix} r \\ u \end{pmatrix}. \end{aligned}\]

Since $s$ acts as a scaling factor, we can pull it out of the equation for now, and solve the problem without it. Specifically,

\[\left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} e^{-t s} q = e^{-t s} \dot{q} - s e^{-t s} q = A e^{-t s} q,\]

so if we solve the problem while ignoring $s$, we can add it back in at the end. To solve the ODE now, we only have to compute the matrix exponential

\[\begin{aligned} A &:= \begin{pmatrix} 0 & - v^\top \\ v & - v^\times \end{pmatrix} & \exp(A) &= \sum_{k=0}^\infty \frac{1}{k!} A^k. \end{aligned}\]

Examining the first nontrivial power of $A$ reveals that

\[\begin{aligned} A^2 &= \begin{pmatrix} 0 & - v^\top \\ v & - v^\times \end{pmatrix}^2 = \begin{pmatrix} -\vert v \vert^2 & 0_{1\times 3} \\ 0_{3\times 1} & (v^\times)^2 - v v^\top \end{pmatrix} = \begin{pmatrix} -\vert v \vert^2 & 0_{1\times 3} \\ 0_{3\times 1} & - \vert v \vert^2 I_3 \end{pmatrix} = - \vert v \vert^2 I_4. \end{aligned}\]

Substituting this into the exponential formula yields

\[\begin{aligned} \exp(A) &= \sum_{k=0}^\infty \frac{1}{k!} A^k \\ &= \sum_{k=0}^\infty \frac{1}{(2k)!} A^{2k} + \sum_{k=0}^\infty \frac{1}{(2k+1)!} A^{2k+1} \\ &= \sum_{k=0}^\infty \frac{1}{(2k)!} (- \vert v \vert^2 I_4)^k + \sum_{k=0}^\infty \frac{1}{(2k+1)!} (- \vert v \vert^2 I_4)^{k} A \\ &= \sum_{k=0}^\infty \frac{(-1)^k}{(2k)!} \vert v \vert^{2k} I_4+ \vert v \vert^{-1} \sum_{k=0}^\infty \frac{(-1)^k}{(2k+1)!} \vert v \vert^{2k+1} A \\ &= \cos(\vert v \vert) I_4 + \frac{\sin(\vert v \vert)}{\vert v \vert} A. \end{aligned}\]

Therefore, we have our final solution,

\[\begin{aligned} \exp(w) &= e^s \exp(A) \begin{pmatrix} 1 \\ 0_3 \end{pmatrix} \\ &= \left( \cos(\vert v \vert) I_4 + \frac{\sin(\vert v \vert)}{\vert v \vert} A \right) \begin{pmatrix} e^s \\ 0_3 \end{pmatrix} \\ &= \cos(\vert v \vert)\begin{pmatrix} e^s \\ 0_3 \end{pmatrix} + \frac{\sin(\vert v \vert)}{\vert v \vert} \begin{pmatrix} 0 & - v^\top \\ v & - v^\times \end{pmatrix} \begin{pmatrix} e^s \\ 0_3 \end{pmatrix} \\ &= \begin{pmatrix} e^s \cos(\vert v \vert) \\ e^s \sin(\vert v \vert) \frac{v}{\vert v \vert} \end{pmatrix}. \end{aligned}\]

Note that, if $\vert v \vert = 0$, then the whole computation simplifies and the solution is simply $\exp(w) = ( e^s, 0_3)$.

The logarithm is found by inverting this formula, although there may be multiple solutions for a given $q \in \mathbb{H}$. Suppose that $q = \exp(w)$. Then we wish to determine the components of $w = (s, v)$ in terms of $q = (r, u)$. We have

\[\begin{aligned} q &= \exp(w), \\ (r, u) &= (e^s \cos(\vert v \vert), e^s \sin(\vert v \vert) \frac{v}{\vert v \vert}). \end{aligned}\]

Immediately, we see that $e^s = r / \cos(\vert v \vert)$. Substituting this into the $u$-component,

\[\begin{aligned} u &= r \tan(\vert v \vert) \frac{v}{\vert v \vert}, \\ \frac{u}{\vert u \vert} \vert u \vert &= r \tan(\vert v \vert) \frac{v}{\vert v \vert}, \\ r^{-1} \vert u \vert \frac{u}{\vert u \vert} &= \tan(\vert v \vert) \frac{v}{\vert v \vert}, \\ v &= \frac{\arctan(r^{-1} \vert u \vert)}{\vert u \vert} u \end{aligned}\]

Rather than substitute this back into the formula for $e^s$, we observe that the norm of both sides of the original equation satisfies

\[\begin{aligned} \vert q \vert &= \vert \exp(w) \vert, \\ \sqrt{r^2 + \vert u \vert^2} &= e^{s} , \\ s &= \ln(\sqrt{r^2 + \vert u \vert^2}). \end{aligned}\]

In summary, we have thus found the logarithm to be

\[\begin{aligned} \log(q) &= \left( \frac{1}{2} \ln(r^2 + \vert u \vert^2), \; \frac{\arctan(r^{-1} \vert u \vert)}{\vert u \vert} u \right). \end{aligned}\]

Similarly to the exponential formula, we should note that, if $\vert u \vert = 0$, the formula simplifies to $\log(q) = (\ln(r), 0_3)$.

Adjoint Operators and Lie Bracket

The big and little Adjoint operators are another important aspect of the Quaternion Lie algebra. The `big’ Adjoint operator $\mathrm{Ad} : \mathbb{H} \times \mathfrak{h} \to \mathfrak{h}$ is defined by

\[\begin{aligned} \mathrm{Ad}_q (w) &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} \mathrm{Cn}_q(e + t w) \\ &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} (1+ t s , \; (I_3 + (2 r_1 u_1^\times + 2 (u_1^\times)^2 )\vert q_1 \vert^{-2})(t v) ) \\ &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} (s , \; (I_3 + (2 r_1 u_1^\times + 2 (u_1^\times)^2 )\vert q_1 \vert^{-2})v ). \end{aligned}\]

In matrix form,

\[\begin{aligned} \mathrm{Ad}_q \simeq \begin{pmatrix} 1 & 0_{1\times 3} \\ 0_{3\times 1} & I_3 + (2 r_1 u_1^\times + 2 (u_1^\times)^2 )\vert q_1 \vert^{-2} \end{pmatrix}. \end{aligned}\]

The `little’ adjoint operator $\mathrm{ad} : \mathfrak{h} \times \mathfrak{h} \to \mathfrak{h}$ is defined as the derivative of the big Adjoint operator,

\[\begin{aligned} \mathrm{ad}_{w_1} (w_2) &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} \mathrm{Ad}_{e+ t w_1}w_2 \\ &= \left. \frac{\mathrm{d}}{\mathrm{d} t} \right\vert_{t=0} (s_2 , \; (I_3 + (2 (1+t s_1) (t v_1)^\times + 2 ((t v_1)^\times)^2 )\vert e + t w_1 \vert^{-2})v_2 ) \\ &= (0 , \; 2 v_1^\times v_2 ). \end{aligned}\]

Once more, in matrix form,

\[\begin{aligned} \mathrm{ad}_w \simeq \begin{pmatrix} 0 & 0_{1\times 3} \\ 0_{3\times 1} & 2v^\times \end{pmatrix}. \end{aligned}\]

The Lie bracket is equivalent to the adjoint operator, in the sense that

\[\begin{aligned} \left[w_1, w_2\right] := \mathrm{ad}_{w_1}(w_2) = (0 , \; 2 v_1^\times v_2 ). \end{aligned}\]

Matrix Representation

The final topic of interest for computations is the matrix representation of $\mathbb{H}$. Matrix representations are rarely unique, but sometimes can be nice. The matrix representation we consider is $\rho : \mathbb{H} \to \mathbf{GL}(4)$, given by

\[\begin{aligned} \rho(q) := \begin{pmatrix} r& u_1& u_2& u_3 \\ -u_1& r& -u_3& u_2 \\ -u_2& u_3& r& -u_1 \\ -u_3&-u_2&u_1&r \\ \end{pmatrix}. \end{aligned}\]

The matrix representation of the Lie algebra $\mathfrak{h}$ is basically the same. Verifying that these are indeed representations is a messy and time-consuming computation. However, working out a matrix representation is very rewarding in that it provides a way to check all the other computations we have done. Specifically, we can check things like the inverse $\rho(q)^{-1} = \rho(q^{-1})$, the exponential $\mathrm{expm}(\mathrm{d}\rho(w)) = \rho(\exp(w))$, and the adjoint operators $\mathrm{Ad}_q w = \rho(q) \mathrm{d}\rho(w) \rho(q)^{-1}$.

Summary

I decided to write this post when I needed these formulas for the $n$th time, and I realised that deriving them every time I needed them was taking too long. I hope they are helpful to anyone else who reads them, and please let me know if you spot any mistakes!

The Unscented Transform

2022-10-11T00:00:00+00:00

Introduction

The unscented transform is a way to approximate the probability distribution of a variable that is normally distributed after it is put through a nonlinear function. Concretely, let’s suppose that $x \sim N(\mu, \Sigma)$, and let $f : \mathbb{R}^n \to \mathbb{R}^m$ be some nonlinear function. Then we might ask: how can we approximate the distribution of $f(x)$ by another normal distribution?

The ‘traditional’ approach is through linearisation. We know that if $f$ is a linear or affine mapping, $f(x) = A x + b$, then the distribution of $f(x)$ is given by

\[Ax+b \sim N(A\mu + b, A\Sigma A^\top)\]

So one way to approximate the distribution of $f(x)$ for a nonlinear $f$ is to simply say

\[f(x) \approx f(\mu) + D f(\mu)[x-\mu] \sim N(f(\mu), D f(\mu) \Sigma D f(\mu)^\top),\]

where $Df(\mu)$ is the differential (Jacobian) of $f$ at $\mu$. The problem is that this may not be a very good approximation, so what else can we do?

Sigma Points

What if, instead of trying to approximate $f$ and then applying it to the distribution, we approximate the distribution and then apply $f$? This is the idea of the unscented transform. But how can we approximate the distribution? The answer is ‘sigma points’. A set of sigma points $S$ for the distribution $N(\mu, \Sigma)$ consists of points $x^{(i)}$ and weights $w^{(i)}$ so that

$\sum_i w^{(i)} = 1$,
$\sum_i w^{(i)} x^{(i)} = \mu$,
$\sum_i w^{(i)} (x^{(i)} - \mu)(x^{(i)} - \mu)^\top = \Sigma$.

This approximates the original distribution if you interpret it as a probability function. Let $\Omega = { x^{(i)} \mid i=0,1,…,p }$, define a probability function $p: \Omega \to \mathbb{R}$ by $p(x^{(i)}) = w^{(i)}$, and let $x$ be a random variable distributed according to $p$. Then,

$\sum_{x \in \Omega} p(x) = \sum_i w^{(i)} = 1$,
$\mathbb{E}[x] = \sum_i w^{(i)} x^{(i)} = \mu$,
$\mathrm{cov}(x,x) = \mathbb{E}\left[ (x - \mathbb{E}[x])(x - \mathbb{E}[x])^\top \right] = \sum_i w^{(i)} (x^{(i)} - \mu)(x^{(i)} - \mu)^\top = \Sigma$.

Now, this is only possible if there are at least $n+1$ points, but there is no unique choice of sigma points. However, the original choice by Uhlmann gives us a ‘canonical’ choice; for $i = 1,…, 2n$ define

\[\begin{aligned} w^{(i)} &= \frac{1}{2n}, & x^{(i)} &= \mu + \begin{cases} (\sqrt{n \Sigma})_i & \text{if } i \leq n \\ -(\sqrt{n \Sigma})_i & \text{if } i > n \end{cases} \end{aligned}\]

Here $(\sqrt{n\Sigma})_i$ denotes the $i$th column of the matrix square-root of $\Sigma$ multiplied by $n$. Specifically, if $A = \sqrt{n \Sigma}$, then $A A^\top = n \Sigma$.

Unscented Transform

Now that we have a set of sigma points that approximate the original distribution, we need to apply the nonlinear function $f$. This time, instead of approximating $f$, we simply apply it to our sigma points and gather statistics at the end. Let $x$ be a random variable distributed according to the sigma points. Then we compute the mean and covariance of $f(x)$ as follows.

\[\begin{aligned} \hat{\eta} &:= \mathbb{E} \left[ f(x) \right], \\ &= \sum_i w^{(i)} f(x^{(i)}), \\ %----------------------------- \hat{\Sigma} &:= \mathbb{E} \left[ (f(x) - \mathbb{E} \left[ f(x) \right])(f(x) - \mathbb{E} \left[ f(x) \right])^\top \right], \\ &= \mathbb{E} \left[ (f(x) - \hat{\eta})(f(x) - \hat{\eta})^\top \right], \\ &= \sum_i w^{(i)} (f(x^{(i)}) - \hat{\eta})(f(x^{(i)}) - \hat{\eta})^\top. \end{aligned}\]

This is how we now approximate $f(x)$: we say that, approximately, $f(x) \sim N(\hat{\eta}, \hat{\Sigma})$. This is quite different from the linearisation approach, but is it any better? That depends on the function $f$, the choice of sigma points, and also on what you mean by “better”.

Analysis

How does the unscented transform compare to the true distribution of $f(x)$? Let $x \sim N(\mu, \Sigma)$, and let $s = x - \mu \sim N(0, \Sigma)$. Then we can calculate the expected value of $f(x)$ by using a Taylor expansion,

\[\begin{aligned} \mathbb{E} \left[ f(x) \right] &= \mathbb{E} \left[ f(\mu + s) \right], \\ &= \mathbb{E} \left[ f(\mu) + Df(\mu)[s] + \frac{1}{2} D^2 f(\mu)[s,s] + \cdots \right], \\ &= \mathbb{E} [f(\mu)] + Df(\mu) \mathbb{E} [s] + \frac{1}{2} D^2 f(\mu) \mathbb{E}[s,s] + \cdots, \\ &= f(\mu) + Df(\mu) [0] + \frac{1}{2} D^2 f(\mu) \mathrm{cov}(s,s) + \cdots, \\ &= f(\mu) + \frac{1}{2} D^2 f(\mu) \Sigma + O(\mathbb{E}[\vert s \vert^3]). \end{aligned}\]

In other words, the expected value of $f(x)$ is determined by the mean and covariance of $x$ up to third order deviations from the mean. In particular, this means that the unscented transform gives the correct transformed distribution for all functions $f$ where $D^k f = 0$ for $k \geq 3$; that is, functions $f$ that are degree 2 polynomials. In fact, using the canonical choice of sigma points, this is true for polynomials of degree 3 as well.

What about the covariance obtained from the unscented transform? It turns out that this is equal to the covariance obtained from the unscented transform only when $f$ is a first-order (linear-affine) function. However, practical experience has shown that it can offer better performance in a range of applications.

Conclusion

The unscented transform provides a different way to propagate uncertainty through nonlinear functions than the “standard” approach of linearisation. Rather than approximate the function as a linear function, it approximates the probability distribution as a discrete probability function. This has the advantage that there is no need to compute the derivative of a function to linearise it, and propagates the mean of the function more accurately than the linearisation approach. In practice, a number of applications show that the unscented transform can outperform the linearisation approach. Overall, in my view it is a very interesting perspective on approximations to probability that is not widely enough understood.

The Lie-Theoretic Exponential is Not Surjective in General

2022-07-26T00:00:00+00:00

In order to test that mathjax is indeed working on my site, I thought I would write a short post about an example that shows the exponential map from a Lie algebra to a Lie group is not necessarily surjective. In the special orthogonal group $\mathbf{SO}(3)$ (one of the most commonly discussed Lie groups in robotics - the exponential is surjective, and it is even almost everywhere invertible). However, this is not true for every Lie group, and a classic example is $\mathbf{SL}(2)$.

The Lie group $\mathbf{SL}(2)$ is defined to be the set of $2 \times 2$ matrices with determinant 1. Its Lie algebra $\mathfrak{sl}(2)$ is exactly the set of $2 \times 2$ matrices with trace 0. Consider the matrix

\[H = \begin{pmatrix} -1 & 1 \\ 0 & -1 \end{pmatrix} \in \mathbf{SL}(2).\]

It is trivial to see that, indeed, the determinant of this matrix is 1. Now let $A \in \mathfrak{sl(2)}$. Since the trace of $A$ is 0, we may write

\[A = \begin{pmatrix} a & b \\ c & -a \end{pmatrix} \in \mathfrak{sl}(2).\]

The exponential of $A$ is given by the power series

\[\begin{aligned} \exp(A) &= \sum_{k=0}^\infty \frac{1}{k!} A^k. \end{aligned}\]

In order to calculate this, let’s compute some powers of $A$.

\[\begin{aligned} A^2 &= \begin{pmatrix} a & b \\ c & -a \end{pmatrix} \begin{pmatrix} a & b \\ c & -a \end{pmatrix} = \begin{pmatrix} a^2 + bc & ab - ba \\ c a -a c & a^2 + bc \end{pmatrix} = \begin{pmatrix} a^2 + bc & 0 \\ 0 & a^2 + bc \end{pmatrix}, \\ A^3 &= \begin{pmatrix} a & b \\ c & -a \end{pmatrix} \begin{pmatrix} a^2 + bc & 0 \\ 0 & a^2 + bc \end{pmatrix} = \begin{pmatrix} a^3 + a b c & a^2 b + b^2 c \\ a^2 c + b c^2 & -a^3 - a bc \end{pmatrix}, \\ A^4 &= \begin{pmatrix} a & b \\ c & -a \end{pmatrix} \begin{pmatrix} a^3 + a b c & a^2 b + b^2 c \\ a^2 c + b c^2 & -a^3 - a bc \end{pmatrix} = \begin{pmatrix} a^4 + 2 a^2 bc + b^2 c^2 & 0 \\ 0 & a^4 + 2 a^2 bc + b^2 c^2 \end{pmatrix}. \end{aligned}\]

The emerging pattern is a consequence of $A^2 = (a^2 + bc) I_2$. In fact, it follows that, for any $k \in \mathbb{N}$,

\[A^{2k} = (a^2 + bc)^k I_2, \quad A^{2k+1} = (a^2 + bc)^k A.\]

Using this insight, let us return to computing the exponential. We have

\[\begin{aligned} \exp(A) &= \sum_{k=0}^\infty \frac{1}{k!} A^k, \\ &= \left( \sum_{k=0}^\infty \frac{1}{(2k)!} A^{2k} \right) + \left( \sum_{k=0}^\infty \frac{1}{(2k+1)!} A^{2k+1} \right), \\ &= \left( \sum_{k=0}^\infty \frac{1}{(2k)!} (a^2 + bc)^k I_2 \right) + \left( \sum_{k=0}^\infty \frac{1}{(2k+1)!} (a^2 + bc)^k A \right), \\ &= \left( \sum_{k=0}^\infty \frac{(\sqrt{a^2 + bc})^{2k}}{(2k)!} I_2 \right) + \left( \sum_{k=0}^\infty \frac{(\sqrt{a^2 + bc})^{2k}}{(2k+1)!} A \right), \\ &= \cosh(\sqrt{a^2 + bc}) I_2 +\sinh(\sqrt{a^2 + bc}) A. \end{aligned}\]

Isn’t that interesting! If you’re not sure where the hyperbolic sin and cos functions came from, have a look at their series expansions. From here, let $\theta = \sqrt{a^2 + bc}$. Then,

\[\begin{aligned} \exp(A) &= \cosh(\theta) I_2 +\sinh(\theta) A, \\ &= \begin{pmatrix} \cosh(\theta) & 0 \\ 0 & \cosh(\theta) \end{pmatrix} + \begin{pmatrix} \sinh(\theta) a & \sinh(\theta) b \\ \sinh(\theta) c & -\sinh(\theta) a \end{pmatrix} , \\ &= \begin{pmatrix} \cosh(\theta) + \sinh(\theta) a & \sinh(\theta) b \\ \sinh(\theta) c & \cosh(\theta)-\sinh(\theta) a \end{pmatrix}. \end{aligned}\]

In order for this to be the matrix $H$ we considered at the start, we require that

\[\cosh(\theta) + \sinh(\theta) a = \cosh(\theta) - \sinh(\theta) a = -1\]

Therefore either $\sinh(\theta) = 0$ or $a = 0$. The first case is impossible because we also require that $ \sinh(\theta) b = 1 $ for the top-right cell, so this cannot be. In the second case $a = 0$, but looking at the bottom-left cell $ \sinh(\theta) c = 0$ so then $c = 0$ as well. But if $c = 0$ and $a = 0$, then $\theta = \sqrt{0^2 + 0b} = 0$, and we have $\sinh(\theta) = 0$ again! Therefore, neither case is possible, and we arrive at a contradiction: there is no choice of $a,b,c \in \mathbb{R}$ such that $\exp(A) = H$.

Conclusion

This example shows that the exponential map is not surjective for any Lie group. While this seems like a minor detail and a bit of an artificial problem, in my opinion examples like this are important for building intuition around mathematical structures, and often lead to better insight and intuition down the track.

It also looks like my mathjax is working!

EqVIO Paper and Code

2022-05-05T00:00:00+00:00

Our latest research on equivariant filtering for VIO is now publicly available on arxiv. The code used in the paper is also available on github under the GNU GPLv3 license. This paper is an extension and improvement over our previous work published in ICRA, and we have decided to call the resulting VIO system “EqVIO”. Have a look at the following videos to see the system working on V2_03_difficult from the EuRoC dataset and indoor_forward_7 from the UZH FPV dataset.

Abstract

Visual Inertial Odometry (VIO) is the problem of estimating a robot’s trajectory by combining information from an inertial measurement unit (IMU) and a camera, and is of great interest to the robotics community. This paper develops a novel Lie group symmetry for the VIO problem and applies the recently proposed equivariant filter. The symmetry is shown to be compatible with the invariance of the VIO reference frame, lead to exact linearisation of bias-free IMU dynamics, and provide equivariance of the visual measurement function. As a result, the equivariant filter (EqF) based on this Lie group is a consistent estimator for VIO with lower linearisation error in the propagation of state dynamics and a higher order equivariant output approximation than standard formulations. Experimental results on the popular EuRoC and UZH-FPV datasets demonstrate that the proposed system outperforms other state-of-the-art VIO algorithms in terms of both speed and accuracy.

Presentation for CDC 2020

2021-05-07T00:00:00+00:00

My presentation from the CDC 2020 is now available on Youtube! It is a 12 minute presentation on our paper: Equivariant Filter (EqF): A General Filter Design for Systems on Homogeneous Spaces. The paper itself is quite technical and uses a lot of advanced mathematical tools, but I tried to make the presentation a bit more accessible by including a lot of figures.

Abstract

The kinematics of many mechanical systems encountered in robotics and other fields, such as single-bearing attitude estimation and SLAM, are naturally posed on homogeneous spaces: That is, their state lies in a smooth manifold equipped with a transitive Lie-group symmetry. This paper shows that any system posed in a homogeneous space can be extended to a larger system that is equivariant under a symmetry action. The equivariant structure of the system is exploited to propose a novel new filter, the Equivariant Filter (EqF), based on linearisation of global error dynamics derived from the symmetry action. The EqF is applied to an example of estimating the positions of stationary landmarks relative to a moving monocular camera that is intractable for previously proposed symmetry based filter design methodologies.

Ardupilot Conference Presentation: EqF VIO

2021-03-17T05:38:21+00:00

Prof. Mahony and I gave a talk at the ardupilot developer conference, discussing the Equivariant Filter for Visual Inertial Odometry. In the first half of the talk (until around 33min), Rob covers some of the theory and geometric concepts that are used to develop the filter. In the second half, I describe the technical challenges in taking our research code, and applying it to real-world data from an outdoor flight at ANU’s spring valley field robotics facility.

I would like to thank Tridge and the other ardupilot developers for the opportunity to give this talk. I would also like to thank the ardupilot community for their friendly engagement with us during the questions and afterwards via private emails.

The code used to generate the results shown during the presentation is based on research to be published in ICRA 2021 (preprint available on arxiv. The research version of the code is available at eqf_vio. We are actively working on making improved code available to the ardupilot community.

A New Paper Accepted at IFAC

2020-06-01T00:00:00+00:00

Our new paper “An Observer Design for Visual Simultaneous Localisation and Mapping with Output Equivariance” was accepted for publication at IFAC 2020 in February this year. The preprint version is now on arxiv. Thank you to my co-authors and supervisors Robert Mahony, Tarek Hamel, and Jochen Trumpf.

GIFT: General Invariant Feature Tracker

2019-09-26T00:00:00+00:00

Recently, I released the code for my feature tracking library, GIFT. GIFT lets you specify camera parameters, and then it will detect and track features between subsequent images taken by that camera. All the boilerplate code associated with feature tracking and providing geometric descriptions of image points is taken care of. For example, a GIFT landmark contains the pixel coordinates (corresponding to the image plane), the normalised coordinates, and the spherical coordinates of a given image point. I hope GIFT will be useful to others who need to work with tracked features. An example of the output from GIFT is now available on youtube.

Please check out the code on github!