Wednesday, May 21, 2014

Quadcopter Stability and Neural Networks

This is a fairly long post, so here's a TLDR: Quadcopters use an algorithm called a PID controller to not fall over in the air, but a more complicated algorithm involving neural networks may be just as stable if not more.

A few times now, I've mentioned my desire to write a post about the hexacopter I build long ago. The project started almost two years ago and never really finished. The goal was to not just build a hexacopter for my own personal use, but to build and program my own flight controller from scratch.

Building a hexacopter or quadcopter using a standard pre-programmed flight controller is not too hard. You need a frame, a battery, motors, propellers, speed controllers (motor drivers), a flight controller, and other miscellaneous electronics (radio, gps, etc). The flight controller tells the speed controllers how fast each motor and propeller should be spinning. It monitors the orientation of the craft using an accelerometer and gyroscope and continually adjusts the motor speeds to keep the platform stable. If for some reason the copter tips to the left, the flight controller needs to bump up the thrust on the left side to rotate the body back to a level position. The flight controller can also intentionally tip the body in some direction in response to commands received from the radio link.

Now, why have I used the word quadcopter (four propellers) in the title of this post instead of hexacopter (six propellers)? Because in terms of the flight controller, there is little difference. Sure, the process of assembling a hexacopter is different than that of a quadcopter, but only in that the frame is a little different and there are two more of a couple components. Quadcopter is a generally more recognized term, so I'll stick with what from now on.

1 - A Problem with a Solution

So what's the most important part about making your own flight controller? Giving it the tools to keep the quadcopter flying. Handling radio commands, GPS coordinates, data logging, and pathfinding are really all extraneous tasks compared to the ability to stay in the air without flipping over and nose-diving into the ground. Any amount of force from wind or otherwise will cause the quadcopter to tip in a random direction. It's up the flight controller to detect these forces and compensate for them by adjusting how fast each propeller turns.

To simplify my discussion on how a quadcopter can stabilize itself, I'll reduce the problem some. Imagine a solid rod with a motor and propeller pointed up at one end, and an unmoveable hinge at the other end. Gravity pulls the arm down, and the thrust produced by the motor and propeller pushes the arm up. A sensor can measure the current angle of the arm as well as a few other related quantities. The goal is to develop an algorithm that will cause the arm to spend most of it's time perfectly horizontal. The only way to influence the position of the arm is to adjust the motor speed, so you can think of the algorithm as having a single output.

This situation has many similarities to the problem of quadcopter stability, but also a few key differences. The primary difference is that the arm can only rotate itself around the hinge in one direction, while a quadcopter can leverage motors on opposite sides of the body to have controlled rotation in every direction. To make this simple problem more applicable, I'll allow the motor to spin either way, providing thrust either up or down.

Instead of attempting to derive an optimal algorithm to achieve this goal based on the underlying physics, I'll go ahead and jump to a well-accepted solution: the PID controller. This is an algorithm that takes sensor measurements of the arm and returns a motorspeed. This process of turning sensor measurements into motor speed is typically done a few hundred times per second. How does a PID controller work? The best way to explain it is to explain the name:

        P = Proportional
    Adjust the motor speed proportionally to how far away the arm is to the target position. The farther the arm is below level, the higher the motor speed will be set to. If the arm is perfectly level, the proportional term has nothing to add.
        I = Integral
    Adjust the motor speed to account for any systematic difference between the arm and the target. If the arm tends to droop below the target position, increase the motor speed. The quantity used to determine this is the integrated difference between the arm position and the target position.
        D = Derivative
    If the arm is rapidly approaching the target, slow it down to avoid overshooting. The derivative of the difference between the arm position and the target position is used here.

These three components of the algorithm are computed at every time step and added together to come up with the 'optimal' motor speed that will get the arm to the right position. Each component has a tuning parameter that can increase or decrease their relative importance. Given proper tuning parameters, the PID controller can be a very effective method of stabilization.

Why does the PID controller work? How do these three components make sense? To answer these questions, we start by describing the stability problem with math. The governing equation for the arm rotating about the base hinge is Newton's Second Law for Rotation:

\[ \sum_i \mathscr{T}_i = I \ddot{\theta} \]

where $\mathscr{T}_i$ is a torque exerted on the arm about the hinge, $I$ is the moment of inertia (has to do with the mass of the arm and it's size), $\theta$ is the position/angle of the arm, and the double dots indicate the second derivative with respect to time. The left hand side is a sum over many torques because there will be multiple forces at work. In this language, our goal is to have $\theta = 0$, and we can apply a torque $\mathscr{T}$ using the motor and propeller to get us there. Gravity and wind can also apply torques, throwing the arm away from the solution we want.

The first torque to include is gravity, which exerts itself the most when the arm is horizontal:

\[ \mathscr{T}_{grav} = -g m L \mathrm{cos} \theta \]

Here, we approximate the arm and motor as a mass $m$ at a length $L$ away from the hinge. I could elaborate this model and go back to express the moment of inertia $I$ in terms of the mass and length, but it adds little to the discussion. The next source of torque is that exerted by the motor and propeller. If we use the PID controller, the torque applied is a function of the arm angle along with the derivative and the integral:

\[ \mathscr{T}_{PID} = f(\theta, \dot{\theta}, \int \! \theta dt) = P \theta + I \int_0^{\infty} \! \theta dt + D \dot{\theta} \]

Combining these pieces together, we get an equation describing the battle between gravity and the PID controller to stabilize the arm:

\[ I \ddot{\theta} = -g m L \mathrm{cos} \theta + P \theta + I \int_0^{\infty} \! \theta dt + D \dot{\theta} \]

The $\mathrm{cos} \theta$ term makes things a little complicated, but if we assume the arm won't deviate very far from horizontal, we can approximate this term as constant. Rearranging terms and collecting constants for readability, we end up with a nice textbook-looking 2nd order inhomogeneous integro-differential equation:

\[ a \ddot{\theta} + b \dot{\theta} + c \theta + d \int_0^{\infty} \! \theta dt = e \]

If there's one thing I've learned in my many years of schooling about integro-differential equations (and there was in fact only one thing I learned), it's that they are a pain and should be solved numerically. But before we give up on mathematical beauty and resort to number crunching, we can gain a bit of intuition for how the system acts under various conditions.

If we turned off the PID controller completely, we would end up with a very simple equation to solve. Unfortunately, this solution involves the arm rotating a significant ways away from horizontal, which breaks our earlier assumption. In that case, the equation we would have solved would no longer be valid. Instead, we will start by turning off the I and D components of the PID controller. With that, we are left with:

\[ a \ddot{\theta} + c \theta = e \]

This is a simple inhomogenous second order differential equation that has a correspondingly simple solution:

\[ \theta(t) = A \mathrm{cos}(\sqrt{c/a} t + \tau) + e/c \]

This is a simple harmonic oscillator. Depending on the initial conditions, the arm will bounce endlessly around some angle a little below horizontal. Plotting the arm angle as a function of time for some reasonable values of the various coefficients, we can see that this solution is not exactly optimal:

The next step in understanding the full PID solution is to use the P and D terms, but keep the I term off. This produces a similarly simple equation that can be solved using standard freshman-level differential equation solving methods:

\[ a \ddot{\theta} + b \dot{\theta} + c \theta = e \]

\[ \theta(t) = A e^{C_- t} + B e^{-C_+ t} + e/c \]
\[ C_{\pm} = \frac{\sqrt{b^2 - 4 a c} \pm b}{2 a} \]

While this solution might seem much more complicated than the previous, it is primarily because I have decided to express what could be a simple sine wave as a complex exponential. Plotting this solution for a particular set of initial conditions demonstrates the character:

Including the D term in the PID controller has helped damp out the oscillations from the previous example. A lot is known about damped oscillators, including the fact that certain values of the PID coefficients P and D will cause the system to be 'critically damped'. This means that the system will achieve a steady state as fast as possible. Below is a plot showing three different cases: under-damped, critically damped, and over-damped.

This version of the algorithm does a decent job at stabilizing the arm, but still tends to settle on an angle slightly below horizontal. To fix this, we have to turn back on the I component of the PID controller. The full integro-differential equation can be solved for certain values of the coefficients, but it's difficult to gain a fundamental understanding of the system by looking at the solution. Instead, it's better to reason out what the final term does.

By looking at the previous solutions, we can see that even when the arm was stable, it would settle at an angle below horizontal (the solid black line). If we were to sum up how far negative the arm was over time, the sum would continue to grow as long as the arm sat below horizontal. The purpose of the I component of the PID controller is to bump up the motor speed the more this sum of how far the arm is from horizontal. If this additional term happens to bring the arm back to horizontal, the sum of how far the arm has been below horizontal will stay at whatever value it was before the arm was restored to the proper position. This way, the I component describes an offset that only has to be determined once by looking at how the arm tends to settle. Comparing a properly tuned PID controller to the critically damped PD controller from before, we can see how the new component affects the solution.

Finding the values of the PID coefficients that give an optimal solution such as the one shown requires either knowing the mechanical properties of the system to a high precision, or a decent amount of guesswork. The usual method for finding the appropriate coefficients for a real world system is the latter. The procedure is similar to what I have done here: turn on one component of the PID controller at a time, tuning them independently until the solution is as good as possible without the next component.

2 - Solution to the Wrong Problem

To show how the PID controller is useful for stabilizing a system, I have had to simplify the problem and ignore various effects. This has allowed me to solve the governing equations and explain why the PID controller works, but my 'solution' is then only truly applicable to the idealized scenario. What happens to the PID controller when we add some complications, like wind or noise? Wind will act as a random torque on the arm in either direction at any time. The algorithm can't predict the wind, and it can't even measure the wind directly. All it can do it notice that the system has suddenly and unexpectedly gone off course and corrections need to be made. Any sensor used to measure the arm angle and inform the algorithm will have noise, a random variation between the true answer and the reported answer. How well can the PID controller handle incorrect information? Finally, My model has assumed that the algorithm can assimilate sensor measurements and adjust the thrust instantaneously. In the real world, there will be a delay between the arm moving and the sensor picking up on it, a delay due to the computation of the motor speed correction, and a delay between telling the motor to change speed and the propeller actually generating a different amount of force.

These three non-ideal effects (wind, noise, delay) are difficult to model mathematically. It's certainly possible to do so, but the amount of fundamental insight gained from such an analytic solution is limited. Instead, we can turn to a numerical simulation of our system. I've written a simple Javascript code that simulates an arm and motor just as I have describes in the equations above, but can also add wind, noise, and delay in varying amounts to the system. I've initialized the PID coefficients to a reasonably stable solution so you can press RUN and see what happens. Clicking the circles next to WIND, NOISE, and DELAY will increase the amount of each present in the system, and clicking the POKE button on the left will nudge the arm in a random direction to test the stability. Clicking the circles next to P, I, and D will toggle each component. The sliders next to them determine the coefficients, but will only apply to the system if the circle next to them is green. Pressing RESET will put the arm back at horizontal, but will keep the PID coefficients the same.

If you are feeling adventurous, try setting the PID coefficients randomly and then tuning the PID controller from stratch. If the arm just flails about randomly, press RESET to calm it down. The buttons next to PID and NN determine which controller is being used to stabilize the arm. I'll describe what the NN controller is doing in the next section.

Canvas not working!

3 - A Smarter Solution

The PID controller does a decent job at stabilizing the arm under certain conditions. If it gets pushed around too much by noise, gets too many flawed measurements, has too large of a delay, or simply ends up outside it's comfort zone, it tends to have trouble. Is there another option?

I, like many others, have recently finished up with Andrew Ng's Coursera course on Machine Learning. A significant fraction of my PhD research has utilized large nonlinear optimization codes, so I found most of the details of machine learning presented in the course to be pretty straightforward. The biggest concept I was able to take away from the course was that of an artificial neural network. There seems to be a lot of mystery and awe when talking about artificial neural networks in a casual setting, but I think this is largely due to the name. If I were to rename neural networks so something a little more down-to-earth, I would call them Arbitrary Nonlinear Function Approximators. Not nearly as magical sounding, but a little more to the point. But until I am the king of naming things, I'll call them neural networks.

What does a neural network have to do with stability? The PID controller was attempting to model the 'optimal' response to a set of inputs that would stabilize a rotating arm. We know that the PID controller is not a perfect solution, but it seems to have some ability to mimic a perfect solution in some cases. We might imagine that the truly optimal solution would be something far more complex than the PID controller, but we don't really have a way of knowing what that solution is. Not saying we can never know, it's just really hard to figure out. A neural network is a useful tool for approximating an unknown nonlinear function, as long as we have some examples of what the function looks like at various points.

A neural network works by taking a set of inputs, combining them back and forth in all kinds of nonlinear ways, and producing a final output. The way in which inputs are combined with each other can be varied incrementally, allowing the network to 'learn' how to appropriately combine them to match a specified output. Given enough examples of the function it needs to approximate, the neural network will eventually (and hopefully) converge on the right answer. This is of course a gross simplification of how a neural network works, but the point of this post is not to delve into the specifics of sigmoid functions and backpropagation. If you want to know more, use the internet! That's what I did.

So let's say a neural network could replace the PID controller. How do we train the network to know what the right answer is? The only example we have so far to learn from is the PID controller, but if that's the only information we give it, we might as well just use the PID controller and avoid the confusion the neural network will add. We need a way of letting the neural network controller (NNC) explore possibilities and recognize when it has come across a good part of the solution. To do this, we use reinforcement learning. Specifically, I've generally followed the procedure put forth in this paper by Hasselt and Wiering for using neural networks to perform reinforcement learning in a continuous action space. It was possible to do this with a discrete action space, but I wanted to generalize the problem for other projects.

With the knowledge that an artificial neural network can be used to approximate a complicated and unknown function given examples of the function for different inputs, how can we apply this to the problem of learning stability? The idea behind reinforcement learning is that you are training an 'actor' (the part of the algorithm that looks at inputs and decides on the optimal action) by giving it rewards or punishments. The reward is a single number that is used to communicate the idea of how good the previous action that the actor performed was. For example, you could reward the actor in this situation one point for every second that it keeps the arm near horizontal, but take away a hundred points if the arm falls down. The actor is continually building up an idea of the appropriate function that translates the current state of the arm into an optimal action through trial and error. At every step, the actor either picks the action it thinks is optimal based on it's current neural network or picks a random action close to the optimal guess. Since the neural network can only provide an approximate answer for the optimal action while learning is in progress, the randomness allows for exploration of new possibilities that may be better. After performing an action, an appropriate reward is returned. If this reward is better than what the actor anticipated getting for a given action, then the action taken is reinforced. There are a few other details to this method that I'm not mentioning, but if you are interested, start by reading up on Temporal Difference Learning.

Given enough attempts to find the optimal action to achieve stability, the artificial neural network controller will (hopefully) converge on a function that can take the same inputs given to the PID controller and output an optimal motor speed to keep the arm stable.

4 - Learning vs Knowing

I wrote up a simple code to perform reinforcement learning on a simulation of the arm using the FANN library to simplify my use of artificial neural networks. As inputs, I provided the same quantities that the PID controller uses (current angle, time derivative of angle, integral of angle) and had the desired motor speed as the only output. To initialize the NN controller, I preconditioned the actor by teaching it to approximate a linear combination of the inputs, resulting in a standard PID controller.

The following animation shows the progression of learning over many iterations. Since the actor starts without knowing anything about the appropriate action to take, it spends a lot of time completely failing to keep the arm anywhere near horizontal. As more simulations are run and the actor explores new possibilities, it picks up on the rewards reaped for keeping the arm in the right place and starts to make a concerted effort to keep the arm in line.

It's hard to quantify the performance of this new controller relative to the PID controller in a way that is simple to understand and appropriate for a post like this. Instead of presenting you, the reader, with a definitive answer for which controller is better, I am providing you with the ability to try out a neural network controller that I have trained. If you scroll back up to the PID controller app and click the NN button, you will switch the algorithm from PID to NN (neural network). See how well it can keep the arm horizontal under various conditions. The neural network controller isn't doing any learning in this app; it is only applying lessons it has been taught beforehand. I trained it on a simulation almost identical to the one in the app here, and ran it through about 10k learning iterations to settle on an answer. Running on a single core of my desktop, the training procedure took around 5 minutes.

As you may find, the new controller performs just as well as the PID controller at keeping the arm horizontal. When the PID controller is exposed to signal noise, it tends to amplify this noise in the output motor speed. The NN controller seems to react a little nicer to noise, allowing less noise on the final output. The NN controller also seems to be a little lighter on the trigger for correcting sudden jumps in arm angle. The PID controller will respond to a sudden change in state with a sudden change in output, while the NN controller seems to have a smoother response curve.

It's hard to determine exactly why the NN controller acts the way it does, other than to say it has found an optimal solution to the training data I provided. While there is some mystery around how to interpret the non-linear solution the controller has found, I hope this post has cleared up some of the mystery around how the controller operates and how it is able to 'learn'. In the end, it's all about twisting and turning a complicated function into something that best matches the training data.

So what does all of this mean for quadcopters? Should they all implement neural network stability algorithms with the ability to learn from current conditions? The process of learning is fairly computationally intense and can involve quite a few failures, so should they instead use pre-trained networks like in my app? I'm not sure. Neural networks are much more difficult to deal with than standard linear controllers. Not only are they harder to implement and tune, they are harder to understand. I believe that one of the greatest things about the growing quadcopter / UAV field is the lowering investment cost, both monetarily and intellectually. I'm not advocating for everyone to start replacing simple controllers that work pretty well with much more complicated ones that work slightly better, I would leave that to the researchers who are working out how best to implement such complicated systems. Instead, I advocate people using this field as a testbed for their own experiments. There are many levels of complexity to an unmanned vehicle, and a lot can be gained from picking one and thoroughly examining it.