EWMA filter example using pandas and python

This article gives an example of how to use an exponentially weighted moving average filter to remove noise from a data set using the pandas library in python 3. I am writing this as the syntax for the library function has changed. The syntax I had been using is shown in Connor Johnoson's well explained example here.
I will give some example code, plot the data sets then explain the code. The pandas documentation for this function is here. Like a lot of pandas documentation it is thorough, but could do with some more worked examples. I hope this article will plug some of that gap.
Here's the example code:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
ewma = pd.Series.ewm

x = np.linspace(0, 2 * np.pi, 100)
y = 2 * np.sin(x) + 0.1 * np.random.normal(x)
df = pd.Series(y)
# take EWMA in both directions then average them
fwd = ewma(df,span=10).mean() # take EWMA in fwd direction
bwd = ewma(df[::-1],span=10).mean() # take EWMA in bwd direction
filtered = np.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd together
filtered = np.mean(filtered, axis=0 ) # average
plt.title('filtered and raw data')
plt.plot(y, color = 'orange')
plt.plot(filtered, color='green')
plt.plot(fwd, color='red')
plt.plot(bwd, color='blue')

This produces the following plot. Orange line = noisy data set. Blue line = backwards filtered EWMA data set. Red line = forwards filtered EWMA data set. Green line = sum and average of the two EWMA data sets. This is the final filtered output.

EWMA fiiltered and raw data.

Let's look at the example code. After importing the libraries I will need in lines 1-5, I create some example data. Line 6 creates 100 x values with values spaced evenly from 0 to 2 * pi. Line 7 creates 100 y-values from these 100 x-values. Each y value = 2*sin(x)+some noise. The noise is generated using the np.random.normal function. This noisy sine function is plotted in line 15 and can be seen as the jagged orange line on the plot.
Forwards and backwards EWMA filtered data sets are created in lines 10 and 11.
Line 10 starts with the first x-sample and the corresponding y-sample and works forwards and creates an EWMA filtered data set called fwd. This is plotted in line 17 as the red line.
Line 11 starts at the opposite end of the data set and works backwards to the first - this is the backwards EWMA filtered set, called bwd. This is plotted in line 18 as the blue line.
These two EWMA filtered data sets are added and averaged in lines 12-13. This data set is called filtered. This data set is plotted in line 16 as the green line.
If you look at the ewma functions in line 10 and 11, there is a parameter called span. This controls the width of the filter. The lag of the backwards EWMA data behind the final averaged filtered output is equal to this value. Similarly the forward EWMA data set has an offset forwards of the noisy data set equal to this value. Increasing the span increases the smoothing and the lag. Increasing the value will also reduce the peaks of the filtered data in relation to the unfiltered data. You need to try out different values.
My present application for this filter is removing jitter from accelerometer data. I have also used this filter to smooth signals from hydrophones.

One minus alpha filter

I've got some real time accelerometer and gyroscope data coming in on my project to recognise hand gestures here. Naturally, I would like to be able to remove jitter and noise from the data as painlessly as possible. So we are into the world of real time digital filtering. Many books have been written on this subject and it is easy to dive 'down the rabbit hole' and lose a lot of your life testing filters. I want something that is 'good enough' quickly. From this stackoverflow answer, I got the idea for a simple implementation of a moving average filter:
x' ←x'+α(x−x')

where α<1

The first term, x', is the new filtered value, calculated by taking the last filtered value and adding α.(last measured value-last filtered value).
So how to simulate and implement this?


First of all, I will simulate the idea. Filters are implemented by using convolution. The input data is convolved with a filter operator. So I tried out a filter operator:

(1-α, α)

Here's a simple script I ran using a Jupyter notebook and Python 3. I lifted the base code from the matplotlib examples page. I generated a sine wave and add some random noise in line 6. The filter function is defined in line 5, I am using alpha = 0.5 for this example. The function np.convolve in line 7 implements the filter. I had to knock off the last element of the filtered and difference data to get them all to plot, as one of the characteristics of a filter is that it will elongate the data set. Really, you need to 'pad' a data set at each end before applying convolution to remove 'edge effects' of the filter. But we are looking to quickly test and implement a filter here, not get bogged down in the technical minutia of filter design. Rabbit hole. Avoid.

import numpy as np
import matplotlib.pyplot as plt
ALPHA = 0.5
x = np.linspace(0, 2 * np.pi, 100)
filter = (1-ALPHA,ALPHA*1)
y = 2 * np.sin(x) + 0.1 * np.random.normal(x)
y_filt = np.convolve(y, filter)
y_diff = y - y_filt[:-1]


fig, (ax0, ax1, ax2) = plt.subplots(nrows=3)

ax0.plot(x, y)

ax1.plot(x, y_filt[:-1])

ax2.plot(x, y_diff)

# Hide the right and top spines
# Only show ticks on the left and bottom spines

# Tweak spacing between subplots to prevent labels from overlapping


For an input, filtered output and difference plot, see below. Note that the difference plot is on a different scaling to the input and filtered output. Looks to have the same amplitude and that some of the random noise has been removed. It is not perfect, but it has helped and was fast and easy to implement.


I am using micropython v1.7 on a pyboard v1.0 with an mpu6050 accelerometer/gyroscope for my hardware platform - see the diagram below. So how hard could it be to implement a simple one point filter? Errrr....

The filter code is straightforwards, see the snippet below. The function filter takes the latest sensor value as new_value and uses the last filtered value as old_value, returning the latest filtered value. I am using ALPHA as 0.5 for this test.

def filter(self, old_value, new_value):
''' simple moving average filter '''
return (new_value*ALPHA + (1-ALPHA)*old_value)

This function is called from the main sensor scan and process while loop for each of the x,y and z accelerometers, shown in the snippet below.

if (self.run_flag):
self.counter+= 1
(delta, x, y, z) = self.read_acc()
x_acc = x
x = self.filter(old_x, x)
y = self.filter(old_y, y)
z = self.filter(old_z, z)
print(START, self.counter, delta, x_acc, x, x-x_acc, END)
old_x, old_y, old_z = x, y, z

Have a look at the plot below. This shows the x-axis from an imu6050 module being sampled through a pyboard v1.0. at 100Hz. I wrote the firmware for this board using micropython v1.7 and the display software using python 3.4 with the pyqtgraph library. The x scale shows samples, the y scale shows acceleration in g.

So what can we see? The raw data looks jittery, the filtered data looks smoother and we can see the jitter that has been taken out in the difference plot. To characterise this filter properly I would need to start looking at the frequency spectrum of the raw and filtered data. But this is heading down the rabbit hole again.
I've quickly implemented a filter that looks to be doing what I want it to - removing noise from data. I can play with the alpha value to change the amount of smoothing. 'The proof is in the eating'. If I can get my gesture recognition system to work with this simple filter implemented, then it is good enough.