parsing and unpacking python3 serial data containing double backslashes
Last updated: Jan 24, 2023
I lost a day of my life figuring out how to parse serial data sent as bytes from the BBC Microbit using micropython. The problem is that the data byte string appears with double backslash characters instead of single backslashes when read in over a serial interface. Actual data:
b'ST\\x00\\x00\\x00\\xe0\\xeaE\\x00\\x00HB\\x00\\x00\\xc3\\x00\\x00\\x10C\\x00\\x00t\\xc4EN'
What I wanted as data:
b'ST\x00\x00\x00\xe0\xeaE\x00\x00HB\x00\x00\xc3\x00\x00\x10C\x00\x00t\xc4EN'
So how to convert from one misformed byte string to the clean one that python 3 would use? I really went around in circles on this one. In the end I used a kludge. But it works. My life can now move on. I convert the double slash byte to a string. Then I use the replace method to replace ‘\\’ with ‘\’. Then I use the literal_eval function to recast it as a byte. I am open to suggestions for a cleaner way of doing this! Here’s some example code I used in a jupyter notebook session. test2 is the misformed byte string received over the serial interface and test3 is the cleaned byte that I can now unpack and extract the data from.
from struct import *
from ast import literal_eval
PACKER = ('2s5f2s')
test2=b'ST\\x00\\x00\\x00\\xe0\\xeaE\\x00\\x00HB\\x00\\x00\\xc3\\x00\\x00\\x10C\\x00\\x00t\\xc4EN'
test3 = str(test2)
test3 = test3.replace('\\\\', '\\')
print('{}'.format(test3))
test3 = literal_eval(test3)
print(test3)
print(unpack(PACKER,test3)) ```
output:
b’ST\x00\x00\x00\xe0\xeaE\x00\x00HB\x00\x00\xc3\x00\x00\x10C\x00\x00t\xc4EN’ b’ST\x00\x00\x00\xe0\xeaE\x00\x00HB\x00\x00\xc3\x00\x00\x10C\x00\x00t\xc4EN’ (b’ST’, 7516.0, 50.0, -224.0, 144.0, -976.0, b’EN')
The data was produced from reading the accelerometer on a BBC Microbit board then using
struct.pack(PACKER,scan).
I am programming the boards using micropython. The data is packed using the packer format:
PACKER = (‘2s5f2s’)
The transmitted scan is constructed using:
values = (START, counter, DELTA, x, y, z, END) scan = struct.pack(packer, *values)
Where values contains a START and END string ('ST' and 'EN' respectively), a constant called DELTA which represents the time in between samples and the x, y and z readings from the accelerometer. So PACKER means '2 characters followed by 5 floats followed by 2 characters'. I was being obstinate in sending bytes over the serial interface instead of a string. Why use bytes and not just send a text string? Using the pack and unpack enforces a structure to the data packets and reduces the amount of data needed to be transmitted compared with a string. Consider a number '2048' sent using the packer function. This is coded as an 'f' meaning a float. This is 2 bytes long. Sending '2048' as a string would require 4 bytes, one for each of '2', '0', '4' and '8'. If I encode the string 'ST 7516.0 50.0 -224.0 144.0 -976.0 EN' using packer '2s5f2s', the message is 26 bytes. If I send it as a string, it will be 37 bytes. Please see the example code and its output below.
from struct import * PACKER = (‘2s5f2s’) test = ‘ST 7516.0 50.0 -224.0 144.0 -976.0 EN’ test2 = (b’ST’,7516.0,50.0,-224.0,144.0,-976.0,b’EN’) print(‘string length: {}’.format(len(test))) packed_data = pack(PACKER,*test2) print(‘packed length: {}’.format(len(packed_data))) print(‘unpacked data: {}’.format(unpack(PACKER,packed_data)))
output:
string length: 37 packed length: 26 unpacked data: (b’ST’, 7516.0, 50.0, -224.0, 144.0, -976.0, b’EN’)
The second reason for using pack and unpack for data packed transmission over sending a stream is that this enforces error checking. If the data is corrupted while reading from the sensor, then an error will be raised during the pack process at the transmitter end. If the data packet is corrupted during transmission, an error will be raised during the unpack process at the receiving end. This can be caught using a try-except clause.