在 Python 中通过正则表达式解析 GPS 接收器输出

Question

提问by crashsystems

I have a friend who is finishing up his masters degree in aerospace engineering. For his final project, he is on a small team tasked with writing a program for tracking weather balloons, rockets and satellites. The program receives input from a GPS device, does calculations with the data, and uses the results of those calculations to control a series of motors designed to orientate a directional communication antenna, so the balloon, rocket or satellite always stays in focus.

我有一个朋友正在完成他的航空航天工程硕士学位。在他的最后一个项目中，他所在的一个小团队负责编写跟踪气象气球、火箭和卫星的程序。该程序接收来自 GPS 设备的输入，使用数据进行计算，并使用这些计算结果来控制一系列旨在定向定向通信天线的电机，因此气球、火箭或卫星始终保持聚焦。

Though somewhat of a (eternal) beginner myself, I have more programming experience than my friend. So when he asked me for advice, I convinced him to write the program in Python, my language of choice.

虽然我自己有点（永远的）初学者，但我比我的朋友有更多的编程经验。所以当他向我征求意见时，我说服他用 Python 编写程序，我选择的语言。

At this point in the project, we are working on the code that parses the input from the GPS device. Here is some example input, with the data we need to extract in bold:

在项目的这一点上，我们正在处理解析来自 GPS 设备的输入的代码。这是一些示例输入，我们需要提取的数据以粗体显示：

$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F $GPRMC,093345.679,4234.7899,N,11344.2567,W,3,02,24.5,1000.23,M,,,0000*1F $GPRMC,044584.936,1276.5539,N,88734.1543,E,2,04,33.5,600.323,M,,,*00 $GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00 $GPRMC,066487.954,4572.0089,S,45572.3345,W,3,09,15.0,35000.00,M,,,*1F

$ GPRMC，092204.999，4250.5589，S，14718.5084，E，1,12,24.4，89.6，男,, 0000 * 1F $ GPRMC，093345.679，4234.7899，N，11344.2567，W，3,02,24.5，1000.23，男,,, 0000 * 1F $ GPRMC，044584.936，1276.5539，N，88734.1543，E，2,04,33.5，600.323，男,,, * 00 $ GPRMC，199304.973，3248.7780，N，11355.7832，W，1.06， 02.2, 25722.5,M,,,*00 $GPRMC,066487.954, 4572.0089,S,45572.3345,W,3,09,15.0, 35000.00,M,,,*1F

Here is some further explanation of the data:

以下是对数据的进一步解释：

"I looks like I'll need five things out of every line. And bear in mind that any one of these area's may be empty. Meaning there will be just two commas right next to each other. Such as ',,,' There are two fields that may be full at any time. Some of them only have two or three options that they may be but I don't think I should be counting on that."

“我看起来每行都需要五件事。请记住，这些区域中的任何一个都可能是空的。这意味着只有两个逗号彼此相邻。例如',,,'有是两个随时可能满员的字段。其中一些可能只有两三个选项，但我认为我不应该指望这一点。”

Two days ago my friend was able to acquire the full log from the GPS receiver used to track a recent weather balloon launch. The data is quite long, so I put it all in this pastebin.

两天前，我的朋友能够从用于跟踪最近一次气象气球发射的 GPS 接收器获取完整日志。数据很长，所以我把它都放在这个 pastebin 中。

I am still rather new with regular expressions myself, so I am looking for some assistance.

我自己对正则表达式还是比较陌生，所以我正在寻找一些帮助。

Answer 1

回答by Claudiu

splitting should do the trick. Here's a good way to extract the data, as well:

拆分应该可以解决问题。这也是一种提取数据的好方法：

>>> line = "$GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00"
>>> line = line.split(",")
>>> neededData = (float(line[2]), line[3], float(line[4]), line[5], float(line[9]))
>>> print neededData
(3248.7779999999998, 'N', 11355.7832, 'W', 25722.5)

Answer 2

回答by S.Lott

It's simpler to use split than a regex.

使用 split 比使用正则表达式更简单。

>>> line="$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F "
>>> line.split(',')
['$GPRMC', '092204.999', '4250.5589', 'S', '14718.5084', 'E', '1', '12', '24.4', '89.6', 'M', '', '', '0000*1F ']
>>>

Answer 3

回答by Jerub

Those are comma separated values, so using a csv library is the easiest solution.

这些是逗号分隔的值，因此使用 csv 库是最简单的解决方案。

I threw that sample data you have into /var/tmp/sampledata, then I did this:

我将您拥有的样本数据扔到 /var/tmp/sampledata 中，然后我这样做了：

>>> import csv
>>> for line in csv.reader(open('/var/tmp/sampledata')):
...   print line
['$GPRMC', '092204.999', '**4250.5589', 'S', '14718.5084', 'E**', '1', '12', '24.4', '**89.6**', 'M', '', '', '0000\*1F']
['$GPRMC', '093345.679', '**4234.7899', 'N', '11344.2567', 'W**', '3', '02', '24.5', '**1000.23**', 'M', '', '', '0000\*1F']
['$GPRMC', '044584.936', '**1276.5539', 'N', '88734.1543', 'E**', '2', '04', '33.5', '**600.323**', 'M', '', '', '\*00']
['$GPRMC', '199304.973', '**3248.7780', 'N', '11355.7832', 'W**', '1', '06', '02.2', '**25722.5**', 'M', '', '', '\*00']
['$GPRMC', '066487.954', '**4572.0089', 'S', '45572.3345', 'W**', '3', '09', '15.0', '**35000.00**', 'M', '', '', '\*1F']

You can then process the data however you wish. It looks a little odd with the '**' at the start and end of some of the values, you might want to strip that stuff off, you can do:

然后，您可以根据需要处理数据。某些值的开头和结尾处的 '**' 看起来有点奇怪，您可能想要去除这些东西，您可以这样做：

>> eastwest = 'E**'
>> eastwest = eastwest.strip('*')
>> print eastwest
E

You will have to cast some values as floats. So for example, the 3rd value on the first line of sample data is:

您将不得不将一些值转换为浮点数。例如，示例数据第一行的第三个值是：

>> data = '**4250.5589'
>> print float(data.strip('*'))
4250.5589

Answer 4

回答by Brian C. Lane

You should also first check the checksum of the data. It is calculated by XORing the characters between the $ and the * (not including them) and comparing it to the hex value at the end.

您还应该首先检查数据的校验和。它是通过对 $ 和 * 之间的字符（不包括它们）进行异或并将其与最后的十六进制值进行比较来计算的。

Your pastebin looks like it has some corrupt lines in it. Here is a simple check, it assumes that the line starts with $ and has no CR/LF at the end. To build a more robust parser you need to search for the '$' and work through the string until hitting the '*'.

你的 pastebin 看起来有一些损坏的行。这是一个简单的检查，它假设该行以 $ 开头并且末尾没有 CR/LF。要构建更强大的解析器，您需要搜索 '$' 并处理字符串，直到遇到 '*'。

def check_nmea0183(s):
    """
    Check a string to see if it is a valid NMEA 0183 sentence
    """
    if s[0] != '$':
        return False
    if s[-3] != '*':
        return False

    checksum = 0
    for c in s[1:-3]:
        checksum ^= ord(c)

    if int(s[-2:],16) != checksum:
        return False

    return True

Answer 5

回答by Knio

You could use a library like pynmea2for parsing the NMEA log.

您可以使用像pynmea2这样的库来解析 NMEA 日志。

>>> import pynmea2
>>> msg = pynmea2.parse('$GPGGA,142927.829,2831.4705,N,08041.0067,W,1,07,1.0,7.9,M,-31.2,M,0.0,0000*4F')
>>> msg.timestamp, msg.latitude, msg.longitude, msg.altitude
(datetime.time(14, 29, 27), 28.524508333333333, -80.683445, 7.9)

Disclaimer: I am the author of pynmea2

免责声明：我是 pynmea2 的作者

Answer 6

回答by PaulMcG

If you need to do some more extensive analysis of your GPS data streams, here is a pyparsing solution that breaks up your data into named data fields. I extracted your pastebin'ned data to a file gpsstream.txt, and parsed it with the following:

如果您需要对 GPS 数据流进行更广泛的分析，这里有一个 pyparsing 解决方案，可将您的数据分解为命名数据字段。我将您的 pastebin'ned 数据提取到文件 gpsstream.txt 中，并使用以下内容对其进行解析：

"""
 Parse NMEA 0183 codes for GPS data
 http://en.wikipedia.org/wiki/NMEA_0183

 (data formats from http://www.gpsinformation.org/dale/nmea.htm)
"""
from pyparsing import *

lead = "$"
code = Word(alphas.upper(),exact=5)
end = "*"
COMMA = Suppress(',')
cksum = Word(hexnums,exact=2).setParseAction(lambda t:int(t[0],16))

# define basic data value forms, and attach conversion actions
word = Word(alphanums)
N,S,E,W = map(Keyword,"NSEW")
integer = Regex(r"-?\d+").setParseAction(lambda t:int(t[0]))
real = Regex(r"-?\d+\.\d*").setParseAction(lambda t:float(t[0]))
timestamp = Regex(r"\d{2}\d{2}\d{2}\.\d+")
timestamp.setParseAction(lambda t: t[0][:2]+':'+t[0][2:4]+':'+t[0][4:])
def lonlatConversion(t):
    t["deg"] = int(t.deg)
    t["min"] = float(t.min)
    t["value"] = ((t.deg + t.min/60.0) 
                    * {'N':1,'S':-1,'':1}[t.ns] 
                    * {'E':1,'W':-1,'':1}[t.ew])
lat = Regex(r"(?P<deg>\d{2})(?P<min>\d{2}\.\d+),(?P<ns>[NS])").setParseAction(lonlatConversion)
lon = Regex(r"(?P<deg>\d{3})(?P<min>\d{2}\.\d+),(?P<ew>[EW])").setParseAction(lonlatConversion)

# define expression for a complete data record
value = timestamp | Group(lon) | Group(lat) | real | integer | N | S | E | W | word
item = lead + code("code") + COMMA + delimitedList(Optional(value,None))("datafields") + end + cksum("cksum")


def parseGGA(tokens):
    keys = "time lat lon qual numsats horiz_dilut alt _ geoid_ht _ last_update_secs stnid".split()
    for k,v in zip(keys, tokens.datafields):
        if k != '_':
            tokens[k] = v
    #~ print tokens.dump()

def parseGSA(tokens):
    keys = "auto_manual _3dfix prn prn prn prn prn prn prn prn prn prn prn prn pdop hdop vdop".split()
    tokens["prn"] = []
    for k,v in zip(keys, tokens.datafields):
        if k != 'prn':
            tokens[k] = v
        else:
            if v is not None:
                tokens[k].append(v)
    #~ print tokens.dump()

def parseRMC(tokens):
    keys = "time active_void lat lon speed track_angle date mag_var _ signal_integrity".split()
    for k,v in zip(keys, tokens.datafields):
        if k != '_':
            if k == 'date' and v is not None:
                v = "%06d" % v
                tokens[k] = '20%s/%s/%s' % (v[4:],v[2:4],v[:2])
            else:
                tokens[k] = v
    #~ print tokens.dump()


# process sample data
data = open("gpsstream.txt").read().expandtabs()

count = 0
for i,s,e in item.scanString(data):
    # use checksum to validate input 
    linebody = data[s+1:e-3]
    checksum = reduce(lambda a,b:a^b, map(ord, linebody))
    if i.cksum != checksum:
        continue
    count += 1

    # parse out specific data fields, depending on code field
    fn = {'GPGGA' : parseGGA, 
          'GPGSA' : parseGSA,
          'GPRMC' : parseRMC,}[i.code]
    fn(i)

    # print out time/position/speed values
    if i.code == 'GPRMC':
        print "%s %8.3f %8.3f %4d" % (i.time, i.lat.value, i.lon.value, i.speed or 0) 


print count

The $GPRMC records in your pastebin don't seem to quite match with the ones you included in your post, but you should be able to adjust this example as necessary.

您粘贴箱中的 $GPRMC 记录似乎与您在帖子中包含的记录不太匹配，但您应该能够根据需要调整此示例。

Answer 7

回答by jbdupont

I suggest a small fix in your code because if used to parse data from the previous century the date looks like sometime in the future (for instance 2094 instead of 1994)

我建议在您的代码中进行一个小修复，因为如果用于解析上个世纪的数据，则日期看起来像将来的某个时间（例如 2094 而不是 1994）

My fix is not fully accurate, but I take the stand that prior to the 70's no GPS data existed.

我的修复并不完全准确，但我认为在 70 年代之前不存在 GPS 数据。

In the def parse function for RMC sentences just replace the format line by:

在 RMC 语句的 def 解析函数中，只需将格式行替换为：

p = int(v[4:])
print "p = ", p
if p > 70:
    tokens[k] = '19%s/%s/%s' % (v[4:],v[2:4],v[:2])
else:
    tokens[k] = '20%s/%s/%s' % (v[4:],v[2:4],v[:2])

This will look at the two yy digits of the year and assume that past year 70 we are dealing with sentences from the previous century. It could be better done by comparing to today's date and assuming that every time you deal with some data in the future, they are in fact from the past century

这将查看年份的两个 yy 数字，并假设过去 70 年我们正在处理上个世纪的句子。通过与今天的日期进行比较并假设您将来每次处理某些数据时，它们实际上都是来自上个世纪的数据，可以做得更好

Thanks for all the pieces of code your provided above... I had some fun with this.

感谢您在上面提供的所有代码片段...我对此很感兴趣。

在 Python 中通过正则表达式解析 GPS 接收器输出

提问by crashsystems

回答by Claudiu

回答by S.Lott

回答by Jerub

回答by Brian C. Lane

回答by Knio

回答by PaulMcG

回答by jbdupont

相关推荐

最近更新

标签

在 Python 中通过正则表达式解析 GPS 接收器输出

提问by crashsystems

回答by Claudiu

回答by S.Lott

回答by Jerub

回答by Brian C. Lane

回答by Knio

回答by PaulMcG

回答by jbdupont

相关推荐

UTF-8 latin-1 转换问题，python django

如何使用进度条在 Python 中复制文件？

将 C++ API 暴露给 Python

如何使用 SWIG 将 std::vector<int> 公开为 Python 列表？

相关推荐

最近更新

标签