用于将纯文本 (ASCII) 转换为 GSM 7 位字符集的 Python 库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2452861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:41:03  来源:igfitidea点击:

Python library for converting plain text (ASCII) into GSM 7-bit character set?

pythonencodingsms

提问by M K Saravanan

Is there a python library for encoding ascii data to 7-bit GSM character set (for sending SMS)?

是否有用于将 ascii 数据编码为 7 位 GSM 字符集(用于发送 SMS)的 python 库?

回答by John La Rooy

There is now :)

现在有:)

Thanks to Chadfor pointing out that this wasn't quite right

感谢Chad指出这不太正确

Python2 version

Python2版本

# -*- coding: utf8 -*- 
gsm = (u"@£$¥èéùìò?\n??\r??Δ_ΦΓΛΩΠΨΣΘΞ\x1b???é !\"#¤%&'()*+,-./0123456789:;<=>"
       u"??ABCDEFGHIJKLMNOPQRSTUVWXYZ???ü§?abcdefghijklmnopqrstuvwxyz???üà")
ext = (u"````````````````````^```````````````````{}`````\````````````[~]`"
       u"|``````````````````````````````````````````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c)
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return res.encode('hex')

print gsm_encode(u"Hello World")

The output is hex. Obviously you can skip that if you want the binary stream

输出为十六进制。显然,如果你想要二进制流,你可以跳过它

Python3 version

Python3版本

# -*- coding: utf8 -*- 
import binascii
gsm = ("@£$¥èéùìò?\n??\r??Δ_ΦΓΛΩΠΨΣΘΞ\x1b???é !\"#¤%&'()*+,-./0123456789:;<=>?"
       "?ABCDEFGHIJKLMNOPQRSTUVWXYZ???ü§?abcdefghijklmnopqrstuvwxyz???üà")
ext = ("````````````````````^```````````````````{}`````\````````````[~]`"
       "|``````````````````````````````````````````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c);
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return binascii.b2a_hex(res.encode('utf-8'))

print(gsm_encode("Hello World"))

回答by Jimmy Ilenloa

I got tips from gnibbler's answer. Here is a script I somehow made up after looking at an online converter: http://smstools3.kekekasvi.com/topic.php?id=288, and it works correctly for me. Both encoding and decoding.

我从 gnibbler 的回答中得到了提示。这是我在查看在线转换器后以某种方式编写的脚本:http: //smstools3.kekekasvi.com/topic.php?id=288,它对我来说工作正常。既编码又解码。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

gsm = (u"@£$¥èéùìò?\n??\r??Δ_ΦΓΛΩΠΨΣΘΞ\x1b???é !\"#¤%&'()*+,-./0123456789:;<=>"
   u"??ABCDEFGHIJKLMNOPQRSTUVWXYZ???ü`?abcdefghijklmnopqrstuvwxyz???üà")
ext = (u"````````````````````^```````````````````{}`````\````````````[~]`"
   u"|``````````````````````````````````````````````````````````````")

def get_encode(currentByte, index, bitRightCount, position, nextPosition, leftShiftCount, bytesLength, bytes):
    if index < 8:
        byte = currentByte >> bitRightCount
        if nextPosition < bytesLength:
            idx2 = bytes[nextPosition]
            byte = byte | ((idx2) << leftShiftCount)
            byte = byte & 0x000000FF
        else:
            byte = byte & 0x000000FF
        return chr(byte).encode('hex').upper()
    return ''

def getBytes(plaintext):
    if type(plaintext) != str:
         plaintext = str(plaintext)
    bytes = []
    for c in plaintext.decode('utf-8'):
        idx = gsm.find(c)
        if idx != -1:
            bytes.append(idx)
        else:
            idx = ext.find(c)
            if idx != -1:
                bytes.append(27)
                bytes.append(idx)
    return bytes

def gsm_encode(plaintext):
    res = ""
    f = -1
    t = 0
    bytes = getBytes(plaintext)
    bytesLength = len(bytes)
    for b in bytes:
        f = f+1
        t = (f%8)+1
        res += get_encode(b, t, t-1, f, f+1, 8-t, bytesLength, bytes)

    return res


def chunks(l, n):
    if n < 1:
        n = 1
    return [l[i:i + n] for i in range(0, len(l), n)]

def gsm_decode(codedtext):
    hexparts = chunks(codedtext, 2)
    number   = 0
    bitcount = 0
    output   = ''
    found_external = False
    for byte in hexparts:
    byte = int(byte, 16);
        # add data on to the end
        number = number + (byte << bitcount)
        # increase the counter
        bitcount = bitcount + 1
        # output the first 7 bits
        if number % 128 == 27:
             '''skip'''
             found_external = True
        else:
            if found_external == True:                
                 character = ext[number % 128]
                 found_external = False
            else:
                 character = gsm[number % 128]
            output = output + character

        # then throw them away
        number = number >> 7
        # every 7th letter you have an extra one in the buffer
        if bitcount == 7:
            if number % 128 == 27:
                '''skip'''
                found_external = True
            else:
                if found_external == True:                
                    character = ext[number % 128]
                    found_external = False
                else:
                    character = gsm[number % 128]
                output = output + character

            bitcount = 0
            number = 0
    return output

回答by sanjeev

I faced a similar issue recently where we were getting gsm7bit decoded text messages, mostly for Verizon carrier with Spanish characters, from the aggregator and we were not able to decode it successfully. Here is the one I created with the help of other answers in the forum. This is for Python 2.7.x.

我最近遇到了一个类似的问题,我们从聚合器获取 gsm7bit 解码的文本消息,主要是针对带有西班牙语字符的 Verizon 运营商,但我们无法成功解码它。这是我在论坛中其他答案的帮助下创建的。这适用于 Python 2.7.x。

def gsm7bitdecode(text):
    gsm = (u"@£$¥èéùìò?\n??\r??Δ_ΦΓΛΩΠΨΣΘΞ\x1b???é !\"#¤%&'()*+,-./0123456789:;<=>"
           u"??ABCDEFGHIJKLMNOPQRSTUVWXYZ???ü`?abcdefghijklmnopqrstuvwxyz???üà")
    ext = (u"````````````````````^```````````````````{}`````\````````````[~]`"
           u"|``````````````````````````````````````````````````````````````")

    text = ''.join(["{0:08b}".format(int(text[i:i+2], 16)) for i in range(0, len(text), 2)][::-1])

    text = [(int(text[::-1][i:i+7][::-1], 2)) for i in range(0, len(text), 7)]
    text = text[:len(text)-1] if text[-1] == 0 else text
    text =iter(text)

    result = []
    for i in text:
        if i == 27:
            i = next(text)
            result.append(ext[i])
        else:
            result.append(gsm[i])

    return "".join(result).rstrip()

回答by Pratik Deoghare

I could not find any library. But I think this should not need a library. Its somewhat easy to do.

我找不到任何图书馆。但我认为这应该不需要图书馆。它有点容易做到。

Hereis Jon Skeethimself on the same topic.

Jon Skeet本人在同一主题上的发言

Example:

例子:

s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def ascii_to_gsm(ch):
    return bin(65 + s.index(ch))

print ascii_to_gsm('A')
print '--'

binary_stream = ''.join([str(ascii_to_gsm(ch))[2:] for ch in s])
print binary_stream

You can also use dictto store mapping between ASCII and GSM 7-bit character set.

也可以dict用来存储ASCII 和 GSM 7 位字符集之间的映射