如何在 Python 中将 unicode 字符转换为浮点数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1263796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:48:24  来源:igfitidea点击:

How do I convert unicode characters to floats in Python?

pythonunicodefloating-point

提问by Paul

I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example:

我正在解析一个具有 Unicode 分数表示的网页。我希望能够直接获取这些字符串并将它们转换为浮点数。例如:

"?" would become 0.2

“?” 将变成 0.2

Any suggestions of how to do this in Python?

关于如何在 Python 中执行此操作的任何建议?

回答by Karl Voigtland

You want to use the unicodedatamodule:

你想使用unicodedata模块:

import unicodedata
unicodedata.numeric(u'?')

This will print:

这将打印:

0.20000000000000001

If the character does not have a numeric value, then unicodedata.numeric(unichr[, default])will return default, or if default is not given will raise ValueError.

如果字符没有数字值,unicodedata.numeric(unichr[, default])则将返回默认值,或者如果未给出默认值将引发 ValueError。

回答by Jason Lewallen

Those Unicode representations of floats are called Vulgar Fractions

浮点数的 Unicode 表示称为Vulgar Fractions

You can covert them to floats using unicodedata.numeric(char)

您可以使用unicodedata.numeric(char) 将它们转换为浮点数

However, numeric(char)won't work on something like 3?. That takes a bit more effort:

但是,numeric(char)不适用于类似3?. 这需要更多的努力:

from unicodedata import numeric

samples = ["3?","19?","3 ?","10"]

for i in samples:
    if len(i) == 1:
        v = numeric(i)
    elif i[-1].isdigit():
        # normal number, ending in [0-9]
        v = float(i)
    else:
        # Assume the last character is a vulgar fraction
        v = float(i[:-1]) + numeric(i[-1])
    print(i, v)

Output:

输出:

3? 3.25
19? 19.25
3 ? 3.25
10 10.0

You might also be interested isolating these vulgar fractions from broader user input using regular expressions. You can do so using ranges of their unicode character codes:

您可能还对使用正则表达式从更广泛的用户输入中分离出这些粗俗部分感兴趣。您可以使用其 unicode 字符代码范围来执行此操作:

/[\u2150-\u215E\u00BC-\u00BE]/g

Sample: https://regexr.com/3p8nd

示例:https: //regexr.com/3p8nd

回答by Greg Hewgill

Since there are only a fixed number of fractions defined in Unicode, a dictionary seems appropriate:

由于在 Unicode 中只定义了固定数量的分数,因此字典似乎是合适的:

Fractions = {
    u'?': 0.25,
    u'?': 0.5,
    u'?': 0.75,
    u'?': 0.2,
    # add any other fractions here
}

Update: the unicodedatamodule is a much better solution.

更新:该unicodedata模块是一个更好的解决方案。

回答by akent

Maybe you could decompose the fraction using the "unicodedata" moduleand then look for the FRACTION SLASH characterand then it's just a matter of simple division.

也许您可以使用“unicodedata”模块分解分数,然后查找FRACTION SLASH 字符,然后这只是一个简单的除法问题。

For example:

例如:

>>> import unicodedata
>>> unicodedata.lookup('VULGAR FRACTION ONE QUARTER')
u'\xbc'
>>> unicodedata.decomposition(unicodedata.lookup('VULGAR FRACTION ONE QUARTER'))
'<fraction> 0031 2044 0034'

Update:I'll leave this answer here for reference but using unicodedata.numeric() as per Karl's answer is a much better idea.

更新:我将这个答案留在这里以供参考,但按照 Karl 的答案使用 unicodedata.numeric() 是一个更好的主意。

回答by riza

In Python 3.1, you don't need the 'u', and it will produce 0.2 instead of 0.20000000000000001 .

在 Python 3.1 中,您不需要 'u',它会生成 0.2 而不是 0.20000000000000001 。

>>> unicodedata.numeric('?')
0.2

回答by Tex

I'm stating the obvious here, but it's very simple to extend this for cases when people write "1?" meaning "1.75", so I'm just going to share it here for quick reference:

我在这里说的是显而易见的,但是对于人们写“1?”的情况,扩展它非常简单。意思是“1.75”,所以我只想在这里分享它以供快速参考:

import unicodedata

# Assuming that the unicode is always the last character. You always going to see stuff like "3?", or "19?" whereas stuff like "3?5"
# does not have a clear interpretation 

def convertVulgarFractions(vulgarFraction):

    if (len(vulgarFraction) == 1):
        return unicodedata.numeric(vulgarFraction)

    if (len(vulgarFraction) > 1) & (not (vulgarFraction[:len(vulgarFraction)-1].isdigit())):
        raise ArithmeticError("The format needs to be numbers ending with a vulgar fraction. The number inserted was " + 
                              str(vulgarFraction))

    if vulgarFraction[len(vulgarFraction)-1].isdigit():
        return float(vulgarFraction)
    else:
        return float(vulgarFraction[:len(vulgarFraction)-1]) + unicodedata.numeric(vulgarFraction[len(vulgarFraction)-1])

回答by gerrit

Although not exactly what asked, perhaps someone wants it converted to a fraction rather than to a float. After all, a fraction is what it really represents.

虽然不完全是所问的,但也许有人希望将其转换为分数而不是浮点数。毕竟,分数才是它真正代表的东西。

unicodedata.normalize("NFKC", "?")results in "1?5". This is not (presently) understood by fractions.Fraction, which expects a fraction described with /rather than ?. However, that is easy to replace:

unicodedata.normalize("NFKC", "?")结果在"1?5". 这不是(目前)被 理解fractions.Fraction,它期望用/而不是描述的分数?。但是,这很容易替换:

In [313]: def unifrac_to_frac(s):
     ...:     return fractions.Fraction(unicodedata.normalize("NFKC", s).replace("?", "/"))
     ...: 

In [315]: unifrac_to_frac("?")
Out[315]: Fraction(1, 5)

In [316]: unifrac_to_frac("?")
Out[316]: Fraction(1, 2)

In [317]: unifrac_to_frac("?")
Out[317]: Fraction(0, 1)