Python “ascii”编解码器无法解码位置 319 中的字节 0xef:序号不在范围内(128)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19270165/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:18:21  来源:igfitidea点击:

'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)?

pythonpython-2.7unicodeencoding

提问by dhana

Here I am encoding the data

我在这里对数据进行编码

post = """
='Brand New News Fr0m The Timber Industry!!'=

========Latest Profile==========
Energy & Asset Technology, Inc. (EGTY)
Current Price 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)
.15 ================================ Recognize this undiscovered gem which is poised to jump!! Please read the following Announcement in its Entierty and Consider the Possibilities? Watch this One to Trad,e! Because, EGTY has secured the global rights to market genetically enhanced fast growing, hard-wood trees! EGTY trading volume is beginning to surge with landslide Announcement. The value of this Stoc,k appears poised for growth! This one will not remain on the ground floor for long. KEEP READING!!!!!!!!!!!!!!! =============== "BREAKING NEWS" =============== -Energy and Asset Technology, Inc. (EGTY) owns a global license to market the genetically enhanced Global Cedar growth trees, with plans to REVOLUTIONIZE the forest-timber industry. These newly enhanced Globa| Cedar trees require only 9-12 years of growth before they can be harvested for lumber, whereas worldwide growth time for lumber is 30-50 years. Other than growing at an astonishing rate, the Global Cedar has a number of other benefits. Its natural elements make it resistant to termites, and the lack of oils and sap found in the wood make it resistant to forest fire, ensuring higher returns on investments. T he wood is very lightweight and strong, lighter than Poplar and over twice as strong as Balsa, which makes it great for construction. It also has the unique ability to regrow itself from the stump, minimizing the land and time to replant and develop new root systems. Based on current resources and agreements, EGTY projects revenues of 0 Million with an approximate profit margin of 40% for each 9-year cycle. With anticipated growth, EGTY is expected to challenge Deltic Timber Corp. during its initial 9-year cycle. Deltic Timber Corp. currently trades at over .00 a share with about 3 Million in revenues. As the reputation and demand for the Global Cedar tree continues to grow around the world EGTY believes additional multi-million dollar agreements will be forthcoming. The Global Cedar nursery has produced about 100,000 infant plants and is developing a production growth target of 250,000 infant plants per month. Energy and Asset Technology is currently in negotiations with land and business owners in New Zealand, Greece and Malaysia regarding the purchase of their popular and profitable fast growing infant tree plants. Inquiries from the governments of Brazil and Ecuador are also being evaluated. Conclusion: The examples above show the Awesome, Earning Potential of little known Companies That Explode onto Investor?s Radar Screens. This s-t0ck will not be a Secret for long. Then You May Feel the Desire to Act Right Now! And Please Watch This One Trade!! GO EGTY! All statements made are our express opinion only and should be treated as such. We may own, take position and sell any securities mentioned at any time. Any statements that express or involve discussions with respect to predictions, goals, expectations, beliefs, plans, projections, object'ives, assumptions or future events or perfo'rmance are not statements of historical fact and may be "forward,|ooking statements." forward,|ooking statements are based on expectations, estimates and projections at the time the statements are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those presently anticipated. This newsletter was paid ,000 from third party (IR Marketing). Forward,|ooking statements in this action may be identified through the use of words such as: "pr0jects", "f0resee", "expects". in compliance with Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of this report. Be aware of an inherent conflict of interest resulting from such holdings due to our intent to profit from the liquidation of these shares. Shar,es may be sold at any time, even after positive statements have been made regarding the above company. Since we own shares, there is an inherent conflict of interest in our statements and opinions. Readers of this publication are cautioned not to place undue reliance on forward,|ooking statements, which are based on certain assumptions and expectations involving various risks and uncertainties that could cause results to differ materially from those set forth in the forward- looking statements. This is not solicitation to buy or sell st-0cks, this text is or informational purpose only and you should seek professional advice from registered financial advisor before you do anything related with buying or selling st0ck-s, penny st'0cks are very high risk and you can lose your entire inves,tment. """ In [147]: post.encode('utf-8')

and I am getting the output

我得到了输出

    [decode]     [encode]
ASCII ---> UNICODE ---> UTF-8
1 Glyph                 1 Glyph 
  =        1 Glyph        =
1 Byte                  1-4 Bytes

采纳答案by Don Question

Unicode is a table which tries to encompass (all) known letters, characters and signs, often also called glyphs. That's somewhat over 110000 meaning holding signs atm. So the DECODED state is a (code)point in this table. But because a byte can't hold more then 8bits = 256 states you have to ENCODE the unicode representation into a byte-stream. The most used encoding technique is the so called UTF-8 ENCODING, which succeeds the older ASCII ENCODING. The UTF-8 Encoding allows to ENCODE Unicode-glyphs with one to four bytes.

Unicode 是一个试图包含(所有)已知字母、字符和符号的表,通常也称为字形。这有点超过 110000 意味着持有 atm 标志。所以 DECODED 状态是这个表中的一个(代码)点。但是因为一个字节不能容纳超过 8 位 = 256 的状态,所以您必须将 unicode 表示编码为字节流。最常用的编码技术是所谓的 UTF-8 编码,它继承了旧的 ASCII 编码。UTF-8 编码允许使用一到四个字节对 Unicode 字形进行编码。

So encoding or decoding is always from unicode or towards unicode. If you want to transform from one encoding to another you have to do it over unicode:

所以编码或解码总是从 unicode 或走向 unicode。如果您想从一种编码转换为另一种编码,则必须通过 unicode 进行:

   unicode_str = mystring.decode('ascii')
   utf8_str = unicode_str.encode('utf-8')


import sys
print sys.getdefaultencoding()

(not the best example, because ASCII ALWAYS fits into utf-8)

(不是最好的例子,因为 ASCII 总是适合 utf-8)

So if you want to decode your postvariable, you have to know which encoding has the referred string. In python 2.x it's normally ASCII encoded. In python 3.x it should be UTF-8.

所以如果你想解码你的post变量,你必须知道哪个编码有引用的字符串。在 python 2.x 中,它通常是 ASCII 编码的。在 python 3.x 中,它应该是 UTF-8。

# coding=utf-8

If your post-variable is not defined in your source-code, but read from an external byte-stream you MUST KNOWthe encoding or you will be out of luck.

如果您的post-variable 未在源代码中定义,而是从外部字节流中读取,则您必须知道编码,否则您将不走运。

回答by Wooble

First, tell Python what encoding you're using by making this the second line of your file (or first, if you don't use a shebang):

首先,通过将其作为文件的第二行来告诉 Python 您正在使用什么编码(或者首先,如果您不使用 shebang):

post = u"""
='Brand New News Fr0m The Timber Industry!!'=
etc. etc. etc."""

(see PEP 263)

(见PEP 263

Then, instead of using a byte string, always use unicode literals for textual content:

然后,不要使用字节字符串,而是始终对文本内容使用 unicode 文字:

##代码##