Python “ascii”编解码器无法解码位置 319 中的字节 0xef:序号不在范围内(128)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19270165/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)?
提问by dhana
Here I am encoding the data
我在这里对数据进行编码
post = """
='Brand New News Fr0m The Timber Industry!!'=
========Latest Profile==========
Energy & Asset Technology, Inc. (EGTY)
Current Price UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)
.15
================================
Recognize this undiscovered gem which is poised to jump!!
Please read the following Announcement in its Entierty and
Consider the Possibilities?
Watch this One to Trad,e!
Because, EGTY has secured the global rights to market
genetically enhanced fast growing, hard-wood trees!
EGTY trading volume is beginning to surge with landslide Announcement.
The value of this Stoc,k appears poised for growth! This one will not
remain on the ground floor for long.
KEEP READING!!!!!!!!!!!!!!!
===============
"BREAKING NEWS"
===============
-Energy and Asset Technology, Inc. (EGTY) owns a global license to market
the genetically enhanced Global Cedar growth trees, with plans to
REVOLUTIONIZE the forest-timber industry.
These newly enhanced Globa| Cedar trees require only 9-12 years of growth
before they can be harvested for lumber, whereas worldwide growth time for
lumber is 30-50 years.
Other than growing at an astonishing rate, the Global Cedar has a number
of other benefits. Its natural elements make it resistant to termites, and
the lack of oils and sap found in the wood make it resistant to forest fire,
ensuring higher returns on investments.
T
he wood is very lightweight and strong, lighter than Poplar and over twice
as strong as Balsa, which makes it great for construction. It also has
the unique ability to regrow itself from the stump, minimizing the land and
time to replant and develop new root systems.
Based on current resources and agreements, EGTY projects revenues of 0
Million with an approximate profit margin of 40% for each 9-year cycle. With
anticipated growth, EGTY is expected to challenge Deltic Timber Corp. during
its initial 9-year cycle.
Deltic Timber Corp. currently trades at over .00 a share with about 3
Million in revenues. As the reputation and demand for the Global Cedar tree
continues to grow around the world EGTY believes additional multi-million
dollar agreements will be forthcoming. The Global Cedar nursery has produced
about 100,000 infant plants and is developing a production growth target of
250,000 infant plants per month.
Energy and Asset Technology is currently in negotiations with land and business
owners in New Zealand, Greece and Malaysia regarding the purchase of their popular
and profitable fast growing infant tree plants. Inquiries from the governments of
Brazil and Ecuador are also being evaluated.
Conclusion:
The examples above show the Awesome, Earning Potential of little
known Companies That Explode onto Investor?s Radar Screens.
This s-t0ck will not be a Secret for long. Then You May Feel the Desire to Act Right
Now! And Please Watch This One Trade!!
GO EGTY!
All statements made are our express opinion only and should be treated as such.
We may own, take position and sell any securities mentioned at any time. Any
statements that express or involve discussions with respect to predictions,
goals, expectations, beliefs, plans, projections, object'ives, assumptions or
future events or perfo'rmance are not
statements of historical fact and may be
"forward,|ooking statements." forward,|ooking statements are based on expectations,
estimates and projections at the time the statements are made that involve a number
of risks and uncertainties which could cause actual results or events to differ
materially from those presently anticipated. This newsletter was paid ,000 from
third party (IR Marketing). Forward,|ooking statements in this action may be identified
through the use of words such as: "pr0jects", "f0resee", "expects". in compliance with
Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of
this report. Be aware of an inherent conflict of interest resulting from such holdings
due to our intent to profit from the liquidation of these shares. Shar,es may be sold
at any time, even after positive statements have been made regarding the above company.
Since we own shares, there is an inherent conflict of interest in our statements and
opinions. Readers of this publication are cautioned not
to place undue reliance on
forward,|ooking statements, which are based on certain assumptions and expectations
involving various risks and uncertainties that could cause results to differ materially
from those set forth in the forward- looking statements. This is not solicitation to
buy or sell st-0cks, this text is or informational purpose only and you should seek
professional advice from registered financial advisor before you do anything related
with buying or selling st0ck-s, penny st'0cks are very high risk and you can lose your
entire inves,tment.
"""
In [147]: post.encode('utf-8')
and I am getting the output
我得到了输出
[decode] [encode]
ASCII ---> UNICODE ---> UTF-8
1 Glyph 1 Glyph
= 1 Glyph =
1 Byte 1-4 Bytes
采纳答案by Don Question
Unicode is a table which tries to encompass (all) known letters, characters and signs, often also called glyphs. That's somewhat over 110000 meaning holding signs atm. So the DECODED state is a (code)point in this table. But because a byte can't hold more then 8bits = 256 states you have to ENCODE the unicode representation into a byte-stream. The most used encoding technique is the so called UTF-8 ENCODING, which succeeds the older ASCII ENCODING. The UTF-8 Encoding allows to ENCODE Unicode-glyphs with one to four bytes.
Unicode 是一个试图包含(所有)已知字母、字符和符号的表,通常也称为字形。这有点超过 110000 意味着持有 atm 标志。所以 DECODED 状态是这个表中的一个(代码)点。但是因为一个字节不能容纳超过 8 位 = 256 的状态,所以您必须将 unicode 表示编码为字节流。最常用的编码技术是所谓的 UTF-8 编码,它继承了旧的 ASCII 编码。UTF-8 编码允许使用一到四个字节对 Unicode 字形进行编码。
So encoding or decoding is always from unicode or towards unicode. If you want to transform from one encoding to another you have to do it over unicode:
所以编码或解码总是从 unicode 或走向 unicode。如果您想从一种编码转换为另一种编码,则必须通过 unicode 进行:
unicode_str = mystring.decode('ascii')
utf8_str = unicode_str.encode('utf-8')
import sys
print sys.getdefaultencoding()
(not the best example, because ASCII ALWAYS fits into utf-8)
(不是最好的例子,因为 ASCII 总是适合 utf-8)
So if you want to decode your post
variable, you have to know which encoding has the referred string. In python 2.x it's normally ASCII encoded. In python 3.x it should be UTF-8.
所以如果你想解码你的post
变量,你必须知道哪个编码有引用的字符串。在 python 2.x 中,它通常是 ASCII 编码的。在 python 3.x 中,它应该是 UTF-8。
# coding=utf-8
If your post
-variable is not defined in your source-code, but read from an external byte-stream you MUST KNOWthe encoding or you will be out of luck.
如果您的post
-variable 未在源代码中定义,而是从外部字节流中读取,则您必须知道编码,否则您将不走运。
回答by Wooble
First, tell Python what encoding you're using by making this the second line of your file (or first, if you don't use a shebang):
首先,通过将其作为文件的第二行来告诉 Python 您正在使用什么编码(或者首先,如果您不使用 shebang):
post = u"""
='Brand New News Fr0m The Timber Industry!!'=
etc. etc. etc."""
(see PEP 263)
(见PEP 263)
Then, instead of using a byte string, always use unicode literals for textual content:
然后,不要使用字节字符串,而是始终对文本内容使用 unicode 文字:
##代码##