Python “ascii”编解码器无法解码位置 319 中的字节 0xef：序号不在范围内（128）？

Question

提问by dhana

Here I am encoding the data

我在这里对数据进行编码

post = """
='Brand New News Fr0m The Timber Industry!!'=

========Latest Profile==========
Energy & Asset Technology, Inc. (EGTY)
Current Price UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)
.15
================================

Recognize this undiscovered gem which is poised to jump!! 

Please read the following Announcement in its Entierty and 
Consider the Possibilities?
Watch this One to Trad,e!

Because, EGTY has secured the global rights to market 
genetically enhanced fast growing, hard-wood trees!

EGTY trading volume is beginning to surge with landslide Announcement. 
The value of this Stoc,k appears poised for growth! This one will not 
remain on the ground floor for long.

KEEP READING!!!!!!!!!!!!!!!

===============
"BREAKING NEWS"
===============

-Energy and Asset Technology, Inc. (EGTY) owns a global license to market
the genetically enhanced Global Cedar growth trees, with plans to 
REVOLUTIONIZE the forest-timber industry. 

These newly enhanced Globa| Cedar trees require only 9-12 years of growth 
before they can be harvested for lumber, whereas worldwide growth time for 
lumber is 30-50 years. 

Other than growing at an astonishing rate, the Global Cedar has a number 
of other benefits. Its natural elements make it resistant to termites, and 
the lack of oils and sap found in the wood make it resistant to forest fire, 
ensuring higher returns on investments.
T
he wood is very lightweight and strong, lighter than Poplar and over twice
as strong as Balsa, which makes it great for construction. It also has 
the unique ability to regrow itself from the stump, minimizing the land and
time to replant and develop new root systems.

Based on current resources and agreements, EGTY projects revenues of 0 
Million with an approximate profit margin of 40% for each 9-year cycle. With 
anticipated growth, EGTY is expected to challenge Deltic Timber Corp. during 
its initial 9-year cycle.

Deltic Timber Corp. currently trades at over .00 a share with about 3 
Million in revenues. As the reputation and demand for the Global Cedar tree 
continues to grow around the world EGTY believes additional multi-million 
dollar agreements will be forthcoming. The Global Cedar nursery has produced 
about 100,000 infant plants and is developing a production growth target of 
250,000 infant plants per month.

Energy and Asset Technology is currently in negotiations with land and business 
owners in New Zealand, Greece and Malaysia regarding the purchase of their popular 
and profitable fast growing infant tree plants. Inquiries from the governments of 
Brazil and Ecuador are also being evaluated.

Conclusion:

The examples above show the Awesome, Earning Potential of little
known Companies That Explode onto Investor?s Radar Screens. 
This s-t0ck will not be a Secret for long. Then You May Feel the Desire to Act Right 
Now! And Please Watch This One Trade!!


GO EGTY!


All statements made are our express opinion only and should be treated as such.
We may own, take position and sell any securities mentioned at any time. Any 
statements that express or involve discussions with respect to predictions, 
goals, expectations, beliefs, plans, projections, object'ives, assumptions or 
future events or perfo'rmance are not
statements of historical fact and may be 
"forward,|ooking statements." forward,|ooking statements are based on expectations, 
estimates and projections at the time the statements are made that involve a number 
of risks and uncertainties which could cause actual results or events to differ 
materially from those presently anticipated. This newsletter was paid ,000 from 
third party (IR Marketing). Forward,|ooking statements in this action may be identified 
through the use of words such as: "pr0jects", "f0resee", "expects". in compliance with 
Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of 
this report. Be aware of an inherent conflict of interest resulting from such holdings 
due to our intent to profit from the liquidation of these shares. Shar,es may be sold 
at any time, even after positive statements have been made regarding the above company. 
Since we own shares, there is an inherent conflict of interest in our statements and 
opinions. Readers of this publication are cautioned not 
to place undue reliance on 
forward,|ooking statements, which are based on certain assumptions and expectations 
involving various risks and uncertainties that could cause results to differ materially 
from those set forth in the forward- looking statements. This is not solicitation to 
buy or sell st-0cks, this text is or informational purpose only and you should seek 
professional advice from registered financial advisor before you do anything related 
with buying or selling st0ck-s, penny st'0cks are very high risk and you can lose your 
entire inves,tment.
"""

In [147]: post.encode('utf-8')

and I am getting the output

我得到了输出

    [decode]     [encode]
ASCII ---> UNICODE ---> UTF-8
1 Glyph                 1 Glyph 
  =        1 Glyph        =
1 Byte                  1-4 Bytes

Answer 1

采纳答案by Don Question

Unicode is a table which tries to encompass (all) known letters, characters and signs, often also called glyphs. That's somewhat over 110000 meaning holding signs atm. So the DECODED state is a (code)point in this table. But because a byte can't hold more then 8bits = 256 states you have to ENCODE the unicode representation into a byte-stream. The most used encoding technique is the so called UTF-8 ENCODING, which succeeds the older ASCII ENCODING. The UTF-8 Encoding allows to ENCODE Unicode-glyphs with one to four bytes.

Unicode 是一个试图包含（所有）已知字母、字符和符号的表，通常也称为字形。这有点超过 110000 意味着持有 atm 标志。所以 DECODED 状态是这个表中的一个（代码）点。但是因为一个字节不能容纳超过 8 位 = 256 的状态，所以您必须将 unicode 表示编码为字节流。最常用的编码技术是所谓的 UTF-8 编码，它继承了旧的 ASCII 编码。UTF-8 编码允许使用一到四个字节对 Unicode 字形进行编码。

So encoding or decoding is always from unicode or towards unicode. If you want to transform from one encoding to another you have to do it over unicode:

所以编码或解码总是从 unicode 或走向 unicode。如果您想从一种编码转换为另一种编码，则必须通过 unicode 进行：

   unicode_str = mystring.decode('ascii')
   utf8_str = unicode_str.encode('utf-8')

import sys
print sys.getdefaultencoding()

(not the best example, because ASCII ALWAYS fits into utf-8)

（不是最好的例子，因为 ASCII 总是适合 utf-8）

So if you want to decode your postvariable, you have to know which encoding has the referred string. In python 2.x it's normally ASCII encoded. In python 3.x it should be UTF-8.

所以如果你想解码你的post变量，你必须知道哪个编码有引用的字符串。在 python 2.x 中，它通常是 ASCII 编码的。在 python 3.x 中，它应该是 UTF-8。

# coding=utf-8

If your post-variable is not defined in your source-code, but read from an external byte-stream you MUST KNOWthe encoding or you will be out of luck.

如果您的post-variable 未在源代码中定义，而是从外部字节流中读取，则您必须知道编码，否则您将不走运。

Answer 2

回答by Wooble

First, tell Python what encoding you're using by making this the second line of your file (or first, if you don't use a shebang):

首先，通过将其作为文件的第二行来告诉 Python 您正在使用什么编码（或者首先，如果您不使用 shebang）：

post = u"""
='Brand New News Fr0m The Timber Industry!!'=
etc. etc. etc."""

(see PEP 263)

（见PEP 263）

Then, instead of using a byte string, always use unicode literals for textual content:

然后，不要使用字节字符串，而是始终对文本内容使用 unicode 文字：

##代码##

Python “ascii”编解码器无法解码位置 319 中的字节 0xef：序号不在范围内（128）？

提问by dhana

采纳答案by Don Question

回答by Wooble

相关推荐

最近更新

标签

Python “ascii”编解码器无法解码位置 319 中的字节 0xef：序号不在范围内（128）？

提问by dhana

采纳答案by Don Question

回答by Wooble

相关推荐

Python 根据长度将数据帧拆分为相对均匀的块

Python 一次读取整个文件

使用python中的索引创建一个包含列表子集的新列表

Python 为 scipy 安装 BLAS 和 LAPACK 的最简单方法是什么？

相关推荐

最近更新

标签