pandas 熊猫将数据帧转换为 Utf-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45424414/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:08:39  来源:igfitidea点击:

Pandas convert dataframe to Utf-8

pythonpandasutf-8

提问by Chris Johnson

I have a dfthat consist of 100 rows and 24 columns. The column type is string. It's throwing me the following error when I tried to append the data frame to KDB

我有一个df由 100 行和 24 列组成的。列类型为字符串。当我尝试将数据框附加到 KDB 时,它向我抛出以下错误

UnicodeEncodeError: 'ascii' codec can't encode character '\xd3' in position 9: ordinal not in range(128)

Here is an example of the first row in my df.

这是我的 df 中第一行的示例。

                        AnnouncementDate AuctionDate    BBT  \
_id
00000067   2012-12-11T00:00:00.000+00:00         NaN   FHLB

           CouponDividendRate DaysToSettle  \
_id
00000067                 0.61            1

                                        Description  \
_id
00000067                         FHLB 0.61 12/28/16

                     FirstSettlementDate           ISN IsAgency IsWhenIssued  \
_id
00000067   2012-12-28T00:00:00.000+00:00  US313381K796     True        False


           ...  OnTheRunTreasury OperationalIndicator  \
_id        ...
00000067   ...               NaN                False


          OriginalAmountOfPrincipal OriginalMaturityDate  \
_id
00000067                 13000000.0                  NaN


          PrincipalAmountOutstanding       SCSP       SMCP  \
_id
00000067                         0.0  313381K79   76000000

           SecurityTypeLevel1 SecurityTypeLevel2   TCK
_id
00000067          US-DOMESTIC                NaN   NaN

My question is, is there an easy way to convert my dfto utf-8 format?

我的问题是,有没有一种简单的方法可以将我的df格式转换为 utf-8 格式?

Possibly something like df = df.encode('utf-8')

可能像 df = df.encode('utf-8')

Thanks

谢谢

回答by Ricky McMaster

It depends on how you're outputting the data. If you're simply using csv files, which you then import to KDB, then you can specify that easily:

这取决于您如何输出数据。如果您只是使用 csv 文件,然后将其导入 KDB,那么您可以轻松指定:

df.to_csv('df_output.csv', encoding='utf-8')

Or, you can set the encoding when you import the data to Pandas originally, using the same syntax.

或者,您可以使用相同的语法在最初将数据导入 Pandas 时设置编码。

If you're connecting directly to KDB using SQLAlchemy or something similar, you should try specifying this in the connection itself - see this question: Another UnicodeEncodeError when using pandas method to_sql with MySQL

如果您使用 SQLAlchemy 或类似的东西直接连接到 KDB,您应该尝试在连接本身中指定它 - 请参阅这个问题:Another UnicodeEncodeError when using pandas method to_sql with MySQL