Python 使用熊猫读取csv时设置列类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36195485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:31:59  来源:igfitidea点击:

Setting column types while reading csv with pandas

pythoncsvdictionarypandastypes

提问by user2738815

Trying to read csvfile into pandasdataframe with the following formatting

尝试使用以下格式将csv文件读入Pandas数据帧

dp = pd.read_csv('products.csv', header = 0,  dtype = {'name': str,'review': str,
                                                      'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
    print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

This is the output:

这是输出:

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

enter image description here

在此处输入图片说明

I can sort of understand that pandasmight be finding it difficult to convert a string representation of a dictionary into a dictionary given thisand this. But how can the content of the "rating" column be both str and numpy.int64???

我可以理解,鉴于thisthispandas可能会发现将字典的字符串表示形式转换为字典很困难。但是“评级”列的内容怎么可能既是str又是numpy.int64???

By the way, tweaks like not specifying an engine or header do not change anything.

顺便说一句,像不指定引擎或标题这样的调整不会改变任何东西。

Thanks and regards

感谢致敬

采纳答案by Colonel Beauvel

Just do :

做就是了 :

for col in dp.columns:
    print 'column', col,':', col[0]

And you will see you print the first letter of each column name, which is a string. Beware you iterate here on the name of the column, not on each Series.

你会看到你打印了每个列名的第一个字母,这是一个字符串。请注意,您在此处对列名进行迭代,而不是对每个 Series进行迭代。

What you want is to check the type of each column through a loop so rather do:

您想要的是通过循环检查每列的类型,而是这样做:

for col in dp.columns:
    print 'column', col,':', type(dp[col][0])

...as you did for the column rating!!

...就像您对列评级所做的一样!

回答by Mike Müller

Use:

用:

dp.info()

to see the datatypes of the columns. dp.columnsrefers to the column header names, which are strings.

查看列的数据类型。dp.columns指的是列标题名称,它们是字符串。

回答by Sourav Das

Just use read_tablewith delimiter as ","along with literal_evalas functions for converting values in the concerned columns.

只需使用read_table分隔符","以及literal_eval用于转换相关列中值的函数。

recipes = pd.read_table("\souravD\PP_recipes.csv", sep=r',',
                      names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"],
                      converters = {'name_tokens' : literal_eval,
                                    'ingredient_tokens' : literal_eval,
                                    'steps_tokens' : literal_eval,
                                    'techniques' : literal_eval,
                                    'ingredient_ids' : literal_eval},header=0)

image of recipes dataframe after changing datatype

更改数据类型后配方数据框的图像

回答by taotao.li

I think you should check this one first: Pandas: change data type of columns

我认为你应该先检查这个:Pandas: change data type of columns

when google pandas dataframe column type, it's on the top 5 answers.

当 google 时pandas dataframe column type,它位于前 5 个答案中。