Python 使用熊猫读取csv时设置列类型

Question

提问by user2738815

Trying to read csvfile into pandasdataframe with the following formatting

尝试使用以下格式将csv文件读入Pandas数据帧

dp = pd.read_csv('products.csv', header = 0,  dtype = {'name': str,'review': str,
                                                      'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
    print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

This is the output:

这是输出：

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

I can sort of understand that pandasmight be finding it difficult to convert a string representation of a dictionary into a dictionary given thisand this. But how can the content of the "rating" column be both str and numpy.int64???

我可以理解，鉴于this和this，pandas可能会发现将字典的字符串表示形式转换为字典很困难。但是“评级”列的内容怎么可能既是str又是numpy.int64？？？

By the way, tweaks like not specifying an engine or header do not change anything.

顺便说一句，像不指定引擎或标题这样的调整不会改变任何东西。

Thanks and regards

感谢致敬

Answer 1

采纳答案by Colonel Beauvel

Just do :

做就是了：

for col in dp.columns:
    print 'column', col,':', col[0]

And you will see you print the first letter of each column name, which is a string. Beware you iterate here on the name of the column, not on each Series.

你会看到你打印了每个列名的第一个字母，这是一个字符串。请注意，您在此处对列名进行迭代，而不是对每个 Series进行迭代。

What you want is to check the type of each column through a loop so rather do:

您想要的是通过循环检查每列的类型，而是这样做：

for col in dp.columns:
    print 'column', col,':', type(dp[col][0])

...as you did for the column rating!!

...就像您对列评级所做的一样！

Answer 2

回答by Mike Müller

Use:

用：

dp.info()

to see the datatypes of the columns. dp.columnsrefers to the column header names, which are strings.

查看列的数据类型。dp.columns指的是列标题名称，它们是字符串。

Answer 3

回答by Sourav Das

Just use read_tablewith delimiter as ","along with literal_evalas functions for converting values in the concerned columns.

只需使用read_table分隔符","以及literal_eval用于转换相关列中值的函数。

recipes = pd.read_table("\souravD\PP_recipes.csv", sep=r',',
                      names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"],
                      converters = {'name_tokens' : literal_eval,
                                    'ingredient_tokens' : literal_eval,
                                    'steps_tokens' : literal_eval,
                                    'techniques' : literal_eval,
                                    'ingredient_ids' : literal_eval},header=0)

image of recipes dataframe after changing datatype

更改数据类型后配方数据框的图像

Answer 4

回答by taotao.li

I think you should check this one first: Pandas: change data type of columns

我认为你应该先检查这个：Pandas: change data type of columns

when google pandas dataframe column type, it's on the top 5 answers.

当 google 时pandas dataframe column type，它位于前 5 个答案中。

Python 使用熊猫读取csv时设置列类型

提问by user2738815

采纳答案by Colonel Beauvel

回答by Mike Müller

回答by Sourav Das

回答by taotao.li

相关推荐

最近更新

标签

Python 使用熊猫读取csv时设置列类型

提问by user2738815

采纳答案by Colonel Beauvel

回答by Mike Müller

回答by Sourav Das

回答by taotao.li

相关推荐

Python：根据类无效的RGBA参数0.0色点

Python PyCharm 内存不足

Python cx_Oracle.DatabaseError: DPI-1047: 无法加载 64 位 Oracle 客户端库：“dlopen(libclntsh.dylib, 1): image not found”

Python 使用 seaborn，如何在散点图上画一条我选择的线？

相关推荐

最近更新

标签