Python 使用熊猫读取csv时设置列类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36195485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Setting column types while reading csv with pandas
提问by user2738815
Trying to read csvfile into pandasdataframe with the following formatting
尝试使用以下格式将csv文件读入Pandas数据帧
dp = pd.read_csv('products.csv', header = 0, dtype = {'name': str,'review': str,
'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)
This is the output:
这是输出:
(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>
I can sort of understand that pandasmight be finding it difficult to convert a string representation of a dictionary into a dictionary given thisand this. But how can the content of the "rating" column be both str and numpy.int64???
我可以理解,鉴于this和this,pandas可能会发现将字典的字符串表示形式转换为字典很困难。但是“评级”列的内容怎么可能既是str又是numpy.int64???
By the way, tweaks like not specifying an engine or header do not change anything.
顺便说一句,像不指定引擎或标题这样的调整不会改变任何东西。
Thanks and regards
感谢致敬
采纳答案by Colonel Beauvel
Just do :
做就是了 :
for col in dp.columns:
print 'column', col,':', col[0]
And you will see you print the first letter of each column name, which is a string. Beware you iterate here on the name of the column, not on each Series.
你会看到你打印了每个列名的第一个字母,这是一个字符串。请注意,您在此处对列名进行迭代,而不是对每个 Series进行迭代。
What you want is to check the type of each column through a loop so rather do:
您想要的是通过循环检查每列的类型,而是这样做:
for col in dp.columns:
print 'column', col,':', type(dp[col][0])
...as you did for the column rating!!
...就像您对列评级所做的一样!
回答by Mike Müller
Use:
用:
dp.info()
to see the datatypes of the columns. dp.columns
refers to the column header names, which are strings.
查看列的数据类型。dp.columns
指的是列标题名称,它们是字符串。
回答by Sourav Das
Just use read_table
with delimiter as ","
along with literal_eval
as functions for converting values in the concerned columns.
只需使用read_table
分隔符","
以及literal_eval
用于转换相关列中值的函数。
recipes = pd.read_table("\souravD\PP_recipes.csv", sep=r',',
names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"],
converters = {'name_tokens' : literal_eval,
'ingredient_tokens' : literal_eval,
'steps_tokens' : literal_eval,
'techniques' : literal_eval,
'ingredient_ids' : literal_eval},header=0)
回答by taotao.li
I think you should check this one first: Pandas: change data type of columns
我认为你应该先检查这个:Pandas: change data type of columns
when google pandas dataframe column type
, it's on the top 5 answers.
当 google 时pandas dataframe column type
,它位于前 5 个答案中。