Python 更改熊猫数据框特定列的数据类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41590884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:17:21  来源:igfitidea点击:

Change data type of a specific column of a pandas dataframe

pythonpandas

提问by DougKruger

I want to sort a dataframe with many columns by a specific column, but first I need to change type from objectto int. How to change the data type of this specific column while keeping the original column positions?

我想按特定列对包含多列的数据框进行排序,但首先我需要将类型从 更改objectint。如何在保持原始列位置的同时更改此特定列的数据类型?

采纳答案by jezrael

You can use reindexby sorted column by sort_values, cast to intby astype:

您可以使用reindexby 排序列 by sort_values,转换为intby astype

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'colname':['7','3','9'],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  D  E  F colname
0  1  4  1  5  7       7
1  2  5  3  3  4       3
2  3  6  5  6  3       9

print (df.colname.astype(int).sort_values())
1    3
0    7
2    9
Name: colname, dtype: int32

print (df.reindex(df.colname.astype(int).sort_values().index))
   A  B  D  E  F colname
1  2  5  3  3  4       3
0  1  4  1  5  7       7
2  3  6  5  6  3       9

print (df.reindex(df.colname.astype(int).sort_values().index).reset_index(drop=True))
   A  B  D  E  F colname
0  2  5  3  3  4       3
1  1  4  1  5  7       7
2  3  6  5  6  3       9

If first solution does not works because Noneor bad data use to_numeric:

如果第一个解决方案由于None或错误数据而不起作用,请使用to_numeric

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'colname':['7','3','None'],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B  D  E  F colname
0  1  4  1  5  7       7
1  2  5  3  3  4       3
2  3  6  5  6  3    None

print (pd.to_numeric(df.colname, errors='coerce').sort_values())
1    3.0
0    7.0
2    NaN
Name: colname, dtype: float64

回答by JimmyOnThePage

df['colname'] = df['colname'].astype(int)works when changing from floatvalues to intatleast.

df['colname'] = df['colname'].astype(int)float值更改为int至少时有效。

回答by user19120

I have tried following:

我试过以下:

df['column']=df.column.astype('int64')

and it worked for me.

它对我有用。

回答by Kripalu Sar

To simply change one column, here is what you can do: df.column_name.apply(int)

要简单地更改一列,您可以执行以下操作: df.column_name.apply(int)

you can replace intwith the desired datatype you want e.g (np.int64), str, category.

您可以替换int为所需的数据类型,例如(np.int64), str, category

For multiple datatype changes, I would recommend the following:

对于多个数据类型更改,我建议如下:

df = pd.read_csv(data, dtype={'Col_A': str,'Col_B':int64})

df = pd.read_csv(data, dtype={'Col_A': str,'Col_B':int64})