pandas 从熊猫数据框列中的对象中删除逗号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36475838/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:00:25  来源:igfitidea点击:

Remove comma from objects in a pandas dataframe column

pythoncsvpandascomma

提问by djhc

I have imported a csv file using pandas.

我已经使用Pandas导入了一个 csv 文件。

My dataframe has multiple columns titled "Farm", "Total Apples" and "Good Apples".

我的数据框有多个标题为“农场”、“苹果总数”和“好苹果”的列。

The numerical data imported for "Total Apples" and "Good Apples" contains commas to indicate thousands e.g. 1,200 etc. I want to remove the comma so the data looks like 1200 etc.

为“Total Apples”和“Good Apples”导入的数字数据包含表示千的逗号,例如 1,200 等。我想删除逗号,使数据看起来像 1200 等。

The variable type for the "Total Apples" and "Good Apples" columns comes up as object.

“Total Apples”和“Good Apples”列的变量类型作为对象出现。

I tried using df.str.replaceand df.stripbut have not been successful.

我尝试使用df.str.replacedf.strip但没有成功。

Also tried to change the variable type from object to string and object to integer but couldn't make it work.

还尝试将变量类型从对象更改为字符串,将对象更改为整数,但无法使其工作。

Any help would be greatly appreciated.

任何帮助将不胜感激。

****EDIT****

****编辑****

Excerpt of data from csv file imported using pd.read_csv:

来自使用 pd.read_csv 导入的 csv 文件的数据摘录:

Farm_Name   Total Apples    Good Apples
EM  18,327  14,176
EE  18,785  14,146
IW  635 486
L   33,929  24,586
NE  12,497  9,609
NW  30,756  23,765
SC  8,515   6,438
SE  22,896  17,914
SW  11,972  9,114
WM  27,251  20,931
Y   21,495  16,662

回答by jezrael

I think you can add parameter thousandsto read_csv, then values in columns Total Applesand Good Applesare converted to integers:

我想,你可以添加参数thousandsread_csv,然后在列中的值Total Apples,并Good Apples转换为integers

Maybe your separatoris different, dont forget change it. If separator is whitespace, change it to sep='\s+'.

也许你的separator不一样,别忘了改变它。如果分隔符是空格,请将其更改为sep='\s+'.

import pandas as pd
import io

temp=u"""Farm_Name;Total Apples;Good Apples
EM;18,327;14,176
EE;18,785;14,146
IW;635;486
L;33,929;24,586
NE;12,497;9,609
NW;30,756;23,765
SC;8,515;6,438
SE;22,896;17,914
SW;11,972;9,114
WM;27,251;20,931
Y;21,495;16,662"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";",thousands=',')
print df
   Farm_Name  Total Apples  Good Apples
0         EM         18327        14176
1         EE         18785        14146
2         IW           635          486
3          L         33929        24586
4         NE         12497         9609
5         NW         30756        23765
6         SC          8515         6438
7         SE         22896        17914
8         SW         11972         9114
9         WM         27251        20931
10         Y         21495        16662
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
Farm_Name       11 non-null object
Total Apples    11 non-null int64
Good Apples     11 non-null int64
dtypes: int64(2), object(1)
memory usage: 336.0+ bytes
None

回答by Grr

try this:

尝试这个:

locale.setlocale(locale.LC_NUMERIC, '')
df = df[['Farm Name']].join(df[['Total Apples', 'Good Apples']].applymap(locale.atof))