pandas 从熊猫数据框列中的对象中删除逗号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36475838/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove comma from objects in a pandas dataframe column
提问by djhc
I have imported a csv file using pandas.
我已经使用Pandas导入了一个 csv 文件。
My dataframe has multiple columns titled "Farm", "Total Apples" and "Good Apples".
我的数据框有多个标题为“农场”、“苹果总数”和“好苹果”的列。
The numerical data imported for "Total Apples" and "Good Apples" contains commas to indicate thousands e.g. 1,200 etc. I want to remove the comma so the data looks like 1200 etc.
为“Total Apples”和“Good Apples”导入的数字数据包含表示千的逗号,例如 1,200 等。我想删除逗号,使数据看起来像 1200 等。
The variable type for the "Total Apples" and "Good Apples" columns comes up as object.
“Total Apples”和“Good Apples”列的变量类型作为对象出现。
I tried using df.str.replace
and df.strip
but have not been successful.
我尝试使用df.str.replace
,df.strip
但没有成功。
Also tried to change the variable type from object to string and object to integer but couldn't make it work.
还尝试将变量类型从对象更改为字符串,将对象更改为整数,但无法使其工作。
Any help would be greatly appreciated.
任何帮助将不胜感激。
****EDIT****
****编辑****
Excerpt of data from csv file imported using pd.read_csv:
来自使用 pd.read_csv 导入的 csv 文件的数据摘录:
Farm_Name Total Apples Good Apples
EM 18,327 14,176
EE 18,785 14,146
IW 635 486
L 33,929 24,586
NE 12,497 9,609
NW 30,756 23,765
SC 8,515 6,438
SE 22,896 17,914
SW 11,972 9,114
WM 27,251 20,931
Y 21,495 16,662
回答by jezrael
I think you can add parameter thousands
to read_csv
, then values in columns Total Apples
and Good Apples
are converted to integers
:
我想,你可以添加参数thousands
来read_csv
,然后在列中的值Total Apples
,并Good Apples
转换为integers
:
Maybe your separator
is different, dont forget change it. If separator is whitespace, change it to sep='\s+'
.
也许你的separator
不一样,别忘了改变它。如果分隔符是空格,请将其更改为sep='\s+'
.
import pandas as pd
import io
temp=u"""Farm_Name;Total Apples;Good Apples
EM;18,327;14,176
EE;18,785;14,146
IW;635;486
L;33,929;24,586
NE;12,497;9,609
NW;30,756;23,765
SC;8,515;6,438
SE;22,896;17,914
SW;11,972;9,114
WM;27,251;20,931
Y;21,495;16,662"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";",thousands=',')
print df
Farm_Name Total Apples Good Apples
0 EM 18327 14176
1 EE 18785 14146
2 IW 635 486
3 L 33929 24586
4 NE 12497 9609
5 NW 30756 23765
6 SC 8515 6438
7 SE 22896 17914
8 SW 11972 9114
9 WM 27251 20931
10 Y 21495 16662
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
Farm_Name 11 non-null object
Total Apples 11 non-null int64
Good Apples 11 non-null int64
dtypes: int64(2), object(1)
memory usage: 336.0+ bytes
None
回答by Grr
try this:
尝试这个:
locale.setlocale(locale.LC_NUMERIC, '')
df = df[['Farm Name']].join(df[['Total Apples', 'Good Apples']].applymap(locale.atof))