pandas 从熊猫数据框列中的对象中删除逗号

Question

提问by djhc

I have imported a csv file using pandas.

我已经使用Pandas导入了一个 csv 文件。

My dataframe has multiple columns titled "Farm", "Total Apples" and "Good Apples".

我的数据框有多个标题为“农场”、“苹果总数”和“好苹果”的列。

The numerical data imported for "Total Apples" and "Good Apples" contains commas to indicate thousands e.g. 1,200 etc. I want to remove the comma so the data looks like 1200 etc.

为“Total Apples”和“Good Apples”导入的数字数据包含表示千的逗号，例如 1,200 等。我想删除逗号，使数据看起来像 1200 等。

The variable type for the "Total Apples" and "Good Apples" columns comes up as object.

“Total Apples”和“Good Apples”列的变量类型作为对象出现。

I tried using df.str.replaceand df.stripbut have not been successful.

我尝试使用df.str.replace，df.strip但没有成功。

Also tried to change the variable type from object to string and object to integer but couldn't make it work.

还尝试将变量类型从对象更改为字符串，将对象更改为整数，但无法使其工作。

Any help would be greatly appreciated.

任何帮助将不胜感激。

****EDIT****

****编辑****

Excerpt of data from csv file imported using pd.read_csv:

来自使用 pd.read_csv 导入的 csv 文件的数据摘录：

Farm_Name   Total Apples    Good Apples
EM  18,327  14,176
EE  18,785  14,146
IW  635 486
L   33,929  24,586
NE  12,497  9,609
NW  30,756  23,765
SC  8,515   6,438
SE  22,896  17,914
SW  11,972  9,114
WM  27,251  20,931
Y   21,495  16,662

Answer 1

回答by jezrael

I think you can add parameter thousandsto read_csv, then values in columns Total Applesand Good Applesare converted to integers:

我想，你可以添加参数thousands来read_csv，然后在列中的值Total Apples，并Good Apples转换为integers：

Maybe your separatoris different, dont forget change it. If separator is whitespace, change it to sep='\s+'.

也许你的separator不一样，别忘了改变它。如果分隔符是空格，请将其更改为sep='\s+'.

import pandas as pd
import io

temp=u"""Farm_Name;Total Apples;Good Apples
EM;18,327;14,176
EE;18,785;14,146
IW;635;486
L;33,929;24,586
NE;12,497;9,609
NW;30,756;23,765
SC;8,515;6,438
SE;22,896;17,914
SW;11,972;9,114
WM;27,251;20,931
Y;21,495;16,662"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";",thousands=',')
print df
   Farm_Name  Total Apples  Good Apples
0         EM         18327        14176
1         EE         18785        14146
2         IW           635          486
3          L         33929        24586
4         NE         12497         9609
5         NW         30756        23765
6         SC          8515         6438
7         SE         22896        17914
8         SW         11972         9114
9         WM         27251        20931
10         Y         21495        16662

print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
Farm_Name       11 non-null object
Total Apples    11 non-null int64
Good Apples     11 non-null int64
dtypes: int64(2), object(1)
memory usage: 336.0+ bytes
None

Answer 2

回答by Grr

try this:

尝试这个：

locale.setlocale(locale.LC_NUMERIC, '')
df = df[['Farm Name']].join(df[['Total Apples', 'Good Apples']].applymap(locale.atof))

pandas 从熊猫数据框列中的对象中删除逗号

提问by djhc

回答by jezrael

回答by Grr

相关推荐

最近更新

标签

pandas 从熊猫数据框列中的对象中删除逗号

提问by djhc

回答by jezrael

回答by Grr

相关推荐

pandas 向数据框追加一行

pandas pd.rolling_mean 已被弃用 - ndarrays 的替代方案

pandas 在熊猫系列对象中查找非整数值

pandas 类型错误：不正确的输入：N=2 不得超过 M=1

相关推荐

最近更新

标签