尝试在 Python 中使用 Pandas 删除逗号和美元符号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38516481/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to remove commas and dollars signs with Pandas in Python
提问by Mark
Tring to remove the commas and dollars signs from the columns. But when I do, the table prints them out and still has them in there. Is there a different way to remove the commans and dollars signs using a pandas function. I was unuable to find anything in the API Docs or maybe i was looking in the wrong place
尝试从列中删除逗号和美元符号。但是当我这样做时,桌子将它们打印出来并且仍然在那里。有没有其他方法可以使用 pandas 函数删除命令和美元符号。我在 API 文档中找不到任何东西,或者我找错了地方
import pandas as pd
import pandas_datareader.data as web
players = pd.read_html('http://www.usatoday.com/sports/mlb/salaries/2013/player/p/')
df1 = pd.DataFrame(players[0])
df1.drop(df1.columns[[0,3,4, 5, 6]], axis=1, inplace=True)
df1.columns = ['Player', 'Team', 'Avg_Annual']
df1['Avg_Annual'] = df1['Avg_Annual'].replace(',', '')
print (df1.head(10))
回答by mechanical_meat
You have to access the str
attribute per http://pandas.pydata.org/pandas-docs/stable/text.html
您必须str
根据http://pandas.pydata.org/pandas-docs/stable/text.html访问该属性
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)
alternately;
交替;
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)
if you want to prioritize time spent typing over readability.
如果您想优先考虑打字时间而不是可读性。
回答by Hendy
Shamelessly stolen from this answer... but, that answer is only about changing one character and doesn't complete the coolness: since it takes a dictionary, you can replace any number of characters at once, as well as in any number of columns.
无耻地从这个答案中窃取了......但是,那个答案只是改变一个字符,并没有完成酷:因为它需要一本字典,你可以一次替换任意数量的字符,以及任意数量的列.
# if you want to operate on multiple columns, put them in a list like so:
cols = ['col1', 'col2', ..., 'colN']
# pass them to df.replace(), specifying each char and it's replacement:
df[cols] = df[cols].replace({'$': '', ',': ''}, regex=True)
@shivsn caught that you need to use regex=True
; you already knew about replace (but also didn't show trying to use it on multiple columns or both the dollar sign and comma simultaneously).
@shivsn 发现您需要使用regex=True
;您已经知道替换(但也没有显示尝试在多个列上或同时在美元符号和逗号上使用它)。
This answer is simply spelling out the details I found from others in one place for those like me (e.g. noobs to python
an pandas
). Hope it's helpful.
这个答案只是为像我这样的人(例如 noobs 到python
an pandas
)在一个地方详细说明了我从其他人那里找到的详细信息。希望它有帮助。
回答by BiGYaN
@bernie's answer is spot on for your problem. Here's my take on the general problem of loading numerical data in pandas.
@bernie 的答案很适合您的问题。这是我对在 Pandas 中加载数值数据的一般问题的看法。
Often the source of the data is reports generated for direct consumption. Hence the presence of extra formatting like %
, thousand's separator, currency symbols etc. All of these are useful for reading but causes problems for the default parser. My solution is to typecast the column to string, replace these symbols one by one then cast it back to appropriate numerical formats. Having a boilerplate function which retains only [0-9.]
is tempting but causes problems where the thousand's separator and decimal gets swapped, also in case of scientific notation. Here's my code which I wrap into a function and apply as needed.
数据来源通常是为直接消费而生成的报告。因此存在额外的格式,如%
、千位分隔符、货币符号等。所有这些对于阅读都很有用,但会导致默认解析器出现问题。我的解决方案是将列类型转换为字符串,一一替换这些符号,然后将其转换回适当的数字格式。具有仅保留的样板函数[0-9.]
很诱人,但会导致千位分隔符和小数点交换的问题,在科学记数法的情况下也是如此。这是我的代码,我将其包装到一个函数中并根据需要应用。
df[col] = df[col].astype(str) # cast to string
# all the string surgery goes in here
df[col] = df[col].replace('$', '')
df[col] = df[col].replace(',', '') # assuming ',' is the thousand's separator in your locale
df[col] = df[col].replace('%', '')
df[col] = df[col].astype(float) # cast back to appropriate type