pandas 删除数据帧python中的空间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30763351/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing space in dataframe python
提问by jenryb
I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.
我的代码出现错误,因为我试图通过从 csv 调用元素来制作数据帧。我从文件中调用了两列:CompanyName 和 QualityIssue。质量问题分为三类:设备质量、用户和两者都不是。我在尝试制作数据框 df.Equipment Quality 时遇到了问题,这显然不起作用,因为那里有空间。我想从原始文件中获取设备质量并用下划线替换空格。
input:
输入:
Top Calling Customers, Equipment Quality, User, Neither,
Customer 3, 2, 2, 0,
Customer 1, 0, 2, 1,
Customer 2, 0, 1, 0,
Customer 4, 0, 1, 0,
Here is my code:
这是我的代码:
import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3
# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size()
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue)
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1),
columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')
This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:
这似乎是一个常见问题,我尝试了很多解决方案,但它们似乎不起作用。这是我尝试过的:
with open('byqualityissue.csv', 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
sentence.replace(" ", "_")
And
和
sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' '))
And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:
我认为最有希望的是http://pandas.pydata.org/pandas-docs/stable/text.html:
formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')
but I got this error: AttributeError: 'Index' object has no attribute 'str'
但我收到了这个错误:AttributeError: 'Index' object has no attribute 'str'
Thanks for your help in advance!
提前感谢您的帮助!
回答by Alexander
Try:
尝试:
formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]
回答by JBWhitmore
As I understand your question, the following should work (test it out with inplace=Falseto see how it looks first if you want to be careful):
据我了解您的问题,以下应该有效(inplace=False如果您想小心,请先进行测试以查看它的外观):
sortedtotal.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)
And if you have white space surrounding the column names, like: "This example "
如果列名周围有空格,例如:“此示例”
sortedtotal.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)
which strips leading/trailing whitespace, then converts internal spaces to "_".
它去除前导/尾随空格,然后将内部空格转换为“_”。

