pandas 删除数据帧python中的空间

Question

提问by jenryb

I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.

我的代码出现错误，因为我试图通过从 csv 调用元素来制作数据帧。我从文件中调用了两列：CompanyName 和 QualityIssue。质量问题分为三类：设备质量、用户和两者都不是。我在尝试制作数据框 df.Equipment Quality 时遇到了问题，这显然不起作用，因为那里有空间。我想从原始文件中获取设备质量并用下划线替换空格。

input:

输入：

Top Calling Customers,         Equipment Quality,    User,    Neither,
Customer 3,                      2,           2,        0,
Customer 1,                      0,           2,        1,
Customer 2,                      0,           1,        0,
Customer 4,                      0,           1,        0,

Here is my code:

这是我的代码：

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')   
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size() 
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue) 
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]  
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0 
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1), 
                             columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')

This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:

这似乎是一个常见问题，我尝试了很多解决方案，但它们似乎不起作用。这是我尝试过的：

with open('byqualityissue.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]
    sentence.replace(" ", "_")

And

和

sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' '))

And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:

我认为最有希望的是http://pandas.pydata.org/pandas-docs/stable/text.html：

formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')

but I got this error: AttributeError: 'Index' object has no attribute 'str'

但我收到了这个错误：AttributeError: 'Index' object has no attribute 'str'

Thanks for your help in advance!

提前感谢您的帮助！

Answer 1

回答by Alexander

Try:

尝试：

formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]

Answer 2

回答by JBWhitmore

As I understand your question, the following should work (test it out with inplace=Falseto see how it looks first if you want to be careful):

据我了解您的问题，以下应该有效（inplace=False如果您想小心，请先进行测试以查看它的外观）：

sortedtotal.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)

And if you have white space surrounding the column names, like: "This example "

如果列名周围有空格，例如：“此示例”

sortedtotal.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

which strips leading/trailing whitespace, then converts internal spaces to "_".

它去除前导/尾随空格，然后将内部空格转换为“_”。

pandas 删除数据帧python中的空间

提问by jenryb

回答by Alexander

回答by JBWhitmore

相关推荐

最近更新

标签

pandas 删除数据帧python中的空间

提问by jenryb

回答by Alexander

回答by JBWhitmore

相关推荐

pandas 熊猫：如何选择每个 GROUP BY 组中的第一行？

如何强制 Pandas read_csv 对所有浮点列使用 float32？

pandas 使用 .concat 创建熊猫数据框时包含空系列

pandas DataFrame 在布尔掩码上设置值

相关推荐

最近更新

标签