pandas 如何替换熊猫数据框中字符串中的空格？

Question

提问by katus

Suppose I have a pandas dataframe like this:

假设我有一个像这样的Pandas数据框：

    Person_1     Person_2     Person_3 
0   John Smith   Jane Smith   Mark Smith 
1   Harry Jones  Mary Jones   Susan Jones

Reproducible form:

可复制形式：

df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith'],
               ['Harry Jones', 'Mary Jones', 'Susan Jones'],
              columns=['Person_1', 'Person_2', 'Person_3'])

What is the nicest way to replace the whitespace between the first and last name in each name with an underscore _ to get:

用下划线 _ 替换每个名字中名字和姓氏之间的空格的最佳方法是什么：

    Person_1     Person_2     Person_3 
0   John_Smith   Jane_Smith   Mark_Smith 
1   Harry_Jones  Mary_Jones   Susan_Jones

Thank you in advance!

先感谢您！

Answer 1

回答by miradulo

I think you could also just opt for DataFrame.replace.

我想你也可以选择DataFrame.replace.

df.replace(' ', '_', regex=True)

Outputs

输出

      Person_1    Person_2     Person_3
0   John_Smith  Jane_Smith   Mark_Smith
1  Harry_Jones  Mary_Jones  Susan_Jones

From some rough benchmarking, it predictably seems like piRSquared's NumPy solution is indeed the fastest, for this small sample at least, followed by DataFrame.replace.

从一些粗略的基准测试来看，可以预见，piRSquared 的 NumPy 解决方案确实是最快的，至少对于这个小样本而言，其次是DataFrame.replace.

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_')
10000 loops, best of 3: 78.4 μs per loop

%timeit df.replace(' ', '_', regex=True)
1000 loops, best of 3: 932 μs per loop

%timeit df.stack().str.replace(' ', '_').unstack()
100 loops, best of 3: 2.29 ms per loop

Interestinglyhowever, it appears that piRSquared's Pandas solution scales muchbetter with larger DataFrames than DataFrame.replace, and even outperforms the NumPy solution.

有趣的是但是，似乎piRSquared的大Pandas解决方案规模太大与大于DataFrames好DataFrame.replace，甚至优于NumPy的解决方案。

>>> df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith']*10000,
                       ['Harry Jones', 'Mary Jones', 'Susan Jones']*10000])

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_')
10 loops, best of 3: 181 ms per loop

%timeit df.replace(' ', '_', regex=True)
1 loop, best of 3: 4.14 s per loop

%timeit df.stack().str.replace(' ', '_').unstack()
10 loops, best of 3: 99.2 ms per loop

Answer 2

回答by Serenity

Use replacemethod of dataframe:

replacedataframe的使用方法：

df.replace('\s+', '_',regex=True,inplace=True)

Answer 3

回答by piRSquared

`pandas`

stack/ unstackwith str.replace

stack/unstack与str.replace

df.stack().str.replace(' ', '_').unstack()

      Person_1    Person_2     Person_3
0   John_Smith  Jane_Smith   Mark_Smith
1  Harry_Jones  Mary_Jones  Susan_Jones

`numpy`

pd.DataFrame(
    np.core.defchararray.replace(df.values.astype(str), ' ', '_'),
    df.index, df.columns)

      Person_1    Person_2     Person_3
0   John_Smith  Jane_Smith   Mark_Smith
1  Harry_Jones  Mary_Jones  Susan_Jones

time testing

时间测试

Answer 4

回答by Aravinda P K

I used the below code to replace white spaces in multiple (specific) Columns.

我使用以下代码替换多个（特定）列中的空格。

df[['Col1','Col2','Col3']] = df[['Col1','col2','Col3']].replace(' ', '', regex=True)

df[['Col1','Col2','Col3']] = df[['Col1','col2','Col3']].replace('','', regex=True)

pandas 如何替换熊猫数据框中字符串中的空格？

提问by katus

回答by miradulo

回答by Serenity

回答by piRSquared

`pandas`

`pandas`

`numpy`

`numpy`

回答by Aravinda P K

相关推荐

最近更新

标签

pandas 如何替换熊猫数据框中字符串中的空格？

提问by katus

回答by miradulo

回答by Serenity

回答by piRSquared

pandas

pandas

numpy

numpy

回答by Aravinda P K

相关推荐

pandas 如何使用 matplotlib 为特定日期和时间绘制来自 csv 的数据？

pandas ValueError：feature_names 不匹配：在 predict() 函数中的 xgboost

pandas 如何在散景图中旋转 X 轴标签？

Pandas concat DataFrames - 保持索引的原始顺序

相关推荐

最近更新

标签

`pandas`

`pandas`

`numpy`

`numpy`