Python 剥离/修剪数据帧的所有字符串

Question

提问by bold

Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :

在 python/pandas 中清理多类型数据框的值，我想修剪字符串。我目前正在按照两条说明进行操作：

import pandas as pd

df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])

df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end

df.values

This is quite slow, what could I improve ?

这很慢，我可以改进什么？

Answer 1

回答by jezrael

You can use DataFrame.select_dtypesto select stringcolumns and then applyfunction str.strip.

您可以使用DataFrame.select_dtypes来选择string列，然后apply使用str.strip.

Notice: Values cannot be typeslike dictsor lists, because their dtypesis object.

注意：值不能types像dicts或lists，因为它们dtypes是object。

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

但如果只有几列，请使用str.strip：

df[0] = df[0].str.strip()

Answer 2

回答by Jonathan B.

Money Shot

金钱射击

Here's a compact version of using applymapwith a straightforward lambda expression to call striponly when the value is of a string type:

这是使用applymap简单的 lambda 表达式的紧凑版本，strip仅当值是字符串类型时才调用：

df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

Full Example

完整示例

A more complete example:

一个更完整的例子：

import pandas as pd


def trim_all_columns(df):
    """
    Trim whitespace from ends of each value across all series in dataframe
    """
    trim_strings = lambda x: x.strip() if isinstance(x, str) else x
    return df.applymap(trim_strings)


# simple example of trimming whitespace from data elements
df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
df = trim_all_columns(df)
print(df)


>>>
   0   1
0  a  10
1  c   5

Working Example

工作示例

Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab

这是由饰品托管的工作示例：https: //trinket.io/python3/e6ab7fb4ab

Answer 3

回答by Roman Pekar

If you really want to use regex, then

如果你真的想使用正则表达式，那么

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
   0   1
0  a  10
1  c   5

But it should be faster to do it like this:

但是这样做应该更快：

>>> df[0] = df[0].str.strip()

Answer 4

回答by Aakash Makwana

You can try:

你可以试试：

df[0] = df[0].str.strip()

or more specifically for all string columns

或更具体地针对所有字符串列

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

Answer 5

回答by Dekel

You can use the applyfunctionof the Seriesobject:

您可以使用该apply功能的的Series对象：

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df[0][0]
'  a  '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'

Note the usage of stripand not the regexwhich is much faster

请注意使用stripand not theregex哪个更快

Another option - use the applyfunctionof the DataFrame object:

另一种选择 - 使用DataFrame 对象的apply功能：

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)

   0   1
0  a  10
1  c   5

Answer 6

回答by hyunwoo jeong

def trim(x):
    if x.dtype == object:
        x = x.str.split(' ').str[0]
    return(x)

df = df.apply(trim)

Python 剥离/修剪数据帧的所有字符串

提问by bold

回答by jezrael

回答by Jonathan B.

Money Shot

金钱射击

Full Example

完整示例

Working Example

工作示例

回答by Roman Pekar

回答by Aakash Makwana

回答by Dekel

回答by hyunwoo jeong

相关推荐

最近更新

标签

Python 剥离/修剪数据帧的所有字符串

提问by bold

回答by jezrael

回答by Jonathan B.

Money Shot

金钱射击

Full Example

完整示例

Working Example

工作示例

回答by Roman Pekar

回答by Aakash Makwana

回答by Dekel

回答by hyunwoo jeong

相关推荐

Python ValueError：输入 0 与层 lstm_13 不兼容：预期 ndim=3，发现 ndim=4

Python 有没有一种简单的方法可以将 Pandas 数据框中的一列是/否更改为 1/0？

如何从 Python 在浏览器中打开 HTML 文件？

Python 将 Jupyter 笔记本会话加载到 Spyder 的最佳方法是什么？

相关推荐

最近更新

标签