Python 剥离/修剪数据帧的所有字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40950310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:14:42  来源:igfitidea点击:

Strip / trim all strings of a dataframe

pythonregexpandasdataframetrim

提问by bold

Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :

在 python/pandas 中清理多类型数据框的值,我想修剪字符串。我目前正在按照两条说明进行操作:

import pandas as pd

df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])

df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end

df.values

This is quite slow, what could I improve ?

这很慢,我可以改进什么?

回答by jezrael

You can use DataFrame.select_dtypesto select stringcolumns and then applyfunction str.strip.

您可以使用DataFrame.select_dtypes来选择string列,然后apply使用str.strip.

Notice: Values cannot be typeslike dictsor lists, because their dtypesis object.

注意:值不能typesdictslists,因为它们dtypesobject

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

但如果只有几列,请使用str.strip

df[0] = df[0].str.strip()

回答by Jonathan B.

Money Shot

金钱射击

Here's a compact version of using applymapwith a straightforward lambda expression to call striponly when the value is of a string type:

这是使用applymap简单的 lambda 表达式的紧凑版本,strip仅当值是字符串类型时才调用:

df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

Full Example

完整示例

A more complete example:

一个更完整的例子:

import pandas as pd


def trim_all_columns(df):
    """
    Trim whitespace from ends of each value across all series in dataframe
    """
    trim_strings = lambda x: x.strip() if isinstance(x, str) else x
    return df.applymap(trim_strings)


# simple example of trimming whitespace from data elements
df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
df = trim_all_columns(df)
print(df)


>>>
   0   1
0  a  10
1  c   5

Working Example

工作示例

Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab

这是由饰品托管的工作示例:https: //trinket.io/python3/e6ab7fb4ab

回答by Roman Pekar

If you really want to use regex, then

如果你真的想使用正则表达式,那么

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
   0   1
0  a  10
1  c   5

But it should be faster to do it like this:

但是这样做应该更快:

>>> df[0] = df[0].str.strip()

回答by Aakash Makwana

You can try:

你可以试试:

df[0] = df[0].str.strip()

or more specifically for all string columns

或更具体地针对所有字符串列

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

回答by Dekel

You can use the applyfunctionof the Seriesobject:

您可以使用该apply功能的的Series对象:

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df[0][0]
'  a  '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'

Note the usage of stripand not the regexwhich is much faster

请注意使用stripand not theregex哪个更快

Another option - use the applyfunctionof the DataFrame object:

另一种选择 - 使用DataFrame 对象的apply功能

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)

   0   1
0  a  10
1  c   5

回答by hyunwoo jeong

def trim(x):
    if x.dtype == object:
        x = x.str.split(' ').str[0]
    return(x)

df = df.apply(trim)