Python 剥离/修剪数据帧的所有字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40950310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Strip / trim all strings of a dataframe
提问by bold
Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :
在 python/pandas 中清理多类型数据框的值,我想修剪字符串。我目前正在按照两条说明进行操作:
import pandas as pd
df = pd.DataFrame([[' a ', 10], [' c ', 5]])
df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end
df.values
This is quite slow, what could I improve ?
这很慢,我可以改进什么?
回答by jezrael
You can use DataFrame.select_dtypes
to select string
columns and then apply
function str.strip
.
您可以使用DataFrame.select_dtypes
来选择string
列,然后apply
使用str.strip
.
Notice: Values cannot be types
like dicts
or lists
, because their dtypes
is object
.
注意:值不能types
像dicts
或lists
,因为它们dtypes
是object
。
df_obj = df.select_dtypes(['object'])
print (df_obj)
0 a
1 c
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)
0 1
0 a 10
1 c 5
But if there are only a few columns use str.strip
:
但如果只有几列,请使用str.strip
:
df[0] = df[0].str.strip()
回答by Jonathan B.
Money Shot
金钱射击
Here's a compact version of using applymap
with a straightforward lambda expression to call strip
only when the value is of a string type:
这是使用applymap
简单的 lambda 表达式的紧凑版本,strip
仅当值是字符串类型时才调用:
df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
Full Example
完整示例
A more complete example:
一个更完整的例子:
import pandas as pd
def trim_all_columns(df):
"""
Trim whitespace from ends of each value across all series in dataframe
"""
trim_strings = lambda x: x.strip() if isinstance(x, str) else x
return df.applymap(trim_strings)
# simple example of trimming whitespace from data elements
df = pd.DataFrame([[' a ', 10], [' c ', 5]])
df = trim_all_columns(df)
print(df)
>>>
0 1
0 a 10
1 c 5
Working Example
工作示例
Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab
这是由饰品托管的工作示例:https: //trinket.io/python3/e6ab7fb4ab
回答by Roman Pekar
If you really want to use regex, then
如果你真的想使用正则表达式,那么
>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
0 1
0 a 10
1 c 5
But it should be faster to do it like this:
但是这样做应该更快:
>>> df[0] = df[0].str.strip()
回答by Aakash Makwana
You can try:
你可以试试:
df[0] = df[0].str.strip()
or more specifically for all string columns
或更具体地针对所有字符串列
non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())
回答by Dekel
You can use the apply
functionof the Series
object:
您可以使用该apply
功能的的Series
对象:
>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]])
>>> df[0][0]
' a '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'
Note the usage of
strip
and not theregex
which is much faster
请注意使用
strip
and not theregex
哪个更快
Another option - use the apply
functionof the DataFrame object:
另一种选择 - 使用DataFrame 对象的apply
功能:
>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)
0 1
0 a 10
1 c 5
回答by hyunwoo jeong
def trim(x):
if x.dtype == object:
x = x.str.split(' ').str[0]
return(x)
df = df.apply(trim)