过滤'pandas'中不包含字母(alpha)的所有行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50134687/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:31:21  来源:igfitidea点击:

Filter all rows that do not contain letters (alpha) in ′pandas′

pythonregexpython-2.7pandasdataframe

提问by owwoow14

I am trying to filter a pandasdataframe using regular expressions. I want to delete those rows that do not contain any letters. For example:

我正在尝试pandas使用regular expressions. 我想删除那些不包含任何字母的行。例如:

Col A.
50000
7848
dog
cat 583
rabbit 444

My desired results is:

我想要的结果是:

Col A.
dog
cat 583
rabbit 444

I have been trying to solve this problem unsuccessful with regexand pandasfilter options. See blow. I am specifically running into problems when I try to merge two conditions for the filter. How can I achieve this?

我一直在试图解决这个问题不成功regexpandas过滤器选项。见打击。当我尝试合并过滤器的两个条件时,我特别遇到了问题。我怎样才能做到这一点?

Option 1:

选项1:

df['Col A.'] = ~df['Col A.'].filter(regex='\d+')

Option 2

选项 2

df['Col A.'] = df['Col A.'].filter(regex=\w+)

Option 3

选项 3

from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))

OR

或者

df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))

OR

或者

df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]

回答by jezrael

I think you'd need str.containsto filter values which contain letters by the means of boolean indexing:

我认为您需要str.contains通过以下方式过滤包含字母的值boolean indexing

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
       Col A.
2         dog
3     cat 583
4  rabbit 444

If there are some NaNs values you can pass a parameter:

如果有一些NaNs 值,您可以传递一个参数:

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
       Col A.
3         dog
4     cat 583
5  rabbit 444

回答by Samuel GIFFARD

Have you tried:

你有没有尝试过:

df['Col A.'].filter(regex=r'\D')  # Keeps only if there's a non-digit character

or:

或者:

df['Col A.'].filter(regex=r'[A-Za-z]')  # Keeps only if there's a letter (alpha)

or:

或者:

df['Col A.'].filter(regex=r'[^\W\d_]')  # More info in the link below...

Explanation: https://stackoverflow.com/a/2039476/8933502

说明:https: //stackoverflow.com/a/2039476/8933502

回答by Vaidic

df['Col A.'].str.contains(r'^\d+$', na=True)# if string with only digits or if int/float then will result in NaN converted to True

df['Col A.'].str.contains(r'^\d+$', na=True)# 如果字符串只有数字或者如果是 int/float 则将导致 NaN 转换为 True

eg: [50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] will give : [True,False,False,False,False,False,False, True,False]

例如:[50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] 将给出: [True,False ,假,假,假,假,假,真,假]

回答by Beno?t Zu

You can use ^.*[a-zA-Z].*$

您可以使用 ^.*[a-zA-Z].*$

https://regex101.com/r/b84ji1/1

https://regex101.com/r/b84ji1/1

Details

细节

^: Start of the line

^: 行首

.*: Match any character

.*: 匹配任何字符

[a-zA-Z]: Match letters

[a-zA-Z]: 匹配字母

$: End of the line

$: 队伍的尽头