过滤'pandas'中不包含字母(alpha)的所有行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50134687/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter all rows that do not contain letters (alpha) in ′pandas′
提问by owwoow14
I am trying to filter a pandas
dataframe using regular expressions
.
I want to delete those rows that do not contain any letters. For example:
我正在尝试pandas
使用regular expressions
. 我想删除那些不包含任何字母的行。例如:
Col A.
50000
7848
dog
cat 583
rabbit 444
My desired results is:
我想要的结果是:
Col A.
dog
cat 583
rabbit 444
I have been trying to solve this problem unsuccessful with regex
and pandas
filter options. See blow. I am specifically running into problems when I try to merge two conditions for the filter. How can I achieve this?
我一直在试图解决这个问题不成功regex
和pandas
过滤器选项。见打击。当我尝试合并过滤器的两个条件时,我特别遇到了问题。我怎样才能做到这一点?
Option 1:
选项1:
df['Col A.'] = ~df['Col A.'].filter(regex='\d+')
Option 2
选项 2
df['Col A.'] = df['Col A.'].filter(regex=\w+)
Option 3
选项 3
from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))
OR
或者
df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))
OR
或者
df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]
回答by jezrael
I think you'd need str.contains
to filter values which contain letters by the means of boolean indexing
:
我认为您需要str.contains
通过以下方式过滤包含字母的值boolean indexing
:
df = df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444
If there are some NaN
s values you can pass a parameter:
如果有一些NaN
s 值,您可以传递一个参数:
df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444
回答by Samuel GIFFARD
Have you tried:
你有没有尝试过:
df['Col A.'].filter(regex=r'\D') # Keeps only if there's a non-digit character
or:
或者:
df['Col A.'].filter(regex=r'[A-Za-z]') # Keeps only if there's a letter (alpha)
or:
或者:
df['Col A.'].filter(regex=r'[^\W\d_]') # More info in the link below...
Explanation: https://stackoverflow.com/a/2039476/8933502
回答by Vaidic
df['Col A.'].str.contains(r'^\d+$', na=True)
# if string with only digits or if int/float then will result in NaN converted to True
df['Col A.'].str.contains(r'^\d+$', na=True)
# 如果字符串只有数字或者如果是 int/float 则将导致 NaN 转换为 True
eg: [50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] will give : [True,False,False,False,False,False,False, True,False]
例如:[50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] 将给出: [True,False ,假,假,假,假,假,真,假]
回答by Beno?t Zu
You can use ^.*[a-zA-Z].*$
您可以使用 ^.*[a-zA-Z].*$
https://regex101.com/r/b84ji1/1
https://regex101.com/r/b84ji1/1
Details
细节
^
: Start of the line
^
: 行首
.*
: Match any character
.*
: 匹配任何字符
[a-zA-Z]
: Match letters
[a-zA-Z]
: 匹配字母
$
: End of the line
$
: 队伍的尽头