过滤'pandas'中不包含字母（alpha）的所有行

Question

提问by owwoow14

I am trying to filter a pandasdataframe using regular expressions. I want to delete those rows that do not contain any letters. For example:

我正在尝试pandas使用regular expressions. 我想删除那些不包含任何字母的行。例如：

Col A.
50000
7848
dog
cat 583
rabbit 444

My desired results is:

我想要的结果是：

Col A.
dog
cat 583
rabbit 444

I have been trying to solve this problem unsuccessful with regexand pandasfilter options. See blow. I am specifically running into problems when I try to merge two conditions for the filter. How can I achieve this?

我一直在试图解决这个问题不成功regex和pandas过滤器选项。见打击。当我尝试合并过滤器的两个条件时，我特别遇到了问题。我怎样才能做到这一点？

Option 1:

选项1：

df['Col A.'] = ~df['Col A.'].filter(regex='\d+')

Option 2

选项 2

df['Col A.'] = df['Col A.'].filter(regex=\w+)

Option 3

选项 3

from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))

OR

或者

df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))

OR

或者

df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]

Answer 1

回答by jezrael

I think you'd need str.containsto filter values which contain letters by the means of boolean indexing:

我认为您需要str.contains通过以下方式过滤包含字母的值boolean indexing：

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
       Col A.
2         dog
3     cat 583
4  rabbit 444

If there are some NaNs values you can pass a parameter:

如果有一些NaNs 值，您可以传递一个参数：

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
       Col A.
3         dog
4     cat 583
5  rabbit 444

Answer 2

回答by Samuel GIFFARD

Have you tried:

你有没有尝试过：

df['Col A.'].filter(regex=r'\D')  # Keeps only if there's a non-digit character

or:

或者：

df['Col A.'].filter(regex=r'[A-Za-z]')  # Keeps only if there's a letter (alpha)

or:

或者：

df['Col A.'].filter(regex=r'[^\W\d_]')  # More info in the link below...

Explanation: https://stackoverflow.com/a/2039476/8933502

说明：https: //stackoverflow.com/a/2039476/8933502

Answer 3

回答by Vaidic

df['Col A.'].str.contains(r'^\d+$', na=True)# if string with only digits or if int/float then will result in NaN converted to True

df['Col A.'].str.contains(r'^\d+$', na=True)# 如果字符串只有数字或者如果是 int/float 则将导致 NaN 转换为 True

eg: [50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] will give : [True,False,False,False,False,False,False, True,False]

例如：[50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] 将给出： [True,False ,假,假,假,假,假,真,假]

Answer 4

回答by Beno?t Zu

You can use ^.*[a-zA-Z].*$

您可以使用 ^.*[a-zA-Z].*$

https://regex101.com/r/b84ji1/1

Details

细节

^: Start of the line

^: 行首

.*: Match any character

.*: 匹配任何字符

[a-zA-Z]: Match letters

[a-zA-Z]: 匹配字母

$: End of the line

$：队伍的尽头

过滤'pandas'中不包含字母（alpha）的所有行

提问by owwoow14

回答by jezrael

回答by Samuel GIFFARD

回答by Vaidic

回答by Beno?t Zu

相关推荐

最近更新

标签

过滤'pandas'中不包含字母（alpha）的所有行

提问by owwoow14

回答by jezrael

回答by Samuel GIFFARD

回答by Vaidic

回答by Beno?t Zu

相关推荐

pandas train_test_split 具有多种功能

pandas 在 x 轴上带有索引的散点图表单数据框

迭代 Pandas 数据框的行

Pandas：从具有特定值的行下方开始读取 Excel 文件

相关推荐

最近更新

标签