pandas 如何在熊猫中选择不以某些 str 开头的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41689722/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select rows that do not start with some str in pandas?
提问by running man
I want to select rows that the values do not start with some str. For example, I have a pandas df, and I want to select data do not start with t, and c. In this sample, the output should be mext1and okl1.
我想选择值不以某些 str 开头的行。例如,我有一个 pandas df,我想选择不以t, 和开头的数据c。在此示例中,输出应为mext1和okl1。
import pandas as pd
df=pd.DataFrame({'col':['text1','mext1','cext1','okl1']})
df
col
0 text1
1 mext1
2 cext1
3 okl1
I want this:
我要这个:
col
0 mext1
1 okl1
回答by Ted Petrou
You can use the str accessor to get string functionality. The getmethod can grab a given index of the string.
您可以使用 str 访问器来获取字符串功能。该get方法可以获取字符串的给定索引。
df[~df.col.str.get(0).isin(['t', 'c'])]
col
1 mext1
3 okl1
Looks like you can use startswithas well with a tuple (and not a list) of the values you want to exclude.
看起来您也可以使用startswith要排除的值的元组(而不是列表)。
df[~df.col.str.startswith(('t', 'c'))]
回答by piRSquared
option 1
use str.matchand negative look ahead
选项 1
使用str.match和负面展望
df[df.col.str.match('^(?![tc])')]
option 2
within query
选项 2
内query
df.query('col.str[0] not list("tc")')
option 3numpybroadcasting
选项 3numpy广播
df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]
col
1 mext1
3 okl1
time testing
时间测试
def ted(df):
return df[~df.col.str.get(0).isin(['t', 'c'])]
def adele(df):
return df[~df['col'].str.startswith(('t','c'))]
def yohanes(df):
return df[df.col.str.contains('^[^tc]')]
def pir1(df):
return df[df.col.str.match('^(?![tc])')]
def pir2(df):
return df.query('col.str[0] not in list("tc")')
def pir3(df):
df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]
functions = pd.Index(['ted', 'adele', 'yohanes', 'pir1', 'pir2', 'pir3'], name='Method')
lengths = pd.Index([10, 100, 1000, 5000, 10000], name='Length')
results = pd.DataFrame(index=lengths, columns=functions)
from string import ascii_lowercase
for i in lengths:
a = np.random.choice(list(ascii_lowercase), i)
df = pd.DataFrame(dict(col=a))
for j in functions:
results.set_value(
i, j,
timeit(
'{}(df)'.format(j),
'from __main__ import df, {}'.format(j),
number=1000
)
)
fig, axes = plt.subplots(3, 1, figsize=(8, 12))
results.plot(ax=axes[0], title='All Methods')
results.drop('pir2', 1).plot(ax=axes[1], title='Drop `pir2`')
results[['ted', 'adele', 'pir3']].plot(ax=axes[2], title='Just the fast ones')
fig.tight_layout()
回答by ade1e
You can use str.startswithand negate it.
你可以使用str.startswith和否定它。
df[~df['col'].str.startswith('t') &
~df['col'].str.startswith('c')]
col
1 mext1
3 okl1
Or the better option, with multiple characters in a tuple as per @Ted Petrou:
或者更好的选择,按照@Ted Petrou 在元组中包含多个字符:
df[~df['col'].str.startswith(('t','c'))]
col
1 mext1
3 okl1
回答by Yohanes Gultom
Just another alternative in case you prefer regex:
如果您更喜欢正则表达式,这是另一种选择:
df1[df1.col.str.contains('^[^tc]')]
回答by Yantao Xie
You can use the applymethod.
您可以使用该apply方法。
Take your question as a example, the code is like this
以你的问题为例,代码是这样的
df[df['col'].apply(lambda x: x[0] not in ['t', 'c'])]
I think applyis a more general and flexible method.
我认为apply是一种更通用和灵活的方法。


