pandas 如何在熊猫中选择不以某些 str 开头的行？

Question

提问by running man

I want to select rows that the values do not start with some str. For example, I have a pandas df, and I want to select data do not start with t, and c. In this sample, the output should be mext1and okl1.

我想选择值不以某些 str 开头的行。例如，我有一个 pandas df，我想选择不以t, 和开头的数据c。在此示例中，输出应为mext1和okl1。

import pandas as pd

df=pd.DataFrame({'col':['text1','mext1','cext1','okl1']})
df

    col
0   text1
1   mext1
2   cext1
3   okl1

I want this:

我要这个：

    col
0   mext1
1   okl1

Answer 1

回答by Ted Petrou

You can use the str accessor to get string functionality. The getmethod can grab a given index of the string.

您可以使用 str 访问器来获取字符串功能。该get方法可以获取字符串的给定索引。

df[~df.col.str.get(0).isin(['t', 'c'])]

     col
1  mext1
3   okl1

Looks like you can use startswithas well with a tuple (and not a list) of the values you want to exclude.

看起来您也可以使用startswith要排除的值的元组（而不是列表）。

df[~df.col.str.startswith(('t', 'c'))]

Answer 2

回答by piRSquared

option 1
use str.matchand negative look ahead

选项 1
使用str.match和负面展望

df[df.col.str.match('^(?![tc])')]

option 2
within query

选项 2
内query

df.query('col.str[0] not list("tc")')

option 3
numpybroadcasting

选项 3
numpy广播

df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

         col
1  mext1
3   okl1

time testing

时间测试

def ted(df):
    return df[~df.col.str.get(0).isin(['t', 'c'])]

def adele(df):
    return df[~df['col'].str.startswith(('t','c'))]

def yohanes(df):
    return df[df.col.str.contains('^[^tc]')]

def pir1(df):
    return df[df.col.str.match('^(?![tc])')]

def pir2(df):
    return df.query('col.str[0] not in list("tc")')

def pir3(df):
    df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

functions = pd.Index(['ted', 'adele', 'yohanes', 'pir1', 'pir2', 'pir3'], name='Method')
lengths = pd.Index([10, 100, 1000, 5000, 10000], name='Length')
results = pd.DataFrame(index=lengths, columns=functions)

from string import ascii_lowercase

for i in lengths:
    a = np.random.choice(list(ascii_lowercase), i)
    df = pd.DataFrame(dict(col=a))
    for j in functions:
        results.set_value(
            i, j,
            timeit(
                '{}(df)'.format(j),
                'from __main__ import df, {}'.format(j),
                number=1000
            )
        )

fig, axes = plt.subplots(3, 1, figsize=(8, 12))
results.plot(ax=axes[0], title='All Methods')
results.drop('pir2', 1).plot(ax=axes[1], title='Drop `pir2`')
results[['ted', 'adele', 'pir3']].plot(ax=axes[2], title='Just the fast ones')
fig.tight_layout()

Answer 3

回答by ade1e

You can use str.startswithand negate it.

你可以使用str.startswith和否定它。

    df[~df['col'].str.startswith('t') & 
       ~df['col'].str.startswith('c')]

col
1   mext1
3   okl1

Or the better option, with multiple characters in a tuple as per @Ted Petrou:

或者更好的选择，按照@Ted Petrou 在元组中包含多个字符：

df[~df['col'].str.startswith(('t','c'))]

    col
1   mext1
3   okl1

Answer 4

回答by Yohanes Gultom

Just another alternative in case you prefer regex:

如果您更喜欢正则表达式，这是另一种选择：

df1[df1.col.str.contains('^[^tc]')]

Answer 5

回答by Yantao Xie

You can use the applymethod.

您可以使用该apply方法。

Take your question as a example, the code is like this

以你的问题为例，代码是这样的

df[df['col'].apply(lambda x: x[0] not in ['t', 'c'])]

I think applyis a more general and flexible method.

我认为apply是一种更通用和灵活的方法。

pandas 如何在熊猫中选择不以某些 str 开头的行？

提问by running man

回答by Ted Petrou

回答by piRSquared

回答by ade1e

回答by Yohanes Gultom

回答by Yantao Xie

相关推荐

最近更新

标签

pandas 如何在熊猫中选择不以某些 str 开头的行？

提问by running man

回答by Ted Petrou

回答by piRSquared

回答by ade1e

回答by Yohanes Gultom

回答by Yantao Xie

相关推荐

我在 Pandas DataFrame 中有字符串索引如何通过开始选择？

Pandas 函数操作

pandas 如何使用尽可能少的代码在 Jupyter notebook 中使用 Python 创建给定数据的频率分布表？

加速 Pandas to_sql()？

相关推荐

最近更新

标签