pandas 根据列值的长度过滤数据框行

Question

提问by D.prd

I have a pandas dataframe as follows:

我有一个Pandas数据框，如下所示：

df = pd.DataFrame([ [1,2], [np.NaN,1], ['test string1', 5]], columns=['A','B'] )

df
              A  B
0             1  2
1           NaN  1
2  test string1  5

I am using pandas 0.20. What is the most efficient way to remove any rows where 'any' of its column values has length > 10?

我正在使用Pandas 0.20。删除“任何”列值的长度 > 10 的任何行的最有效方法是什么？

len('test string1') 12

len('测试字符串1') 12

So for the above e.g., I am expecting an output as follows:

所以对于上面的例子，我期望输出如下：

df
              A  B
0             1  2
1           NaN  1

Answer 1

回答by Zero

If based on column A

如果基于列 A

In [865]: df[~(df.A.str.len() > 10)]
Out[865]:
     A  B
0    1  2
1  NaN  1

If based on all columns

如果基于所有列

In [866]: df[~df.applymap(lambda x: len(str(x)) > 10).any(axis=1)]
Out[866]:
     A  B
0    1  2
1  NaN  1

Answer 2

回答by Elizabeth

I had to cast to a string for Diego's answer to work:

为了让 Diego 的答案起作用，我不得不转换为一个字符串：

df = df[df['A'].apply(lambda x: len(str(x)) <= 10)]

Answer 3

回答by MaxU

In [42]: df
Out[42]:
              A  B                         C          D
0             1  2                         2 2017-01-01
1           NaN  1                       NaN 2017-01-02
2  test string1  5  test string1test string1 2017-01-03

In [43]: df.dtypes
Out[43]:
A            object
B             int64
C            object
D    datetime64[ns]
dtype: object

In [44]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(1)]
Out[44]:
     A  B    C          D
0    1  2    2 2017-01-01
1  NaN  1  NaN 2017-01-02

Explanation:

解释：

df.select_dtypes(['object'])selects only columns of object(str) dtype:

df.select_dtypes(['object'])仅选择object( str) dtype 的列：

In [45]: df.select_dtypes(['object'])
Out[45]:
              A                         C
0             1                         2
1           NaN                       NaN
2  test string1  test string1test string1

In [46]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10))
Out[46]:
       A      C
0  False  False
1  False  False
2   True   True

now we can "aggregate" it as follows:

现在我们可以按如下方式“聚合”它：

In [47]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)
Out[47]:
0    False
1    False
2     True
dtype: bool

finally we can select only those rows where value is False:

最后我们只能选择那些值为 value 的行False：

In [48]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)]
Out[48]:
     A  B    C          D
0    1  2    2 2017-01-01
1  NaN  1  NaN 2017-01-02

Answer 4

回答by Diego Aguado

Use the apply function of series, in order to keep them:

使用系列的应用功能，以保持它们：

df = df[df['A'].apply(lambda x: len(x) <= 10)]

pandas 根据列值的长度过滤数据框行

提问by D.prd

回答by Zero

回答by Elizabeth

回答by MaxU

回答by Diego Aguado

相关推荐

最近更新

标签

pandas 根据列值的长度过滤数据框行

提问by D.prd

回答by Zero

回答by Elizabeth

回答by MaxU

回答by Diego Aguado

相关推荐

将 Pandas DataFrame 切片为新的 DataFrame

Pandas：astype error string to float（无法将字符串转换为浮点数：'7,50'）

Pandas：如何在数据框列中找到特定模式？

与 Pandas 并排的箱线图

相关推荐

最近更新

标签