pandas 熊猫数据框中列表上的“Where子句”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26112785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:31:36  来源:igfitidea点击:

'Where clause' on a list in a pandas Dataframe

pythonpandasdataframe

提问by woshitom

I'm having this kind of pandas Datamframe named df:

我有这种名为 df 的 Pandas Datamframe:

     email        | list
___________________________
[email protected]  | [0,1]
[email protected]  | [2,1]
[email protected]  | [0,3]
[email protected]  | [0,0]
[email protected]  | [0,1]

I want to retrieve all the row from df having a 0 list : [0,0]

我想从具有 0 列表的 df 中检索所有行:[0,0]

I'm doing:

我正在做:

df2 = df[df['list'] == [0,0]]

But I'm getting the following error:

但我收到以下错误:

ValueError: Arrays were different lengths: 5 vs 2

回答by firelynx

The reason this is not working:

这不起作用的原因:

df2 = df[df['list'] == [0, 0]]

is because df['list'] is a 5 element long list, and [0, 0]is a two element long list. It fails while evaluating your mask

是因为 df['list'] 是一个 5 个元素的长列表,并且[0, 0]是一个两个元素的长列表。评估您的面罩时失败

df['list'] == [0, 0]

Updated proper solution

更新了正确的解决方案

I believe the fastest way of solving this is to create a series of [0,0] elements the length of your dataframe, and compare this series to your column

我相信解决这个问题的最快方法是创建一系列 [0,0] 元素的数据帧长度,并将这个系列与您的列进行比较

df['list'] == pd.Series([[0, 0]] * len(df))

0    False
1    False
2    False
3    True
4    False

This creates a mask by comparing each elementin the list to [0, 0]instead of comparing the listdf['list']to [0, 0]

这通过比较列表中的每个元素[0, 0]而不是将列表df['list'][0, 0]

Using this mask you can then create your new dataframe

使用此掩码,您可以创建新的数据框

mask = df['list'] == pd.Series([[0, 0]] * len(df))
df2 = df[mask]

回答by ragingSloth

your comparing the list of lists to an individual entry. You should instead filter df by using iterrows(). iterrows()creates a generator whic yields tuples whose second entry is the dictionary of columns. you can iterate through them and match against them, then build a new dataframe.

您将列表列表与单个条目进行比较。您应该使用iterrows(). iterrows()创建一个生成元组,其第二个条目是列字典。您可以遍历它们并匹配它们,然后构建一个新的数据框。

df2 = {'email':[], 'list':[]}
for row in df.iterrows():
    row_dictionary = row[1]
    if row_dictionary['list'] == [0,0]:
        for key in df2.keys():
            df2[key].append(row_dictionary[key])
df2 = pandas.DataFrame.from_dict(df2)

By using the dictionary's keys to populate it you can use this method on any dataframe.

通过使用字典的键来填充它,您可以在任何数据帧上使用此方法。