Pandas Dataframe 检查列值是否在列列表中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47513408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:49:33  来源:igfitidea点击:

Pandas Dataframe Check if column value is in column list

pythonpandaswherelist-comprehensionapply

提问by clg4

I have a dataframe df:

我有一个数据框df

data = {'id':[12,112],
        'idlist':[[1,5,7,12,112],[5,7,12,111,113]]
       }
df=pd.DataFrame.from_dict(data)

which looks like this:

看起来像这样:

    id                idlist
0   12    [1, 5, 7, 12, 112]
1  112  [5, 7, 12, 111, 113]

I need to check and see if idis in the idlist, and select or flag it. I have tried variations of the following and receive the commented error:

我需要检查并查看是否id在 中idlist,然后选择或标记它。我尝试了以下变体并收到评论错误:

df=df.loc[df.id.isin(df.idlist),:] #TypeError: unhashable type: 'list'
df['flag']=df.where(df.idlist.isin(df.idlist),1,0) #TypeError: unhashable type: 'list'

Some possible other methods to a solution would be .applyin a list comprehension?

解决方案的一些可能的其他方法将.apply在列表理解中?

I am looking for a solution here that either selects the rows where idis in idlist, or flags the row with a 1 where idis in idlist. The resulting dfshould be either:

我在这里寻找一个解决方案,要么选择 where idis in的行,要么idlist用 1 where idis in标记该行idlist。结果df应该是:

   id              idlist
0  12  [1, 5, 7, 12, 112]

or:

或者:

   flag   id                idlist
0     1   12    [1, 5, 7, 12, 112]
1     0  112  [5, 7, 12, 111, 113]

Thanks for the help!

谢谢您的帮助!

回答by jezrael

Use apply:

使用apply

df['flag'] = df.apply(lambda x: int(x['id'] in x['idlist']), axis=1)
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

Similar:

相似的:

df['flag'] = df.apply(lambda x: x['id'] in x['idlist'], axis=1).astype(int)
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

With list comprehension:

list comprehension

df['flag'] = [int(x[0] in x[1]) for x in df[['id', 'idlist']].values.tolist()]
print (df)
    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0


Solutions for filtering:

过滤解决方案:

df = df[df.apply(lambda x: x['id'] in x['idlist'], axis=1)]
print (df)
   id              idlist
0  12  [1, 5, 7, 12, 112]

df = df[[x[0] in x[1] for x in df[['id', 'idlist']].values.tolist()]]
print (df)

   id              idlist
0  12  [1, 5, 7, 12, 112]

回答by Aafaque Abdullah

You can use df.applyand process each row and create a new column flag that will check the condition and give you result as second output requested.

您可以使用df.apply和处理每一行并创建一个新的列标志,该标志将检查条件并在请求的第二个输出时为您提供结果。

df['flag'] = df.loc[:, ('id', 'idlist')].apply(lambda x: 1 if x[0] in x[1] else 0, axis=1)

print(df)

where x[0] is idand x[1] is idlist

哪里x[0] is idx[1] is idlist

回答by YOBEN_S

By using issubset

通过使用 issubset

df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
Out[378]: 
0    1
1    0
dtype: int32

By using np.vectorize

通过使用 np.vectorize

def myfun(x,y):
    return np.in1d(x,y)


np.vectorize(myfun)(df.id,df.idlist).astype(int)

Timing :

时间:

%timeit np.vectorize(myfun)(df.id,df.idlist).astype(int)
10000 loops, best of 3: 92.3 μs per loop
%timeit df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
1000 loops, best of 3: 353 μs per loop

回答by rnso

Try simple forloop:

尝试简单的for循环:

flaglist = []
for i in range(len(df)):
    if df.id[i] in df.idlist[i]:
        flaglist.append(1)
    else:
        flaglist.append(0)
df["flag"] = flaglist 

df:

df:

    id                idlist  flag
0   12    [1, 5, 7, 12, 112]     1
1  112  [5, 7, 12, 111, 113]     0

To drop rows:

删除行:

flaglist = []
for i in range(len(df)):
    if df.id[i] not in df.idlist[i]:
        flaglist.append(i)
df = df.drop(flaglist)

df:

df:

   id              idlist  flag
0  12  [1, 5, 7, 12, 112]     1

Above can be converted to list comprehension for creating a flag column:

以上可以转换为列表理解来创建标志列:

df["flag"] = [df.id[i] in df.idlist[i]    for i in range(len(df))]
print(df)
#     id                idlist   flag
# 0   12    [1, 5, 7, 12, 112]   True
# 1  112  [5, 7, 12, 111, 113]  False

or

或者

df["flag"] = [1 if df.id[i] in df.idlist[i] else 0    for i in range(len(df))]
print(df)
#     id                idlist  flag
# 0   12    [1, 5, 7, 12, 112]     1
# 1  112  [5, 7, 12, 111, 113]     0

and for selecting out rows:

并选择行:

flaglist = [i   for i in range(len(df))   if df.id[i] in df.idlist[i]]
print(df.iloc[flaglist])
#    id              idlist
# 0  12  [1, 5, 7, 12, 112]