Pandas Dataframe 检查列值是否在列列表中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47513408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe Check if column value is in column list
提问by clg4
I have a dataframe df
:
我有一个数据框df
:
data = {'id':[12,112],
'idlist':[[1,5,7,12,112],[5,7,12,111,113]]
}
df=pd.DataFrame.from_dict(data)
which looks like this:
看起来像这样:
id idlist
0 12 [1, 5, 7, 12, 112]
1 112 [5, 7, 12, 111, 113]
I need to check and see if id
is in the idlist
, and select or flag it. I have tried variations of the following and receive the commented error:
我需要检查并查看是否id
在 中idlist
,然后选择或标记它。我尝试了以下变体并收到评论错误:
df=df.loc[df.id.isin(df.idlist),:] #TypeError: unhashable type: 'list'
df['flag']=df.where(df.idlist.isin(df.idlist),1,0) #TypeError: unhashable type: 'list'
Some possible other methods to a solution would be .apply
in a list comprehension?
解决方案的一些可能的其他方法将.apply
在列表理解中?
I am looking for a solution here that either selects the rows where id
is in idlist
, or flags the row with a 1 where id
is in idlist
. The resulting df
should be either:
我在这里寻找一个解决方案,要么选择 where id
is in的行,要么idlist
用 1 where id
is in标记该行idlist
。结果df
应该是:
id idlist
0 12 [1, 5, 7, 12, 112]
or:
或者:
flag id idlist
0 1 12 [1, 5, 7, 12, 112]
1 0 112 [5, 7, 12, 111, 113]
Thanks for the help!
谢谢您的帮助!
回答by jezrael
Use apply
:
使用apply
:
df['flag'] = df.apply(lambda x: int(x['id'] in x['idlist']), axis=1)
print (df)
id idlist flag
0 12 [1, 5, 7, 12, 112] 1
1 112 [5, 7, 12, 111, 113] 0
Similar:
相似的:
df['flag'] = df.apply(lambda x: x['id'] in x['idlist'], axis=1).astype(int)
print (df)
id idlist flag
0 12 [1, 5, 7, 12, 112] 1
1 112 [5, 7, 12, 111, 113] 0
With list comprehension
:
与list comprehension
:
df['flag'] = [int(x[0] in x[1]) for x in df[['id', 'idlist']].values.tolist()]
print (df)
id idlist flag
0 12 [1, 5, 7, 12, 112] 1
1 112 [5, 7, 12, 111, 113] 0
Solutions for filtering:
过滤解决方案:
df = df[df.apply(lambda x: x['id'] in x['idlist'], axis=1)]
print (df)
id idlist
0 12 [1, 5, 7, 12, 112]
df = df[[x[0] in x[1] for x in df[['id', 'idlist']].values.tolist()]]
print (df)
id idlist
0 12 [1, 5, 7, 12, 112]
回答by Aafaque Abdullah
You can use df.apply
and process each row and create a new column flag that will check the condition and give you result as second output requested.
您可以使用df.apply
和处理每一行并创建一个新的列标志,该标志将检查条件并在请求的第二个输出时为您提供结果。
df['flag'] = df.loc[:, ('id', 'idlist')].apply(lambda x: 1 if x[0] in x[1] else 0, axis=1)
print(df)
where x[0] is id
and x[1] is idlist
哪里x[0] is id
和x[1] is idlist
回答by YOBEN_S
By using issubset
通过使用 issubset
df.apply(lambda x : set([x.id]).issubset(x.idlist),1).astype(int)
Out[378]:
0 1
1 0
dtype: int32
By using np.vectorize
通过使用 np.vectorize
def myfun(x,y):
return np.in1d(x,y)
np.vectorize(myfun)(df.id,df.idlist).astype(int)
Timing :
时间:
%timeit np.vectorize(myfun)(df.id,df.idlist).astype(int)
10000 loops, best of 3: 92.3 μs per loop
%timeit df.apply(lambda x : set([x.id]).issubset(x.idlist),1).astype(int)
1000 loops, best of 3: 353 μs per loop
回答by rnso
Try simple for
loop:
尝试简单的for
循环:
flaglist = []
for i in range(len(df)):
if df.id[i] in df.idlist[i]:
flaglist.append(1)
else:
flaglist.append(0)
df["flag"] = flaglist
df:
df:
id idlist flag
0 12 [1, 5, 7, 12, 112] 1
1 112 [5, 7, 12, 111, 113] 0
To drop rows:
删除行:
flaglist = []
for i in range(len(df)):
if df.id[i] not in df.idlist[i]:
flaglist.append(i)
df = df.drop(flaglist)
df:
df:
id idlist flag
0 12 [1, 5, 7, 12, 112] 1
Above can be converted to list comprehension for creating a flag column:
以上可以转换为列表理解来创建标志列:
df["flag"] = [df.id[i] in df.idlist[i] for i in range(len(df))]
print(df)
# id idlist flag
# 0 12 [1, 5, 7, 12, 112] True
# 1 112 [5, 7, 12, 111, 113] False
or
或者
df["flag"] = [1 if df.id[i] in df.idlist[i] else 0 for i in range(len(df))]
print(df)
# id idlist flag
# 0 12 [1, 5, 7, 12, 112] 1
# 1 112 [5, 7, 12, 111, 113] 0
and for selecting out rows:
并选择行:
flaglist = [i for i in range(len(df)) if df.id[i] in df.idlist[i]]
print(df.iloc[flaglist])
# id idlist
# 0 12 [1, 5, 7, 12, 112]