在 Pandas DataFrame 中查找值的 VLOOKUP 等效函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21030693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:32:31  来源:igfitidea点击:

VLOOKUP equivalent function to look up value in pandas DataFrame

pandaslookup

提问by Alexis Eggermont

I have a pandas dataframe with the following structure:

我有一个具有以下结构的Pandas数据框:

DF_Cell, DF_Site
C1,A
C2,A
C3,B
C4,B
C5,B

And I have a very long loop (100 million iterations) in which I treat one by one strings that correspond to the "DF_Cell" column in the DataFrame (first loop iteration creates C1, second iteration creates C2, etc...).

而且我有一个很长的循环(1 亿次迭代),我在其中逐个处理对应于 DataFrame 中“DF_Cell”列的字符串(第一次循环迭代创建 C1,第二次迭代创建 C2,等等)。

I would like to lookup in the dataframe the DF_Site corresponding to the cell (DF_Cell) treated in the loop.

我想在数据框中查找与循环中处理的单元格 (DF_Cell) 相对应的 DF_Site。

One way I could think of was to put the treated cell in a one-cell DataFrame and then doing a left merge on it, but this is much too inefficient for such big data.

我能想到的一种方法是将处理过的单元格放在一个单元格的 DataFrame 中,然后对其进行左合并,但这对于如此大的数据来说效率太低了。

Is there a better way?

有没有更好的办法?

回答by Andy Hayden

Perhaps you want to set DF_Cell as the index*:

也许您想将 DF_Cell 设置为索引*:

In [11]: df = pd.read_csv('foo.csv', index_col='DF_Cell')
         # or df.set_index('DF_Cell', inplace=True)

In [12]: df
Out[12]: 
        DF_Site
DF_Cell        
C1            A
C2            A
C3            B
C4            B
C5            B

You can then refer to the row, or specific entry, using loc:

然后,您可以使用 loc 引用该行或特定条目:

In [13]: df.loc['C1']
Out[13]: 
DF_Site    A
Name: C1, dtype: object

In [14]: df.loc['C1', 'DF_Site']
Out[14]: 'A'

*Assuming this has two columns, you could use squeeze=True.

*假设这有两列,您可以使用squeeze=True.

回答by Sean Geoffrey Pietz

I don't really understand what you mean in your first paragraph, but to be able to look up a field value by reference to the corresponding type in a different column, I agree with Alexis' example as the most idiomatic and efficient way to do it in pandas. However if this is really representative of your data structure you can just use a dict.

我不太明白你在第一段中的意思,但是为了能够通过引用不同列中的相应类型来查找字段值,我同意 Alexis 的例子,这是最惯用和最有效的方法它在Pandas中。但是,如果这真的代表了您的数据结构,您可以只使用 dict。

    data = {'a': 1, 'b': 2, 'c': 3}

    data['a'] 
    # 2

    map(lambda y: x[y]+1, ['c', 'b', 'a'])
    # [4, 3, 2]