pandas 提取熊猫列中列表的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45983017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:21:44  来源:igfitidea点击:

Extracting an element of a list in a pandas column

pythonpython-3.xpandas

提问by Ignacio Vergara Kausel

I have a DataFrame that contains a list on each column as shown in the example below with only two columns.

我有一个 DataFrame,它在每列上包含一个列表,如下例所示,只有两列。

    Gamma   Beta
0   [1.4652917656926299, 0.9326935235505321, float] [91, 48.611034768515864, int]
1   [2.6008354611105995, 0.7608529935313189, float] [59, 42.38646954167245, int]
2   [2.6386970166722348, 0.9785848171888037, float] [89, 37.9011122659478, int]
3   [3.49336632573625, 1.0411524946972244, float]   [115, 36.211134224288344, int]
4   [2.193991200007534, 0.7955134305428825, float]  [128, 50.03563864975485, int]
5   [3.4574527664490997, 0.9399880977511021, float] [120, 41.841146628802875, int]
6   [3.1190582380554863, 1.0839109431114795, float] [148, 55.990072419824514, int]
7   [2.7757359940789916, 0.8889801332053203, float] [142, 51.08885697101243, int]
8   [3.23820908493237, 1.0587479742892683, float]   [183, 43.831293356668425, int]
9   [2.2509032790941985, 0.8896196407231622, float] [66, 35.9377662201882, int]

I'd like to extract for every column the first position of the list on each row to get a DataFrame looking as follows.

我想为每一列提取每行列表的第一个位置,以获得如下所示的 DataFrame。

    Gamma   Beta
0   1.4652917656926299  91
1   2.6008354611105995  59
2   2.6386970166722348  89
...

Up to now, my solution would be like [row[1][0] for row in df_params.itertuples()], which I could iterate for every column index of the row and then compose my new DataFrame.

到目前为止,我的解决方案类似于[row[1][0] for row in df_params.itertuples()],我可以对行的每个列索引进行迭代,然后组成我的新 DataFrame。

An alternative is new_df = df_params['Gamma'].apply(lambda x: x[0])and then to iterate to go through all the columns.

另一种方法是new_df = df_params['Gamma'].apply(lambda x: x[0])然后迭代以遍历所有列。

My question is, is there a less cumbersome way to perform this operation?

我的问题是,有没有一种不那么麻烦的方法来执行这个操作?

回答by IanS

You can use the straccessor for lists, e.g.:

您可以将str访问器用于列表,例如:

df_params['Gamma'].str[0]

This should work for all columns:

这应该适用于所有列:

df_params.apply(lambda col: col.str[0])

回答by A.Kot

Itertuples would be pretty slow. You could speed this up with the following:

Itertuples 会很慢。您可以通过以下方式加快速度:

for column_name in df_params.columns:
    df_params[column_name] = [i[0] for i in df_params[column_name]]