Python 从列表中添加数据框中的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26666919/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:50:47  来源:igfitidea点击:

Add column in dataframe from list

pythonpandasdataframe

提问by mane

I have a dataframe with some columns like this:

我有一个包含如下列的数据框:

A   B   C  
0   
4
5
6
7
7
6
5

The possible range of values in A are only from 0 to 7.

A 中可能的值范围仅为 0 到 7

Also, I have a list of 8 elements like this:

另外,我有一个包含 8 个元素的列表,如下所示:

List=[2,5,6,8,12,16,26,32]  //There are only 8 elements in this list

If the element in column A is n, I need to insert the nth element from the List in a new column, say 'D'.

如果 A 列中的元素是n,我需要在新列中插入List 中的第n个元素,例如“D”。

How can I do this in one go without looping over the whole dataframe?

如何一次完成此操作而不遍历整个数据帧?

The resulting dataframe would look like this:

生成的数据框如下所示:

A   B   C   D
0           2
4           12
5           16
6           26
7           32
7           32
6           26
5           16

Note: The dataframe is huge and iteration is the last option option. But I can also arrange the elements in 'List' in any other data structure like dict if necessary.

注意:数据框很大,迭代是最后一个选项。但是,如有必要,我也可以将“列表”中的元素排列在任何其他数据结构(如 dict)中。

采纳答案by DSM

IIUC, if you make your (unfortunately named) Listinto an ndarray, you can simply index into it naturally.

IIUC,如果您将您的(不幸命名的)List变成一个ndarray,您可以简单地自然地索引到它。

>>> import numpy as np
>>> m = np.arange(16)*10
>>> m[df.A]
array([  0,  40,  50,  60, 150, 150, 140, 130])
>>> df["D"] = m[df.A]
>>> df
    A   B   C    D
0   0 NaN NaN    0
1   4 NaN NaN   40
2   5 NaN NaN   50
3   6 NaN NaN   60
4  15 NaN NaN  150
5  15 NaN NaN  150
6  14 NaN NaN  140
7  13 NaN NaN  130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.Awill pick out the appropriate elements of m.

在这里我构建了一个新的m,但如果你使用m = np.asarray(List),同样的事情应该起作用: 中的值df.A将挑选出m.



Note that if you're using an old version of numpy, you might have to use m[df.A.values]instead-- in the past, numpydidn't play well with others, and some refactoring in pandascaused some headaches. Things have improved now.

请注意,如果您使用的是旧版本的numpy,则可能必须改用m[df.A.values]- 在过去,numpy与其他人一起玩得不好,并且一些重构pandas引起了一些麻烦。现在情况有所好转。

回答by Phil Cooper

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

首先让我们创建您拥有的数据框,我将忽略列 B 和 C,因为它们不相关。

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

以及您想要的映射:

mapping = dict(enumerate([2,5,6,8,12,16,26,32]))

df['D'] = df['A'].map(mapping)

Done!

完毕!

print df

Output:

输出:

   A   D
0  0   2
1  4  12
2  5  16
3  6  26
4  7  32
5  7  32
6  6  26
7  5  16

回答by sparrow

Just assign the list directly:

直接分配列表即可:

df['new_col'] = mylist


Alternative
Convert the list to a series or array and then assign:

替代
将列表转换为系列或数组,然后分配:

se = pd.Series(mylist)
df['new_col'] = se.values

or

或者

df['new_col'] = np.array(mylist)

回答by Salvatore Cosentino

A solution improving on the great one from @sparrow.

一种改进了@sparrow 的伟大解决方案的解决方案。

Let df, be your dataset, and mylistthe list with the values you want to add to the dataframe.

df成为您的数据集,并将包含要添加到数据框的值的列表mylist

Let's suppose you want to call your new column simply, new_column

假设您想简单地将新列称为new_column

First make the list into a Series:

首先将列表变成一个系列:

column_values = pd.Series(mylist)

Then use the insertfunction to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

然后使用插入功能添加列。此功能的优点是让您可以选择要放置列的位置。在以下示例中,我们将新列定位在左起第一个位置(通过设置 loc=0)

df.insert(loc=0, column='new_column', value=column_values)

回答by Mehdi

Old question; but I always try to use fastest code!

老问题;但我总是尝试使用最快的代码!

I had a huge list with 69 millions of uint64. np.array()was fastest for me.

我有一个包含 6900 万个 uint64 的庞大列表。np.array()对我来说是最快的。

df['hashes'] = hashes
Time spent: 17.034842014312744

df['hashes'] = pd.Series(hashes).values
Time spent: 17.141014337539673

df['key'] = np.array(hashes)
Time spent: 10.724546194076538