Python 从列表中添加数据框中的列

Question

提问by mane

I have a dataframe with some columns like this:

我有一个包含如下列的数据框：

The possible range of values in A are only from 0 to 7.

A 中可能的值范围仅为 0 到 7。

Also, I have a list of 8 elements like this:

另外，我有一个包含 8 个元素的列表，如下所示：

List=[2,5,6,8,12,16,26,32]  //There are only 8 elements in this list

If the element in column A is n, I need to insert the nth element from the List in a new column, say 'D'.

如果 A 列中的元素是n，我需要在新列中插入List 中的第n个元素，例如“D”。

How can I do this in one go without looping over the whole dataframe?

如何一次完成此操作而不遍历整个数据帧？

The resulting dataframe would look like this:

生成的数据框如下所示：

A   B   C   D
0           2
4           12
5           16
6           26
7           32
7           32
6           26
5           16

Note: The dataframe is huge and iteration is the last option option. But I can also arrange the elements in 'List' in any other data structure like dict if necessary.

注意：数据框很大，迭代是最后一个选项。但是，如有必要，我也可以将“列表”中的元素排列在任何其他数据结构（如 dict）中。

Answer 1

采纳答案by DSM

IIUC, if you make your (unfortunately named) Listinto an ndarray, you can simply index into it naturally.

IIUC，如果您将您的（不幸命名的）List变成一个ndarray，您可以简单地自然地索引到它。

>>> import numpy as np
>>> m = np.arange(16)*10
>>> m[df.A]
array([  0,  40,  50,  60, 150, 150, 140, 130])
>>> df["D"] = m[df.A]
>>> df
    A   B   C    D
0   0 NaN NaN    0
1   4 NaN NaN   40
2   5 NaN NaN   50
3   6 NaN NaN   60
4  15 NaN NaN  150
5  15 NaN NaN  150
6  14 NaN NaN  140
7  13 NaN NaN  130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.Awill pick out the appropriate elements of m.

在这里我构建了一个新的m，但如果你使用m = np.asarray(List)，同样的事情应该起作用：中的值df.A将挑选出m.

Note that if you're using an old version of numpy, you might have to use m[df.A.values]instead-- in the past, numpydidn't play well with others, and some refactoring in pandascaused some headaches. Things have improved now.

请注意，如果您使用的是旧版本的numpy，则可能必须改用m[df.A.values]- 在过去，numpy与其他人一起玩得不好，并且一些重构pandas引起了一些麻烦。现在情况有所好转。

Answer 2

回答by Phil Cooper

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

首先让我们创建您拥有的数据框，我将忽略列 B 和 C，因为它们不相关。

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

以及您想要的映射：

mapping = dict(enumerate([2,5,6,8,12,16,26,32]))

df['D'] = df['A'].map(mapping)

Done!

完毕！

print df

Output:

输出：

Answer 3

回答by sparrow

Just assign the list directly:

直接分配列表即可：

df['new_col'] = mylist

Alternative
Convert the list to a series or array and then assign:

替代
将列表转换为系列或数组，然后分配：

se = pd.Series(mylist)
df['new_col'] = se.values

or

或者

df['new_col'] = np.array(mylist)

Answer 4

回答by Salvatore Cosentino

A solution improving on the great one from @sparrow.

一种改进了@sparrow 的伟大解决方案的解决方案。

Let df, be your dataset, and mylistthe list with the values you want to add to the dataframe.

让df成为您的数据集，并将包含要添加到数据框的值的列表mylist。

Let's suppose you want to call your new column simply, new_column

假设您想简单地将新列称为new_column

First make the list into a Series:

首先将列表变成一个系列：

column_values = pd.Series(mylist)

Then use the insertfunction to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

然后使用插入功能添加列。此功能的优点是让您可以选择要放置列的位置。在以下示例中，我们将新列定位在左起第一个位置（通过设置 loc=0）

df.insert(loc=0, column='new_column', value=column_values)

Answer 5

回答by Mehdi

Old question; but I always try to use fastest code!

老问题；但我总是尝试使用最快的代码！

I had a huge list with 69 millions of uint64. np.array()was fastest for me.

我有一个包含 6900 万个 uint64 的庞大列表。np.array()对我来说是最快的。

df['hashes'] = hashes
Time spent: 17.034842014312744

df['hashes'] = pd.Series(hashes).values
Time spent: 17.141014337539673

df['key'] = np.array(hashes)
Time spent: 10.724546194076538

Python 从列表中添加数据框中的列

提问by mane

采纳答案by DSM

回答by Phil Cooper

回答by sparrow

回答by Salvatore Cosentino

回答by Mehdi

相关推荐

最近更新

标签

Python 从列表中添加数据框中的列

提问by mane

采纳答案by DSM

回答by Phil Cooper

回答by sparrow

回答by Salvatore Cosentino

回答by Mehdi

相关推荐

Python Pylab散点图误差条（每个点上的误差是唯一的）

python 3中的字符串拆分格式

Python 熊猫按月和年分组

如何使用 python 将 XLSB 文件转换为 csv？

相关推荐

最近更新

标签