pandas 从numpy数组列表构建pandas数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42952672/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:15:44  来源:igfitidea点击:

Build pandas data frame from list of numpy arrays

pythonpandasdataframe

提问by Whir

I wonder if there is an easy way for the obvious task to generate a pandas DataFrame from a list of numpy arrays, where the columns are the arrays. The default behavior seems to let the arrays be the rows, which I totally don't understand why. Here is a quick example:

我想知道是否有一种简单的方法可以从 numpy 数组列表中生成一个 Pandas DataFrame,其中列是数组。默认行为似乎让数组成为行,我完全不明白为什么。这是一个快速示例:

names = ['data1', 'data2', 'data3']
data = [np.arange(10) for _ in names]
df = pd.DataFrame(data=data, columns=names)

This gives an error, indicating pandas expects 10 columns.

这给出了一个错误,表明 pandas 需要 10 列。

If I do

如果我做

df = pd.DataFrame(data=data)

I get a DataFrame with 10 columns and 3 rows.

我得到一个有 10 列和 3 行的 DataFrame。

Given that it is generally much more difficult to append rows than columns to a DataFrame I wonder about this behavior, e.g. let's say I quickly want to put a 4th data-array into the DataFrame I want the data to be organized in columns to do

鉴于将行附加到 DataFrame 通常要困难得多,我想知道这种行为,例如,假设我很快想将第 4 个数据数组放入 DataFrame 我希望将数据组织成列来做

df['data4'] = new_array

How can I quickly build the DataFrame I want?

如何快速构建我想要的 DataFrame?

采纳答案by Cleb

I would use .from_items:

我会用.from_items

pd.DataFrame.from_items(zip(names, data))

which gives

这使

  data1  data2  data3
0      0      0      0
1      1      1      1
2      2      2      2
3      3      3      3
4      4      4      4
5      5      5      5
6      6      6      6
7      7      7      7
8      8      8      8
9      9      9      9

That should also be faster than transposing:

这也应该比移调更快:

%timeit pd.DataFrame.from_items(zip(names, data))

1000 loops, best of 3: 281 μs per loop

1000 个循环,最好的 3 个:每个循环 281 μs

%timeit pd.DataFrame(data, index=names).T

1000 loops, best of 3: 730 μs per loop

1000 个循环,最好的 3 个:每个循环 730 μs

Adding a fourth column is then also fairly simple:

添加第四列也相当简单:

df['data4'] = range(1, 11)

which gives

这使

  data1  data2  data3  data4
0      0      0      0      1
1      1      1      1      2
2      2      2      2      3
3      3      3      3      4
4      4      4      4      5
5      5      5      5      6
6      6      6      6      7
7      7      7      7      8
8      8      8      8      9
9      9      9      9     10

EDIT:

编辑:

As mentioned by @jezrael, a third option would be (beware: order not guaranteed)

正如@jezrael 所提到的,第三种选择是(注意:订单不保证

pd.DataFrame(dict(zip(names, data)), columns=names)

Timing:

定时:

%timeit pd.DataFrame(dict(zip(names, data)))

1000 loops, best of 3: 281 μs per loop

1000 个循环,最好的 3 个:每个循环 281 μs

回答by blacksite

There are many ways to solve your problem, but the easiest way seems to be df.T(Tbeing shorthand for pandas.DataFrame.transpose):

有很多方法可以解决您的问题,但最简单的方法似乎是df.TT是 的简写pandas.DataFrame.transpose):

>>> df = pd.DataFrame(data=data, index=names)
>>> df
       0  1  2  3  4  5  6  7  8  9
data1  0  1  2  3  4  5  6  7  8  9
data2  0  1  2  3  4  5  6  7  8  9
data3  0  1  2  3  4  5  6  7  8  9

>>> df.T 
   data1  data2  data3
0      0      0      0
1      1      1      1
2      2      2      2
3      3      3      3
4      4      4      4
5      5      5      5
6      6      6      6
7      7      7      7
8      8      8      8
9      9      9      9

回答by Lak

from_itemsis now deprecated. Use from_dictinstead:

from_items现在已弃用。使用from_dict来代替:

df = pd.DataFrame.from_dict({
  'data1': np.arange(10),
  'data2': np.arange(10),
  'data3': np.arange(10)
})

This returns:

这将返回:

    data1   data2   data3
0   0   0   0
1   1   1   1
2   2   2   2
3   3   3   3
4   4   4   4
5   5   5   5
6   6   6   6
7   7   7   7
8   8   8   8
9   9   9   9