pandas 从numpy数组列表构建pandas数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42952672/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Build pandas data frame from list of numpy arrays
提问by Whir
I wonder if there is an easy way for the obvious task to generate a pandas DataFrame from a list of numpy arrays, where the columns are the arrays. The default behavior seems to let the arrays be the rows, which I totally don't understand why. Here is a quick example:
我想知道是否有一种简单的方法可以从 numpy 数组列表中生成一个 Pandas DataFrame,其中列是数组。默认行为似乎让数组成为行,我完全不明白为什么。这是一个快速示例:
names = ['data1', 'data2', 'data3']
data = [np.arange(10) for _ in names]
df = pd.DataFrame(data=data, columns=names)
This gives an error, indicating pandas expects 10 columns.
这给出了一个错误,表明 pandas 需要 10 列。
If I do
如果我做
df = pd.DataFrame(data=data)
I get a DataFrame with 10 columns and 3 rows.
我得到一个有 10 列和 3 行的 DataFrame。
Given that it is generally much more difficult to append rows than columns to a DataFrame I wonder about this behavior, e.g. let's say I quickly want to put a 4th data-array into the DataFrame I want the data to be organized in columns to do
鉴于将行附加到 DataFrame 通常要困难得多,我想知道这种行为,例如,假设我很快想将第 4 个数据数组放入 DataFrame 我希望将数据组织成列来做
df['data4'] = new_array
How can I quickly build the DataFrame I want?
如何快速构建我想要的 DataFrame?
采纳答案by Cleb
I would use .from_items
:
我会用.from_items
:
pd.DataFrame.from_items(zip(names, data))
which gives
这使
data1 data2 data3
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
That should also be faster than transposing:
这也应该比移调更快:
%timeit pd.DataFrame.from_items(zip(names, data))
1000 loops, best of 3: 281 μs per loop
1000 个循环,最好的 3 个:每个循环 281 μs
%timeit pd.DataFrame(data, index=names).T
1000 loops, best of 3: 730 μs per loop
1000 个循环,最好的 3 个:每个循环 730 μs
Adding a fourth column is then also fairly simple:
添加第四列也相当简单:
df['data4'] = range(1, 11)
which gives
这使
data1 data2 data3 data4
0 0 0 0 1
1 1 1 1 2
2 2 2 2 3
3 3 3 3 4
4 4 4 4 5
5 5 5 5 6
6 6 6 6 7
7 7 7 7 8
8 8 8 8 9
9 9 9 9 10
EDIT:
编辑:
As mentioned by @jezrael, a third option would be (beware: order not guaranteed)
正如@jezrael 所提到的,第三种选择是(注意:订单不保证)
pd.DataFrame(dict(zip(names, data)), columns=names)
Timing:
定时:
%timeit pd.DataFrame(dict(zip(names, data)))
1000 loops, best of 3: 281 μs per loop
1000 个循环,最好的 3 个:每个循环 281 μs
回答by blacksite
There are many ways to solve your problem, but the easiest way seems to be df.T
(T
being shorthand for pandas.DataFrame.transpose
):
有很多方法可以解决您的问题,但最简单的方法似乎是df.T
(T
是 的简写pandas.DataFrame.transpose
):
>>> df = pd.DataFrame(data=data, index=names)
>>> df
0 1 2 3 4 5 6 7 8 9
data1 0 1 2 3 4 5 6 7 8 9
data2 0 1 2 3 4 5 6 7 8 9
data3 0 1 2 3 4 5 6 7 8 9
>>> df.T
data1 data2 data3
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
回答by Lak
from_items
is now deprecated. Use from_dict
instead:
from_items
现在已弃用。使用from_dict
来代替:
df = pd.DataFrame.from_dict({
'data1': np.arange(10),
'data2': np.arange(10),
'data3': np.arange(10)
})
This returns:
这将返回:
data1 data2 data3
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9