Python 如何在保留列顺序的同时创建 DataFrame?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36539396/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create a DataFrame while preserving order of the columns?
提问by ceiling cat
How can I create a DataFrame from multiple numpy
arrays, Pandas
Series, or Pandas
DataFrame's while preserving the order of the columns?
如何在保留列顺序的同时从多个numpy
数组、Pandas
系列或数据Pandas
帧创建数据帧?
For example, I have these two numpy
arrays and I want to combine them as a Pandas
DataFrame.
例如,我有这两个numpy
数组,我想将它们组合为一个Pandas
DataFrame。
foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
If I do this, the bar
column would come first because dict
doesn't preserve order.
如果我这样做,该bar
列将排在第一位,因为dict
不保留顺序。
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )
bar foo
0 4 1
1 5 2
2 6 3
I can do this, but it gets tedious when I need to combine many variables.
我可以做到这一点,但是当我需要组合许多变量时会变得乏味。
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )
EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).
编辑:有没有办法指定要加入的变量并在一个操作中组织列顺序?也就是说,我不介意使用多行来完成整个操作,但我宁愿不必多次指定要连接的变量(因为我将大量更改代码,这很容易出错) .
EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.
EDIT2:还有一点。如果我想添加或删除要加入的变量之一,我只想在一个地方添加/删除。
采纳答案by Eddo Hintoso
Original Solution: Incorrect Usage of collections.OrderedDict
原解决方案:不正确的使用 collections.OrderedDict
In my original solution, I proposed to use OrderedDict
from the collections
package in python's standard library.
在我最初的解决方案中,我建议使用python 标准库中OrderedDict
的collections
包。
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )
foo bar
0 1 4
1 2 5
2 3 6
Right Solution: Passing Key-Value Tuple Pairs for Order Preservation
正确的解决方案:传递键值元组对以保留订单
However, as noted, if a normal dictionary is passed to OrderedDict
, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict
, as suggested from this SO post:
但是,如前所述,如果将普通字典传递给OrderedDict
,则可能仍不会保留顺序,因为在构建字典时顺序是随机的。但是,解决方法是将键值元组对列表转换OrderedDict
为 ,如此 SO 帖子所建议的:
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )
a c b
0 1 7 4
1 2 8 5
2 3 9 6
>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )
a b c
0 1 4 7
1 2 5 8
2 3 6 9
回答by blokeley
Use the columns
keyword when creating the DataFrame
:
columns
创建时使用关键字DataFrame
:
pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar'])
Also, note that you don't need to create the Series.
另请注意,您不需要创建系列。
回答by Vidhya G
To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items
:
要保留列顺序,将 numpy 数组作为元组列表传递给DataFrame.from_items
:
>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])
foo bar
0 1 4
1 2 5
2 3 6
Update
更新
From pandas 0.23 from_items
is deprecated and will be removed. So pass the numpy
arrays using from_dict
. To use from_dict
you need to pass the items as a dictionary:
从熊猫 0.23from_items
已弃用并将被删除。所以numpy
使用from_dict
. 要使用,from_dict
您需要将项目作为字典传递:
>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))
From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:
从 python 3.7 开始,您可以依赖于保留的插入顺序(参见https://mail.python.org/pipermail/python-dev/2017-December/151283.html)所以:
>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))
or simply:
或者干脆:
>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))
回答by tfv
After having created your dataframe, you can simply reorder the columns the way you want by using
创建数据框后,您可以简单地使用
df= df[['foo','bar']]
回答by Eric
I couldn't comment to ask, but how will you specify the order of the columns in the first place (since you can't with a regular dictionary)?
我无法发表评论,但是首先您将如何指定列的顺序(因为您不能使用常规字典)?
If you want to maintain an ordered dictionary:
如果你想维护一个有序的字典:
from collections import OrderedDict
import numpy as np
import pandas as pd
data = OrderedDict()
data['foo'] = np.array([1, 2, 3])
data['bar'] = np.array([4, 5, 6])
df = pd.DataFrame(data)
If you just have a list of keys for order:
如果您只有一个订单键列表:
data = {key: value for key, value in data.iteritems()}
df = pd.concat(data.values(), keys=['foo', 'bar'], axis=1)
@tfv's answer is likely the most concise way to do what you want.
@tfv 的答案可能是做你想做的最简洁的方法。
回答by Alexander
回答by Joe T. Boka
This may be an other way to approach it:
这可能是处理它的另一种方法:
foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
stacked = np.vstack((x,y)).T
stacked
array([[1, 4],
[2, 5],
[3, 6]])
new_df = pd.DataFrame(stacked, columns = ['foo', 'bar'] )
new_df
foo bar
0 1 4
1 2 5
2 3 6
回答by Saminfeld
Make the dataframe with just the data in it, and transpose it.
仅使用其中的数据制作数据框,然后转置它。
Then add the columns.
然后添加列。
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> df = pd.DataFrame([foo, bar]).T
>>> df.columns = ['foo','bar']
>>> df
foo bar 0 1 4 1 2 5 2 3 6
foo bar 0 1 4 1 2 5 2 3 6
回答by Borja_042
Another sketchy solution might be to pass a X_ to the title of the column where X is the number of the order of the column:
另一个粗略的解决方案可能是将 X_ 传递给列的标题,其中 X 是列的顺序号:
pd.DataFrame( { '2_foo': pd.Series(foo), '1_bar': pd.Series(bar) } )
And after that you can use columns or something to rename the columns again! The less pythonic code in the world!!!
之后,您可以使用列或其他东西再次重命名列!世界上更少的 Pythonic 代码!!!
Good luck mates!
祝小伙伴们好运!
回答by Leonardo Pmont
What I did is as follow:
我所做的如下:
# Creating list of dict
list_of_dicts = ({'key1':'valueA', 'key2':'valueB},{'key1':'valueC', 'key2':'valueD}')
#getting list of keys from the dict
keys_list = list(list_of_dicts.keys())
# and finally
df = pd.DataFrame(list_of_dicts, columns = keys_list)
Worked perfectly for me.
非常适合我。