Python 如何在保留列顺序的同时创建 DataFrame?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36539396/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:00:36  来源:igfitidea点击:

How to create a DataFrame while preserving order of the columns?

pythonpandas

提问by ceiling cat

How can I create a DataFrame from multiple numpyarrays, PandasSeries, or PandasDataFrame's while preserving the order of the columns?

如何在保留列顺序的同时从多个numpy数组、Pandas系列或数据Pandas帧创建数据帧?

For example, I have these two numpyarrays and I want to combine them as a PandasDataFrame.

例如,我有这两个numpy数组,我想将它们组合为一个PandasDataFrame。

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )

If I do this, the barcolumn would come first because dictdoesn't preserve order.

如果我这样做,该bar列将排在第一位,因为dict不保留顺序。

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )

    bar foo
0   4   1
1   5   2
2   6   3

I can do this, but it gets tedious when I need to combine many variables.

我可以做到这一点,但是当我需要组合许多变量时会变得乏味。

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )

EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

编辑:有没有办法指定要加入的变量并在一个操作中组织列顺序?也就是说,我不介意使用多行来完成整个操作,但我宁愿不必多次指定要连接的变量(因为我将大量更改代码,这很容易出错) .

EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

EDIT2:还有一点。如果我想添加或删除要加入的变量之一,我只想在一个地方添加/删除。

采纳答案by Eddo Hintoso

Original Solution: Incorrect Usage of collections.OrderedDict

原解决方案:不正确的使用 collections.OrderedDict

In my original solution, I proposed to use OrderedDictfrom the collectionspackage in python's standard library.

在我最初的解决方案中,我建议使用python 标准库中OrderedDictcollections包。

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )

   foo  bar
0    1    4
1    2    5
2    3    6

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

正确的解决方案:传递键值元组对以保留订单

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

但是,如前所述,如果将普通字典传递给OrderedDict,则可能仍不会保留顺序,因为在构建字典时顺序是随机的。但是,解决方法是将键值元组对列表转换OrderedDict为 ,如此 SO 帖子所建议

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )

   a  c  b
0  1  7  4
1  2  8  5
2  3  9  6

>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )

   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9

回答by blokeley

Use the columnskeyword when creating the DataFrame:

columns创建时使用关键字DataFrame

pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar'])

Also, note that you don't need to create the Series.

另请注意,您不需要创建系列。

回答by Vidhya G

To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items:

要保留列顺序,将 numpy 数组作为元组列表传递给DataFrame.from_items

>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])

   foo  bar
0    1    4
1    2    5
2    3    6

Update

更新

From pandas 0.23 from_itemsis deprecated and will be removed. So pass the numpyarrays using from_dict. To use from_dictyou need to pass the items as a dictionary:

从熊猫 0.23from_items已弃用并将被删除。所以numpy使用from_dict. 要使用,from_dict您需要将项目作为字典传递:

>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))

From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:

从 python 3.7 开始,您可以依赖于保留的插入顺序(参见https://mail.python.org/pipermail/python-dev/2017-December/151283.html)所以:

>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))

or simply:

或者干脆:

>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))

回答by tfv

After having created your dataframe, you can simply reorder the columns the way you want by using

创建数据框后,您可以简单地使用

df= df[['foo','bar']]

回答by Eric

I couldn't comment to ask, but how will you specify the order of the columns in the first place (since you can't with a regular dictionary)?

我无法发表评论,但是首先您将如何指定列的顺序(因为您不能使用常规字典)?

If you want to maintain an ordered dictionary:

如果你想维护一个有序的字典:

from collections import OrderedDict
import numpy as np
import pandas as pd

data = OrderedDict()
data['foo'] = np.array([1, 2, 3])
data['bar'] = np.array([4, 5, 6])

df = pd.DataFrame(data)

If you just have a list of keys for order:

如果您只有一个订单键列表:

data = {key: value for key, value in data.iteritems()}
df = pd.concat(data.values(), keys=['foo', 'bar'], axis=1)

@tfv's answer is likely the most concise way to do what you want.

@tfv 的答案可能是做你想做的最简洁的方法。

回答by Alexander

>>> pd.concat([pd.Series(eval(col), name=col) for col in ['foo', 'bar']], axis=1)
   foo  bar
0    1    4
1    2    5
2    3    6

This works using eval. Your list of column names must match the corresponding variable name.

这使用eval. 您的列名列表必须与相应的变量名匹配。

>>> eval('foo')
array([1, 2, 3])

回答by Joe T. Boka

This may be an other way to approach it:

这可能是处理它的另一种方法:

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
stacked = np.vstack((x,y)).T
stacked
array([[1, 4],
       [2, 5],
       [3, 6]])

new_df = pd.DataFrame(stacked, columns = ['foo', 'bar'] )
new_df
   foo  bar
0   1   4
1   2   5
2   3   6

回答by Saminfeld

Make the dataframe with just the data in it, and transpose it.

仅使用其中的数据制作数据框,然后转置它。

Then add the columns.

然后添加列。

>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>     
>>> df = pd.DataFrame([foo, bar]).T
>>> df.columns = ['foo','bar']
>>> df
  foo bar
0  1   4
1  2   5
2  3   6
  foo bar
0  1   4
1  2   5
2  3   6

回答by Borja_042

Another sketchy solution might be to pass a X_ to the title of the column where X is the number of the order of the column:

另一个粗略的解决方案可能是将 X_ 传递给列的标题,其中 X 是列的顺序号:

pd.DataFrame( { '2_foo': pd.Series(foo), '1_bar': pd.Series(bar) } )

And after that you can use columns or something to rename the columns again! The less pythonic code in the world!!!

之后,您可以使用列或其他东西再次重命名列!世界上更少的 Pythonic 代码!!!

Good luck mates!

祝小伙伴们好运!

回答by Leonardo Pmont

What I did is as follow:

我所做的如下:

# Creating list of dict
list_of_dicts = ({'key1':'valueA', 'key2':'valueB},{'key1':'valueC', 'key2':'valueD}')

#getting list of keys from the dict
keys_list = list(list_of_dicts.keys())

# and finally
df = pd.DataFrame(list_of_dicts, columns = keys_list)

Worked perfectly for me.

非常适合我。