Python rbind 的 Pandas 版本

Question

提问by N. McA.

In R, you can combine two dataframes by sticking the columns of one onto the bottom of the columns of the other using rbind. In pandas, how do you accomplish the same thing? It seems bizarrely difficult.

在 R 中，您可以通过使用 rbind 将一个数据帧的列粘贴到另一个数据帧的底部来组合两个数据帧。在熊猫中，你如何完成同样的事情？这似乎异常困难。

Using append results in a horrible mess including NaNs and things for reasons I don't understand. I'm just trying to "rbind" two identical frames that look like this:

使用 append 导致可怕的混乱，包括 NaN 和我不明白的原因。我只是想“绑定”两个看起来像这样的相同框架：

EDIT: I was creating the DataFrames in a stupid way, which was causing issues. Append=rbind to all intents and purposes. See answer below.

编辑：我正在以一种愚蠢的方式创建数据帧，这导致了问题。Append=rbind 到所有意图和目的。请参阅下面的答案。

        0         1       2        3          4          5        6                    7
0   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43
4   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44
5  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44
6   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45

But I'm getting something horrible a la this:

但是我得到了一些可怕的东西：

        0         1        2        3          4         5        6                    7       0         1       2        3          4          5        6                    7
0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43
4     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44
5     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44
6     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45
0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN

And I don't understand why. I'm starting to miss R :(

我不明白为什么。我开始想念 R :(

Answer 1

采纳答案by N. McA.

Ah, this is to do with how I created the DataFrame, not with how I was combining them. The long and the short of it is, if you are creating a frame using a loop and a statement that looks like this:

啊，这与我创建 DataFrame 的方式有关，而不是与我如何组合它们有关。如果您使用循环和如下所示的语句创建框架，则总而言之：

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData))

You must ignore the index

你必须忽略索引

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData), ignore_index=True)

Or you will have issues later when combining data.

或者您稍后在合并数据时会遇到问题。

Answer 2

回答by abudis

This worked for me:

这对我有用：

import numpy as np
import pandas as pd

dates = np.asarray(pd.date_range('1/1/2000', periods=8))
df1 = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df2 = df1.copy()
df = df1.append(df2)

Yields:

产量：

                   A         B         C         D
2000-01-01 -0.327208  0.552500  0.862529  0.493109
2000-01-02  1.039844 -2.141089 -0.781609  1.307600
2000-01-03 -0.462831  0.066505 -1.698346  1.123174
2000-01-04 -0.321971 -0.544599 -0.486099 -0.283791
2000-01-05  0.693749  0.544329 -1.606851  0.527733
2000-01-06 -2.461177 -0.339378 -0.236275  0.155569
2000-01-07 -0.597156  0.904511  0.369865  0.862504
2000-01-08 -0.958300 -0.583621 -2.068273  0.539434
2000-01-01 -0.327208  0.552500  0.862529  0.493109
2000-01-02  1.039844 -2.141089 -0.781609  1.307600
2000-01-03 -0.462831  0.066505 -1.698346  1.123174
2000-01-04 -0.321971 -0.544599 -0.486099 -0.283791
2000-01-05  0.693749  0.544329 -1.606851  0.527733
2000-01-06 -2.461177 -0.339378 -0.236275  0.155569
2000-01-07 -0.597156  0.904511  0.369865  0.862504
2000-01-08 -0.958300 -0.583621 -2.068273  0.539434

If you don't already use the latest version of pandasI highly recommend upgrading. It is now possible to operate with DataFrames which contain duplicate indices.

如果您还没有使用pandas我强烈建议升级的最新版本。现在可以使用包含重复索引的 DataFrame 进行操作。

Answer 3

回答by Bem Ostap

import pandas as pd 
import numpy as np

If you have a DataFramelike this:

如果您有这样的DataFrame：

array = np.random.randint( 0,10, size = (2,4) )
df = pd.DataFrame(array, columns = ['A','B', 'C', 'D'], \ 
                           index = ['10aa', '20bb'] )  ### some crazy indexes
df

      A  B  C  D
10aa  4  2  4  6
20bb  5  1  0  2

And you want addsome NEW ROWwhich is a list (or another iterable object):

并且您想要添加一些NEW ROW，它是一个列表（或另一个可迭代对象）：

List = [i**3 for i in range(df.shape[1]) ]
List
[0, 1, 8, 27]

You should transform list to dictionary with keys equals columns in DataFrame with zip()function:

您应该使用zip()函数将列表转换为字典，键等于 DataFrame 中的列：

Dict = dict(  zip(df.columns, List)  )
Dict
{'A': 0, 'B': 1, 'C': 8, 'D': 27}

Than you can use append()method to add new dictionary:

比您可以使用append()方法添加新字典：

df = df.append(Dict, ignore_index=True)
df
    A   B   C   D
0   7   5   5   4
1   5   8   4   1
2   0   1   8   27

N.B.the indexes are droped.

注意索引被删除。

And yeah, it's not as simple as cbind()in R :(

是的，它不像R 中的cbind()那样简单:(

Answer 4

回答by B.Mr.W.

pd.concatwill serve the purpose of rbindin R.

pd.concat将服务rbind于 R 中的目的。

import pandas as pd
df1 = pd.DataFrame({'col1': [1,2], 'col2':[3,4]})
df2 = pd.DataFrame({'col1': [5,6], 'col2':[7,8]})
print(df1)
print(df2)
print(pd.concat([df1, df2]))

The outcome will looks like:

结果将如下所示：

   col1  col2
0     1     3
1     2     4
   col1  col2
0     5     7
1     6     8
   col1  col2
0     1     3
1     2     4
0     5     7
1     6     8

If you read the documentation careful enough, it will also explain other operations like cbind, ..etc.

如果您足够仔细地阅读文档，它还会解释其他操作，如 cbind、.. 等。

Python rbind 的 Pandas 版本

提问by N. McA.

采纳答案by N. McA.

回答by abudis

回答by Bem Ostap

回答by B.Mr.W.

相关推荐

最近更新

标签

Python rbind 的 Pandas 版本

提问by N. McA.

采纳答案by N. McA.

回答by abudis

回答by Bem Ostap

回答by B.Mr.W.

相关推荐

如何在 Python 中解析 DNS？

Python Django：测试页面是否已重定向到所需的 url

Python 无法使用烧瓶路由到“/登录”？

Python 如何检查两个线段是否相交？

相关推荐

最近更新

标签