Pandas 等价的 rbind 操作

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38838059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:45:52  来源:igfitidea点击:

Pandas equivalent rbind operation

pythonpandas

提问by TTT

Basically, I am looping through a bunch of CSV files and in the end would like to appendeach dataframe into one. Actually, all I need is an rbindtype function. So, I did some search and followed the guide. However, I still could not get the ideal solution.

基本上,我正在遍历一堆 CSV 文件,最后希望将append每个数据帧合并为一个。实际上,我只需要一个rbind类型函数。因此,我进行了一些搜索并按照指南进行了操作。但是,我仍然无法得到理想的解决方案。

A sample code is attached below. For instance shape of data1 is always 47 by 42. But shape of data_out_finalbecomes (47, 42), (47, 84), and (47, 126) after the first three files. Idealy, it should be (141, 42). In addition, I check index of data1, which is RangeIndex(start=0, stop=47, step=1). Appreciate any suggestions!

下面附上示例代码。例如,data1 的形状总是 47 x 42。但是data_out_final在前三个文件之后,形状变为 (47, 42)、(47, 84) 和 (47, 126)。理想情况下,它应该是 (141, 42)。此外,我检查了 的索引data1,即RangeIndex(start=0, stop=47, step=1)。感谢任何建议!

My pandasversion is 0.18.1

我的pandas版本是0.18.1

code

代码

appended_data = []
for csv_each in csv_pool:
    data1 = pd.read_csv(csv_each, header=0)
    # do something here
    appended_data.append(data2) 
data_out_final = pd.concat(appended_data, axis=1)

If using data_out_final = pd.concat(appended_data, axis=1), shape of data_out_final becomes (141, 94)

如果使用data_out_final = pd.concat(appended_data, axis=1),则 data_out_final 的形状变为 (141, 94)

PS

聚苯乙烯

kind of figure it out. Actually, you have to standardize column names before pd.concat.

有点想通了。实际上,您必须在pd.concat.

回答by Asish M.

>>> df1
          a         b
0 -1.417866 -0.828749
1  0.212349  0.791048
2 -0.451170  0.628584
3  0.612671 -0.995330
4  0.078460 -0.322976
5  1.244803  1.576373
6  1.169629 -1.135926
7 -0.652443  0.506388
8  0.549604 -0.691054
9 -0.512829 -0.959398

>>> df2
          a         b
0 -0.652161  0.940932
1  2.495067  0.004833
2 -2.187792  1.692402
3  1.900738  0.372425
4  0.245976  1.894527
5  0.627297  0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8  0.543389  0.703457
9 -0.755059  1.239968

>>> pd.concat([df1, df2])
          a         b
0 -1.417866 -0.828749
1  0.212349  0.791048
2 -0.451170  0.628584
3  0.612671 -0.995330
4  0.078460 -0.322976
5  1.244803  1.576373
6  1.169629 -1.135926
7 -0.652443  0.506388
8  0.549604 -0.691054
9 -0.512829 -0.959398
0 -0.652161  0.940932
1  2.495067  0.004833
2 -2.187792  1.692402
3  1.900738  0.372425
4  0.245976  1.894527
5  0.627297  0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8  0.543389  0.703457
9 -0.755059  1.239968

Unless I'm misinterpreting what you need, this is what you need.

除非我误解了您的需要,否则这就是您所需要的。

回答by Jon

Try: http://pandas.pydata.org/pandas-docs/stable/10min.html?highlight=concat#concat

试试:http: //pandas.pydata.org/pandas-docs/stable/10min.html?highlight=concat#concat

"pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations."

“在连接/合并类型操作的情况下,pandas 提供了各种工具,可以轻松地将 Series、DataFrame 和 Panel 对象与索引和关系代数功能的各种集合逻辑组合在一起。”