Pandas 数据框的 Concat 列表,但忽略列名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41225604/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Concat list of pandas data frame, but ignoring column name
提问by Darren Cook
Sub-title: Dumb it down pandas, stop trying to be clever.
副标题:让Pandas们闭嘴,别再想聪明了。
I've a list (res
) of single-column pandas data frames, each containing the same kind of numeric data, but each with a different column name. The row indices have no meaning. I want to put them into a single, very long, single-column data frame.
我有一个res
单列 Pandas 数据框的列表 ( ),每个都包含相同类型的数字数据,但每个都有不同的列名。行索引没有意义。我想将它们放入一个非常长的单列数据框中。
When I do pd.concat(res)
I get one column per input file (and loads and loads of NaN cells). I've tried various values for the parameters (*), but none that do what I'm after.
当我这样做时,pd.concat(res)
我会为每个输入文件获取一列(以及加载和加载 NaN 单元格)。我已经尝试了参数 (*) 的各种值,但没有一个能满足我的要求。
Edit: Sample data:
编辑:示例数据:
res = [
pd.DataFrame({'A':[1,2,3]}),
pd.DataFrame({'B':[9,8,7,6,5,4]}),
pd.DataFrame({'C':[100,200,300,400]}),
]
I have an ugly-hack solution: copy every data frame and giving it a new column name:
我有一个丑陋的解决方案:复制每个数据框并给它一个新的列名:
newList = []
for r in res:
r.columns = ["same"]
newList.append(r)
pd.concat( newList, ignore_index=True )
Surely that is not the best way to do it??
当然,这不是最好的方法吗??
BTW, pandas: concat data frame with different column nameis similar, but my question is even simpler, as I don't want the index maintained. (I also start with a list of N single-column data frames, not a single N-column data frame.)
顺便说一句,pandas:具有不同列名的 concat 数据框是相似的,但我的问题更简单,因为我不希望维护索引。(我也从 N 个单列数据框的列表开始,而不是单个 N 列数据框。)
*: E.g. axis=0
is default behaviour. axis=1
gives an error. join="inner"
is just silly (I only get the index). ignore_index=True
renumbers the index, but I stil gets lots of columns, lots of NaNs.
*:例如axis=0
是默认行为。axis=1
给出一个错误。join="inner"
只是愚蠢(我只得到索引)。ignore_index=True
重新编号索引,但我仍然得到很多列,很多 NaN。
UPDATE for empty lists
更新空列表
I was having problems (with all the given solutions) when the data had an empty list, something like:
当数据有一个空列表时,我遇到了问题(所有给定的解决方案),例如:
res = [
pd.DataFrame({'A':[1,2,3]}),
pd.DataFrame({'B':[9,8,7,6,5,4]}),
pd.DataFrame({'C':[]}),
pd.DataFrame({'D':[100,200,300,400]}),
]
The trick was to force the type, by adding .astype('float64')
. E.g.
诀窍是通过添加.astype('float64')
. 例如
pd.Series(np.concatenate([df.values.ravel().astype('float64') for df in res]))
or:
或者:
pd.concat(res,axis=0).astype('float64').stack().reset_index(drop=True)
采纳答案by Steven G
I would use list comphrension such has:
我会使用列表理解,例如:
import pandas as pd
res = [
pd.DataFrame({'A':[1,2,3]}),
pd.DataFrame({'B':[9,8,7,6,5,4]}),
pd.DataFrame({'C':[100,200,300,400]}),
]
x = []
[x.extend(df.values.tolist()) for df in res]
pd.DataFrame(x)
Out[49]:
0
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
8 4
9 100
10 200
11 300
12 400
I tested speed for you.
我为你测试了速度。
%timeit x = []; [x.extend(df.values.tolist()) for df in res]; pd.DataFrame(x)
10000 loops, best of 3: 196 μs per loop
%timeit pd.Series(pd.concat(res, axis=1).values.ravel()).dropna()
1000 loops, best of 3: 920 μs per loop
%timeit pd.concat(res, axis=1).stack().reset_index(drop=True)
1000 loops, best of 3: 902 μs per loop
%timeit pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna()
1000 loops, best of 3: 1.07 ms per loop
%timeit pd.Series(np.concatenate([df.values.ravel() for df in res]))
10000 loops, best of 3: 70.2 μs per loop
looks like
好像
pd.Series(np.concatenate([df.values.ravel() for df in res]))
is the fastest.
是最快的。
回答by jezrael
I think you need concat
with stack
:
print (pd.concat(res, axis=1))
A B C
0 1.0 9 100.0
1 2.0 8 200.0
2 3.0 7 300.0
3 NaN 6 400.0
4 NaN 5 NaN
5 NaN 4 NaN
print (pd.concat(res, axis=1).stack().reset_index(drop=True))
0 1.0
1 9.0
2 100.0
3 2.0
4 8.0
5 200.0
6 3.0
7 7.0
8 300.0
9 6.0
10 400.0
11 5.0
12 4.0
dtype: float64
Another solution with numpy.ravel
for flattening:
numpy.ravel
用于展平的另一种解决方案:
print (pd.Series(pd.concat(res, axis=1).values.ravel()).dropna())
0 1.0
1 9.0
2 100.0
3 2.0
4 8.0
5 200.0
6 3.0
7 7.0
8 300.0
10 6.0
11 400.0
13 5.0
16 4.0
dtype: float64
print (pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna())
col
0 1.0
1 9.0
2 100.0
3 2.0
4 8.0
5 200.0
6 3.0
7 7.0
8 300.0
10 6.0
11 400.0
13 5.0
16 4.0
Solution with list comprehension
:
解决方案list comprehension
:
print (pd.Series(np.concatenate([df.values.ravel() for df in res])))
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
8 4
9 100
10 200
11 300
12 400
dtype: int64