Pandas 数据框的 Concat 列表，但忽略列名

Question

提问by Darren Cook

Sub-title: Dumb it down pandas, stop trying to be clever.

副标题：让Pandas们闭嘴，别再想聪明了。

I've a list (res) of single-column pandas data frames, each containing the same kind of numeric data, but each with a different column name. The row indices have no meaning. I want to put them into a single, very long, single-column data frame.

我有一个res单列 Pandas 数据框的列表 ( )，每个都包含相同类型的数字数据，但每个都有不同的列名。行索引没有意义。我想将它们放入一个非常长的单列数据框中。

When I do pd.concat(res)I get one column per input file (and loads and loads of NaN cells). I've tried various values for the parameters (*), but none that do what I'm after.

当我这样做时，pd.concat(res)我会为每个输入文件获取一列（以及加载和加载 NaN 单元格）。我已经尝试了参数 (*) 的各种值，但没有一个能满足我的要求。

Edit: Sample data:

编辑：示例数据：

res = [
    pd.DataFrame({'A':[1,2,3]}),
    pd.DataFrame({'B':[9,8,7,6,5,4]}),
    pd.DataFrame({'C':[100,200,300,400]}),
]

I have an ugly-hack solution: copy every data frame and giving it a new column name:

我有一个丑陋的解决方案：复制每个数据框并给它一个新的列名：

newList = []
for r in res:
  r.columns = ["same"]
  newList.append(r)
pd.concat( newList, ignore_index=True )

Surely that is not the best way to do it??

当然，这不是最好的方法吗？？

BTW, pandas: concat data frame with different column nameis similar, but my question is even simpler, as I don't want the index maintained. (I also start with a list of N single-column data frames, not a single N-column data frame.)

顺便说一句，pandas：具有不同列名的 concat 数据框是相似的，但我的问题更简单，因为我不希望维护索引。（我也从 N 个单列数据框的列表开始，而不是单个 N 列数据框。）

*: E.g. axis=0is default behaviour. axis=1gives an error. join="inner"is just silly (I only get the index). ignore_index=Truerenumbers the index, but I stil gets lots of columns, lots of NaNs.

*：例如axis=0是默认行为。axis=1给出一个错误。join="inner"只是愚蠢（我只得到索引）。ignore_index=True重新编号索引，但我仍然得到很多列，很多 NaN。

UPDATE for empty lists

更新空列表

I was having problems (with all the given solutions) when the data had an empty list, something like:

当数据有一个空列表时，我遇到了问题（所有给定的解决方案），例如：

res = [
    pd.DataFrame({'A':[1,2,3]}),
    pd.DataFrame({'B':[9,8,7,6,5,4]}),
    pd.DataFrame({'C':[]}),
    pd.DataFrame({'D':[100,200,300,400]}),
]

The trick was to force the type, by adding .astype('float64'). E.g.

诀窍是通过添加.astype('float64'). 例如

pd.Series(np.concatenate([df.values.ravel().astype('float64') for df in res]))

or:

或者：

pd.concat(res,axis=0).astype('float64').stack().reset_index(drop=True)

Answer 1

采纳答案by Steven G

I would use list comphrension such has:

我会使用列表理解，例如：

import pandas as pd
res = [
    pd.DataFrame({'A':[1,2,3]}),
    pd.DataFrame({'B':[9,8,7,6,5,4]}),
    pd.DataFrame({'C':[100,200,300,400]}),
]


x = []
[x.extend(df.values.tolist()) for df in res]
pd.DataFrame(x)

Out[49]: 
      0
0     1
1     2
2     3
3     9
4     8
5     7
6     6
7     5
8     4
9   100
10  200
11  300
12  400

I tested speed for you.

我为你测试了速度。

%timeit x = []; [x.extend(df.values.tolist()) for df in res]; pd.DataFrame(x)
10000 loops, best of 3: 196 μs per loop
%timeit pd.Series(pd.concat(res, axis=1).values.ravel()).dropna()
1000 loops, best of 3: 920 μs per loop
%timeit pd.concat(res, axis=1).stack().reset_index(drop=True)
1000 loops, best of 3: 902 μs per loop
%timeit pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna()
1000 loops, best of 3: 1.07 ms per loop
%timeit pd.Series(np.concatenate([df.values.ravel() for df in res]))
10000 loops, best of 3: 70.2 μs per loop

looks like

好像

pd.Series(np.concatenate([df.values.ravel() for df in res]))

is the fastest.

是最快的。

Answer 2

回答by jezrael

I think you need concatwith stack:

我认为你需要concat有stack：

print (pd.concat(res, axis=1))
     A  B      C
0  1.0  9  100.0
1  2.0  8  200.0
2  3.0  7  300.0
3  NaN  6  400.0
4  NaN  5    NaN
5  NaN  4    NaN

print (pd.concat(res, axis=1).stack().reset_index(drop=True))
0       1.0
1       9.0
2     100.0
3       2.0
4       8.0
5     200.0
6       3.0
7       7.0
8     300.0
9       6.0
10    400.0
11      5.0
12      4.0
dtype: float64

Another solution with numpy.ravelfor flattening:

numpy.ravel用于展平的另一种解决方案：

print (pd.Series(pd.concat(res, axis=1).values.ravel()).dropna())
0       1.0
1       9.0
2     100.0
3       2.0
4       8.0
5     200.0
6       3.0
7       7.0
8     300.0
10      6.0
11    400.0
13      5.0
16      4.0
dtype: float64

print (pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna())
      col
0     1.0
1     9.0
2   100.0
3     2.0
4     8.0
5   200.0
6     3.0
7     7.0
8   300.0
10    6.0
11  400.0
13    5.0
16    4.0

Solution with list comprehension:

解决方案list comprehension：

print (pd.Series(np.concatenate([df.values.ravel() for df in res])))
0       1
1       2
2       3
3       9
4       8
5       7
6       6
7       5
8       4
9     100
10    200
11    300
12    400
dtype: int64

Pandas 数据框的 Concat 列表，但忽略列名

提问by Darren Cook

采纳答案by Steven G

回答by jezrael

相关推荐

最近更新

标签

Pandas 数据框的 Concat 列表，但忽略列名

提问by Darren Cook

采纳答案by Steven G

回答by jezrael

相关推荐

Pandas Groupby：计数和平均值相结合

pandas 如何使用熊猫选择重复的行？

pandas 循环遍历一列熊猫时获取索引

pandas 将分类变量从 String 转换为 int 表示

相关推荐

最近更新

标签