数据框 -pandas/python 中所有可能的列组合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43347939/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:22:38  来源:igfitidea点击:

All possible combinations of columns in dataframe -pandas/python

pandascombinations

提问by S.Peters

I'm trying to take one dataframe and create another, with all possible combinations of the columns and the difference between the corresponding values, i.e on 11-apr column AB should be (B-A)= 0 etc.

我正在尝试获取一个数据框并创建另一个数据框,其中包含所有可能的列组合以及相应值之间的差异,即 11-apr 列 AB 应为 (BA)= 0 等。

e.g, starting with

例如,从

        Dt              A           B           C          D
        11-apr          1           1           1          1
        10-apr          2           3           1          2

how do I get a new frame that looks like this:

我如何获得一个看起来像这样的新框架:

desired result

想要的结果

I have come across the below post, but have not been able to transpose this to work for columns.

我遇到了下面的帖子,但无法将其转换为适用于专栏。

Aggregate all dataframe row pair combinations using pandas

使用 Pandas 聚合所有数据帧行对组合

回答by jezrael

You can use:

您可以使用:

from itertools import combinations
df = df.set_index('Dt')

cc = list(combinations(df.columns,2))
df = pd.concat([df[c[1]].sub(df[c[0]]) for c in cc], axis=1, keys=cc)
df.columns = df.columns.map(''.join)
print (df)
        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1

回答by piRSquared

Make sure your index is Dt

确保您的索引是 Dt

df = df.set_index('Dt')

Using numpys np.tril_indicesand slicing See below for explanation of np.triu_indices

使用numpysnp.tril_indices和切片 见下文解释np.triu_indices

v = df.values

i, j = np.tril_indices(len(df.columns), -1)

We can create a pd.MultiIndexfor the columns. This makes it more generalizable for column names that are longer than one character.

我们可以pd.MultiIndex为列创建一个。这使得它对于长度超过一个字符的列名更具通用性。

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    [df.columns[j], df.columns[i]]
)

        A     B  A  B  C
        B  C  C  D  D  D
Dt                      
11-apr  0  0  0  0  0  0
10-apr  1 -1 -2  0 -1  1

But we can also do

但我们也可以做到

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    df.columns[j] + df.columns[i]
)

        AB  AC  BC  AD  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1  -2   0  -1   1


np.tril_indicesexplained

np.tril_indices解释

This is a numpyfunction that returns two arrays that when used together, provide the locations of a lower triangle of a square matrix. This is handy when doing manipulations of all combinations of things as this lower triangle represents all combinations of one axis of a matrix with the other.

这是一个numpy返回两个数组的函数,当它们一起使用时,提供方阵的下三角形的位置。这在对事物的所有组合进行操作时很方便,因为这个下方的三角形代表矩阵的一个轴与另一个轴的所有组合。

Consider the dataframe dfor illustration

考虑d用于说明的数据框

d = pd.DataFrame(np.array(list('abcdefghijklmnopqrstuvwxy')).reshape(-1, 5))
d

   0  1  2  3  4
0  a  b  c  d  e
1  f  g  h  i  j
2  k  l  m  n  o
3  p  q  r  s  t
4  u  v  w  x  y

The triangle indices, when looked at like coordinate pairs, looks like this

三角形索引,当看起来像坐标对时,看起来像这样

i, j = np.tril_indices(5, -1)
list(zip(i, j))

[(1, 0),
 (2, 0),
 (2, 1),
 (3, 0),
 (3, 1),
 (3, 2),
 (4, 0),
 (4, 1),
 (4, 2),
 (4, 3)]

I can manipulate ds values with iand j

我可以di和操纵s 值j

d.values[i, j] = 'z'
d

   0  1  2  3  4
0  a  b  c  d  e
1  z  g  h  i  j
2  z  z  m  n  o
3  z  z  z  s  t
4  z  z  z  z  y

And you can see it targeted just that lower triangle

你可以看到它只针对那个较低的三角形

naive time test

天真的时间测试

enter image description here

在此处输入图片说明

回答by languitar

itertools.combinationswill help you:

itertools.combinations会帮助你:

import itertools
pd.DataFrame({'{}{}'.format(a, b): df[a] - df[b] for a, b in itertools.combinations(df.columns, 2)})

Which results in:

结果是:

        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr  -1   1   0   2   1  -1

回答by Nipun Batra

Itertoolsmodule should help you to create the required combinations/permutations.

Itertools模块应该可以帮助您创建所需的组合/排列。

from itertools import combinations

# Creating a new pd.DataFrame
new_df = pd.DataFrame(index=df.index)

# list of columns
columns = df.columns

# Create all combinations of length 2 . eg. AB, BC, etc.
for combination in combinations(columns, 2):
    combination_string = "".join(combination)
    new_df[combination_string] = df[combination[1]]-df[combination[0]]
    print new_df


         AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1