数据框 -pandas/python 中所有可能的列组合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43347939/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
All possible combinations of columns in dataframe -pandas/python
提问by S.Peters
I'm trying to take one dataframe and create another, with all possible combinations of the columns and the difference between the corresponding values, i.e on 11-apr column AB should be (B-A)= 0 etc.
我正在尝试获取一个数据框并创建另一个数据框,其中包含所有可能的列组合以及相应值之间的差异,即 11-apr 列 AB 应为 (BA)= 0 等。
e.g, starting with
例如,从
Dt A B C D
11-apr 1 1 1 1
10-apr 2 3 1 2
how do I get a new frame that looks like this:
我如何获得一个看起来像这样的新框架:
I have come across the below post, but have not been able to transpose this to work for columns.
我遇到了下面的帖子,但无法将其转换为适用于专栏。
回答by jezrael
You can use:
您可以使用:
from itertools import combinations
df = df.set_index('Dt')
cc = list(combinations(df.columns,2))
df = pd.concat([df[c[1]].sub(df[c[0]]) for c in cc], axis=1, keys=cc)
df.columns = df.columns.map(''.join)
print (df)
AB AC AD BC BD CD
Dt
11-apr 0 0 0 0 0 0
10-apr 1 -1 0 -2 -1 1
回答by piRSquared
Make sure your index is Dt
确保您的索引是 Dt
df = df.set_index('Dt')
Using numpy
s np.tril_indices
and slicing
See below for explanation of np.triu_indices
使用numpy
snp.tril_indices
和切片 见下文解释np.triu_indices
v = df.values
i, j = np.tril_indices(len(df.columns), -1)
We can create a pd.MultiIndex
for the columns. This makes it more generalizable for column names that are longer than one character.
我们可以pd.MultiIndex
为列创建一个。这使得它对于长度超过一个字符的列名更具通用性。
pd.DataFrame(
v[:, i] - v[:, j],
df.index,
[df.columns[j], df.columns[i]]
)
A B A B C
B C C D D D
Dt
11-apr 0 0 0 0 0 0
10-apr 1 -1 -2 0 -1 1
But we can also do
但我们也可以做到
pd.DataFrame(
v[:, i] - v[:, j],
df.index,
df.columns[j] + df.columns[i]
)
AB AC BC AD BD CD
Dt
11-apr 0 0 0 0 0 0
10-apr 1 -1 -2 0 -1 1
np.tril_indices
explained
np.tril_indices
解释
This is a numpy
function that returns two arrays that when used together, provide the locations of a lower triangle of a square matrix. This is handy when doing manipulations of all combinations of things as this lower triangle represents all combinations of one axis of a matrix with the other.
这是一个numpy
返回两个数组的函数,当它们一起使用时,提供方阵的下三角形的位置。这在对事物的所有组合进行操作时很方便,因为这个下方的三角形代表矩阵的一个轴与另一个轴的所有组合。
Consider the dataframe d
for illustration
考虑d
用于说明的数据框
d = pd.DataFrame(np.array(list('abcdefghijklmnopqrstuvwxy')).reshape(-1, 5))
d
0 1 2 3 4
0 a b c d e
1 f g h i j
2 k l m n o
3 p q r s t
4 u v w x y
The triangle indices, when looked at like coordinate pairs, looks like this
三角形索引,当看起来像坐标对时,看起来像这样
i, j = np.tril_indices(5, -1)
list(zip(i, j))
[(1, 0),
(2, 0),
(2, 1),
(3, 0),
(3, 1),
(3, 2),
(4, 0),
(4, 1),
(4, 2),
(4, 3)]
I can manipulate d
s values with i
and j
我可以d
用i
和操纵s 值j
d.values[i, j] = 'z'
d
0 1 2 3 4
0 a b c d e
1 z g h i j
2 z z m n o
3 z z z s t
4 z z z z y
And you can see it targeted just that lower triangle
你可以看到它只针对那个较低的三角形
naive time test
天真的时间测试
回答by languitar
itertools.combinations
will help you:
itertools.combinations
会帮助你:
import itertools
pd.DataFrame({'{}{}'.format(a, b): df[a] - df[b] for a, b in itertools.combinations(df.columns, 2)})
Which results in:
结果是:
AB AC AD BC BD CD
Dt
11-apr 0 0 0 0 0 0
10-apr -1 1 0 2 1 -1
回答by Nipun Batra
Itertools
module should help you to create the required combinations/permutations.
Itertools
模块应该可以帮助您创建所需的组合/排列。
from itertools import combinations
# Creating a new pd.DataFrame
new_df = pd.DataFrame(index=df.index)
# list of columns
columns = df.columns
# Create all combinations of length 2 . eg. AB, BC, etc.
for combination in combinations(columns, 2):
combination_string = "".join(combination)
new_df[combination_string] = df[combination[1]]-df[combination[0]]
print new_df
AB AC AD BC BD CD
Dt
11-apr 0 0 0 0 0 0
10-apr 1 -1 0 -2 -1 1