在 Pandas 中是否有一种 pythonic 方法来制作列联表？

Question

提问by hernanavella

Given a dataframe that looks like this:

给定一个如下所示的数据框：

            A   B      
2005-09-06  5  -2  
2005-09-07 -1   3  
2005-09-08  4   5 
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9 
2005-09-12  2   8  
2005-09-13  6  -5  
2005-09-14  6  -5

Is there a pythonic way to create a 2x2 matrix like this:

是否有一种 pythonic 方法来创建这样的 2x2 矩阵：

    1  0
 1  a  b
 0  c  d

Where:

在哪里：

a = number of obs where the corresponding elements of column A and B are both positive.

a = obs 的数量，其中 A 列和 B 列的相应元素都是正数。

b = number of obs where the corresponding elements of column A are positive and negative in column B.

b = obs 的数量，其中 A 列的相应元素在 B 列中为正数和负数。

c = number of obs where the corresponding elements of column A are negative and positive in column B.

c = obs 的数量，其中 A 列的相应元素在 B 列中为负数和正数。

d = number of obs where the corresponding elements of column A and B are both negative.

d = obs 的数量，其中 A 列和 B 列的相应元素均为负数。

For this example the output would be:

对于此示例，输出将是：

    1  0
 1  2  3
 0  3  1

Thanks

谢谢

Answer 1

采纳答案by lanenok

Let us call your dataframe data. Try

让我们调用您的 dataframe data。尝试

a = data['A']>0
b = data['B']>0
data.groupby([a,b]).count()

Answer 2

回答by Dyno Fu

import pandas as pd
from StringIO import StringIO

table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

a = df['A'] > 0
b = df['B'] > 0
df1 = df.groupby([a,b]).count()
print df1["A"].unstack()

output:

输出：

B      False  True
A
False      1      3
True       3      2

this is just lnanenok's answer and using unstack()to make it more readable. credit should go to lanenok.

这只是 lnanenok 的答案，并unstack()用于使其更具可读性。信用应该去lanenok。

Answer 3

回答by Tyr Wiesner-Hanks

Probably easiest to just use the pandas function crosstab. Borrowing from Dyno Fu above:

可能最简单的方法是使用 pandas 函数crosstab。借用上面的 Dyno Fu：

import pandas as pd
from StringIO import StringIO
table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

pd.crosstab(df.A > 0, df.B > 0)

Output:

输出：

B      False  True 
A                  
False      1      3
True       3      2

[2 rows x 2 columns]

Also the table is usable if you want to do a Fisher exact test with scipy.statsetc:

如果您想对等进行 Fisher 精确检验，该表也可用scipy.stats：

from scipy.stats import fisher_exact
tab = pd.crosstab(df.A > 0, df.B > 0)
fisher_exact(tab)

Answer 4

回答by Joseph Whiting

Here's a really useful page about the pandas crosstab function:

这是一个关于熊猫交叉表功能的非常有用的页面：

http://chrisalbon.com/python/pandas_crosstabs.html

So I think for what you'd like to do you should use

所以我认为对于你想做的事情，你应该使用

import pandas as pd
pd.crosstab(data['A']>0, data['B']>0)

Hope that helps!

希望有帮助！

在 Pandas 中是否有一种 pythonic 方法来制作列联表？

提问by hernanavella

采纳答案by lanenok

回答by Dyno Fu

回答by Tyr Wiesner-Hanks

回答by Joseph Whiting

相关推荐

最近更新

标签

在 Pandas 中是否有一种 pythonic 方法来制作列联表？

提问by hernanavella

采纳答案by lanenok

回答by Dyno Fu

回答by Tyr Wiesner-Hanks

回答by Joseph Whiting

相关推荐

Python 在bash中获取变量类型

Python 调用 Tk() 实际上有什么作用？

用 Python 编写一个适用于 Windows 中的 Python 2.7+ 和 Python 3.3+ 的 .CSV 文件

Python PIL 0.5 不透明度、透明度、alpha

相关推荐

最近更新

标签