在 Pandas 中是否有一种 pythonic 方法来制作列联表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29901436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a pythonic way to do a contingency table in Pandas?
提问by hernanavella
Given a dataframe that looks like this:
给定一个如下所示的数据框:
A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
Is there a pythonic way to create a 2x2 matrix like this:
是否有一种 pythonic 方法来创建这样的 2x2 矩阵:
1 0
1 a b
0 c d
Where:
在哪里:
a = number of obs where the corresponding elements of column A and B are both positive.
a = obs 的数量,其中 A 列和 B 列的相应元素都是正数。
b = number of obs where the corresponding elements of column A are positive and negative in column B.
b = obs 的数量,其中 A 列的相应元素在 B 列中为正数和负数。
c = number of obs where the corresponding elements of column A are negative and positive in column B.
c = obs 的数量,其中 A 列的相应元素在 B 列中为负数和正数。
d = number of obs where the corresponding elements of column A and B are both negative.
d = obs 的数量,其中 A 列和 B 列的相应元素均为负数。
For this example the output would be:
对于此示例,输出将是:
1 0
1 2 3
0 3 1
Thanks
谢谢
采纳答案by lanenok
Let us call your dataframe data
. Try
让我们调用您的 dataframe data
。尝试
a = data['A']>0
b = data['B']>0
data.groupby([a,b]).count()
回答by Dyno Fu
import pandas as pd
from StringIO import StringIO
table = """dt A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)
a = df['A'] > 0
b = df['B'] > 0
df1 = df.groupby([a,b]).count()
print df1["A"].unstack()
output:
输出:
B False True
A
False 1 3
True 3 2
this is just lnanenok's answer and using unstack()
to make it more readable. credit should go to lanenok.
这只是 lnanenok 的答案,并unstack()
用于使其更具可读性。信用应该去lanenok。
回答by Tyr Wiesner-Hanks
Probably easiest to just use the pandas function crosstab
. Borrowing from Dyno Fu above:
可能最简单的方法是使用 pandas 函数crosstab
。借用上面的 Dyno Fu:
import pandas as pd
from StringIO import StringIO
table = """dt A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)
pd.crosstab(df.A > 0, df.B > 0)
Output:
输出:
B False True
A
False 1 3
True 3 2
[2 rows x 2 columns]
Also the table is usable if you want to do a Fisher exact test with scipy.stats
etc:
如果您想对等进行 Fisher 精确检验,该表也可用scipy.stats
:
from scipy.stats import fisher_exact
tab = pd.crosstab(df.A > 0, df.B > 0)
fisher_exact(tab)
回答by Joseph Whiting
Here's a really useful page about the pandas crosstab function:
这是一个关于熊猫交叉表功能的非常有用的页面:
http://chrisalbon.com/python/pandas_crosstabs.html
http://chrisalbon.com/python/pandas_crosstabs.html
So I think for what you'd like to do you should use
所以我认为对于你想做的事情,你应该使用
import pandas as pd
pd.crosstab(data['A']>0, data['B']>0)
Hope that helps!
希望有帮助!