相当于R表的python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25710875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python equivalent of R table
提问by Donbeo
I have a list
我有一个清单
[[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
I want to count the frequency of each element in this list. Something like
我想计算这个列表中每个元素的频率。就像是
freq[[12,6]] = 40
In R this can be obtained with the tablefunction. Is there anything similar in python3?
在 R 中,这可以通过table函数获得。python3中有类似的东西吗?
采纳答案by Brionius
A Counterobject from the collectionslibrary will function like that.
库中的Counter对象collections将具有这样的功能。
from collections import Counter
x = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
# Since the elements passed to a `Counter` must be hashable, we have to change the lists to tuples.
x = [tuple(element) for element in x]
freq = Counter(x)
print freq[(12,6)]
# Result: 28
回答by andilabs
import pandas
x = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
ps = pandas.Series([tuple(i) for i in x])
counts = ps.value_counts()
print counts
you will get the result like:
你会得到如下结果:
(12, 0) 33
(12, 6) 28
(6, 0) 20
(0, 6) 19
and for [(12,6)]you will get exact number, here 28
并且[(12,6)]你会得到确切的数字,在这里28
more about pandas, which is powerful Python data analysis toolkit, you can read in official doc: http://pandas.pydata.org/pandas-docs/stable/
更多关于pandas,这是一个强大的Python数据分析工具包,你可以阅读官方文档:http: //pandas.pydata.org/pandas-docs/stable/
UPDATE:
更新:
If order does not matter just use sorted:
ps = pandas.Series([tuple(sorted(i)) for i in x])after that result is:
如果顺序无关紧要,只需使用 sorted:
ps = pandas.Series([tuple(sorted(i)) for i in x])之后的结果是:
(0, 6) 39
(0, 12) 33
(6, 12) 28
回答by Shankar Chavan
Pandas has a built-in function called value_counts().
Pandas 有一个名为value_counts().
Example: if your DataFrame has a column with values as 0's and 1's, and you want to count the total frequencies for each of them, then simply use this:
示例:如果您的 DataFrame 有一列值为 0 和 1,并且您想计算每个列的总频率,则只需使用以下命令:
df.colName.value_counts()
回答by thorbjornwolf
Supposing you need to convert the data to a pandas DataFrameanyway, so that you have
假设您无论如何都需要将数据转换为pandas DataFrame,以便您拥有
L = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
df = pd.DataFrame(L, columns=('a', 'b'))
then you can do as suggested in this answer, using groupby.size():
那么您可以按照此答案中的建议进行操作,使用groupby.size():
tab = df.groupby(['a', 'b']).size()
tablooks as follows:
tab如下所示:
In [5]: tab
Out[5]:
a b
0 6 19
6 0 20
12 0 33
6 28
dtype: int64
and can easily be changed to a table form with unstack():
并且可以轻松地更改为表格形式unstack():
In [6]: tab.unstack()
Out[6]:
b 0 6
a
0 NaN 19.0
6 20.0 NaN
12 33.0 28.0
Fill NaNsand convert to intat your own leisure!
回答by erickfis
IMHO, pandas offers a better solution for this "tabulation" problem:
恕我直言,熊猫为这个“制表”问题提供了更好的解决方案:
One dimension:
一维:
my_tab = pd.crosstab(index = df["feature_you_r_interested_in"],
columns="count")
Proportion count:
比例计数:
my_tab/my_tab.sum()
Two-dimensions (with totals):
二维(带总数):
cross = pd.crosstab(index=df["feat1"],
columns=df["feat2"],
margins=True)
cross
Also, as mentioned by other coleagues, pandas value_counts method could be all you need. It is so good that you can have the counts as percentages if you want:
此外,正如其他同事所提到的,pandas value_counts 方法可能就是你所需要的。如果您愿意,您可以将计数作为百分比,这太好了:
df['your feature'].value_counts(normalize=True)
I'm very grateful for this blog:
我非常感谢这个博客:
http://hamelg.blogspot.com.br/2015/11/python-for-data-analysis-part-19_17.html
http://hamelg.blogspot.com.br/2015/11/python-for-data-analysis-part-19_17.html
回答by nachoes
You can probably do a 1-dimensional count with list comprehension.
您可能可以使用列表理解进行一维计数。
L = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
countey = [tuple(x) for x in L]
freq = {x:countey.count(x) for x in set(countey)}
In [2]: %timeit {x:countey.count(x) for x in set(countey)}
100000 loops, best of 3: 15.2 μs per loop
In [4]: print(freq)
Out[4]: {(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}
In [5]: print(freq[(12,6)])
Out[5]: 28
回答by Sam Mason
In Numpy, the best way I've found of doing this is to use unique, e.g:
在 Numpy 中,我发现这样做的最好方法是使用unique,例如:
import numpy as np
# OPs data
arr = np.array([[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]])
values, counts = np.unique(arr, axis=0, return_counts=True)
# into a dict for presentation
{tuple(a):b for a,b in zip(values, counts)}
giving me: {(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}which matches the other answers
给我:{(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}这与其他答案相匹配
This example is a bit more complicated than I normally see, and hence the need for the axis=0option, if you just want unique values everywhere, you can just miss that out:
这个例子比我通常看到的要复杂一些,因此需要这个axis=0选项,如果你只是想要到处都是唯一的值,你可能会错过它:
# generate random values
x = np.random.negative_binomial(10, 10/(6+10), 100000)
# get table
values, counts = np.unique(x, return_counts=True)
# plot
import matplotlib.pyplot as plt
plt.vlines(values, 0, counts, lw=2)
R seems to make this sort of thing much more convenient! The above Python code is just plot(table(rnbinom(100000, 10, mu=6))).
R 似乎让这种事情变得更方便了!上面的 Python 代码只是plot(table(rnbinom(100000, 10, mu=6))).


