相当于R表的python

Question

提问by Donbeo

I have a list

我有一个清单

[[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]

I want to count the frequency of each element in this list. Something like

我想计算这个列表中每个元素的频率。就像是

freq[[12,6]] = 40

In R this can be obtained with the tablefunction. Is there anything similar in python3?

在 R 中，这可以通过table函数获得。python3中有类似的东西吗？

Answer 1

采纳答案by Brionius

A Counterobject from the collectionslibrary will function like that.

库中的Counter对象collections将具有这样的功能。

from collections import Counter

x = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]

# Since the elements passed to a `Counter` must be hashable, we have to change the lists to tuples.
x = [tuple(element) for element in x]

freq = Counter(x)

print freq[(12,6)]

# Result:  28

Answer 2

回答by andilabs

import pandas
x = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]] 
ps = pandas.Series([tuple(i) for i in x])
counts = ps.value_counts()
print counts

you will get the result like:

你会得到如下结果：

(12, 0)    33
(12, 6)    28
(6, 0)     20
(0, 6)     19

and for [(12,6)]you will get exact number, here 28

并且[(12,6)]你会得到确切的数字，在这里28

more about pandas, which is powerful Python data analysis toolkit, you can read in official doc: http://pandas.pydata.org/pandas-docs/stable/

更多关于pandas，这是一个强大的Python数据分析工具包，你可以阅读官方文档：http: //pandas.pydata.org/pandas-docs/stable/

UPDATE:

更新：

If order does not matter just use sorted: ps = pandas.Series([tuple(sorted(i)) for i in x])after that result is:

如果顺序无关紧要，只需使用 sorted: ps = pandas.Series([tuple(sorted(i)) for i in x])之后的结果是：

(0, 6)     39
(0, 12)    33
(6, 12)    28

Answer 3

回答by Shankar Chavan

Pandas has a built-in function called value_counts().

Pandas 有一个名为value_counts().

Example: if your DataFrame has a column with values as 0's and 1's, and you want to count the total frequencies for each of them, then simply use this:

示例：如果您的 DataFrame 有一列值为 0 和 1，并且您想计算每个列的总频率，则只需使用以下命令：

df.colName.value_counts()

Answer 4

回答by thorbjornwolf

Supposing you need to convert the data to a pandas DataFrameanyway, so that you have

假设您无论如何都需要将数据转换为pandas DataFrame，以便您拥有

L = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
df = pd.DataFrame(L, columns=('a', 'b'))

then you can do as suggested in this answer, using groupby.size():

那么您可以按照此答案中的建议进行操作，使用groupby.size()：

tab = df.groupby(['a', 'b']).size()

tablooks as follows:

tab如下所示：

In [5]: tab
Out[5]:
a   b
0   6    19
6   0    20
12  0    33
    6    28
dtype: int64

and can easily be changed to a table form with unstack():

并且可以轻松地更改为表格形式unstack()：

In [6]: tab.unstack()
Out[6]:
b      0     6
a
0    NaN  19.0
6   20.0   NaN
12  33.0  28.0

Fill NaNsand convert to intat your own leisure!

填写NaNs并转换为int您自己的闲暇时间！

Answer 5

回答by erickfis

IMHO, pandas offers a better solution for this "tabulation" problem:

恕我直言，熊猫为这个“制表”问题提供了更好的解决方案：

One dimension:

一维：

my_tab = pd.crosstab(index = df["feature_you_r_interested_in"],
                              columns="count")

Proportion count:

比例计数：

my_tab/my_tab.sum()

Two-dimensions (with totals):

二维（带总数）：

cross = pd.crosstab(index=df["feat1"], 
                             columns=df["feat2"],
                             margins=True)

cross

Also, as mentioned by other coleagues, pandas value_counts method could be all you need. It is so good that you can have the counts as percentages if you want:

此外，正如其他同事所提到的，pandas value_counts 方法可能就是你所需要的。如果您愿意，您可以将计数作为百分比，这太好了：

df['your feature'].value_counts(normalize=True)

I'm very grateful for this blog:

我非常感谢这个博客：

http://hamelg.blogspot.com.br/2015/11/python-for-data-analysis-part-19_17.html

Answer 6

回答by nachoes

You can probably do a 1-dimensional count with list comprehension.

您可能可以使用列表理解进行一维计数。

L = [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]
countey = [tuple(x) for x in L]
freq = {x:countey.count(x) for x in set(countey)}

In [2]: %timeit {x:countey.count(x) for x in set(countey)}
        100000 loops, best of 3: 15.2 μs per loop   

In [4]: print(freq)
Out[4]: {(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}

In [5]: print(freq[(12,6)])
Out[5]: 28

Answer 7

回答by Sam Mason

In Numpy, the best way I've found of doing this is to use unique, e.g:

在 Numpy 中，我发现这样做的最好方法是使用unique，例如：

import numpy as np

# OPs data
arr = np.array([[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]])

values, counts = np.unique(arr, axis=0, return_counts=True)

# into a dict for presentation
{tuple(a):b for a,b in zip(values, counts)}

giving me: {(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}which matches the other answers

给我：{(0, 6): 19, (6, 0): 20, (12, 0): 33, (12, 6): 28}这与其他答案相匹配

This example is a bit more complicated than I normally see, and hence the need for the axis=0option, if you just want unique values everywhere, you can just miss that out:

这个例子比我通常看到的要复杂一些，因此需要这个axis=0选项，如果你只是想要到处都是唯一的值，你可能会错过它：

# generate random values
x = np.random.negative_binomial(10, 10/(6+10), 100000)

# get table
values, counts = np.unique(x, return_counts=True)

# plot
import matplotlib.pyplot as plt
plt.vlines(values, 0, counts, lw=2)

R seems to make this sort of thing much more convenient! The above Python code is just plot(table(rnbinom(100000, 10, mu=6))).

R 似乎让这种事情变得更方便了！上面的 Python 代码只是plot(table(rnbinom(100000, 10, mu=6))).

相当于R表的python

提问by Donbeo

采纳答案by Brionius

回答by andilabs

回答by Shankar Chavan

回答by thorbjornwolf

回答by erickfis

回答by nachoes

回答by Sam Mason

相关推荐

最近更新

标签

相当于R表的python

提问by Donbeo

采纳答案by Brionius

回答by andilabs

回答by Shankar Chavan

回答by thorbjornwolf

回答by erickfis

回答by nachoes

回答by Sam Mason

相关推荐

Python 使用 WHERE 在 SQLAlchemy Core 中进行批量更新

Python Pandas 数据框基于多个 if 语句添加一个字段

Python 'module' 对象没有属性 'choice' - 尝试使用 random.choice

在python中组合函数

相关推荐

最近更新

标签