Python 熊猫的大小和数量有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33346591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:13:08  来源:igfitidea点击:

What is the difference between size and count in pandas?

pythonpandasnumpynandifference

提问by Donovan Thomson

That is the difference between groupby("x").countand groupby("x").sizein pandas ?

那是pandasgroupby("x").countgroupby("x").sizein之间的区别吗?

Does size just exclude nil ?

size 是否只排除 nil ?

采纳答案by EdChum

sizeincludes NaNvalues, countdoes not:

size包括NaN值,count不:

In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df

Out[46]:
   a   b         c
0  0   1  1.067627
1  0   2  0.554691
2  1   3  0.458084
3  2   4  0.426635
4  2 NaN -2.238091
5  2   4  1.256943

In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())

a
0    2
1    1
2    2
Name: b, dtype: int64

a
0    2
1    1
2    3
dtype: int64 

回答by Bubble Bubble Bubble Gut

Just to add a little bit to @Edchum's answer, even if the data has no NA values, the result of count() is more verbose, using the example before:

只是在@Edchum 的答案中添加一点,即使数据没有 NA 值,count() 的结果也更加冗长,使用之前的示例:

grouped = df.groupby('a')
grouped.count()
Out[197]: 
   b  c
a      
0  2  2
1  1  1
2  2  3
grouped.size()
Out[198]: 
a
0    2
1    1
2    3
dtype: int64

回答by Mukul Taneja

When we are dealing with normal dataframes then only difference will be an inclusion of NAN values, means count does not include NAN values while counting rows.

当我们处理普通数据帧时,唯一的区别是包含 NAN 值,这意味着计数在计算行数时不包含 NAN 值。

But if we are using these functions with the groupbythen, to get the correct results by count()we have to associate any numeric field with the groupbyto get the exact number of groups where for size()there is no need for this type of association.

但是,如果我们将这些函数与groupbythen 一起使用,为了获得正确的结果,count()我们必须将任何数字字段与 the 关联groupby以获得确切的组数,因为size()不需要这种类型的关联。

回答by cs95

What is the difference between size and count in pandas?

熊猫的大小和数量有什么区别?

The other answers have pointed out the difference, however, it is not completely accurateto say "sizecounts NaNs while countdoes not". While sizedoes indeed count NaNs, this is actually a consequence of the fact that sizereturns the size(or the length) of the objectit is called on. Naturally, this also includes rows/values which are NaN.

其他答案指出了差异,但是,说“计数 NaN 而不计数”并不完全准确。虽然确实计算 NaN,但这实际上是返回调用它的对象大小(或长度)这一事实的结果。当然,这也包括 NaN 的行/值。sizecountsizesize

So, to summarize, sizereturns the size of the Series/DataFrame1,

因此,总而言之,size返回 Series/DataFrame 1的大小,

df = pd.DataFrame({'A': ['x', 'y', np.nan, 'z']})
df

     A
0    x
1    y
2  NaN
3    z

df.A.size
# 4

...while countcounts the non-NaN values:

...同时count计算非 NaN 值:

df.A.count()
# 3 

Notice that sizeis an attribute (gives the same result as len(df)or len(df.A)). countis a function.

请注意,这size是一个属性(给出与len(df)或相同的结果len(df.A))。count是一个函数。

1. DataFrame.sizeis also an attribute and returns the number of elements in the DataFrame (rows x columns).

1.DataFrame.size也是一个属性,返回 DataFrame 中元素的数量(行 x 列)。



Behaviour with GroupBy- Output Structure

行为与GroupBy- 输出结构

Besides the basic difference, there's also the difference in the structure of the generated output when calling GroupBy.size()vs GroupBy.count().

除了基本的区别之外,调用GroupBy.size()vs时生成的输出的结构也有区别GroupBy.count()

df = pd.DataFrame({'A': list('aaabbccc'), 'B': ['x', 'x', np.nan, np.nan, np.nan, np.nan, 'x', 'x']})
df
   A    B
0  a    x
1  a    x
2  a  NaN
3  b  NaN
4  b  NaN
5  c  NaN
6  c    x
7  c    x

Consider,

考虑,

df.groupby('A').size()

A
a    3
b    2
c    3
dtype: int64

Versus,

相对,

df.groupby('A').count()

   B
A   
a  2
b  0
c  2

GroupBy.countreturns a DataFrame when you call counton all column, while GroupBy.sizereturns a Series.

GroupBy.count当您调用count所有列时返回一个 DataFrame ,而GroupBy.size返回一个系列。

The reason being that sizeis the same for all columns, so only a single result is returned. Meanwhile, the countis called for each column, as the results would depend on on how many NaNs each column has.

原因是size所有列都相同,因此只返回一个结果。同时,count为每列调用 ,因为结果将取决于每列有多少 NaN。



Behavior with pivot_table

行为与 pivot_table

Another example is how pivot_tabletreats this data. Suppose we would like to compute the cross tabulation of

另一个例子是如何pivot_table处理这些数据。假设我们想计算交叉表

df

   A  B
0  0  1
1  0  1
2  1  2
3  0  2
4  0  0

pd.crosstab(df.A, df.B)  # Result we expect, but with `pivot_table`.

B  0  1  2
A         
0  1  2  1
1  0  0  1

With pivot_table, you can issue size:

使用pivot_table,您可以发出size

df.pivot_table(index='A', columns='B', aggfunc='size', fill_value=0)

B  0  1  2
A         
0  1  2  1
1  0  0  1

But countdoes not work; an empty DataFrame is returned:

count不起作用;返回一个空的 DataFrame:

df.pivot_table(index='A', columns='B', aggfunc='count')

Empty DataFrame
Columns: []
Index: [0, 1]

I believe the reason for this is that 'count'must be done on the series that is passed to the valuesargument, and when nothing is passed, pandas decides to make no assumptions.

我相信这样'count'做的原因是必须在传递给values参数的系列上完成,当没有传递任何内容时,pandas 决定不做任何假设。

回答by drp

In addition to all above answers, I would like to point out one more diffrence which I seem significant.

除了上述所有答案之外,我还想指出另一个我认为很重要的差异。

You can correlate Panda's Dataramesize and count with Java's Vectorssize and length. When we create vector some predefined memory is allocated to it. when we reach closer to number of elements it can occupy while adding elements, more memory is allocated to it. Similarly, in DataFrameas we add elements, memory allocated to it increases.

您可以将 Panda 的Datarame大小和数量与 Java 的Vectors大小和长度相关联。当我们创建向量时,一些预定义的内存被分配给它。当我们在添加元素时接近它可以占用的元素数量时,会为其分配更多内存。同样,DataFrame随着我们添加元素,分配给它的内存也会增加。

Size attribute gives number of memory cell allocated to DataFramewhereas count gives number of elements that are actually present in DataFrame. For example, enter image description here

Size 属性给出了分配给的内存单元的数量,DataFrame而 count 给出了实际存在于DataFrame. 例如, 在此处输入图片说明

You can see though there are 3 rows in DataFrame, its size is 6.

你可以看到虽然有 3 行DataFrame,但它的大小是 6。

This answer covers size and count difference with respect to DataFrameand not Pandas Series. I have not checked what happens with Series

该答案涵盖了DataFrame和 not 的大小和计数差异Pandas Series。我没有检查会发生什么Series