Python 熊猫的大小和数量有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33346591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between size and count in pandas?
提问by Donovan Thomson
That is the difference between groupby("x").count
and groupby("x").size
in pandas ?
那是pandasgroupby("x").count
和groupby("x").size
in之间的区别吗?
Does size just exclude nil ?
size 是否只排除 nil ?
采纳答案by EdChum
size
includes NaN
values, count
does not:
In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df
Out[46]:
a b c
0 0 1 1.067627
1 0 2 0.554691
2 1 3 0.458084
3 2 4 0.426635
4 2 NaN -2.238091
5 2 4 1.256943
In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())
a
0 2
1 1
2 2
Name: b, dtype: int64
a
0 2
1 1
2 3
dtype: int64
回答by Bubble Bubble Bubble Gut
Just to add a little bit to @Edchum's answer, even if the data has no NA values, the result of count() is more verbose, using the example before:
只是在@Edchum 的答案中添加一点,即使数据没有 NA 值,count() 的结果也更加冗长,使用之前的示例:
grouped = df.groupby('a')
grouped.count()
Out[197]:
b c
a
0 2 2
1 1 1
2 2 3
grouped.size()
Out[198]:
a
0 2
1 1
2 3
dtype: int64
回答by Mukul Taneja
When we are dealing with normal dataframes then only difference will be an inclusion of NAN values, means count does not include NAN values while counting rows.
当我们处理普通数据帧时,唯一的区别是包含 NAN 值,这意味着计数在计算行数时不包含 NAN 值。
But if we are using these functions with the groupby
then, to get the correct results by count()
we have to associate any numeric field with the groupby
to get the exact number of groups where for size()
there is no need for this type of association.
但是,如果我们将这些函数与groupby
then 一起使用,为了获得正确的结果,count()
我们必须将任何数字字段与 the 关联groupby
以获得确切的组数,因为size()
不需要这种类型的关联。
回答by cs95
What is the difference between size and count in pandas?
熊猫的大小和数量有什么区别?
The other answers have pointed out the difference, however, it is not completely accurateto say "size
counts NaNs while count
does not". While size
does indeed count NaNs, this is actually a consequence of the fact that size
returns the size(or the length) of the objectit is called on. Naturally, this also includes rows/values which are NaN.
其他答案指出了差异,但是,说“计数 NaN 而不计数”并不完全准确。虽然确实计算 NaN,但这实际上是返回调用它的对象的大小(或长度)这一事实的结果。当然,这也包括 NaN 的行/值。size
count
size
size
So, to summarize, size
returns the size of the Series/DataFrame1,
因此,总而言之,size
返回 Series/DataFrame 1的大小,
df = pd.DataFrame({'A': ['x', 'y', np.nan, 'z']})
df
A
0 x
1 y
2 NaN
3 z
df.A.size
# 4
...while count
counts the non-NaN values:
...同时count
计算非 NaN 值:
df.A.count()
# 3
Notice that size
is an attribute (gives the same result as len(df)
or len(df.A)
). count
is a function.
请注意,这size
是一个属性(给出与len(df)
或相同的结果len(df.A)
)。count
是一个函数。
1. DataFrame.size
is also an attribute and returns the number of elements in the DataFrame (rows x columns).
1.DataFrame.size
也是一个属性,返回 DataFrame 中元素的数量(行 x 列)。
Behaviour with GroupBy
- Output Structure
行为与GroupBy
- 输出结构
Besides the basic difference, there's also the difference in the structure of the generated output when calling GroupBy.size()
vs GroupBy.count()
.
除了基本的区别之外,调用GroupBy.size()
vs时生成的输出的结构也有区别GroupBy.count()
。
df = pd.DataFrame({'A': list('aaabbccc'), 'B': ['x', 'x', np.nan, np.nan, np.nan, np.nan, 'x', 'x']})
df
A B
0 a x
1 a x
2 a NaN
3 b NaN
4 b NaN
5 c NaN
6 c x
7 c x
Consider,
考虑,
df.groupby('A').size()
A
a 3
b 2
c 3
dtype: int64
Versus,
相对,
df.groupby('A').count()
B
A
a 2
b 0
c 2
GroupBy.count
returns a DataFrame when you call count
on all column, while GroupBy.size
returns a Series.
GroupBy.count
当您调用count
所有列时返回一个 DataFrame ,而GroupBy.size
返回一个系列。
The reason being that size
is the same for all columns, so only a single result is returned. Meanwhile, the count
is called for each column, as the results would depend on on how many NaNs each column has.
原因是size
所有列都相同,因此只返回一个结果。同时,count
为每列调用 ,因为结果将取决于每列有多少 NaN。
Behavior with pivot_table
行为与 pivot_table
Another example is how pivot_table
treats this data. Suppose we would like to compute the cross tabulation of
另一个例子是如何pivot_table
处理这些数据。假设我们想计算交叉表
df
A B
0 0 1
1 0 1
2 1 2
3 0 2
4 0 0
pd.crosstab(df.A, df.B) # Result we expect, but with `pivot_table`.
B 0 1 2
A
0 1 2 1
1 0 0 1
With pivot_table
, you can issue size
:
使用pivot_table
,您可以发出size
:
df.pivot_table(index='A', columns='B', aggfunc='size', fill_value=0)
B 0 1 2
A
0 1 2 1
1 0 0 1
But count
does not work; an empty DataFrame is returned:
但count
不起作用;返回一个空的 DataFrame:
df.pivot_table(index='A', columns='B', aggfunc='count')
Empty DataFrame
Columns: []
Index: [0, 1]
I believe the reason for this is that 'count'
must be done on the series that is passed to the values
argument, and when nothing is passed, pandas decides to make no assumptions.
我相信这样'count'
做的原因是必须在传递给values
参数的系列上完成,当没有传递任何内容时,pandas 决定不做任何假设。
回答by drp
In addition to all above answers, I would like to point out one more diffrence which I seem significant.
除了上述所有答案之外,我还想指出另一个我认为很重要的差异。
You can correlate Panda's Datarame
size and count with Java's Vectors
size and length. When we create vector some predefined memory is allocated to it. when we reach closer to number of elements it can occupy while adding elements, more memory is allocated to it. Similarly, in DataFrame
as we add elements, memory allocated to it increases.
您可以将 Panda 的Datarame
大小和数量与 Java 的Vectors
大小和长度相关联。当我们创建向量时,一些预定义的内存被分配给它。当我们在添加元素时接近它可以占用的元素数量时,会为其分配更多内存。同样,DataFrame
随着我们添加元素,分配给它的内存也会增加。
Size attribute gives number of memory cell allocated to DataFrame
whereas count gives number of elements that are actually present in DataFrame
. For example,
Size 属性给出了分配给的内存单元的数量,DataFrame
而 count 给出了实际存在于DataFrame
. 例如,
You can see though there are 3 rows in DataFrame
, its size is 6.
你可以看到虽然有 3 行DataFrame
,但它的大小是 6。
This answer covers size and count difference with respect to DataFrame
and not Pandas Series
. I have not checked what happens with Series
该答案涵盖了DataFrame
和 not 的大小和计数差异Pandas Series
。我没有检查会发生什么Series