pandas python 如何计算数据帧中的记录数或行数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17468878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas python how to count the number of records or rows in a dataframe
提问by IcemanBerlin
Obviously new to Pandas. How can i simply count the number of records in a dataframe.
显然是 Pandas 的新手。我如何简单地计算数据框中的记录数。
I would have thought some thing as simple as this would do it and i can't seem to even find the answer in searches...probably because it is too simple.
我本以为这样简单的事情就可以做到,而且我似乎甚至无法在搜索中找到答案……可能是因为它太简单了。
cnt = df.count
print cnt
the above code actually just prints the whole df
上面的代码实际上只是打印了整个 df
采纳答案by tshauck
Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...
关于你的问题......数一个字段?我决定把它作为一个问题,但我希望它有帮助......
Say I have the following DataFrame
假设我有以下 DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])
You could count a single column by
你可以计算一列
df.A.count()
#or
df['A'].count()
both evaluate to 5.
两者都评估为 5。
The cool thing (or one of many w.r.t. pandas
) is that if you have NA
values, count takes that into consideration.
很酷的事情(或许多 wrt 之一pandas
)是,如果您有NA
值, count 会考虑到这一点。
So if I did
所以如果我做了
df['A'][1::2] = np.NAN
df.count()
The result would be
结果是
A 3
B 5
回答by user2314737
To get the number of rows in a dataframe use:
要获取数据框中的行数,请使用:
df.shape[0]
(and df.shape[1]
to get the number of columns).
(并df.shape[1]
获得列数)。
As an alternative you can use
作为替代方案,您可以使用
len(df)
or
或者
len(df.index)
(and len(df.columns)
for the columns)
(和len(df.columns)
列)
shape
is more versatile and more convenient than len()
, especially for interactive work (just needs to be added at the end), but len
is a bit faster (see also this answer).
shape
比 更通用,更方便len()
,特别是对于交互式工作(只需在最后添加),但len
速度更快(另请参阅此答案)。
To avoid: count()
because it returns the number of non-NA/null observations over requested axis
避免:count()
因为它返回请求轴上的非 NA/空观察的数量
len(df.index)
is faster
len(df.index)
是比较快的
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C'])
df['A'][5]=np.nan
df
# Out:
# A B C
# 0 0 1 2
# 1 3 4 5
# 2 6 7 8
# 3 9 10 11
# 4 12 13 14
# 5 NaN 16 17
# 6 18 19 20
# 7 21 22 23
%timeit df.shape[0]
# 100000 loops, best of 3: 4.22 μs per loop
%timeit len(df)
# 100000 loops, best of 3: 2.26 μs per loop
%timeit len(df.index)
# 1000000 loops, best of 3: 1.46 μs per loop
df.__len__
is just a call to len(df.index)
df.__len__
只是一个电话 len(df.index)
import inspect
print(inspect.getsource(pd.DataFrame.__len__))
# Out:
# def __len__(self):
# """Returns length of info axis, but here we use the index """
# return len(self.index)
Why you should not use count()
为什么你不应该使用 count()
df.count()
# Out:
# A 7
# B 8
# C 8
回答by Surya
Simply, row_num = df.shape[0]# gives number of rows, here's the example:
简单地说,row_num = df.shape[0]# 给出行数,示例如下:
import pandas as pd
import numpy as np
In [322]: df = pd.DataFrame(np.random.randn(5,2), columns=["col_1", "col_2"])
In [323]: df
Out[323]:
col_1 col_2
0 -0.894268 1.309041
1 -0.120667 -0.241292
2 0.076168 -1.071099
3 1.387217 0.622877
4 -0.488452 0.317882
In [324]: df.shape
Out[324]: (5, 2)
In [325]: df.shape[0] ## Gives no. of rows/records
Out[325]: 5
In [326]: df.shape[1] ## Gives no. of columns
Out[326]: 2
回答by ekta
The Nan example above misses one piece, which makes it less generic. To do this more "generically" use df['column_name'].value_counts()
This will give you the counts of each value in that column.
上面的 Nan 示例遗漏了一个片段,这使得它不那么通用。要更“一般地”执行此操作,请使用df['column_name'].value_counts()
This 将为您提供该列中每个值的计数。
d=['A','A','A','B','C','C'," " ," "," "," "," ","-1"] # for simplicity
df=pd.DataFrame(d)
df.columns=["col1"]
df["col1"].value_counts()
5
A 3
C 2
-1 1
B 1
dtype: int64
"""len(df) give you 12, so we know the rest must be Nan's of some form, while also having a peek into other invalid entries, especially when you might want to ignore them like -1, 0 , "", also"""