pandas python 如何计算数据帧中的记录数或行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17468878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:10:27  来源:igfitidea点击:

pandas python how to count the number of records or rows in a dataframe

countdataframepandas

提问by IcemanBerlin

Obviously new to Pandas. How can i simply count the number of records in a dataframe.

显然是 Pandas 的新手。我如何简单地计算数据框中的记录数。

I would have thought some thing as simple as this would do it and i can't seem to even find the answer in searches...probably because it is too simple.

我本以为这样简单的事情就可以做到,而且我似乎甚至无法在搜索中找到答案……可能是因为它太简单了。

cnt = df.count
print cnt

the above code actually just prints the whole df

上面的代码实际上只是打印了整个 df

采纳答案by tshauck

Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...

关于你的问题......数一个字段?我决定把它作为一个问题,但我希望它有帮助......

Say I have the following DataFrame

假设我有以下 DataFrame

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])

You could count a single column by

你可以计算一列

df.A.count()
#or
df['A'].count()

both evaluate to 5.

两者都评估为 5。

The cool thing (or one of many w.r.t. pandas) is that if you have NAvalues, count takes that into consideration.

很酷的事情(或许多 wrt 之一pandas)是,如果您有NA值, count 会考虑到这一点。

So if I did

所以如果我做了

df['A'][1::2] = np.NAN
df.count()

The result would be

结果是

 A    3
 B    5

回答by user2314737

To get the number of rows in a dataframe use:

要获取数据框中的行数,请使用:

df.shape[0]

(and df.shape[1]to get the number of columns).

(并df.shape[1]获得列数)。

As an alternative you can use

作为替代方案,您可以使用

len(df)

or

或者

len(df.index)

(and len(df.columns)for the columns)

(和len(df.columns)列)

shapeis more versatile and more convenient than len(), especially for interactive work (just needs to be added at the end), but lenis a bit faster (see also this answer).

shape比 更通用,更方便len(),特别是对于交互式工作(只需在最后添加),但len速度更快(另请参阅此答案)。

To avoid: count()because it returns the number of non-NA/null observations over requested axis

避免count()因为它返回请求轴上的非 NA/空观察的数量

len(df.index)is faster

len(df.index)是比较快的

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C'])
df['A'][5]=np.nan
df
# Out:
#     A   B   C
# 0   0   1   2
# 1   3   4   5
# 2   6   7   8
# 3   9  10  11
# 4  12  13  14
# 5 NaN  16  17
# 6  18  19  20
# 7  21  22  23

%timeit df.shape[0]
# 100000 loops, best of 3: 4.22 μs per loop

%timeit len(df)
# 100000 loops, best of 3: 2.26 μs per loop

%timeit len(df.index)
# 1000000 loops, best of 3: 1.46 μs per loop

df.__len__is just a call to len(df.index)

df.__len__只是一个电话 len(df.index)

import inspect 
print(inspect.getsource(pd.DataFrame.__len__))
# Out:
#     def __len__(self):
#         """Returns length of info axis, but here we use the index """
#         return len(self.index)

Why you should not use count()

为什么你不应该使用 count()

df.count()
# Out:
# A    7
# B    8
# C    8

回答by Surya

Simply, row_num = df.shape[0]# gives number of rows, here's the example:

简单地说,row_num = df.shape[0]# 给出行数,示例如下:

import pandas as pd
import numpy as np

In [322]: df = pd.DataFrame(np.random.randn(5,2), columns=["col_1", "col_2"])

In [323]: df
Out[323]: 
      col_1     col_2
0 -0.894268  1.309041
1 -0.120667 -0.241292
2  0.076168 -1.071099
3  1.387217  0.622877
4 -0.488452  0.317882

In [324]: df.shape
Out[324]: (5, 2)

In [325]: df.shape[0]   ## Gives no. of rows/records
Out[325]: 5

In [326]: df.shape[1]   ## Gives no. of columns
Out[326]: 2

回答by ekta

The Nan example above misses one piece, which makes it less generic. To do this more "generically" use df['column_name'].value_counts()This will give you the counts of each value in that column.

上面的 Nan 示例遗漏了一个片段,这使得它不那么通用。要更“一般地”执行此操作,请使用df['column_name'].value_counts()This 将为您提供该列中每个值的计数。

d=['A','A','A','B','C','C'," " ," "," "," "," ","-1"] # for simplicity

df=pd.DataFrame(d)
df.columns=["col1"]
df["col1"].value_counts() 
      5
A     3
C     2
-1    1
B     1
dtype: int64
"""len(df) give you 12, so we know the rest must be Nan's of some form, while also having a peek into other invalid entries, especially when you might want to ignore them like -1, 0 , "", also"""