pandas value_counts 应用于每列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23197324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:56:59  来源:igfitidea点击:

pandas value_counts applied to each column

pythonpandasdataframe

提问by Edouard

I have a dataframewith numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see quickly the value_countsfor each column, how can i do that?

我有一个dataframe来自外部源(csv 文件)的多列(≈30),但其中一些没有价值或始终相同。因此,我想快速value_counts查看每一列,我该怎么做?

For example

例如

  Id, temp, name
1 34, null, mark
2 22, null, mark
3 34, null, mark

Would return me an object stating that

会给我一个对象,说明

  • Id: 34 -> 2, 22 -> 1
  • temp: null -> 3
  • name: mark -> 3
  • 编号:34 -> 2, 22 -> 1
  • 温度:空 -> 3
  • 名称:标记 -> 3

So I would know that temp is irrelevant and name is not interesting (always the same)

所以我会知道 temp 无关紧要,名称也不有趣(始终相同)

采纳答案by tanemaki

For the dataframe,

对于数据框,

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 

the following code

以下代码

for c in df.columns:
    print "---- %s ---" % c
    print df[c].value_counts()

will produce the following result:

将产生以下结果:

---- id ---
34    2
22    1
dtype: int64
---- temp ---
null    3
dtype: int64
---- name ---
mark    3
dtype: int64

回答by Napitupulu Jon

you can use df.applywhich will apply each column with provided function, in this case counting missing value. This is what it looks like,

您可以使用df.applywhich 将应用具有提供功能的每一列,在这种情况下计算缺失值。这是它的样子,

df.apply(lambda x: x.isnull().value_counts())

df.apply(lambda x: x.isnull().value_counts())

回答by Martín Fixman

A nice way to do this and return a nicely formatter series is combining pandas.Series.value_countsand pandas.DataFrame.stack.

一个很好的方法来做到这一点并返回一个很好的格式化程序系列是结合pandas.Series.value_countspandas.DataFrame.stack

For the DataFrame

对于数据帧

df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 

You can do something like

你可以做类似的事情

df.apply(lambda x: x.value_counts()).T.stack()

In this code, df.apply(lambda x: x.value_counts())applies value_countsto every column and appends it to the resulting DataFrame, so you end up with a DataFramewith the same columns and one row per every different value in every column (and a lot of nullfor each value that doesn't appear in each column).

在此代码中,df.apply(lambda x: x.value_counts())适用value_counts于每一列并将其附加到结果中DataFrame,因此您最终会得到DataFrame具有相同列和每一列中每个不同值的一行(并且null每个值中没有出现很多柱子)。

After that, Ttransposes the DataFrame(so you end up with a DataFramewith an index equal to the columns and the columns equal to the possible values), and stackturns the columns of the DataFrameinto a new level of the MultiIndex and "deletes" all the Nullvalues, making the whole thing a Series.

之后,T转置DataFrame(所以你最终DataFrame得到一个索引等于列并且列等于可能值stack的列),并将的列DataFrame转换为 MultiIndex 的新级别并“删除”所有Null值,使整个事情成为一个Series.

The result of this is

这样做的结果是

id    22      1
      34      2
temp  null    3
name  mark    3
dtype: float64

回答by Jagie

Code like the following

代码如下

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=["id", 'temp', 'name'], index=[1, 2, 3]) 
result2 = df.apply(pd.value_counts)
result2

will produce:

将产生:

enter image description here

在此处输入图片说明

回答by Igor Fobia

This is similar to @Jagie's reply but in addition:

这类似于@Jagie的回复,但另外:

  1. Put zero for values absent in a column
  2. Convert the counts to integer
  1. 为列中不存在的值设置零
  2. 将计数转换为整数
    df = pd.DataFrame(
        data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']],     
        columns=["id", 'temp', 'name'], 
        index=[1, 2, 3]
    )
    result2 = df.apply(pd.value_counts).fillna(0).astype(int)

回答by Arnau Mercader

You can replace:

您可以替换:

fillna(0).astype(int)

to

fillna(0, downcast='infer')