pandas value_counts 应用于每列

Question

提问by Edouard

I have a dataframewith numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see quickly the value_countsfor each column, how can i do that?

我有一个dataframe来自外部源（csv 文件）的多列（≈30），但其中一些没有价值或始终相同。因此，我想快速value_counts查看每一列，我该怎么做？

For example

例如

  Id, temp, name
1 34, null, mark
2 22, null, mark
3 34, null, mark

Would return me an object stating that

会给我一个对象，说明

Id: 34 -> 2, 22 -> 1
temp: null -> 3
name: mark -> 3

编号：34 -> 2, 22 -> 1
温度：空 -> 3
名称：标记 -> 3

So I would know that temp is irrelevant and name is not interesting (always the same)

所以我会知道 temp 无关紧要，名称也不有趣（始终相同）

Answer 1

采纳答案by tanemaki

For the dataframe,

对于数据框，

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

the following code

以下代码

for c in df.columns:
    print "---- %s ---" % c
    print df[c].value_counts()

will produce the following result:

将产生以下结果：

---- id ---
34    2
22    1
dtype: int64
---- temp ---
null    3
dtype: int64
---- name ---
mark    3
dtype: int64

Answer 2

回答by Napitupulu Jon

you can use df.applywhich will apply each column with provided function, in this case counting missing value. This is what it looks like,

您可以使用df.applywhich 将应用具有提供功能的每一列，在这种情况下计算缺失值。这是它的样子，

df.apply(lambda x: x.isnull().value_counts())

Answer 3

回答by Martín Fixman

A nice way to do this and return a nicely formatter series is combining pandas.Series.value_countsand pandas.DataFrame.stack.

一个很好的方法来做到这一点并返回一个很好的格式化程序系列是结合pandas.Series.value_counts和pandas.DataFrame.stack。

For the DataFrame

对于数据帧

df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

You can do something like

你可以做类似的事情

df.apply(lambda x: x.value_counts()).T.stack()

In this code, df.apply(lambda x: x.value_counts())applies value_countsto every column and appends it to the resulting DataFrame, so you end up with a DataFramewith the same columns and one row per every different value in every column (and a lot of nullfor each value that doesn't appear in each column).

在此代码中，df.apply(lambda x: x.value_counts())适用value_counts于每一列并将其附加到结果中DataFrame，因此您最终会得到DataFrame具有相同列和每一列中每个不同值的一行（并且null每个值中没有出现很多柱子）。

After that, Ttransposes the DataFrame(so you end up with a DataFramewith an index equal to the columns and the columns equal to the possible values), and stackturns the columns of the DataFrameinto a new level of the MultiIndex and "deletes" all the Nullvalues, making the whole thing a Series.

之后，T转置DataFrame（所以你最终DataFrame得到一个索引等于列并且列等于可能值stack的列），并将的列DataFrame转换为 MultiIndex 的新级别并“删除”所有Null值，使整个事情成为一个Series.

The result of this is

这样做的结果是

id    22      1
      34      2
temp  null    3
name  mark    3
dtype: float64

Answer 4

回答by Jagie

Code like the following

代码如下

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=["id", 'temp', 'name'], index=[1, 2, 3]) 
result2 = df.apply(pd.value_counts)
result2

will produce:

将产生：

Answer 5

回答by Igor Fobia

This is similar to @Jagie's reply but in addition:

这类似于@Jagie的回复，但另外：

Put zero for values absent in a column
Convert the counts to integer

为列中不存在的值设置零
将计数转换为整数

    df = pd.DataFrame(
        data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']],     
        columns=["id", 'temp', 'name'], 
        index=[1, 2, 3]
    )
    result2 = df.apply(pd.value_counts).fillna(0).astype(int)

Answer 6

回答by Arnau Mercader

You can replace:

您可以替换：

fillna(0).astype(int)

to

到

fillna(0, downcast='infer')

pandas value_counts 应用于每列

提问by Edouard

采纳答案by tanemaki

回答by Napitupulu Jon

回答by Martín Fixman

回答by Jagie

回答by Igor Fobia

回答by Arnau Mercader

相关推荐

最近更新

标签

pandas value_counts 应用于每列

提问by Edouard

采纳答案by tanemaki

回答by Napitupulu Jon

回答by Martín Fixman

回答by Jagie

回答by Igor Fobia

回答by Arnau Mercader

相关推荐

pandas IPython - 有打印默认打印头和尾长变量

Pandas 错误“***ValueError：长度不匹配：预期轴有 0 个元素，新值有……”

根据列名从另一个 DataFrame 填充 Pandas DataFrame

pandas.read_sql 处理速度

相关推荐

最近更新

标签