pandas value_counts 应用于每列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23197324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas value_counts applied to each column
提问by Edouard
I have a dataframewith numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see quickly the value_countsfor each column, how can i do that?
我有一个dataframe来自外部源(csv 文件)的多列(≈30),但其中一些没有价值或始终相同。因此,我想快速value_counts查看每一列,我该怎么做?
For example
例如
Id, temp, name
1 34, null, mark
2 22, null, mark
3 34, null, mark
Would return me an object stating that
会给我一个对象,说明
- Id: 34 -> 2, 22 -> 1
- temp: null -> 3
- name: mark -> 3
- 编号:34 -> 2, 22 -> 1
- 温度:空 -> 3
- 名称:标记 -> 3
So I would know that temp is irrelevant and name is not interesting (always the same)
所以我会知道 temp 无关紧要,名称也不有趣(始终相同)
采纳答案by tanemaki
For the dataframe,
对于数据框,
df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])
the following code
以下代码
for c in df.columns:
print "---- %s ---" % c
print df[c].value_counts()
will produce the following result:
将产生以下结果:
---- id ---
34 2
22 1
dtype: int64
---- temp ---
null 3
dtype: int64
---- name ---
mark 3
dtype: int64
回答by Napitupulu Jon
you can use df.applywhich will apply each column with provided function, in this case counting missing value. This is what it looks like,
您可以使用df.applywhich 将应用具有提供功能的每一列,在这种情况下计算缺失值。这是它的样子,
df.apply(lambda x: x.isnull().value_counts())
df.apply(lambda x: x.isnull().value_counts())
回答by Martín Fixman
A nice way to do this and return a nicely formatter series is combining pandas.Series.value_countsand pandas.DataFrame.stack.
一个很好的方法来做到这一点并返回一个很好的格式化程序系列是结合pandas.Series.value_counts和pandas.DataFrame.stack。
For the DataFrame
对于数据帧
df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])
You can do something like
你可以做类似的事情
df.apply(lambda x: x.value_counts()).T.stack()
In this code, df.apply(lambda x: x.value_counts())applies value_countsto every column and appends it to the resulting DataFrame, so you end up with a DataFramewith the same columns and one row per every different value in every column (and a lot of nullfor each value that doesn't appear in each column).
在此代码中,df.apply(lambda x: x.value_counts())适用value_counts于每一列并将其附加到结果中DataFrame,因此您最终会得到DataFrame具有相同列和每一列中每个不同值的一行(并且null每个值中没有出现很多柱子)。
After that, Ttransposes the DataFrame(so you end up with a DataFramewith an index equal to the columns and the columns equal to the possible values), and stackturns the columns of the DataFrameinto a new level of the MultiIndex and "deletes" all the Nullvalues, making the whole thing a Series.
之后,T转置DataFrame(所以你最终DataFrame得到一个索引等于列并且列等于可能值stack的列),并将的列DataFrame转换为 MultiIndex 的新级别并“删除”所有Null值,使整个事情成为一个Series.
The result of this is
这样做的结果是
id 22 1
34 2
temp null 3
name mark 3
dtype: float64
回答by Jagie
回答by Igor Fobia
This is similar to @Jagie's reply but in addition:
这类似于@Jagie的回复,但另外:
- Put zero for values absent in a column
- Convert the counts to integer
- 为列中不存在的值设置零
- 将计数转换为整数
df = pd.DataFrame(
data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']],
columns=["id", 'temp', 'name'],
index=[1, 2, 3]
)
result2 = df.apply(pd.value_counts).fillna(0).astype(int)
回答by Arnau Mercader
You can replace:
您可以替换:
fillna(0).astype(int)
to
到
fillna(0, downcast='infer')


