pandas python从数据框列绘制直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50540256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python plotting a histogram from dataframe column
提问by matanster
Looking to plot a histogram emanating from a dataframe, I seem to lack in transforming to a right object type that matplotlib can deal with. Here are some failed attempts. How do I fix it up?
想要绘制从数据帧发出的直方图,我似乎缺乏转换为 matplotlib 可以处理的正确对象类型。以下是一些失败的尝试。我该如何解决?
And more generally, how do you typically salvage something like that?
更一般地说,你通常如何挽救这样的东西?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
filter(lambda v: v > 0, df['foo_col']).hist(bins=10)
---> 10 filter(lambda v: v > 0, df['foo_col']).hist(bins=100) AttributeError: 'filter' object has no attribute 'hist'
---> 10 filter(lambda v: v > 0, df['foo_col']).hist(bins=100) AttributeError: 'filter' 对象没有属性 'hist'
hist(filter(lambda v: v > 0, df['foo_col']), bins=100)
---> 10 hist(filter(lambda v: v > 0, df['foo_col']), bins=100) TypeError: 'Series' object is not callable
---> 10 hist(filter(lambda v: v > 0, df['foo_col']), bins=100) TypeError: 'Series' 对象不可调用
回答by roganjosh
By all accounts, filter
is lucky to be part of the standard library. IIUC, you just want to filter your dataframe to plot a histogram of values > 0
. Pandas has its own syntax for that:
无论如何,filter
成为标准库的一部分是幸运的。IIUC,您只想过滤数据框以绘制值的直方图> 0
。Pandas 有自己的语法:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.random.randint(-50, 1000, 10000)
df = pd.DataFrame({'some_data': data})
df[df['some_data'] >= 0].hist(bins=100)
plt.show()
Note that this will run much faster than python builtins could ever hope to (it doesn't make much difference in my trivial example, but it will with bigger datasets). It's important to use pandas methods with dataframes wherever possible because, in many cases, the calculation will be vectorized and run in highly optimised C/C++ code.
请注意,这将比 python 内置程序希望的运行速度快得多(在我的简单示例中它没有太大区别,但它会处理更大的数据集)。尽可能将 Pandas 方法与数据帧一起使用很重要,因为在许多情况下,计算将被矢量化并在高度优化的 C/C++ 代码中运行。