Python 如何使用点绘制熊猫数据框的两列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17812978/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to plot two columns of a pandas data frame using points?
提问by Roman
I have a pandas data frame and would like to plot values from one column versus the values from another column. Fortunately, there is plot
method associated with the data-frames that seems to do what I need:
我有一个熊猫数据框,想绘制一列中的值与另一列中的值。幸运的是,有plot
一种与数据帧相关的方法似乎可以满足我的需要:
df.plot(x='col_name_1', y='col_name_2')
Unfortunately, it looks like among the plot styles (listed hereafter the kind
parameter) there are not points. I can use lines or bars or even density but not points. Is there a work around that can help to solve this problem.
不幸的是,它看起来像打印样式(上市中这里后kind
参数)有没有点。我可以使用线条或条形甚至密度,但不能使用点。是否有解决方法可以帮助解决此问题。
采纳答案by sodd
You can specify the style
of the plotted line when calling df.plot
:
您可以style
在调用时指定绘制线的df.plot
:
df.plot(x='col_name_1', y='col_name_2', style='o')
The style
argument can also be a dict
or list
, e.g.:
该style
参数也可以是一个dict
或者list
,如:
import numpy as np
import pandas as pd
d = {'one' : np.random.rand(10),
'two' : np.random.rand(10)}
df = pd.DataFrame(d)
df.plot(style=['o','rx'])
All the accepted style formats are listed in the documentation of matplotlib.pyplot.plot
.
所有接受的样式格式都列在matplotlib.pyplot.plot
.
回答by ely
For this (and most plotting) I would not rely on the Pandas wrappers to matplotlib. Instead, just use matplotlib directly:
为此(以及大多数绘图),我不会依赖 Pandas 包装器来实现 matplotlib。相反,只需直接使用 matplotlib:
import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.
and remember that you can access a NumPy array of the column's values with df.col_name_1.values
for example.
请记住,您可以使用例如访问列值的 NumPy 数组df.col_name_1.values
。
I ran into trouble using this with Pandas default plotting in the case of a column of Timestamp values with millisecond precision. In trying to convert the objects to datetime64
type, I also discovered a nasty issue: < Pandas gives incorrect result when asking if Timestamp column values have attr astype>.
在具有毫秒精度的时间戳值列的情况下,我在使用 Pandas 默认绘图时遇到了麻烦。在尝试将对象转换为datetime64
类型时,我还发现了一个令人讨厌的问题:< Pandas 在询问 Timestamp 列值是否具有 attr astype 时给出了错误的结果>。
回答by Dr. Arslan
Pandas
uses matplotlib
as a library for basic plots. The easiest way in your case will using the following:
Pandas
使用matplotlib
作为基本的绘图库。在您的情况下,最简单的方法将使用以下内容:
import pandas as pd
import numpy as np
#creating sample data
sample_data={'col_name_1':np.random.rand(20),
'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
df.plot(x='col_name_1', y='col_name_2', style='o')
However, I would recommend to use seaborn
as an alternative solution if you want have more customized plots while not going into the basic level of matplotlib.
In this case you the solution will be following:
但是,seaborn
如果您想要更多自定义图而不进入基本级别,我建议将其用作替代解决方案matplotlib.
在这种情况下,您的解决方案将如下:
import pandas as pd
import seaborn as sns
import numpy as np
#creating sample data
sample_data={'col_name_1':np.random.rand(20),
'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df)
回答by shantanu pathak
Now in latest pandas you can directly use df.plot.scatter function
现在在最新的 Pandas 中你可以直接使用 df.plot.scatter 函数
df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
[6.4, 3.2, 1], [5.9, 3.0, 2]],
columns=['length', 'width', 'species'])
ax1 = df.plot.scatter(x='length',
y='width',
c='DarkBlue')
https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html
https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html