pandas 在 matplotlib 中根据数字变量绘制分类变量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47269695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting categorical variable against numeric variable in matplotlib
提问by Raul Feresteanu
My DataFrame's structure
我的 DataFrame 的结构
trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')
I'm trying to plot transcode
(transaction code) against amount
to see the how much money is spent per transaction. I made sure to convert transcode
to a categorical type as seen below.
我正在尝试绘制transcode
(交易代码)amount
以查看每笔交易花费了多少钱。我确保转换transcode
为分类类型,如下所示。
trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]
The result I get from doing plt.scatter(trx['transcode'], trx['amount'])
is
我得到的结果plt.scatter(trx['transcode'], trx['amount'])
是
While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode
[1, 17, 99] instead of the entire [1, 100] range.
虽然上图并非完全错误,但我希望 X 轴仅包含transcode
[1, 17, 99]的三个可能值,而不是整个 [1, 100] 范围。
Thanks!
谢谢!
采纳答案by ImportanceOfBeingErnest
In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.
在 matplotlib 2.1 中,您可以使用字符串绘制分类变量。即,如果您将 x 值的列作为字符串提供,它会将它们识别为类别。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()
In order to optain the same in matplotlib <=2.0 one would plot against some index instead.
为了在 matplotlib <=2.0 中选择相同的内容,可以改为针对某个索引进行绘图。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
u, inv = np.unique(df["x"], return_inverse=True)
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()
The same plot can be obtained using seaborn's stripplot
:
可以使用 seaborn's 获得相同的图stripplot
:
sns.stripplot(x="x", y="y", data=df)
And a potentially nicer representation can be done via seaborn's swarmplot
:
并且可以通过 seaborn 完成潜在更好的表示swarmplot
:
sns.swarmplot(x="x", y="y", data=df)