pandas 在 matplotlib 中根据数字变量绘制分类变量

Question

提问by Raul Feresteanu

My DataFrame's structure

我的 DataFrame 的结构

trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')

I'm trying to plot transcode(transaction code) against amountto see the how much money is spent per transaction. I made sure to convert transcodeto a categorical type as seen below.

我正在尝试绘制transcode（交易代码）amount以查看每笔交易花费了多少钱。我确保转换transcode为分类类型，如下所示。

trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]

The result I get from doing plt.scatter(trx['transcode'], trx['amount'])is

我得到的结果plt.scatter(trx['transcode'], trx['amount'])是

Scatter plot

散点图

While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode[1, 17, 99] instead of the entire [1, 100] range.

虽然上图并非完全错误，但我希望 X 轴仅包含transcode[1, 17, 99]的三个可能值，而不是整个 [1, 100] 范围。

Thanks!

谢谢！

Answer 1

采纳答案by ImportanceOfBeingErnest

In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.

在 matplotlib 2.1 中，您可以使用字符串绘制分类变量。即，如果您将 x 值的列作为字符串提供，它会将它们识别为类别。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
                   "y" : np.random.rand(100)*100})

plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()

In order to optain the same in matplotlib <=2.0 one would plot against some index instead.

为了在 matplotlib <=2.0 中选择相同的内容，可以改为针对某个索引进行绘图。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
                   "y" : np.random.rand(100)*100})

u, inv = np.unique(df["x"], return_inverse=True) 
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()

The same plot can be obtained using seaborn's stripplot:

可以使用 seaborn's 获得相同的图stripplot：

sns.stripplot(x="x", y="y", data=df)

And a potentially nicer representation can be done via seaborn's swarmplot:

并且可以通过 seaborn 完成潜在更好的表示swarmplot：

sns.swarmplot(x="x", y="y", data=df)

pandas 在 matplotlib 中根据数字变量绘制分类变量

提问by Raul Feresteanu

采纳答案by ImportanceOfBeingErnest

相关推荐

最近更新

标签

pandas 在 matplotlib 中根据数字变量绘制分类变量

提问by Raul Feresteanu

采纳答案by ImportanceOfBeingErnest

相关推荐

pandas 如何将元组值设置为熊猫数据框？

pandas 如何在熊猫中设置特定的单元格值？

在 Pandas 中，如何根据值的类型过滤系列？

pandas 查找单个列的最大值/最小值

相关推荐

最近更新

标签