pandas matplotlib 中带有字符串数组的散点图

Question

提问by elelias

this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.

这似乎应该很容易，但我无法弄清楚。我有一个 Pandas 数据框，想用 3 个列做一个 3D 散点图。X 和 Y 列不是数字，它们是字符串，但我不明白这应该是什么问题。

X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float

fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()

isn't there an easy way to do this? Thanks.

没有简单的方法可以做到这一点吗？谢谢。

Answer 1

回答by unutbu

You could use np.unique(..., return_inverse=True)to get representative ints for each string. For example,

您可以使用np.unique(..., return_inverse=True)来获取每个字符串的代表性整数。例如，

In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)

In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])

Note that Xhas dtype int32, as np.uniquecan handle at most 2**31unique strings.

请注意，X具有 dtype int32，因为np.unique最多可以处理2**31唯一的字符串。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d

N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
       yticks=range(len(Yuniques)), yticklabels=Yuniques) 
plt.show()

enter image description here

在此处输入图片说明

Answer 2

回答by jmetz

Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.

尝试将字符转换为数字以进行绘图，然后再次使用这些字符作为轴标签。

Using hash

使用哈希

You could use the hashfunction for the conversion;

您可以使用该hash函数进行转换；

from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values

X =[hash(l) for l in xlab] 
Y =[hash(l) for l in xlab] 

Z= myDataFrame.columnY.values #float

fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()

As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hashfunction could give unexpected spacings.

正如 M4rtini 在评论中指出的那样，尚不清楚字符串坐标的间距/缩放应该是什么；该hash功能可能会产生意想不到的间距。

Nondegenerate uniform spacing

非退化均匀间距

If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use

如果您想让点均匀分布，则必须使用不同的转换。例如你可以使用

X =[i for i in range(len(xlab))]

though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.

尽管即使标签相同，这也会导致每个点具有唯一的 x 位置，并且如果您对Y.

Degenerate uniform spacing

退化均匀间距

A third alternative is to first get the unique members of xlab(using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g.

第三种选择是首先获取xlab（使用 eg set）的唯一成员，然后使用映射的唯一集将每个 xlab 映射到一个位置；例如

xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]

Answer 3

回答by naught101

Scatter does this automatically now (from at least matplotlib 2.1.0):

Scatter 现在自动执行此操作（至少从 matplotlib 2.1.0 开始）：

plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])

pandas matplotlib 中带有字符串数组的散点图

提问by elelias

回答by unutbu

回答by jmetz

回答by naught101

相关推荐

最近更新

标签

pandas matplotlib 中带有字符串数组的散点图

提问by elelias

回答by unutbu

回答by jmetz

回答by naught101

相关推荐

无法在 Pandas 数据框中用零填充 NaN

Pandas 在读取制表符分隔的数据时似乎忽略了第一列名称，给出了 KeyError

将 datetime64 列拆分为 Pandas 数据框中的日期和时间列

pandas 相当于 Python 熊猫的 R 视图

相关推荐

最近更新

标签