pandas matplotlib 中带有字符串数组的散点图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22095746/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
scatter plots with string arrays in matplotlib
提问by elelias
this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.
这似乎应该很容易,但我无法弄清楚。我有一个 Pandas 数据框,想用 3 个列做一个 3D 散点图。X 和 Y 列不是数字,它们是字符串,但我不明白这应该是什么问题。
X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float
fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()
isn't there an easy way to do this? Thanks.
没有简单的方法可以做到这一点吗?谢谢。
回答by unutbu
You could use np.unique(..., return_inverse=True)to get representative ints for each string. For example,
您可以使用np.unique(..., return_inverse=True)来获取每个字符串的代表性整数。例如,
In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)
In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])
Note that Xhas dtype int32, as np.uniquecan handle at most 2**31unique strings.
请注意,X具有 dtype int32,因为np.unique最多可以处理2**31唯一的字符串。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
       yticks=range(len(Yuniques)), yticklabels=Yuniques) 
plt.show()


回答by jmetz
Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.
尝试将字符转换为数字以进行绘图,然后再次使用这些字符作为轴标签。
Using hash
使用哈希
You could use the hashfunction for the conversion; 
您可以使用该hash函数进行转换;
from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values
X =[hash(l) for l in xlab] 
Y =[hash(l) for l in xlab] 
Z= myDataFrame.columnY.values #float
fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()
As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hashfunction could give unexpected spacings.  
正如 M4rtini 在评论中指出的那样,尚不清楚字符串坐标的间距/缩放应该是什么;该hash功能可能会产生意想不到的间距。  
Nondegenerate uniform spacing
非退化均匀间距
If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use
如果您想让点均匀分布,则必须使用不同的转换。例如你可以使用
X =[i for i in range(len(xlab))]
though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.
尽管即使标签相同,这也会导致每个点具有唯一的 x 位置,并且如果您对Y.
Degenerate uniform spacing
退化均匀间距
A third alternative is to first get the unique members of xlab(using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g. 
第三种选择是首先获取xlab(使用 eg set)的唯一成员,然后使用映射的唯一集将每个 xlab 映射到一个位置;例如
xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]

