pandas zip,排序和熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28341387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
zip, sorted and pandas
提问by MJP
I have a pandas data frame with column values as follows :
我有一个带有列值的 Pandas 数据框,如下所示:
names = wine_df.columns
names
Index([u'fixed acidity', u'volatile acidity', u'citric acid', u'residual sugar', u'chlorides', u'free sulfur dioxide', u'total sulfur dioxide', u'density', u'pH', u'sulphates', u'alcohol'], dtype='object')
I have a numpy array named imp with following values:
我有一个名为 imp 的 numpy 数组,其值如下:
array([ 0.07640909, 0.11346059, 0.09160943, 0.06674312, 0.07203855,
0.06306923, 0.08272078, 0.0839144 , 0.05996705, 0.11833288,
0.17173489])
I was working on a project and I came across this piece of code shown below:
我正在做一个项目,我遇到了如下所示的这段代码:
zip(*sorted(zip(imp, names)))
I couldn't understand why are they using *sorted inside the zip function?? Also why are they using zip function twice??
我不明白为什么他们在 zip 函数中使用 *sorted ?另外他们为什么要使用两次 zip 功能??
采纳答案by Andy Hayden
The best way to see what his is doing is with a simple example:
看看他在做什么的最好方法是用一个简单的例子:
In [11]: a = np.array([2, 1, 3])
In [12]: a = np.array([2, 1, 2, 3])
In [13]: b = np.array(['b', 'b', 'a', 'c'])
In [14]: sorted(zip(a, b))
Out[14]: [(1, 'b'), (2, 'a'), (2, 'b'), (3, 'c')]
In [15]: zip(*sorted(zip(a, b)))
Out[15]: [(1, 2, 2, 3), ('b', 'a', 'b', 'c')]
It sorts both lists/arrays with respect to values in the first (followed by values in the second).
它根据第一个中的值(后跟第二个中的值)对两个列表/数组进行排序。
A more "numpy" way to do this would be to use argsort (which will be much more performant for larger arrays):
一种更“numpy”的方法是使用 argsort (对于更大的数组,它的性能会更高):
In [21]: s = np.argsort(a)
In [22]: a[s], b[s]
Out[22]:
(array([1, 2, 2, 3]), array(['b', 'b', 'a', 'c'],
dtype='|S1'))
Note: gives a slightly different result, as it doesn't deal with draws in a.
注意:给出的结果略有不同,因为它不处理 a 中的平局。

