pandas zip，排序和熊猫

Question

提问by MJP

I have a pandas data frame with column values as follows :

我有一个带有列值的 Pandas 数据框，如下所示：

names = wine_df.columns
names
Index([u'fixed acidity', u'volatile acidity', u'citric acid', u'residual sugar', u'chlorides', u'free sulfur dioxide', u'total sulfur dioxide', u'density', u'pH', u'sulphates', u'alcohol'], dtype='object')

I have a numpy array named imp with following values:

我有一个名为 imp 的 numpy 数组，其值如下：

array([ 0.07640909,  0.11346059,  0.09160943,  0.06674312,  0.07203855,
        0.06306923,  0.08272078,  0.0839144 ,  0.05996705,  0.11833288,
        0.17173489])

I was working on a project and I came across this piece of code shown below:

我正在做一个项目，我遇到了如下所示的这段代码：

zip(*sorted(zip(imp, names)))

I couldn't understand why are they using *sorted inside the zip function?? Also why are they using zip function twice??

我不明白为什么他们在 zip 函数中使用 *sorted ？另外他们为什么要使用两次 zip 功能？？

Answer 1

采纳答案by Andy Hayden

The best way to see what his is doing is with a simple example:

看看他在做什么的最好方法是用一个简单的例子：

In [11]: a = np.array([2, 1, 3])

In [12]: a = np.array([2, 1, 2, 3])

In [13]: b = np.array(['b', 'b', 'a', 'c'])

In [14]: sorted(zip(a, b))
Out[14]: [(1, 'b'), (2, 'a'), (2, 'b'), (3, 'c')]

In [15]: zip(*sorted(zip(a, b)))
Out[15]: [(1, 2, 2, 3), ('b', 'a', 'b', 'c')]

It sorts both lists/arrays with respect to values in the first (followed by values in the second).

它根据第一个中的值（后跟第二个中的值）对两个列表/数组进行排序。

A more "numpy" way to do this would be to use argsort (which will be much more performant for larger arrays):

一种更“numpy”的方法是使用 argsort （对于更大的数组，它的性能会更高）：

In [21]: s = np.argsort(a)

In [22]: a[s], b[s]
Out[22]:
(array([1, 2, 2, 3]), array(['b', 'b', 'a', 'c'],
       dtype='|S1'))

Note: gives a slightly different result, as it doesn't deal with draws in a.

注意：给出的结果略有不同，因为它不处理 a 中的平局。

pandas zip，排序和熊猫

提问by MJP

采纳答案by Andy Hayden

相关推荐

最近更新

标签

pandas zip，排序和熊猫

提问by MJP

采纳答案by Andy Hayden

相关推荐

使用 Pandas Excelwriter 写入 StringIO 对象？

pandas 从数据库表中获取数据

pandas.DF() 中的列是否单调递增？

pandas Matplotlib 的 fill_between 不适用于 plot_date，还有其他选择吗？

相关推荐

最近更新

标签