Python 以相同的顺序一次洗牌两个列表

Question

提问by Jaroslav Klim?ík

I'm using the nltklibrary's movie_reviewscorpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documentsand documents2I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7

我正在使用包含大量文档的nltk图书馆movie_reviews语料库。我的任务是通过对数据进行预处理而不进行预处理来获得这些评论的预测性能。但是有问题，在列表中documents，documents2我有相同的文档，我需要对它们进行洗牌以在两个列表中保持相同的顺序。我不能单独洗牌，因为每次洗牌时，我都会得到其他结果。这就是为什么我需要以相同的顺序一次洗牌的原因，因为我最后需要比较它们（这取决于顺序）。我正在使用 python 2.7

Example (in real are strings tokenized, but it is not relative):

示例（实际上是字符串标记化，但不是相对的）：

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

我需要在洗牌两个列表后得到这个结果：

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

I have this code:

我有这个代码：

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Answer 1

采纳答案by sshashank124

You can do it as:

你可以这样做：

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

当然，这是一个具有更简单列表的示例，但对于您的情况，改编将是相同的。

Hope it helps. Good Luck.

希望能帮助到你。祝你好运。

Answer 2

回答by Kundan Kumar

You can use the second argument of the shuffle function to fix the order of shuffling.

您可以使用 shuffle 函数的第二个参数来修复 shuffle 的顺序。

Specifically, you can pass the second argument of shuffle function a zero argument function which returns a value in [0, 1). The return value of this function fixes the order of shuffling. (By default i.e. if you do not pass any function as the second argument, it uses the function random.random(). You can see it at line 277 here.)

具体来说，您可以将 shuffle 函数的第二个参数传递一个零参数函数，该函数返回 [0, 1) 中的值。这个函数的返回值固定了shuffle的顺序。（默认情况下，即如果您没有将任何函数作为第二个参数传递，它将使用 function random.random()。您可以在此处的第 277 行看到它。）

This example illustrates what I described:

这个例子说明了我所描述的：

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

Output:

输出：

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]

Answer 3

回答by Lion Lai

Shuffle an arbitray number of lists simultaneously.

同时打乱任意数量的列表。

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Output:

输出：

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

Note:
objects returned by shuffle_list()are tuples.

注意：
由shuffle_list()are返回的对象tuples。

P.S. shuffle_list()can also be applied to numpy.array()

PS shuffle_list()也可以应用于numpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

Output:

输出：

$ (3, 1, 2) (6, 4, 5)

Answer 4

回答by hua wei

I get a easy way to do this

我有一个简单的方法来做到这一点

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

Answer 5

回答by YScharf

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Answer 6

回答by Boris

Easy and fast way to do this is to use random.seed() with random.shuffle() . It lets you generate same random order many times you want. It will look like this:

简单快捷的方法是使用 random.seed() 和 random.shuffle() 。它可以让您多次生成相同的随机顺序。它看起来像这样：

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

This also works when you can't work with both lists at the same time, because of memory problems.

这也适用于由于内存问题而无法同时处理两个列表的情况。

Python 以相同的顺序一次洗牌两个列表

提问by Jaroslav Klim?ík

采纳答案by sshashank124

回答by Kundan Kumar

回答by Lion Lai

回答by hua wei

回答by YScharf

回答by Boris

相关推荐

最近更新

标签

Python 以相同的顺序一次洗牌两个列表

提问by Jaroslav Klim?ík

采纳答案by sshashank124

回答by Kundan Kumar

回答by Lion Lai

回答by hua wei

回答by YScharf

回答by Boris

相关推荐

在python shell中导入pyspark

Python Pandas DataFrame 到列表列表

如何模拟 Python 静态方法和类方法

Python 获取 DataFrame 的日期时间列的工作日/星期几

相关推荐

最近更新

标签