pandas 在熊猫的数据框中随机化/改组行

Question

提问by avidman

I am currently trying to find a way to randomize items in a dataframe row-wise. I found this thread on shuffling/permutation column-wise in pandas (shuffling/permutating a DataFrame in pandas), but for my purposes, is there a way to do something like

我目前正试图找到一种方法来随机化数据帧中的项目。我在Pandas 中的shuffling/permutation column-wise 中找到了这个线程（shuffling/permutating a DataFrame in pandas），但就我的目的而言，有没有办法做类似的事情

import pandas as pd

data = {'day': ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri'],
       'color': ['Blue', 'Red', 'Green', 'Yellow', 'Black'],
       'Number': [11, 8, 10, 15, 11]}

dataframe = pd.DataFrame(data)
    Number   color    day
0      11    Blue    Mon
1       8     Red   Tues
2      10   Green    Wed
3      15  Yellow  Thurs
4      11   Black    Fri

And randomize the rows into some like

并将行随机化为类似

    Number   color    day
0      Mon    Blue    11
1      Red    Tues     8
2      10     Wed    Green
3      15    Yellow  Thurs
4      Black   11     Fri

If in order to do so, the column headers would have to go away or something of the like, I understand.

如果为了这样做，列标题将不得不消失或类似的东西，我理解。

EDIT: So, in the thread I posted, part of the code refers to an "axis" parameter. I understand that axis = 0 refers to the columns and axis =1 refers to the rows. I tried taking the code and changing the axis to 1, and it seems to randomize my dataframe only if the table consists of all numbers (as opposed to a list of strings, or a combination of the two).

编辑：因此，在我发布的线程中，部分代码引用了“轴”参数。我知道axis = 0 指的是列，axis = 1 指的是行。我尝试使用代码并将轴更改为 1，并且仅当表包含所有数字（而不是字符串列表或两者的组合）时，它似乎才会随机化我的数据框。

That said, should I consider not using dataframes? Is there a better 2D structure where I can randomize the rows and the columns if my data consists of only strings or a combinations of ints and strings?

也就是说，我应该考虑不使用数据帧吗？如果我的数据仅包含字符串或整数和字符串的组合，是否有更好的 2D 结构可以使行和列随机化？

Answer 1

回答by jrjc

Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)

编辑：我误解了这个问题，这只是洗牌而不是所有表（对吗？）

I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :

我认为使用数据框没有多大意义，因为列名变得无用。所以你可以只使用 2D numpy 数组：

In [1]: A
Out[1]: 
array([[11, 'Blue', 'Mon'],
       [8, 'Red', 'Tues'],
       [10, 'Green', 'Wed'],
       [15, 'Yellow', 'Thurs'],
       [11, 'Black', 'Fri']], dtype=object)

In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so return None

In [3]: A
Out[3]: 
array([['Mon', 11, 'Blue'],
       [8, 'Tues', 'Red'],
       ['Wed', 10, 'Green'],
       ['Thurs', 15, 'Yellow'],
       [11, 'Black', 'Fri']], dtype=object)

And if you want to keep dataframe :

如果你想保留数据框：

In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]: 
  Number  color     day
0    Mon     11    Blue
1      8   Tues     Red
2    Wed     10   Green
3  Thurs     15  Yellow
4     11  Black     Fri

Here a function to shuffle rows and columns:

这是一个对行和列进行随机排列的函数：

import numpy as np
import pandas as pd

def shuffle(df):
    col = df.columns
    val = df.values
    shape = val.shape
    val_flat = val.flatten()
    np.random.shuffle(val_flat)
    return pd.DataFrame(val_flat.reshape(shape),columns=col)

In [2]: data
Out[2]: 
   Number   color    day
0      11    Blue    Mon
1       8     Red   Tues
2      10   Green    Wed
3      15  Yellow  Thurs
4      11   Black    Fri

In [3]: shuffle(data)
Out[3]: 
  Number  color     day
0    Fri    Wed  Yellow
1  Thurs  Black     Red
2  Green   Blue      11
3     11      8      10
4    Mon   Tues      15

Hope this helps

希望这可以帮助

Answer 2

回答by Happy001

Maybe flatten the 2d array and then shuffle?

也许展平二维数组然后洗牌？

In [21]: data2=dataframe.values.flatten()

In [22]: np.random.shuffle(data2)

In [23]: dataframe2=pd.DataFrame (data2.reshape(dataframe.shape), columns=dataframe.columns )

In [24]: dataframe2
Out[24]: 
  Number   color    day
0   Tues  Yellow     11
1    Red   Green    Wed
2  Thurs     Mon   Blue
3     15       8  Black
4    Fri      11     10

Answer 3

回答by Raphvanns

Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287which uses np.apply_along_axis()

基于@jrjc 的回答，我发布了https://stackoverflow.com/a/44686455/5009287使用np.apply_along_axis()

a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]]

print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
 [22 21 20]
 [31 30 32]
 [40 41 42]]

See the full answer to see how that could be integrated with a Pandas df.

查看完整答案以了解如何将其与 Pandas df 集成。

pandas 在熊猫的数据框中随机化/改组行

提问by avidman

回答by jrjc

回答by Happy001

回答by Raphvanns

相关推荐

最近更新

标签

pandas 在熊猫的数据框中随机化/改组行

提问by avidman

回答by jrjc

回答by Happy001

回答by Raphvanns

相关推荐

pandas 如何遍历数据框中的列？

pandas 熊猫 groupby 后缺少列

在 pandas.DataFrame 的对角线上设置值

pandas 是否可以在 Python ggplot 上绘制多折线图？

相关推荐

最近更新

标签