pandas 在熊猫的数据框中随机化/改组行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24701217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Randomizing/Shuffling rows in a dataframe in pandas
提问by avidman
I am currently trying to find a way to randomize items in a dataframe row-wise. I found this thread on shuffling/permutation column-wise in pandas (shuffling/permutating a DataFrame in pandas), but for my purposes, is there a way to do something like
我目前正试图找到一种方法来随机化数据帧中的项目。我在Pandas 中的shuffling/permutation column-wise 中找到了这个线程(shuffling/permutating a DataFrame in pandas),但就我的目的而言,有没有办法做类似的事情
import pandas as pd
data = {'day': ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri'],
'color': ['Blue', 'Red', 'Green', 'Yellow', 'Black'],
'Number': [11, 8, 10, 15, 11]}
dataframe = pd.DataFrame(data)
Number color day
0 11 Blue Mon
1 8 Red Tues
2 10 Green Wed
3 15 Yellow Thurs
4 11 Black Fri
And randomize the rows into some like
并将行随机化为类似
Number color day
0 Mon Blue 11
1 Red Tues 8
2 10 Wed Green
3 15 Yellow Thurs
4 Black 11 Fri
If in order to do so, the column headers would have to go away or something of the like, I understand.
如果为了这样做,列标题将不得不消失或类似的东西,我理解。
EDIT: So, in the thread I posted, part of the code refers to an "axis" parameter. I understand that axis = 0 refers to the columns and axis =1 refers to the rows. I tried taking the code and changing the axis to 1, and it seems to randomize my dataframe only if the table consists of all numbers (as opposed to a list of strings, or a combination of the two).
编辑:因此,在我发布的线程中,部分代码引用了“轴”参数。我知道axis = 0 指的是列,axis = 1 指的是行。我尝试使用代码并将轴更改为 1,并且仅当表包含所有数字(而不是字符串列表或两者的组合)时,它似乎才会随机化我的数据框。
That said, should I consider not using dataframes? Is there a better 2D structure where I can randomize the rows and the columns if my data consists of only strings or a combinations of ints and strings?
也就是说,我应该考虑不使用数据帧吗?如果我的数据仅包含字符串或整数和字符串的组合,是否有更好的 2D 结构可以使行和列随机化?
回答by jrjc
Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)
编辑:我误解了这个问题,这只是洗牌而不是所有表(对吗?)
I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :
我认为使用数据框没有多大意义,因为列名变得无用。所以你可以只使用 2D numpy 数组:
In [1]: A
Out[1]:
array([[11, 'Blue', 'Mon'],
[8, 'Red', 'Tues'],
[10, 'Green', 'Wed'],
[15, 'Yellow', 'Thurs'],
[11, 'Black', 'Fri']], dtype=object)
In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so return None
In [3]: A
Out[3]:
array([['Mon', 11, 'Blue'],
[8, 'Tues', 'Red'],
['Wed', 10, 'Green'],
['Thurs', 15, 'Yellow'],
[11, 'Black', 'Fri']], dtype=object)
And if you want to keep dataframe :
如果你想保留数据框:
In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]:
Number color day
0 Mon 11 Blue
1 8 Tues Red
2 Wed 10 Green
3 Thurs 15 Yellow
4 11 Black Fri
Here a function to shuffle rows and columns:
这是一个对行和列进行随机排列的函数:
import numpy as np
import pandas as pd
def shuffle(df):
col = df.columns
val = df.values
shape = val.shape
val_flat = val.flatten()
np.random.shuffle(val_flat)
return pd.DataFrame(val_flat.reshape(shape),columns=col)
In [2]: data
Out[2]:
Number color day
0 11 Blue Mon
1 8 Red Tues
2 10 Green Wed
3 15 Yellow Thurs
4 11 Black Fri
In [3]: shuffle(data)
Out[3]:
Number color day
0 Fri Wed Yellow
1 Thurs Black Red
2 Green Blue 11
3 11 8 10
4 Mon Tues 15
Hope this helps
希望这可以帮助
回答by Happy001
Maybe flatten the 2d array and then shuffle?
也许展平二维数组然后洗牌?
In [21]: data2=dataframe.values.flatten()
In [22]: np.random.shuffle(data2)
In [23]: dataframe2=pd.DataFrame (data2.reshape(dataframe.shape), columns=dataframe.columns )
In [24]: dataframe2
Out[24]:
Number color day
0 Tues Yellow 11
1 Red Green Wed
2 Thurs Mon Blue
3 15 8 Black
4 Fri 11 10
回答by Raphvanns
Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287which uses np.apply_along_axis()
基于@jrjc 的回答,我发布了https://stackoverflow.com/a/44686455/5009287使用np.apply_along_axis()
a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]
print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
[22 21 20]
[31 30 32]
[40 41 42]]
See the full answer to see how that could be integrated with a Pandas df.
查看完整答案以了解如何将其与 Pandas df 集成。

