Python 如何重复 Pandas 数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23887881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to repeat Pandas data frame?
提问by lsheng
This is my data frame that should be repeated for 5 times:
这是我的数据框,应该重复 5 次:
>>> x = pd.DataFrame({'a':1,'b':2},index = range(1))
>>> x
a b
0 1 2
I wanna have the result like this:
我想要这样的结果:
>>> x.append(x).append(x).append(x)
a b
0 1 2
0 1 2
0 1 2
0 1 2
But there must be a way smarter than keep appending.. Actually the data frame Im working on should be repeated for 50 times..
但是必须有一种比继续追加更聪明的方法。实际上,我正在处理的数据框应该重复 50 次。
I haven't found anything practical, including those like np.repeat
---- it just doesnt work on data frame.
我还没有发现任何实用的东西,包括像np.repeat
----它只是不适用于数据框。
Could anyone help?
有人可以帮忙吗?
采纳答案by joris
You can use the concat
function:
您可以使用该concat
功能:
In [13]: pd.concat([x]*5)
Out[13]:
a b
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
If you only want to repeat the values and not the index, you can do:
如果您只想重复值而不是索引,您可以执行以下操作:
In [14]: pd.concat([x]*5, ignore_index=True)
Out[14]:
a b
0 1 2
1 1 2
2 1 2
3 1 2
4 1 2
回答by FooBar
I would generally not repeat and/or append, unless your problem really makes it necessary - it is highly inefficiently and typicallycomes from not understanding the proper way to attack a problem.
我通常不会重复和/或追加,除非您的问题确实有必要 - 它非常低效并且通常来自不了解解决问题的正确方法。
I don't know your exact use case, but if you have your values stored as
我不知道您的确切用例,但是如果您将值存储为
values = array(1, 2)
df2 = pd.DataFrame(index=arange(0,50), columns=['a', 'b'])
df2[['a', 'b']] = values
will do the job. Perhaps you want to better explain what you're trying to achieve?
会做的工作。也许您想更好地解释您想要实现的目标?
回答by Surya
Append should work too:
附加也应该工作:
In [589]: x = pd.DataFrame({'a':1,'b':2},index = range(1))
In [590]: x
Out[590]:
a b
0 1 2
In [591]: x.append([x]*5, ignore_index=True) #Ignores the index as per your need
Out[591]:
a b
0 1 2
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
In [592]: x.append([x]*5)
Out[592]:
a b
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
回答by Andy Hayden
I think it's cleaner/faster to use iloc
nowadays:
我认为现在使用更清洁/更快iloc
:
In [11]: np.full(3, 0)
Out[11]: array([0, 0, 0])
In [12]: x.iloc[np.full(3, 0)]
Out[12]:
a b
0 1 2
0 1 2
0 1 2
More generally, you can use tile
or repeat
with arange
:
In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
In [22]: df
Out[22]:
A B
0 1 2
1 3 4
In [23]: np.tile(np.arange(len(df)), 3)
Out[23]: array([0, 1, 0, 1, 0, 1])
In [24]: np.repeat(np.arange(len(df)), 3)
Out[24]: array([0, 0, 0, 1, 1, 1])
In [25]: df.iloc[np.tile(np.arange(len(df)), 3)]
Out[25]:
A B
0 1 2
1 3 4
0 1 2
1 3 4
0 1 2
1 3 4
In [26]: df.iloc[np.repeat(np.arange(len(df)), 3)]
Out[26]:
A B
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4
Note: This will work with non-integer indexed DataFrames (and Series).
注意:这将适用于非整数索引的 DataFrame(和系列)。
回答by U10-Forward
Try using numpy.repeat
:
尝试使用numpy.repeat
:
>>> df=pd.DataFrame(pd.np.repeat(x.values,5,axis=0),columns=x.columns)
>>> df
a b
0 1 2
1 1 2
2 1 2
3 1 2
4 1 2
>>>