在 Pandas 中复制行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50788508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replicating rows in Pandas
提问by DasVisual
My pandas dataframe looks like this:
我的Pandas数据框如下所示:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 32917 271 88172 Male
2 18273 552 90291 Female
I want to replicate every row 3 times like:
我想将每一行复制 3 次,例如:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
And of course reset the index so it is:
当然,重置索引是这样的:
0
1
2
I tried solutions such as:
我尝试了以下解决方案:
pd.concat([df[:5]]*3, ignore_index=True)
and:
和:
df.reindex(np.repeat(df.index.values, df['ID']), method='ffill')
I have had no luck, if you can help I'd appreciate it.
我没有运气,如果你能帮忙,我会很感激。
回答by U10-Forward
Try this np.repeat
:
试试这个np.repeat
:
newdf = pd.DataFrame(np.repeat(df.values,3,axis=0))
newdf.columns = df.columns
print(newdf)
Output:
输出:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
回答by piRSquared
These will repeat the indices and preserve the columns as op demonstrated
这些将重复索引并保留列,如操作所示
iloc
version 1
iloc
版本 1
df.iloc[np.arange(len(df)).repeat(3)]
iloc
version 2
iloc
版本 2
df.iloc[np.arange(len(df) * 3) // 3]
回答by YOBEN_S
Maybe using concat
也许使用 concat
pd.concat([df]*3).sort_index()
Out[129]:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
回答by IMCoins
You can do it like this.
你可以这样做。
def do_things(df, n_times):
ndf = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
ndf = ndf.sort_values(by='name')
ndf = ndf.reset_index(drop=True)
return ndf
if __name__ == '__main__':
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Hymanson']})
n_times = 3
print do_things(df, n_times)
And with explanation...
并有解释...
import pandas as pd
import numpy as np
n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Hymanson']})
# name
# 0 Peter
# 1 Quill
# 2 Hymanson
# Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
# name
# 0 Peter
# 1 Quill
# 2 Hymanson
# 0 Peter
# 1 Peter
# 2 Peter
# 3 Quill
# 4 Quill
# 5 Quill
# 6 Hymanson
# 7 Hymanson
# 8 Hymanson
# The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
# name
# 2 Hymanson
# 6 Hymanson
# 7 Hymanson
# 8 Hymanson
# 0 Peter
# 0 Peter
# 1 Peter
# 2 Peter
# 1 Quill
# 3 Quill
# 4 Quill
# 5 Quill
# Reseting the index.
# You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
# index name
# 0 2 Hymanson
# 1 6 Hymanson
# 2 7 Hymanson
# 3 8 Hymanson
# 4 0 Peter
# 5 0 Peter
# 6 1 Peter
# 7 2 Peter
# 8 1 Quill
# 9 3 Quill
# 10 4 Quill
# 11 5 Quill