在 Pandas 中复制行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50788508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:40:53  来源:igfitidea点击:

Replicating rows in Pandas

pythonpandasdataframedataset

提问by DasVisual

My pandas dataframe looks like this:

我的Pandas数据框如下所示:

   Person  ID   ZipCode   Gender
0  12345   882  38182     Female
1  32917   271  88172     Male
2  18273   552  90291     Female

I want to replicate every row 3 times like:

我想将每一行复制 3 次,例如:

   Person  ID   ZipCode   Gender
0  12345   882  38182     Female
0  12345   882  38182     Female
0  12345   882  38182     Female
1  32917   271  88172     Male
1  32917   271  88172     Male
1  32917   271  88172     Male
2  18273   552  90291     Female
2  18273   552  90291     Female
2  18273   552  90291     Female

And of course reset the index so it is:

当然,重置索引是这样的:

0
1
2

I tried solutions such as:

我尝试了以下解决方案:

pd.concat([df[:5]]*3, ignore_index=True)

and:

和:

df.reindex(np.repeat(df.index.values, df['ID']), method='ffill')

I have had no luck, if you can help I'd appreciate it.

我没有运气,如果你能帮忙,我会很感激。

回答by U10-Forward

Try this np.repeat:

试试这个np.repeat

newdf = pd.DataFrame(np.repeat(df.values,3,axis=0))
newdf.columns = df.columns
print(newdf)

Output:

输出:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

回答by piRSquared

These will repeat the indices and preserve the columns as op demonstrated

这些将重复索引并保留列,如操作所示

ilocversion 1

iloc版本 1

df.iloc[np.arange(len(df)).repeat(3)]


ilocversion 2

iloc版本 2

df.iloc[np.arange(len(df) * 3) // 3]

回答by YOBEN_S

Maybe using concat

也许使用 concat

pd.concat([df]*3).sort_index()
Out[129]: 
   Person   ID  ZipCode  Gender
0   12345  882    38182  Female
0   12345  882    38182  Female
0   12345  882    38182  Female
1   32917  271    88172    Male
1   32917  271    88172    Male
1   32917  271    88172    Male
2   18273  552    90291  Female
2   18273  552    90291  Female
2   18273  552    90291  Female

回答by IMCoins

You can do it like this.

你可以这样做。

def do_things(df, n_times):
    ndf = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
    ndf = ndf.sort_values(by='name')
    ndf = ndf.reset_index(drop=True)
    return ndf

if __name__ == '__main__':
    df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Hymanson']}) 
    n_times = 3
    print do_things(df, n_times)

And with explanation...

并有解释...

import pandas as pd
import numpy as np

n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Hymanson']})
#       name
# 0    Peter
# 1    Quill
# 2  Hymanson

#   Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
#       name
# 0    Peter
# 1    Quill
# 2  Hymanson
# 0    Peter
# 1    Peter
# 2    Peter
# 3    Quill
# 4    Quill
# 5    Quill
# 6  Hymanson
# 7  Hymanson
# 8  Hymanson

#   The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
#       name
# 2  Hymanson
# 6  Hymanson
# 7  Hymanson
# 8  Hymanson
# 0    Peter
# 0    Peter
# 1    Peter
# 2    Peter
# 1    Quill
# 3    Quill
# 4    Quill
# 5    Quill

#   Reseting the index.
#   You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
#     index     name
# 0       2  Hymanson
# 1       6  Hymanson
# 2       7  Hymanson
# 3       8  Hymanson
# 4       0    Peter
# 5       0    Peter
# 6       1    Peter
# 7       2    Peter
# 8       1    Quill
# 9       3    Quill
# 10      4    Quill
# 11      5    Quill