pandas 熊猫:对于 df 中的每一行,复制行 N 次,稍有变化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32038427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: for each row in df copy row N times with slight changes
提问by Philipp_Kats
So I have a DataFrame like this:
所以我有一个像这样的数据帧:
N start
1 1 08/01/2014 9:30:02
2 1 08/01/2014 10:30:02
3 2 08/01/2014 12:30:02
4 3 08/01/2014 4:30:02
and I need to duplicate each row N times, adding one hour to start each time, like this:
我需要将每一行复制 N 次,每次增加一小时开始,如下所示:
N start
1 1 08/01/2014 9:30:02
2 1 08/01/2014 10:30:02
3 2 08/01/2014 12:30:02
3 2 08/01/2014 13:30:02
4 3 08/01/2014 4:30:02
4 3 08/01/2014 5:30:02
4 3 08/01/2014 6:30:02
how can I do it within pandas?
我怎样才能在Pandas中做到这一点?
采纳答案by unutbu
You could use reindex to expand the DataFrame, and TimedeltaIndex to add the hours:
您可以使用 reindex 来扩展 DataFrame,并使用 TimedeltaIndex 来添加小时数:
import pandas as pd
df = pd.DataFrame({'N': [1, 1, 2, 3],
'start': ['08/01/2014 9:30:02',
'08/01/2014 10:30:02',
'08/01/2014 12:30:02',
'08/01/2014 4:30:02']})
df['start'] = pd.to_datetime(df['start'])
df = df.reindex(np.repeat(df.index.values, df['N']), method='ffill')
df['start'] += pd.TimedeltaIndex(df.groupby(level=0).cumcount(), unit='h')
which yields
这产生
N start
0 1 2014-08-01 09:30:02
1 1 2014-08-01 10:30:02
2 2 2014-08-01 12:30:02
2 2 2014-08-01 13:30:02
3 3 2014-08-01 04:30:02
3 3 2014-08-01 05:30:02
3 3 2014-08-01 06:30:02
回答by Shahram
This may not be the most efficient way but will get you the results:
这可能不是最有效的方法,但会给您带来结果:
import pandas as pd
l = []
for index,item in df.iterrows():
l.append([item[0],pd.to_datetime(item[1])])
i=1
# it was not clear if you want to repeat based on N or the index... if index then replace item[0] with index
while i<item[0]:
l.append([item[0],pd.to_datetime(item[1])+pd.Timedelta('1 hours')])
i=i+1
dfResult = pd.DataFrame(l,columns=['N','Start'])

