Python Pandas 复制数据框中的行

Question

提问by wuha

If the data look like:

如果数据看起来像：

Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE

And I wanna duplicate rows with IsHoliday equal to TRUE, I can do:

我想复制 IsHoliday 等于 TRUE 的行，我可以这样做：

is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)

But is there a better way to do this as I need to duplicate holiday rows by 5 times, and I have to append 5 times if using above way.

但是有没有更好的方法来做到这一点，因为我需要将假日行复制 5 次，如果使用上述方式，我必须追加 5 次。

Answer 1

采纳答案by Karl D.

You can put df_tryinside a list and then do what you have in mind:

您可以放入df_try一个列表，然后按照您的想法进行操作：

>>> df.append([df_try]*5,ignore_index=True)

    Store  Dept       Date  Weekly_Sales IsHoliday
0       1     1 2010-02-05      24924.50     False
1       1     1 2010-02-12      46039.49      True
2       1     1 2010-02-19      41595.55     False
3       1     1 2010-02-26      19403.54     False
4       1     1 2010-03-05      21827.90     False
5       1     1 2010-03-12      21043.39     False
6       1     1 2010-03-19      22136.64     False
7       1     1 2010-03-26      26229.21     False
8       1     1 2010-04-02      57258.43     False
9       1     1 2010-02-12      46039.49      True
10      1     1 2010-02-12      46039.49      True
11      1     1 2010-02-12      46039.49      True
12      1     1 2010-02-12      46039.49      True
13      1     1 2010-02-12      46039.49      True

Answer 2

回答by DavidK

df = df_try
for i in range(4):
   df = df.append(df_try)

# Here, we have df_try times 5

df = df.append(df)

# Here, we have df_try times 10

Answer 3

回答by Surya

Other way is using concat() function:

另一种方法是使用 concat() 函数：

import pandas as pd

In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

In [604]: df
Out[604]: 
  col1  col2
0    a     0
1    b     1
2    c     2

In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
Out[605]: 
  col1  col2
0    a     0
1    b     1
2    c     2
3    a     0
4    b     1
5    c     2
6    a     0
7    b     1
8    c     2

In [606]: pd.concat([df]*3)
Out[606]: 
  col1  col2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2

Answer 4

回答by snooze_bear

This is an old question, but since it still comes up at the top of my results in Google, here's another way.

这是一个老问题，但由于它仍然出现在我在谷歌搜索结果的顶部，这是另一种方式。

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

Say you want to replicate the rows where col1="b".

假设您要复制 col1="b" 的行。

reps = [3 if val=="b" else 1 for val in df.col1]
df.loc[np.repeat(df.index.values, reps)]

You could replace the 3 if val=="b" else 1in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

您可以将3 if val=="b" else 1列表解释中的替换为另一个函数，如果 val=="b" 则返回 3，如果 val=="c" 则返回 4，依此类推，因此它非常灵活。

Answer 5

回答by grofte

Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).

在 Pandas 中追加和连接通常很慢，所以我建议只创建一个新的行列表并将其转换为数据帧（除非追加单行或连接几个数据帧）。

import pandas as pd

df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])

temp_df = []
for row in df.itertuples(index=False):
    if row.IsHoliday:
        temp_df.extend([list(row)]*5)
    else:
        temp_df.append(list(row))

df = pd.DataFrame(temp_df, columns=df.columns)

Python Pandas 复制数据框中的行

提问by wuha

采纳答案by Karl D.

回答by DavidK

回答by Surya

回答by snooze_bear

回答by grofte

相关推荐

最近更新

标签

Python Pandas 复制数据框中的行

提问by wuha

采纳答案by Karl D.

回答by DavidK

回答by Surya

回答by snooze_bear

回答by grofte

相关推荐

Python 是否可以使用 pylint 忽略单个特定行？

Python：创建多个列表

检查 Python 列表中的任何项目是否为 None（但包括零）

python bind socket.error: [Errno 13] 权限被拒绝

相关推荐

最近更新

标签