Pandas/Python 根据条件添加行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40091963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas/Python adding row based on condition
提问by Mike
I am looking to insert a row into a dataframe between two existing rows based on certain criteria.
我希望根据某些条件在两个现有行之间的数据框中插入一行。
For example, my data frame:
例如,我的数据框:
import pandas as pd
df = pd.DataFrame({'Col1':['A','B','D','E'],'Col2':['B', 'C', 'E', 'F'], 'Col3':['1', '1', '1', '1']})
Which looks like:
看起来像:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 D E 1
3 E F 1
I want to be able to insert a new row between Index 1 and Index 2 given the condition:
给定条件,我希望能够在索引 1 和索引 2 之间插入一个新行:
n = 0
while n < len(df):
(df.ix[n]['Col2'] == df.ix[n+1]['Col1']) == False
Something, Something, insert row
n+=1
My desired output table would look like:
我想要的输出表如下所示:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 C D 1
3 D E 1
4 E F 1
I am struggling with conditional inserting of rows based on values in the previous and proceeding records. I ultimately want to preform the above exercise on my real world example which would include multiple conditions, and preserving the values of more than one column (in this example it was Col3, but in my real world it would be multiple columns)
我正在努力根据前一个和正在进行的记录中的值有条件地插入行。我最终想在我的真实世界示例中执行上述练习,该示例将包含多个条件,并保留不止一列的值(在此示例中它是 Col3,但在我的现实世界中它将是多列)
回答by MaxU
UPDATE: memory saving method - first set a new index with a gap for a new row:
更新:内存节省方法 - 首先为新行设置一个有间隙的新索引:
In [30]: df
Out[30]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 D E 1
3 E F 1
if we want to insert a new row between indexes 1
and 2
, we split the index at position 2
:
如果我们想在索引1
和之间插入一个新行2
,我们将在位置 处拆分索引2
:
In [31]: idxs = np.split(df.index, 2)
set a new index (with gap at position 2
):
设置一个新索引(在位置有间隙2
):
In [32]: df.set_index(idxs[0].union(idxs[1] + 1), inplace=True)
In [33]: df
Out[33]:
Col1 Col2 Col3
0 A B 1
1 B C 1
3 D E 1
4 E F 1
insert new row with index 2
:
插入带有索引的新行2
:
In [34]: df.loc[2] = ['X','X',2]
In [35]: df
Out[35]:
Col1 Col2 Col3
0 A B 1
1 B C 1
3 D E 1
4 E F 1
2 X X 2
sort index:
排序索引:
In [38]: df.sort_index(inplace=True)
In [39]: df
Out[39]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 X X 2
3 D E 1
4 E F 1
PS you also can insert DataFrame instead of single row using df.append(new_df)
:
PS,您还可以使用df.append(new_df)
以下方法插入 DataFrame 而不是单行:
In [42]: df
Out[42]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 D E 1
3 E F 1
In [43]: idxs = np.split(df.index, 2)
In [45]: new_df = pd.DataFrame([['X', 'X', 10], ['Y','Y',11]], columns=df.columns)
In [49]: new_df.index += idxs[1].min()
In [51]: new_df
Out[51]:
Col1 Col2 Col3
2 X X 10
3 Y Y 11
In [52]: df = df.set_index(idxs[0].union(idxs[1]+len(new_df)))
In [53]: df
Out[53]:
Col1 Col2 Col3
0 A B 1
1 B C 1
4 D E 1
5 E F 1
In [54]: df = df.append(new_df)
In [55]: df
Out[55]:
Col1 Col2 Col3
0 A B 1
1 B C 1
4 D E 1
5 E F 1
2 X X 10
3 Y Y 11
In [56]: df.sort_index(inplace=True)
In [57]: df
Out[57]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 X X 10
3 Y Y 11
4 D E 1
5 E F 1
OLD answer:
旧答案:
One (among many) way to achieve that would be to split your DF and concatenate it together with needed DF in desired order:
实现这一目标的一种(其中多种)方法是拆分您的 DF 并将其与所需的 DF 按所需顺序连接在一起:
Original DF:
原始DF:
In [12]: df
Out[12]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 D E 1
3 E F 1
let's split it into two parts ([0:1], [2:end]):
让我们把它分成两部分([0:1], [2:end]):
In [13]: dfs = np.split(df, [2])
In [14]: dfs
Out[14]:
[ Col1 Col2 Col3
0 A B 1
1 B C 1, Col1 Col2 Col3
2 D E 1
3 E F 1]
now we can concatenate it together with a new DF in desired order:
现在我们可以按照所需的顺序将它与一个新的 DF 连接在一起:
In [15]: pd.concat([dfs[0], pd.DataFrame([['C','D', 1]], columns=df.columns), dfs[1]], ignore_index=True)
Out[15]:
Col1 Col2 Col3
0 A B 1
1 B C 1
2 C D 1
3 D E 1
4 E F 1