Python 向 Pandas 数据框插入一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24284342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:22:09  来源:igfitidea点击:

Insert a row to pandas dataframe

pythonpandasdataframeinsert

提问by Meloun

I have a dataframe:

我有一个数据框:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

   A  B  C
0  5  6  7
1  7  8  9

[2 rows x 3 columns]

and I need to add a first row [2, 3, 4] to get:

我需要添加第一行 [2, 3, 4] 以获得:

   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

I've tried append()and concat()functions but can't find the right way how to do that.

我已经尝试过append()concat()但无法找到正确的方法来做到这一点。

How to add/insert series to dataframe?

如何在数据框中添加/插入系列?

回答by FooBar

One way to achieve this is

实现这一目标的一种方法是

>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Generally, it's easiest to append dataframes, not series. In your case, since you want the new row to be "on top" (with starting id), and there is no function pd.prepend(), I first create the new dataframe and then append your old one.

通常,最容易附加数据帧,而不是系列。在您的情况下,由于您希望新行位于“顶部”(带有起始 ID),并且没有函数pd.prepend(),我首先创建新数据框,然后附加旧数据框。

ignore_indexwill ignore the old ongoing index in your dataframe and ensure that the first row actually starts with index 1instead of restarting with index 0.

ignore_index将忽略数据框中旧的正在进行的索引,并确保第一行实际上以 index 开头,1而不是以 index 重新启动0

Typical Disclaimer: Cetero censeo ... appending rows is a quite inefficient operation. If you care about performance and can somehow ensure to first create a dataframe with the correct (longer) index and then just insertingthe additional row into the dataframe, you should definitely do that. See:

典型的免责声明:Cetero ceneo ...附加行是一种效率很低的操作。如果您关心性能并且可以以某种方式确保首先创建一个具有正确(更长)索引的数据帧,然后额外的行插入到数据帧中,那么您绝对应该这样做。看:

>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]: 
     A    B    C
0    5    6    7
1    7    8    9
2  NaN  NaN  NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]

So far, we have what you had as df:

到目前为止,我们有你所拥有的df

>>> df2
Out[339]: 
     A    B    C
0  NaN  NaN  NaN
1    5    6    7
2    7    8    9

But now you can easily insert the row as follows. Since the space was preallocated, this is more efficient.

但是现在您可以轻松插入行,如下所示。由于空间是预先分配的,因此效率更高。

>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

回答by Piotr Migdal

Just assign row to a particular index, using loc:

只需将行分配给特定索引,使用loc

 df.loc[-1] = [2, 3, 4]  # adding a row
 df.index = df.index + 1  # shifting index
 df = df.sort_index()  # sorting by index

And you get, as desired:

您可以根据需要获得:

    A  B  C
 0  2  3  4
 1  5  6  7
 2  7  8  9

See in Pandas documentation Indexing: Setting with enlargement.

请参阅 Pandas 文档索引:放大设置

回答by mgilbert

Not sure how you were calling concat()but it should work as long as both objects are of the same type. Maybe the issue is that you need to cast your second vector to a dataframe? Using the df that you defined the following works for me:

不确定您是如何调用的,concat()但只要两个对象的类型相同,它就应该可以工作。也许问题是您需要将第二个向量转换为数据帧?使用您定义的 df 对我有用:

df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])

回答by elPastor

I put together a short function that allows for a little more flexibility when inserting a row:

我整理了一个简短的函数,可以在插入行时提供更大的灵活性:

def insert_row(idx, df, df_insert):
    dfA = df.iloc[:idx, ]
    dfB = df.iloc[idx:, ]

    df = dfA.append(df_insert).append(dfB).reset_index(drop = True)

    return df

which could be further shortened to:

可以进一步缩短为:

def insert_row(idx, df, df_insert):
    return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)

Then you could use something like:

然后你可以使用类似的东西:

df = insert_row(2, df, df_new)

where 2is the index position in dfwhere you want to insert df_new.

where2df您要插入的索引位置df_new

回答by Tai

We can use numpy.insert. This has the advantage of flexibility. You only need to specify the index you want to insert to.

我们可以使用numpy.insert. 这具有灵活性的优点。您只需要指定要插入的索引。

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0))

    0   1   2
0   2   3   4
1   5   6   7
2   7   8   9

For np.insert(df.values, 0, values=[2, 3, 4], axis=0), 0 tells the function the place/index you want to place the new values.

对于np.insert(df.values, 0, values=[2, 3, 4], axis=0), 0 告诉函数您要放置新值的位置/索引。

回答by Sagar Rathod

Below would be the best way to insert a row into pandas dataframe without sorting and reseting an index:

下面是在不排序和重置索引的情况下将行插入到 Pandas 数据框中的最佳方法:

import pandas as pd

df = pd.DataFrame(columns=['a','b','c'])

def insert(df, row):
    insert_loc = df.index.max()

    if pd.isna(insert_loc):
        df.loc[0] = row
    else:
        df.loc[insert_loc + 1] = row

insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)

回答by Aaron Melgar

this might seem overly simple but its incredible that a simple insert new row function isn't built in. i've read a lot about appending a new df to the original, but i'm wondering if this would be faster.

这可能看起来过于简单,但令人难以置信的是,没有内置简单的插入新行函数。我已经阅读了很多关于将新的 df 附加到原始文件的内容,但我想知道这是否会更快。

df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]

回答by Xinyi Li

You can simply append the row to the end of the DataFrame, and then adjust the index.

您可以简单地将该行附加到 DataFrame 的末尾,然后调整索引。

For instance:

例如:

df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()

Or use concatas:

concat用作:

df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)

回答by Pepe

It is pretty simple to add a row into a pandas DataFrame:

在 pandas 中添加一行非常简单DataFrame

  1. Create a regular Python dictionary with the same columns names as your Dataframe;

  2. Use pandas.append()method and pass in the name of your dictionary;

  3. Add ignore_index=Trueright after your dictionary name.

  1. 创建一个与您的列名称相同的常规 Python 字典Dataframe

  2. 使用pandas.append()方法并传入你的字典名称;

  3. ignore_index=True在您的字典名称之后添加。

回答by Pepe

The simplest way add a row in a pandas data frame is:

在 Pandas 数据框中添加一行的最简单方法是:

DataFrame.loc[ location of insertion ]= list( )

Example :

例子 :

DF.loc[ 9 ] = [ ′Pepe' , 33, ′Japan' ]

NB: the length of your list should match that of the data frame.

注意:列表的长度应与数据框的长度相匹配。