Python 使用熊猫在数据框中附加一个空行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39998262/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:02:55  来源:igfitidea点击:

Append an empty row in dataframe using pandas

pythonpython-2.7pandas

提问by Mansoor Akram

I am trying to append an empty row at the end of dataframe but unable to do so, even trying to understand how pandas work with append function and still not getting it.

我试图在数据帧的末尾附加一个空行,但无法这样做,甚至试图了解 Pandas 如何使用 append 函数但仍然没有得到它。

Here's the code:

这是代码:

import pandas as pd

excel_names = ["ARMANI+EMPORIO+AR0143-book.xlsx"]
excels = [pd.ExcelFile(name) for name in excel_names]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None).dropna(how='all') for x in excels]
for f in frames:
    f.append(0, float('NaN'))
    f.append(2, float('NaN'))

There are two columns and random number of row.

有两列和随机数的行。

with "print f" in for loop i Get this:

在for循环中使用“print f”我得到这个:

                             0                 1
0                   Brand Name    Emporio Armani
2                 Model number            AR0143
4                  Part Number            AR0143
6                   Item Shape       Rectangular
8   Dial Window Material Type           Mineral
10               Display Type          Analogue
12                 Clasp Type            Buckle
14               Case Material   Stainless steel
16              Case Diameter    31 millimetres
18               Band Material           Leather
20                 Band Length  Women's Standard
22                 Band Colour             Black
24                 Dial Colour             Black
26            Special Features       second-hand
28                    Movement            Quartz

采纳答案by srcerer

Add a new pandas.Series using pandas.DataFrame.append().

使用pandas.DataFrame.append() 添加一个新的pandas.Series。

If you wish to specify the name (AKA the "index") of the new row, use:

如果要指定新行的名称(也称为“索引”),请使用:

df.append(pandas.Series(name='NameOfNewRow'))

If you don't wish to name the new row, use:

如果您不想命名新行,请使用:

df.append(pandas.Series(), ignore_index=True)

where dfis your pandas.DataFrame.

df你的 pandas.DataFrame在哪里。

回答by silent_dev

You can add it by appending a Series to the dataframe as follows. I am assuming by blank you mean you want to add a row containing only "Nan". You can first create a Series object with Nan. Make sure you specify the columns while defining 'Series' object in the -Index parameter. The you can append it to the DF. Hope it helps!

您可以通过将系列附加到数据帧来添加它,如下所示。我假设空白是指您要添加仅包含“Nan”的行。你可以先用 Nan 创建一个 Series 对象。确保在 -Index 参数中定义“系列”对象时指定列。您可以将其附加到 DF。希望能帮助到你!

from numpy import nan as Nan
import pandas as pd

>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
...                     'B': ['B0', 'B1', 'B2', 'B3'],
...                     'C': ['C0', 'C1', 'C2', 'C3'],
...                     'D': ['D0', 'D1', 'D2', 'D3']},
...                     index=[0, 1, 2, 3])

>>> s2 = pd.Series([Nan,Nan,Nan,Nan], index=['A', 'B', 'C', 'D'])
>>> result = df1.append(s2)
>>> result
     A    B    C    D
0   A0   B0   C0   D0
1   A1   B1   C1   D1
2   A2   B2   C2   D2
3   A3   B3   C3   D3
4  NaN  NaN  NaN  NaN

回答by pocketdora

You can add a new series, and name it at the same time. The name will be the index of the new row, and all the values will automatically be NaN.

您可以添加一个新系列,并同时为其命名。名称将是新行的索引,所有值将自动为 NaN。

df.append(pd.Series(name='Afterthought'))

回答by kamal tanwar

The code below worked for me.

下面的代码对我有用。

df.append(pd.Series([np.nan]), ignore_index = True)

回答by Dave Reikher

Assuming dfis your dataframe,

假设df是你的数据框,

df_prime = pd.concat([df, pd.DataFrame([[np.nan] * df.shape[1]], columns=df.columns)], ignore_index=True)

where df_primeequals dfwith an additional last row of NaN's.

其中df_prime等于df额外的最后一行 NaN。

Note that pd.concatis slow so if you need this functionality in a loop, it's best to avoid using it. In that case, assuming your index is incremental, you can use

请注意,这pd.concat很慢,因此如果您需要循环使用此功能,最好避免使用它。在这种情况下,假设您的索引是增量的,您可以使用

df.loc[df.iloc[-1].name + 1,:] = np.nan

回答by Daniel R

Assuming your df.index is sorted you can use:

假设您的 df.index 已排序,您可以使用:

df.loc[df.index.max() + 1] = None

It handles well different indexes and column types.

它可以很好地处理不同的索引和列类型。

[EDIT] it works with pd.DatetimeIndex if there is a constant frequency, otherwise we must specify the new index exactly e.g:

[编辑] 如果频率恒定,则它与 pd.DatetimeIndex 一起使用,否则我们必须准确指定新索引,例如:

df.loc[df.index.max() + pd.Timedelta(milliseconds=1)] = None

long example:

长示例:

df = pd.DataFrame([[pd.Timestamp(12432423), 23, 'text_field']], 
                    columns=["timestamp", "speed", "text"],
                    index=pd.DatetimeIndex(start='2111-11-11',freq='ms', periods=1))
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11 Freq: L Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null int64 text 1 non-null object dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 32.0+ bytes

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11 Freq: L Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null int64 text 1 non-null object dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 32.0+ bytes

df.loc[df.index.max() + 1] = None
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000 Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null float64 text 1 non-null object dtypes: datetime64[ns](1), float64(1), object(1) memory usage: 64.0+ bytes

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000 Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null float64 text 1 non-null object dtypes: datetime64[ns](1), float64(1), object(1) memory usage: 64.0+ bytes

df.head()

                            timestamp                   speed      text
2111-11-11 00:00:00.000 1970-01-01 00:00:00.012432423   23.0    text_field
2111-11-11 00:00:00.001 NaT NaN NaN

回答by Alberto Garcia

You can also use:

您还可以使用:

your_dataframe.insert(loc=0, value=np.nan, column="")

where locis your empty row index.

loc你的空行索引在哪里。

回答by Peter

Append "empty" row to data frame and fill selected cells:

将“空”行附加到数据框并填充选定的单元格:

Generate empty data frame (no rows just columns aand b):

生成空数据框(没有行,只有列ab):

import pandas as pd    
col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)

Append empty row at the endof the data frame:

在数据框的末尾追加空行:

df = df.append(pd.Series(), ignore_index = True)

Now fill the empty cell at the end (len(df)-1) of the data frame in column a:

现在填充len(df)-1列中数据框末尾 ( )处的空单元格a

df.loc[[len(df)-1],'a'] = 123

Result:

结果:

     a    b
0  123  NaN


And of course one can iterate over the rows and fill cells:

当然,可以遍历行并填充单元格:

col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)
for x in range(0,5):
    df = df.append(pd.Series(), ignore_index = True)
    df.loc[[len(df)-1],'a'] = 123

Result:

结果:

     a    b
0  123  NaN
1  123  NaN
2  123  NaN
3  123  NaN
4  123  NaN