pandas Python - 熊猫 - 将系列附加到空白数据帧中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23974802/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:06:27  来源:igfitidea点击:

Python - pandas - Append Series into Blank DataFrame

pythonmatrixpandasdataframe

提问by bill999

Say I have two pandas Series in python:

假设我在 python 中有两个Pandas系列:

import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])

I can create a DataFrame with just h and then append g to it:

我可以只用 h 创建一个 DataFrame,然后将 g 附加到它:

df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)

I get:

我得到:

>>> df1
   0  1  2  3    4
0  g  4  2  1    1
1  1  6  5  4  abc

But now suppose that I have an empty DataFrame and I try to append h to it:

但是现在假设我有一个空的 DataFrame 并且我尝试将 h 附加到它:

df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)

This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.

这不起作用。我认为问题出在倒数第二行代码中。我需要以某种方式定义空白 DataFrame 以具有正确的列数。

By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.

顺便说一下,我尝试这样做的原因是我正在使用 requests+BeautifulSoup 从互联网上抓取文本,我正在处理它并尝试一次将其写入 DataFrame 一行。

回答by EdChum

So if you don't pass an empty list to the DataFrame constructor then it works:

因此,如果您不将空列表传递给 DataFrame 构造函数,则它可以工作:

In [16]:

df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

The difference between the two constructor approaches appears to be that the index dtypesare set differently, with an empty list it is an Int64with nothing it is an object:

两种构造方法之间的区别似乎是索引dtypes的设置不同,空列表是 an Int64,没有任何内容object

In [21]:

df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64

Unclear to me why the above should affect the behaviour (I'm guessing here).

我不清楚为什么上述会影响行为(我在这里猜测)。

UPDATE

更新

After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0as your original code works fine:

在重新审视这个之后,我可以确认这在我看来是 Pandas 版本中的一个错误,0.12.0因为您的原始代码工作正常:

In [13]:

import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)

Out[13]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

I am running pandas 0.13.1and numpy 1.8.164-bit using python 3.3.5.0but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.

我正在使用 python运行 Pandas0.13.1和 numpy 1.8.164 位,3.3.5.0但我认为问题是 Pandas,但我会升级 Pandas 和 numpy 以确保安全,我认为这不是 32 位与 64 位 python 问题。