pandas Python - 熊猫 - 将系列附加到空白数据帧中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23974802/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - pandas - Append Series into Blank DataFrame
提问by bill999
Say I have two pandas Series in python:
假设我在 python 中有两个Pandas系列:
import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])
I can create a DataFrame with just h and then append g to it:
我可以只用 h 创建一个 DataFrame,然后将 g 附加到它:
df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)
I get:
我得到:
>>> df1
0 1 2 3 4
0 g 4 2 1 1
1 1 6 5 4 abc
But now suppose that I have an empty DataFrame and I try to append h to it:
但是现在假设我有一个空的 DataFrame 并且我尝试将 h 附加到它:
df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)
This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.
这不起作用。我认为问题出在倒数第二行代码中。我需要以某种方式定义空白 DataFrame 以具有正确的列数。
By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.
顺便说一下,我尝试这样做的原因是我正在使用 requests+BeautifulSoup 从互联网上抓取文本,我正在处理它并尝试一次将其写入 DataFrame 一行。
回答by EdChum
So if you don't pass an empty list to the DataFrame constructor then it works:
因此,如果您不将空列表传递给 DataFrame 构造函数,则它可以工作:
In [16]:
df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:
0 1 2 3 4
0 g 4 2 1 1
[1 rows x 5 columns]
The difference between the two constructor approaches appears to be that the index dtypesare set differently, with an empty list it is an Int64with nothing it is an object:
两种构造方法之间的区别似乎是索引dtypes的设置不同,空列表是 an Int64,没有任何内容object:
In [21]:
df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64
Unclear to me why the above should affect the behaviour (I'm guessing here).
我不清楚为什么上述会影响行为(我在这里猜测)。
UPDATE
更新
After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0as your original code works fine:
在重新审视这个之后,我可以确认这在我看来是 Pandas 版本中的一个错误,0.12.0因为您的原始代码工作正常:
In [13]:
import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)
Out[13]:
0 1 2 3 4
0 g 4 2 1 1
[1 rows x 5 columns]
I am running pandas 0.13.1and numpy 1.8.164-bit using python 3.3.5.0but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.
我正在使用 python运行 Pandas0.13.1和 numpy 1.8.164 位,3.3.5.0但我认为问题是 Pandas,但我会升级 Pandas 和 numpy 以确保安全,我认为这不是 32 位与 64 位 python 问题。

