pandas 在现有数据框中添加多行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30081216/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:19:16  来源:igfitidea点击:

Adding multiple rows in an existing dataframe

pythonpandasipython

提问by pythonlearner

Hi I'm learning data science and am trying to make a big data company list from a list with companies in various industries.

嗨,我正在学习数据科学,并且正在尝试从各个行业的公司列表中制作大数据公司列表。

I have a list of row numbers for big data companies, named comp_rows. Now, I'm trying to make a new dataframe with the filtered companies based on the row numbers. Here I need to add rows to an existing dataframe but I got an error. Could someone help?

我有一个大数据公司的行号列表,名为 comp_rows。现在,我正在尝试根据行号使用过滤后的公司创建一个新的数据框。在这里,我需要向现有数据帧添加行,但出现错误。有人可以帮忙吗?

my datarame looks like this.

我的数据帧看起来像这样。

    company_url company tag_line    product data
0   https://angel.co/billguard  BillGuard   The fastest smartest way to track your spendin...   BillGuard is a personal finance security app t...   New York City · Financial Services · Security ...
1   https://angel.co/tradesparq Tradesparq  The world's largest social network for global ...   Tradesparq is Alibaba.com meets LinkedIn. Trad...   Shanghai · B2B · Marketplaces · Big Data · Soc...
2   https://angel.co/sidewalk   Sidewalk    Hoovers (D&B) for the social era    Sidewalk helps companies close more sales to s...   New York City · Lead Generation · Big Data · S...
3   https://angel.co/pangia Pangia  The Internet of Things Platform: Big data mana...   We collect and manage data from sensors embedd...   San Francisco · SaaS · Clean Technology · Big ...
4   https://angel.co/thinknum   Thinknum    Financial Data Analysis Thinknum is a powerful web platform to value c...   New York City · Enterprise Software · Financia...

My code is below:

我的代码如下:

bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data'])

for count, item in enumerate(data.iterrows()):
    for number in comp_rows:
        if int(count) == int(number):
            bigdata_comp.append(item)

Error:

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-234-1e4ea9bd9faa> in <module>()
      4     for number in comp_rows:
      5         if int(count) == int(number):
----> 6             bigdata_comp.append(item)
      7 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
   3814         from pandas.tools.merge import concat
   3815         if isinstance(other, (list, tuple)):
-> 3816             to_concat = [self] + other
   3817         else:
   3818             to_concat = [self, other]

TypeError: can only concatenate list (not "tuple") to list

回答by fixxxer

It seems you are trying to filter out an existing dataframe based on indices (which are stored in your variable called comp_rows). You can do this without using loops by using loc, like shown below:

您似乎正在尝试根据索引(存储在名为 的变量中comp_rows)过滤掉现有的数据框。您可以通过 using 在不使用循环的情况下执行此操作loc,如下所示:

In [1161]: df1.head()
Out[1161]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255

We will get the rows with indices 'a','b' and 'c', for all columns:

我们将获得所有列的索引为 'a'、'b' 和 'c' 的行:

In [1162]: df1.loc[['a','b','c'],:]
Out[1162]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139

You can read more about it here.

您可以在此处阅读更多相关信息

About your code:

关于你的代码:

1. You do not need to iterate through a list to see if an item is present in it: Use the inoperator. For example -

1. 您不需要遍历列表来查看其中是否存在项目:使用in运算符。例如 -

In [1199]: 1 in [1,2,3,4,5]
Out[1199]: True

so, instead of

所以,而不是

for number in comp_rows:
        if int(count) == int(number):

do this

做这个

if number in comp_rows

2. pandas appenddoes not happen in-place. You have to store the result into another variable. See here.

2. Pandasappend不会就地发生。您必须将结果存储到另一个变量中。见这里

3.

3.

Append one row at a time is a slow way to do what you want. Instead, save each row that you want to add into a list of lists, make a dataframe of it and append it to the target dataframe in one-go. Something like this..

一次追加一行是一种缓慢的方式来做你想做的事。相反,将要添加的每一行保存到列表列表中,为其创建一个数据框,然后一次性将其附加到目标数据框。像这样的东西..

temp = []
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()):
    # if count in comp_rows:
    temp.append( list(item[1]))

## -- End pasted text --

In [1233]: temp
Out[1233]: 
[[1.9350940285526077,
  -0.16057932637141861,
  -0.17345827000000605,
  0.43326722021644282],
 [1.66963201034217,
  -1.1308932586268696,
  -1.2103527446031515,
  0.82213753819050794],
 [0.49462218161377397,
  1.0140133740187862,
  0.2156547595968879,
  1.0451391564351897]]

In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D']))

In [1237]: df2
Out[1237]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255
f -0.872135  2.938475 -0.099367 -1.472519
0  1.935094 -0.160579 -0.173458  0.433267
1  1.669632 -1.130893 -1.210353  0.822138
2  0.494622  1.014013  0.215655  1.045139

回答by Kathirmani Sukumar

Replace the following line:

替换以下行:

for count, item in enumerate(data.iterrows()):

by

经过

for count, (index, item) in enumerate(data.iterrows()):

or even simply as

甚至简单地作为

for count, item in data.iterrows():