pandas 在现有数据框中添加多行

Question

提问by pythonlearner

Hi I'm learning data science and am trying to make a big data company list from a list with companies in various industries.

嗨，我正在学习数据科学，并且正在尝试从各个行业的公司列表中制作大数据公司列表。

I have a list of row numbers for big data companies, named comp_rows. Now, I'm trying to make a new dataframe with the filtered companies based on the row numbers. Here I need to add rows to an existing dataframe but I got an error. Could someone help?

我有一个大数据公司的行号列表，名为 comp_rows。现在，我正在尝试根据行号使用过滤后的公司创建一个新的数据框。在这里，我需要向现有数据帧添加行，但出现错误。有人可以帮忙吗？

my datarame looks like this.

我的数据帧看起来像这样。

    company_url company tag_line    product data
0   https://angel.co/billguard  BillGuard   The fastest smartest way to track your spendin...   BillGuard is a personal finance security app t...   New York City · Financial Services · Security ...
1   https://angel.co/tradesparq Tradesparq  The world's largest social network for global ...   Tradesparq is Alibaba.com meets LinkedIn. Trad...   Shanghai · B2B · Marketplaces · Big Data · Soc...
2   https://angel.co/sidewalk   Sidewalk    Hoovers (D&B) for the social era    Sidewalk helps companies close more sales to s...   New York City · Lead Generation · Big Data · S...
3   https://angel.co/pangia Pangia  The Internet of Things Platform: Big data mana...   We collect and manage data from sensors embedd...   San Francisco · SaaS · Clean Technology · Big ...
4   https://angel.co/thinknum   Thinknum    Financial Data Analysis Thinknum is a powerful web platform to value c...   New York City · Enterprise Software · Financia...

My code is below:

我的代码如下：

bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data'])

for count, item in enumerate(data.iterrows()):
    for number in comp_rows:
        if int(count) == int(number):
            bigdata_comp.append(item)

Error:

错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-234-1e4ea9bd9faa> in <module>()
      4     for number in comp_rows:
      5         if int(count) == int(number):
----> 6             bigdata_comp.append(item)
      7 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
   3814         from pandas.tools.merge import concat
   3815         if isinstance(other, (list, tuple)):
-> 3816             to_concat = [self] + other
   3817         else:
   3818             to_concat = [self, other]

TypeError: can only concatenate list (not "tuple") to list

Answer 1

回答by fixxxer

It seems you are trying to filter out an existing dataframe based on indices (which are stored in your variable called comp_rows). You can do this without using loops by using loc, like shown below:

您似乎正在尝试根据索引（存储在名为的变量中comp_rows）过滤掉现有的数据框。您可以通过 using 在不使用循环的情况下执行此操作loc，如下所示：

In [1161]: df1.head()
Out[1161]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255

We will get the rows with indices 'a','b' and 'c', for all columns:

我们将获得所有列的索引为 'a'、'b' 和 'c' 的行：

In [1162]: df1.loc[['a','b','c'],:]
Out[1162]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139

You can read more about it here.

您可以在此处阅读更多相关信息。

About your code:

关于你的代码：

1. You do not need to iterate through a list to see if an item is present in it: Use the inoperator. For example -

1. 您不需要遍历列表来查看其中是否存在项目：使用in运算符。例如 -

In [1199]: 1 in [1,2,3,4,5]
Out[1199]: True

so, instead of

所以，而不是

for number in comp_rows:
        if int(count) == int(number):

do this

做这个

if number in comp_rows

2. pandas appenddoes not happen in-place. You have to store the result into another variable. See here.

2. Pandasappend不会就地发生。您必须将结果存储到另一个变量中。见这里。

3.

Append one row at a time is a slow way to do what you want. Instead, save each row that you want to add into a list of lists, make a dataframe of it and append it to the target dataframe in one-go. Something like this..

一次追加一行是一种缓慢的方式来做你想做的事。相反，将要添加的每一行保存到列表列表中，为其创建一个数据框，然后一次性将其附加到目标数据框。像这样的东西..

temp = []
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()):
    # if count in comp_rows:
    temp.append( list(item[1]))

## -- End pasted text --

In [1233]: temp
Out[1233]: 
[[1.9350940285526077,
  -0.16057932637141861,
  -0.17345827000000605,
  0.43326722021644282],
 [1.66963201034217,
  -1.1308932586268696,
  -1.2103527446031515,
  0.82213753819050794],
 [0.49462218161377397,
  1.0140133740187862,
  0.2156547595968879,
  1.0451391564351897]]

In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D']))

In [1237]: df2
Out[1237]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255
f -0.872135  2.938475 -0.099367 -1.472519
0  1.935094 -0.160579 -0.173458  0.433267
1  1.669632 -1.130893 -1.210353  0.822138
2  0.494622  1.014013  0.215655  1.045139

Answer 2

回答by Kathirmani Sukumar

Replace the following line:

替换以下行：

for count, item in enumerate(data.iterrows()):

by

经过

for count, (index, item) in enumerate(data.iterrows()):

or even simply as

甚至简单地作为

for count, item in data.iterrows():

pandas 在现有数据框中添加多行

提问by pythonlearner

回答by fixxxer

回答by Kathirmani Sukumar

相关推荐

最近更新

标签

pandas 在现有数据框中添加多行

提问by pythonlearner

回答by fixxxer

回答by Kathirmani Sukumar

相关推荐

Python 列表到 Pandas 数据框

pandas 在 for 循环中构建熊猫数据框

pandas 从熊猫日期列中减去当前时间

pandas 即使在使用 .loc 之后，也会尝试在来自 DataFrame 警告的切片副本上设置值

相关推荐

最近更新

标签