将 Pandas 数据框列添加到新数据框

Question

提问by FooBar

Using Pandas, I have some data that I want to add to my ``results'' dataframe. That is, I have

使用 Pandas，我有一些数据要添加到我的“结果”数据框中。也就是说，我有

naics = someData

Which can look like this

看起来像这样

   indnaics  ind1990
89    81393      873

however, it can have more than one row. I want to add these to my resultsdataframe, together with a variable called year. In case there is more than one row, it should be the same yearvalue for all rows. This is what I am trying so far

但是，它可以有多于一行。我想将这些添加到我的results数据框中，以及一个名为 year 的变量。如果有多于一行，则year所有行的值应该相同。这是我目前正在尝试的

for job in jobs:
    df2 =  iGetThisFromJob()
    years = df2.year.unique()
    naics = iGetThisFromJob()
    if len(naics) == 0:
        continue

    for year in years:
        wages = df2.incwage[df2.year == year]
    # Add all the data to results, this is how I try it
        rows = pd.DataFrame([dict(year=year, incwage=mean(wages), )])
    # I also want to add the column indnaics from my naics 
        rows['naics'] = naics.indnaics
        results = results.append(rows, ignore_index=True)

However, despite naics.indnaics being full, I cannot add it this way to the rows object.

但是，尽管 naics.indnaics 已满，但我无法以这种方式将其添加到行对象。

naics.indnaics

Out[1052]: 
89    81393

rows['naics'] = naics.indnaics rows

行['naics'] = naics.indnaics 行

Out[1051]: 
        incwage  year naics
0  45853.061224  2002   NaN

If there is anything else that is not nice with my code, please tell. I'm only beginning to learn pandas.

如果我的代码还有什么不好的地方，请告诉。我才刚刚开始学习Pandas。

Thanks!

谢谢！

/edit Expected output:

/edit 预期输出：

        incwage  year   naics
0  45853.061224  2002   81393
0  45853.061224  2002   12312

/edit Suggested solution:

/edit 建议的解决方案：

index = arange(0, len(naics))
columns = ['year', 'incwage', 'naics']
rows = pd.DataFrame(index=index, columns=columns)
rows.year = year
rows.incwage = mean(wages)
rows.naics = naics.indnaics.values

Answer 1

回答by joris

The reason you get a NaN value, is because the index does not match (in rows['naics'] = naics.indnaicsrowshas index 0, while naics.indnaicshas index 89), and assigning the value will try to align the indices.

获得 NaN 值的原因是索引不匹配（rows['naics'] = naics.indnaicsrows索引为 0，naics.indnaics索引为 89），分配值将尝试对齐索引。

You could for example solve that by taking only the value (by eg naics.indnaics.values). With a toy example:

例如，您可以通过仅取值（例如naics.indnaics.values）来解决该问题。以玩具为例：

In [30]: df = pd.DataFrame({'A':[0], 'B':[1]})
In [31]: df
Out[31]: 
   A  B
0  0  1


In [32]: s = pd.Series([2], index=[83])
In [33]: s
Out[33]: 
83    2
dtype: int64

In [35]: df['new_column'] = s
In [36]: df
Out[36]: 
   A  B  new_column
0  0  1         NaN

In [37]: df['new_column'] = s.values
In [38]: df
Out[38]: 
   A  B  new_column
0  0  1           2

If you want to add the series with possibly more values, there are a couple of options. I think of:

如果您想添加具有更多值的系列，有几个选项。我想：

Eg reindexing the dataframe first to the length of the series:

例如，首先将数据帧重新索引到系列的长度：

In [75]: s
Out[75]: 
83    2
84    4
dtype: int64

In [76]: df
Out[76]: 
   A  B
0  0  1

In [77]: df = df.reindex(np.zeros(len(s)))
In [78]: df
Out[78]: 
   A  B
0  0  1
0  0  1

In [79]: df['new_column'] = s.values

In [80]: df
Out[80]: 
   A  B  new_column
0  0  1           2
0  0  1           4

or the other way around, add the dataframe to the series (that you first convert to a dataframe):

或者反过来，将数据帧添加到系列中（首先转换为数据帧）：

In [90]: ss = s.to_frame().set_index(np.array([0,0]))
In [91]: ss[df.columns] = df
In [92]: ss
Out[92]: 
   0  A  B
0  2  0  1
0  4  0  1

[2 rows x 3 columns]

将 Pandas 数据框列添加到新数据框

提问by FooBar

回答by joris

相关推荐

最近更新

标签

将 Pandas 数据框列添加到新数据框

提问by FooBar

回答by joris

相关推荐

pandas 如何有效地删除python中数据帧或csv文件中的所有重复项？

pandas IPython - 有打印默认打印头和尾长变量

Pandas 错误“***ValueError：长度不匹配：预期轴有 0 个元素，新值有……”

根据列名从另一个 DataFrame 填充 Pandas DataFrame

相关推荐

最近更新

标签