pandas 使用 NaN 添加两个系列

Question

提问by BubbleGuppies

I'm working through the "Python For Data Analysis" and I don't understand a particular functionality. Adding two pandas series objects will automatically align the indexed data but if one object does not contain that index it is returned as NaN. For example from book:

我正在研究“用于数据分析的 Python”，但我不了解特定功能。添加两个 Pandas 系列对象将自动对齐索引数据，但如果一个对象不包含该索引，它将作为 NaN 返回。例如从书中：

a = Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah'])
b = Series([NaN,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio'])

Result:

结果：

    In [63]: a
    Out[63]: Ohio          35000
             Texas         71000
             Oregon        16000
             Utah           5000
    In [64]: b
    Out[64]: California      NaN
             Texas         71000
             Oregon        16000
             Ohio          35000

When I add them together I get this...

当我将它们加在一起时，我得到了这个......

    In [65]: a+b
    Out[65]: California       NaN
             Ohio           70000
             Oregon         32000
             Texas         142000
             Utah             NaN

So why is the Utah value NaN and not 500? It seems that 500+NaN=500. What gives? I'm missing something, please explain.

那么为什么犹他州的值为 NaN 而不是 500？好像是500+NaN=500。是什么赋予了？我错过了一些东西，请解释一下。

Update:

更新：

    In [92]: # fill NaN with zero
             b = b.fillna(0)
             b
    Out[92]: California        0
             Texas         71000
             Oregon        16000
             Ohio          35000

    In [93]: a
    Out[93]: Ohio      35000
             Texas     71000
             Oregon    16000
             Utah       5000

    In [94]: # a is still good
             a+b
    Out[94]: California       NaN
             Ohio           70000
             Oregon         32000
             Texas         142000 
             Utah             NaN

Answer 1

回答by Dan Allan

Pandas does not assume that 500+NaN=500, but it is easy to ask it to do that: a.add(b, fill_value=0)

Pandas 不假设 500+NaN=500，但很容易让它这样做： a.add(b, fill_value=0)

Answer 2

回答by BrenBarn

The default approach is to assume that any computation involving NaN gives NaN as the result. Anything plus NaN is NaN, anything divided by NaN is NaN, etc. If you want to fill the NaN with some value, you have to do that explicitly (as Dan Allan showed in his answer).

默认方法是假设任何涉及 NaN 的计算都会给出 NaN 作为结果。任何加 NaN 的东西都是 NaN，任何除以 NaN 的东西都是 NaN，等等。如果你想用某个值填充 NaN，你必须明确地这样做（正如 Dan Allan 在他的回答中所示）。

Answer 3

回答by Anton vBR

It makes more sense to use pd.concat()as it can accept more columns.

使用更有意义，pd.concat()因为它可以接受更多列。

import pandas as pd
import numpy as np

a = pd.Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah'])
b = pd.Series([np.nan,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio'])

pd.concat((a,b), axis=1).sum(1, min_count=1)

Output:

输出：

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah            5000.0
dtype: float64

Or with 3 series:

或与 3 系列：

import pandas as pd
import numpy as np

a = pd.Series([1, np.NaN, 4, 5])
b = pd.Series([3, np.NaN, 5, np.NaN])
c = pd.Series([np.NaN,np.NaN,np.NaN,np.NaN])

print(pd.concat((a,b,c), axis=1).sum(1, min_count=1))

#0    4.0
#1    NaN
#2    9.0
#3    5.0
#dtype: float64

pandas 使用 NaN 添加两个系列

提问by BubbleGuppies

回答by Dan Allan

回答by BrenBarn

回答by Anton vBR

相关推荐

最近更新

标签

pandas 使用 NaN 添加两个系列

提问by BubbleGuppies

回答by Dan Allan

回答by BrenBarn

回答by Anton vBR

相关推荐

如何在非简单标准上执行 DataFrames 与 Pandas 的内部或外部连接

从 URL 到“pandas.DataFrame”的 Excel 工作簿表

pandas 中的频率表（如 R 中的 plyr）

pandas 如何从python中的csv读取编码字符串的数据帧

相关推荐

最近更新

标签