pandas 熊猫平均函数的 NaN 结果

Question

提问by abuteau

I try to have the mean of a row in my Python DataFrame, but I get a NaN return for every row. Why I get this result and how can I solve that ?

我尝试在我的 Python DataFrame 中获得一行的平均值，但我得到每一行的 NaN 返回值。为什么我得到这个结果，我该如何解决？

Goog key ratios : http://www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234

Goog 关键比率：http: //www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path, skiprows = 2, names = ['Y0','Y1','Y2','Y3','Y4','Y5','Y6','Y7','Y8','Y9','Y10'], index_col = 0)
noTTM = data.iloc[:,0:10]
print(data.mean(1))
grossMargin = noTTM[2:3]
print(grossMargin.mean(1))

Return:

返回：

Gross Margin %   NaN
dtype: float64

Regards,

问候，

Answer 1

回答by Phillip Cloud

The reason you have a bunch of nanvalues is because you don't have homogeneous column types. So, for example when you try to average across the columns it doesn't make sense because pandas.read_csvwill only convert into a numeric column if it makes sense, e.g., you don't have string dates or other text in the same column as numbers.

您有一堆nan值的原因是因为您没有同类的列类型。因此，例如，当您尝试跨列求平均值时，这是没有意义的，因为pandas.read_csv只有在有意义的情况下才会转换为数字列，例如，您在同一列中没有字符串日期或其他文本作为数字。

I recommend also that you do a simple df.head()to check your data before doing even simple analyses. It will save you a lot of time in the future when you're wondering why your output is "weird".

我还建议df.head()您在进行简单的分析之前先检查一下您的数据。当您想知道为什么您的输出“奇怪”时，它将为您节省大量时间。

That said, you can do the following to convert things to numeric values, but this isn't necessarily guaranteed to make sense:

也就是说，您可以执行以下操作将事物转换为数值，但这不一定保证有意义：

In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])

In [36]: df.head() # not homogeneously typed columns
Out[36]:
                               Y0       Y1       Y2       Y3       Y4  \
NaN                       2003-12  2004-12  2005-12  2006-12  2007-12
Revenue USD Mil             1,466    3,189    6,139   10,605   16,594
Gross Margin %               57.3     54.3     58.1     60.2     59.9
Operating Income USD Mil      342      640    2,017    3,550    5,084
Operating Margin %           23.4     20.1     32.9     33.5     30.6

                               Y5       Y6       Y7       Y8       Y9     Y10
NaN                       2008-12  2009-12  2010-12  2011-12  2012-12     TTM
Revenue USD Mil            21,796   23,651   29,321   37,905   50,175  55,797
Gross Margin %               60.4     62.6     64.5     65.2     58.9    56.7
Operating Income USD Mil    6,632    8,312   10,381   11,742   12,760  12,734
Operating Margin %           30.4     35.1     35.4     31.0     25.4    22.8

In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
                             Y0     Y1    Y2    Y3    Y4    Y5    Y6    Y7    Y8    Y9   Y10
NaN                         NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Revenue USD Mil             NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Gross Margin %             57.3   54.3  58.1  60.2  59.9  60.4  62.6  64.5  65.2  58.9  56.7
Operating Income USD Mil  342.0  640.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Operating Margin %         23.4   20.1  32.9  33.5  30.6  30.4  35.1  35.4  31.0  25.4  22.8

pandas 熊猫平均函数的 NaN 结果

提问by abuteau

回答by Phillip Cloud

相关推荐

最近更新

标签

pandas 熊猫平均函数的 NaN 结果

提问by abuteau

回答by Phillip Cloud

相关推荐

pandas 为什么在使用 matplotlib 绘制熊猫数据框时出现 KeyError？

使用 Python Pandas 对 csv 文件中的行进行排序

在 Matplotlib 图中注释来自 Pandas 数据框的点

如何并行执行对 Pandas 数据帧的多个 SQL 查询

相关推荐

最近更新

标签