pandas 熊猫平均函数的 NaN 结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18173873/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NaN results with pandas mean function
提问by abuteau
I try to have the mean of a row in my Python DataFrame, but I get a NaN return for every row. Why I get this result and how can I solve that ?
我尝试在我的 Python DataFrame 中获得一行的平均值,但我得到每一行的 NaN 返回值。为什么我得到这个结果,我该如何解决?
Goog key ratios : http://www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234
Goog 关键比率:http: //www.gogofile.com/Default.aspx?p=sc&ID=635118193040317500_6234
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path, skiprows = 2, names = ['Y0','Y1','Y2','Y3','Y4','Y5','Y6','Y7','Y8','Y9','Y10'], index_col = 0)
noTTM = data.iloc[:,0:10]
print(data.mean(1))
grossMargin = noTTM[2:3]
print(grossMargin.mean(1))
Return:
返回:
Gross Margin %   NaN
dtype: float64
Regards,
问候,
回答by Phillip Cloud
The reason you have a bunch of nanvalues is because you don't have homogeneous column types. So, for example when you try to average across the columns it doesn't make sense because pandas.read_csvwill only convert into a numeric column if it makes sense, e.g., you don't have string dates or other text in the same column as numbers.
您有一堆nan值的原因是因为您没有同类的列类型。因此,例如,当您尝试跨列求平均值时,这是没有意义的,因为pandas.read_csv只有在有意义的情况下才会转换为数字列,例如,您在同一列中没有字符串日期或其他文本作为数字。
I recommend also that you do a simple df.head()to check your data before doing even simple analyses. It will save you a lot of time in the future when you're wondering why your output is "weird".
我还建议df.head()您在进行简单的分析之前先检查一下您的数据。当您想知道为什么您的输出“奇怪”时,它将为您节省大量时间。
That said, you can do the following to convert things to numeric values, but this isn't necessarily guaranteed to make sense:
也就是说,您可以执行以下操作将事物转换为数值,但这不一定保证有意义:
In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])
In [36]: df.head() # not homogeneously typed columns
Out[36]:
                               Y0       Y1       Y2       Y3       Y4  \
NaN                       2003-12  2004-12  2005-12  2006-12  2007-12
Revenue USD Mil             1,466    3,189    6,139   10,605   16,594
Gross Margin %               57.3     54.3     58.1     60.2     59.9
Operating Income USD Mil      342      640    2,017    3,550    5,084
Operating Margin %           23.4     20.1     32.9     33.5     30.6
                               Y5       Y6       Y7       Y8       Y9     Y10
NaN                       2008-12  2009-12  2010-12  2011-12  2012-12     TTM
Revenue USD Mil            21,796   23,651   29,321   37,905   50,175  55,797
Gross Margin %               60.4     62.6     64.5     65.2     58.9    56.7
Operating Income USD Mil    6,632    8,312   10,381   11,742   12,760  12,734
Operating Margin %           30.4     35.1     35.4     31.0     25.4    22.8
In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
                             Y0     Y1    Y2    Y3    Y4    Y5    Y6    Y7    Y8    Y9   Y10
NaN                         NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Revenue USD Mil             NaN    NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Gross Margin %             57.3   54.3  58.1  60.2  59.9  60.4  62.6  64.5  65.2  58.9  56.7
Operating Income USD Mil  342.0  640.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
Operating Margin %         23.4   20.1  32.9  33.5  30.6  30.4  35.1  35.4  31.0  25.4  22.8

