pandas 如何使用pandas/python处理excel文件头
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22010670/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to process excel file headers using pandas/python
提问by felix
I am trying to read https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsxusing pandas.
我正在尝试使用Pandas阅读https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsx。
Having saved it my script is
保存后我的脚本是
import sys
import pandas as pd
inputfile = sys.argv[1]
xl = pd.ExcelFile(inputfile)
#    print xl.sheet_names
df = xl.parse(xl.sheet_names[0])
print df.head()
However this does not seem to process the headers properly as it gives
但是,这似乎没有正确处理标头,因为它给出了
  GCSE and IGCSE1 results2,3 in selected subjects4 of pupils at the end of key stage 4 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10
0                              Year: 2010/11 (Final)                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
1                                  Coverage: England                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
2                                                NaN                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
3  1. Includes International GCSE, Cambridge Inte...                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
4  2. Includes attempts and achievements by these...                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
All of this should be treated as comments.
所有这些都应该被视为评论。
If you load the spreadsheet into libreoffice, for example, you can see that the column headings are correctly parsed and appear in row 15 with drop down menus to let you select the items you want.
例如,如果您将电子表格加载到 libreoffice 中,您可以看到列标题已正确解析并出现在第 15 行,并带有下拉菜单,让您可以选择所需的项目。
How can you get pandas to automatically detect where the column headers are just as libreoffice does?
如何让 Pandas 像 libreoffice 一样自动检测列标题的位置?
回答by DSM
pandasis (are?) processing the file correctly, and exactly the way you asked it (them?) to.  You didn't specify a headervalue, which means that it defaults to picking up the column names from the 0th row.  The first few rows of cells aren't comments in some fundamental way, they're just not cells you're interested in.
pandas是(是?)正确处理文件,并且完全按照您要求(他们?)的方式处理。您没有指定header值,这意味着它默认从第 0 行获取列名。前几行单元格不是某种基本方式的注释,它们只是不是您感兴趣的单元格。
Simply tell parseyou want to skip some rows:
只需告诉parse您要跳过某些行:
>>> xl = pd.ExcelFile("GCSE IGCSE results v3.xlsx")
>>> df = xl.parse(xl.sheet_names[0], skiprows=14)
>>> df.columns
Index([u'Local Authority Number', u'Local Authority Name', u'Local Authority Establishment Number', u'Unique Reference Number', u'School Name', u'Town', u'Number of pupils at the end of key stage 4', u'Number of pupils attempting a GCSE or an IGCSE', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-G', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-A', u'Number of students achieving 5 A*-A grades or more at GCSE or IGCSE'], dtype='object')
>>> df.head()
   Local Authority Number Local Authority Name  \
0                     201       City of london   
1                     201       City of london   
2                     202               Camden   
3                     202               Camden   
4                     202               Camden   
   Local Authority Establishment Number  Unique Reference Number  \
0                               2016005                   100001   
1                               2016007                   100003   
2                               2024104                   100049   
3                               2024166                   100050   
4                               2024196                   100051   
                       School Name    Town  \
0  City of London School for Girls  London   
1            City of London School  London   
2                Haverstock School  London   
3           Parliament Hill School  London   
4               Regent High School  London   
  Number of pupils at the end of key stage 4  \
0                                        105   
1                                        140   
2                                        200   
3                                        172   
4                                        174   
  Number of pupils attempting a GCSE or an IGCSE  \
0                                            104   
1                                            140   
2                                            194   
3                                            169   
4                                            171   
  Number of students achieving 8 or more GCSE or IGCSE passes at A*-G  \
0                                                100                    
1                                                108                    
2                                               SUPP                    
3                                                 22                    
4                                                  0                    
  Number of students achieving 8 or more GCSE or IGCSE passes at A*-A  \
0                                                 87                    
1                                                 75                    
2                                                  0                    
3                                                  7                    
4                                                  0                    
  Number of students achieving 5 A*-A grades or more at GCSE or IGCSE  
0                                                100                   
1                                                123                   
2                                                  0                   
3                                                 34                   
4                                               SUPP                    
[5 rows x 11 columns]

