Pandas Read Excel:如何按列号和行号访问给定单元格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37236743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:14:28  来源:igfitidea点击:

Pandas Read Excel: how to access a given cell by column and row numbers

pythonexcelpandas

提问by pHorseSpec

Using the Pandas module and the read_excel function, could I give each column I read in from an excel file a number assignment as a column header, so instead of using g_int_c=str(df1['Unnamed: 1'][6])to refer to a piece of the data in the excel file, I could use g_int_c=str(df1[1][6])?

使用 Pandas 模块和 read_excel 函数,我是否可以为我从 excel 文件中读取的每一列分配一个编号作为列标题,这样g_int_c=str(df1['Unnamed: 1'][6])我可以使用g_int_c=str(df1[1][6])?

Example code is below:

示例代码如下:

import pandas as pd

with pd.ExcelFile(inputFile,
                      sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
        df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
        df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
        df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
        df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])

回答by unutbu

To obtain nice column names instead of defaults like 'Unnamed: 1'use the namesparameter of pd.read_excel. Mutatis mutandis, try replacing

为了获得很好的列名,而不是默认喜欢'Unnamed: 1'使用names的参数pd.read_excel。比照,尝试更换

with pd.ExcelFile(inputFile,
                  sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
    df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
    df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
    df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
    df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])

with

sheets = ['pnl1 Data','pnl2 Data','pnl3 Data','pnl4 Data']
df = pd.read_excel(inputFile, sheetname=sheets, skiprows=9, parse_cols="B:H", 
                   names=list('BCDEFG'))
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}

This will make dfa dict, whose keys are sheet numbers and whose values are DataFrames. The DataFrames will have colum names Bthrough G, roughly like the original Excel file.

这将生成df一个字典,其键是工作表编号,其值是数据帧。数据帧的列名将B通过G,大致类似于原始 Excel 文件。

Thus, instead of referring to numbered variables df1, ..., df4(generally, a bad idea), you'll have all the DataFrames in the dict dfand will be able to access them by numeric indexing: df[1], ..., df[4]. Sheet pnl3 Data, for example, would be accessed as df[3].

因此,不是引用编号变量df1, ..., df4(通常,这是一个坏主意),您将拥有 dict 中的所有数据帧,df并且将能够通过数字索引访问它们:df[1], ..., df[4]pnl3 Data例如,Sheet将被访问为df[3]

To access the seventh row, Bcolumn value of sheet 'pnl1 Data'of you could then use:

要访问第七行,您可以使用B工作表'pnl1 Data'的列值:

g_int_c = str(df[1].loc[6, 'B'])


For example,

例如,

import pandas as pd
try: from cStringIO import StringIO         # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
          ,,,
          0,1,2,3
          1,4,5,6
          7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
          ,,,
          0,NULL,2,3
          1,4,NULL,NULL""")), converters={i:str for i in range(4)})

sheets = ['pnl1 Data','pnl2 Data']

writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
    print(df)
    #   Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
    # 0          0       NULL          2          3
    # 1          1          4       NULL       NULL
    df.to_excel(writer, sheet)
writer.save()

df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}

for key, dfi in df.items():
    print(dfi)
    #    A  B  C   D
    # 0  0  1  2   3
    # 1  1  4  5   6
    # 2  7  8  9  10
    #    A    B    C    D
    # 0  0  NaN  2.0  3.0
    # 1  1  4.0  NaN  NaN

print(df[1].loc[1, 'B'])
# 4

回答by user6275647

From the looks of your question, this isn't about assigning number values to columns upon import, but instead about how to access a given cell of a table by column and row numbers, which is a question specifically about how to index or slice a dataframe by integer.

从您的问题的外观来看,这不是关于在导入时为列分配数值,而是关于如何按列号和行号访问表的给定单元格,这是一个专门关于如何索引或切片的问题按整数的数据帧。

In your example, you mentioned wanting to refer to df1[1][6]. You can do this by using .iloc.

在您的示例中,您提到要引用df1[1][6]. 您可以使用.iloc.

# spin up a df
df = pd.DataFrame(np.random.randint(0,10,size=(7, 7)), columns=list('ABCDEFG'))
print df

Output:

输出:

   A  B  C  D  E  F  G
0  0  7  7  8  8  2  2
1  8  2  9  1  6  8  1
2  5  3  5  5  9  2  7
3  7  4  2  1  1  5  0
4  0  4  4  1  9  7  1
5  4  2  7  7  9  7  2
6  0  6  7  8  1  4  1

Now use .ilocto index by integer:

现在用于.iloc按整数索引:

df.iloc[1,6] 

Output:

输出:

1

To return to your code above, you could most likely change it to the following:

要返回上面的代码,您很可能将其更改为以下内容:

g_int_c=str(df.iloc[1,6])

For general references, here's the documentation on indexing and slicing dataframes: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer

对于一般参考,这里是索引和切片数据帧的文档:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer

And this Q&A might be helpful: How to get column by number in Pandas?

这个问答可能会有所帮助:如何在 Pandas 中按数字获取列?

回答by pHorseSpec

header=None,names=[0,1,2,3,4,5,6]worked.

header=None,names=[0,1,2,3,4,5,6]工作。

with pd.ExcelFile(inputFile,
                      sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
        df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])#assign column headers
        df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
        df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
        df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])