Pandas Read Excel:如何按列号和行号访问给定单元格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37236743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Read Excel: how to access a given cell by column and row numbers
提问by pHorseSpec
Using the Pandas module and the read_excel function, could I give each column I read in from an excel file a number assignment as a column header, so instead of using g_int_c=str(df1['Unnamed: 1'][6])
to refer to a piece of the data in the excel file, I could use g_int_c=str(df1[1][6])
?
使用 Pandas 模块和 read_excel 函数,我是否可以为我从 excel 文件中读取的每一列分配一个编号作为列标题,这样g_int_c=str(df1['Unnamed: 1'][6])
我可以使用g_int_c=str(df1[1][6])
?
Example code is below:
示例代码如下:
import pandas as pd
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
回答by unutbu
To obtain nice column names instead of defaults like 'Unnamed: 1'
use the names
parameter of pd.read_excel
. Mutatis mutandis, try replacing
为了获得很好的列名,而不是默认喜欢'Unnamed: 1'
使用names
的参数pd.read_excel
。比照,尝试更换
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
with
和
sheets = ['pnl1 Data','pnl2 Data','pnl3 Data','pnl4 Data']
df = pd.read_excel(inputFile, sheetname=sheets, skiprows=9, parse_cols="B:H",
names=list('BCDEFG'))
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
This will make df
a dict, whose keys are sheet numbers and whose values are
DataFrames. The DataFrames will have colum names B
through G
, roughly like
the original Excel file.
这将生成df
一个字典,其键是工作表编号,其值是数据帧。数据帧的列名将B
通过G
,大致类似于原始 Excel 文件。
Thus, instead of referring to numbered variables df1
, ..., df4
(generally, a bad idea), you'll have all the DataFrames in the dict df
and will be able to access them by numeric indexing: df[1]
, ..., df[4]
. Sheet pnl3 Data
, for example, would be accessed as df[3]
.
因此,不是引用编号变量df1
, ..., df4
(通常,这是一个坏主意),您将拥有 dict 中的所有数据帧,df
并且将能够通过数字索引访问它们:df[1]
, ..., df[4]
。pnl3 Data
例如,Sheet将被访问为df[3]
。
To access the seventh row, B
column value of sheet 'pnl1 Data'
of you could then use:
要访问第七行,您可以使用B
工作表'pnl1 Data'
的列值:
g_int_c = str(df[1].loc[6, 'B'])
For example,
例如,
import pandas as pd
try: from cStringIO import StringIO # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,1,2,3
1,4,5,6
7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,NULL,2,3
1,4,NULL,NULL""")), converters={i:str for i in range(4)})
sheets = ['pnl1 Data','pnl2 Data']
writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
print(df)
# Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
# 0 0 NULL 2 3
# 1 1 4 NULL NULL
df.to_excel(writer, sheet)
writer.save()
df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
for key, dfi in df.items():
print(dfi)
# A B C D
# 0 0 1 2 3
# 1 1 4 5 6
# 2 7 8 9 10
# A B C D
# 0 0 NaN 2.0 3.0
# 1 1 4.0 NaN NaN
print(df[1].loc[1, 'B'])
# 4
回答by user6275647
From the looks of your question, this isn't about assigning number values to columns upon import, but instead about how to access a given cell of a table by column and row numbers, which is a question specifically about how to index or slice a dataframe by integer.
从您的问题的外观来看,这不是关于在导入时为列分配数值,而是关于如何按列号和行号访问表的给定单元格,这是一个专门关于如何索引或切片的问题按整数的数据帧。
In your example, you mentioned wanting to refer to df1[1][6]
. You can do this by using .iloc
.
在您的示例中,您提到要引用df1[1][6]
. 您可以使用.iloc
.
# spin up a df
df = pd.DataFrame(np.random.randint(0,10,size=(7, 7)), columns=list('ABCDEFG'))
print df
Output:
输出:
A B C D E F G
0 0 7 7 8 8 2 2
1 8 2 9 1 6 8 1
2 5 3 5 5 9 2 7
3 7 4 2 1 1 5 0
4 0 4 4 1 9 7 1
5 4 2 7 7 9 7 2
6 0 6 7 8 1 4 1
Now use .iloc
to index by integer:
现在用于.iloc
按整数索引:
df.iloc[1,6]
Output:
输出:
1
To return to your code above, you could most likely change it to the following:
要返回上面的代码,您很可能将其更改为以下内容:
g_int_c=str(df.iloc[1,6])
For general references, here's the documentation on indexing and slicing dataframes: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer
对于一般参考,这里是索引和切片数据帧的文档:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer
And this Q&A might be helpful: How to get column by number in Pandas?
这个问答可能会有所帮助:如何在 Pandas 中按数字获取列?
回答by pHorseSpec
header=None,names=[0,1,2,3,4,5,6]
worked.
header=None,names=[0,1,2,3,4,5,6]
工作。
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])