pandas 如何使用pandas将excel文件数据转换为numpy数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36370247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:59:22  来源:igfitidea点击:

How to convert an excel file data into numpy array using pandas?

pythonnumpypandaskeras

提问by Rian Zaman

I am really new in keras library and also Python. I am trying to import an excel file using pandas and convert it to a numpy.ndarrayusing as_matrix()function of pandas. But it seams to read my file wrong. Like I have a 90x1049 data set in Excel file. But when i am trying to convert it into numpy array it reads my data as 89x1049. I am using the following code, which is not working:

我真的是 keras 库和 Python 的新手。我正在尝试使用 pandas 导入一个 excel 文件并将其转换为 pandas的numpy.ndarrayusingas_matrix()函数。但它接缝读取我的文件错误。就像我在 Excel 文件中有一个 90x1049 的数据集。但是当我尝试将其转换为 numpy 数组时,它会将我的数据读取为 89x1049。我正在使用以下代码,但它不起作用:

training_data_x = pd.read_excel("/home/workstation/ANN/new_input.xlsx")
X_train = training_data_x.as_matrix()

采纳答案by Ilja Everil?

Probably what happens is that your Excel file has no header row and so pandas.read_excelconsumes your first data row as such.

可能发生的情况是您的 Excel 文件没有标题行,因此pandas.read_excel消耗了您的第一个数据行。

I tried creating an xlsx containing

我尝试创建一个 xlsx 包含

1   2   3
2   3   4
3   4   5
4   5   6
5   6   7
6   7   8
7   8   9
8   9   10
9   10  11
10  11  12

Reading that resulted in

阅读导致

In [3]: df = pandas.read_excel('test.xlsx')

In [4]: df
Out[4]: 
    1   2   3
0   2   3   4
1   3   4   5
2   4   5   6
3   5   6   7
4   6   7   8
5   7   8   9
6   8   9  10
7   9  10  11
8  10  11  12

As can be seen, the first data row has been used as labels for columns.

可以看出,第一个数据行已用作列的标签。

To avoid consuming the first data row as headers, pass headers=Noneto read_excel. Interestingly the documentationdid not mention this usage before, but has been fixed since:

为避免将第一行数据用作标题,请传递headers=Noneread_excel. 有趣的是,文档之前没有提到这种用法,但已修复:

header: int, list of ints, default 0

Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

标头int,整数列表,默认为 0

用于解析的 DataFrame 的列标签的行(0 索引)。如果传递整数列表,则这些行位置将组合成一个MultiIndex. 如果没有标题,请使用 None。

回答by pylang

If you have no header, try the following:

如果没有标题,请尝试以下操作:

training_data = pd.read_excel("/home/workstation/ANN/new_input.xlsx", header=None)

X_train = training_data_x.as_matrix()

See also answers from a previous question.

另请参阅上一个问题的答案。