Python 如何使用 numpy 导入 xlsx 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29438631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:33:33  来源:igfitidea点击:

Python How to Import xlsx file using numpy

pythoncsvnumpyxlsx

提问by godofamerica

I am having no trouble importing csv data using numpy, but keep getting an error for my xlsx file. How do I convert the xlsx file to csv or how to I import xlsx file to the x2 variable?

我使用 numpy 导入 csv 数据没有问题,但是我的 xlsx 文件一直出现错误。如何将 xlsx 文件转换为 csv 或如何将 xlsx 文件导入 x2 变量?

from matplotlib import pyplot as pp
import numpy as np

#this creates a line graph comparing flight arrival time, arrival in queue, and processing time

x,y = np.loadtxt ('LAX_flights.csv',
                unpack = True,
                usecols = (1,2),
                delimiter = ',')

print("Imported data set arrival time")

x2 = np.loadtext ('First_Persons_PT.xlsx',
               unpack = True,
               usecols=(0))

print ("Imported start of processing time")


#y2=
#print ("Imported final time when processed")

pp.plot(x,y, 'g', linewidth = 1)
#pp.plot(x2,y, 'y', linewidth = 1)
pp.grid(b=True, which = 'major', color='0', linestyle='-')

pp.title('Comparing Time of Arrival vs. Queue Arrival Time, Queue Finish Time')
pp.ylabel('Arrival in queue (Green),Process Time (Yellow)')
pp.xlabel('Time of arrival')

pp.savefig('line_graph_comparison.png')

Here is the error

这是错误

Imported data set arrival time
Traceback (most recent call last):
  File "C:\Users\fkrueg1\Dropbox\forest_python_test\Graph_time_of_arrival.py", line 13, in <module>
    x2 = np.loadtext ('First_Persons_PT.xlsx',
AttributeError: 'module' object has no attribute 'loadtext'

The xlsx is just a single column of about 100 numbers

xlsx 只是一列大约 100 个数字

回答by David Heffernan

The method's name is loadtxt, rather than loadtext. That explains the error that you report.

该方法的名称是loadtxt, 而不是loadtext。这解释了您报告的错误。

However, loadtxtwon't be able to read an OpenXML .xlsx file. The .xlsx file is a binary format, and a rather complex one at that. You will need to use a module dedicated to reading such files in order to be able to read .xlsx files. For instance, xlrdand openpyxlcan both read .xlsx files.

但是,loadtxt将无法读取 OpenXML .xlsx 文件。.xlsx 文件是一种二进制格式,是一种相当复杂的格式。您将需要使用专用于读取此类文件的模块才能读取 .xlsx 文件。例如,xlrd并且openpyxl可以同时读取的.xlsx文件。

Depending on what your requirements are, it may be easier to supply a text file rather than a .xlsx file.

根据您的要求,提供文本文件而不是 .xlsx 文件可能更容易。

回答by Mark Mikofski

NumPy does not have any commmands to read Excel documents. Instead use openpyxlfor OpenXML (Excel >= 2007) or xlrdfor xls and xlsx as @David Heffernansuggests. You can use pipto install either. From the openpyxl documentationexample:

NumPy 没有任何读取 Excel 文档的命令。而是像@David Heffernan建议的那样,将openpyxl用于 OpenXML ( Excel >= 2007) 或将xlrd用于 xls 和 xlsx 。您可以使用pip进行安装。来自openpyxl 文档示例:

>>> from openpyxl import load_workbook
>>> wb = load_workbook('First_Persons_PT.xlsx', read_only=True)
>>> print wb.sheetnames
['Sheet1', 'Sheet2', 'Sheet3']
>>> ws = wb.get_sheet_by_name('Sheet1')
>>> use_col = 0  # column index from each row to get value of
>>> x2 = np.array([r[use_col].value for r in ws.iter_rows()])

See my postson reading Excel in Python.

请参阅我关于在 Python 中阅读 Excel 的帖子

回答by Ahmad Saadeddin

import pandas as pd
WS = pd.read_excel('ur.xlsx')
WS_np = np.array(WS)

Using pandas is simpler

使用熊猫更简单