Python Pandas - 读取 CSV 或 Excel

Question

提问by KingOfLeon

I'm allowing users to upload a CSV or Excel file. I'm using pandas to read the file and create a dataframe. Since I can't predict which filetype the user will upload, I wrapped pd.read_csv() and pd.read_excel() in a try/except block.

我允许用户上传 CSV 或 Excel 文件。我正在使用 Pandas 来读取文件并创建一个数据框。由于我无法预测用户将上传哪种文件类型，我将 pd.read_csv() 和 pd.read_excel() 包装在 try/except 块中。

if form.validate_on_submit():
    input_filename = secure_filename(form.file.data.filename)
    try:
        df = pd.read_csv(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
    except:
        df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')

If pd.read_csv() is first in the try/except block and I upload a .csv file it works. If I attempt to upload a .xlsx file, I get this error:

如果 pd.read_csv() 是 try/except 块中的第一个，并且我上传了一个 .csv 文件，它就可以工作。如果我尝试上传 .xlsx 文件，则会收到此错误：

TypeError: expected str, bytes or os.PathLike object, not NoneType

If pd.read_excel() is first in the try/except block and I upload an .xlsx file it works. If I attempt to upload a .csv file, I get this error:

如果 pd.read_excel() 是 try/except 块中的第一个，并且我上传了一个 .xlsx 文件，它就可以工作。如果我尝试上传 .csv 文件，则会收到此错误：

pandas.io.common.EmptyDataError: No columns to parse from file

Previously, I used mimetype to route the file to the correct pandas function, but I was hoping for a cleaner (and all encompassing) solution that didn't involve several if/elif statements. This is what I had:

以前，我使用 mimetype 将文件路由到正确的 Pandas 函数，但我希望有一个更清晰（并且包罗万象）的解决方案，它不涉及多个 if/elif 语句。这就是我所拥有的：

if form.file.data.mimetype == 'text/csv':
    df = pd.read_csv(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
elif form.file.data.mimetype == 'application/octet-stream':
    df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
elif form.file.data.mimetype == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
    df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
else:
    flash('Error Uploading File. Invalid file type. Please use xls, xlsx or csv.', 'danger')
    return render_template('upload.html', current_user=current_user, form=form)

I'm using Flask, WTForms and Python 3. Thank you.

我正在使用 Flask、WTForms 和 Python 3。谢谢。

Answer 1

回答by Jonathan Eunice

You are calling read_excelwith keyword args that are useful for read_csvbut not supported by read_excel. Instead you might try:

您正在呼吁read_excel与关键字ARGS是有用的read_csv，但不支持read_excel。相反，您可以尝试：

if form.validate_on_submit():
    input_filename = secure_filename(form.file.data.filename)
    data = form.file.data
    try:
        df = pd.read_csv(data, header=0, skip_blank_lines=True, 
                         skipinitialspace=True, encoding='latin-1')
    except:
        df = pd.read_excel(data, header=0)

In addition to removing the extra args to read_excel, I've also hoisted extracting the data out of form.file.data; this is combat the possibility that that there could be some lazy-load behavior interacting poorly with the try/exceptblock.

除了删除额外的 args 之外read_excel，我还提升了从form.file.data; 中提取数据的方法。这是对抗可能存在与try/except块交互不佳的延迟加载行为的可能性。

In general it is hard to debug moderately complex I/O functions in the midst of web requests. When operations like this don't work, best approach is to split the problem into two parts: 1/ Get data from the web request, write it to a file. Then separately, 2/ try the I/O (in this case, Pandas dataframe load) from the resulting file. Doing this interactively or in a separate program will give you more debugging opportunities and clarity. Jupyter Notebookis excellent for such exploratory tests, though most IDEs or even the bare Python REPL will work. When part 2/ is clearly working, then you can patch it back in under the Flask / web app code.

一般来说，很难在 Web 请求中调试中等复杂的 I/O 功能。当这样的操作不起作用时，最好的方法是将问题分成两部分： 1/ 从 Web 请求中获取数据，将其写入文件。然后分别尝试 2/ 来自结果文件的 I/O（在本例中为 Pandas 数据帧加载）。以交互方式或在单独的程序中执行此操作将为您提供更多调试机会和清晰度。Jupyter Notebook非常适合此类探索性测试，尽管大多数 IDE 甚至裸 Python REPL 都可以使用。当第 2/ 部分明显工作时，您可以在 Flask / Web 应用程序代码下将其修补回去。

Python Pandas - 读取 CSV 或 Excel

提问by KingOfLeon

回答by Jonathan Eunice

相关推荐

最近更新

标签

Python Pandas - 读取 CSV 或 Excel

提问by KingOfLeon

回答by Jonathan Eunice

相关推荐

Python Pandas：每月或每周拆分一个 TimeSerie

pandas 如何避免输出到 jupyter notebook 中的可滚动帧？

pandas Python - 如何从 excel 列创建列表

在 Pandas 中，在 groupby 之后分组列消失了

相关推荐

最近更新

标签