pandas 如何在 Python 中使用熊猫跳过读取空文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36133716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:54:37  来源:igfitidea点击:

How to skip reading empty files with panda in Python

pythonpandasdataframe

提问by Leukonoe

I read all the files in one folder one by one into a DataFrame and then I check them for some conditions. There are few thousand files, and I would love to make pandas raise an Exception when a file is empty, so that my reader funtion would skip this file.

我将一个文件夹中的所有文件一个一个读入 DataFrame,然后检查它们是否存在某些情况。有几千个文件,我很想在文件为空时让 Pandas 引发异常,这样我的阅读器功能就会跳过这个文件。

I have something like:

我有类似的东西:

class StructureReader(FileList):
    def __init__(self, dirname, filename):
        self.dirname=dirname
        self.filename=str(self.dirname+"/"+filename)
    def read(self):
        self.data = pd.read_csv(self.filename, header=None, sep = ",")
        if len(self.data)==0:
           raise ValueError
class Run(object):
    def __init__(self, dirname):
        self.dirname=dirname
        self.file__list=FileList(dirname)
        self.result=Result()
    def run(self):
        for k in self.file__list.file_list[:]:
            self.b=StructureReader(self.dirname, k)
            try:
                self.b.read()
                self.b.find_interesting_bonds(self.result)
                self.b.find_same_direction_chain(self.result)
            except ValueError:
                pass

Regular file that I'm searching for some condition looks like:

我正在搜索某些条件的常规文件如下所示:

"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,

But somehow I don't ever get ValueErrorraised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame ..." lines in my results file. How can I make program skip empty files?

但不知何故,我永远不会被ValueError提升,而且我的函数正在搜索空文件,这在我的结果文件中给了我很多“空数据帧...”行。如何让程序跳过空文件?

回答by Yaron

I'd first check if the file is empty, and if it isn't empty I'll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142you can find a nice way to check if a file is empty:

我首先检查文件是否为空,如果它不是空的,我会尝试将它与Pandas一起使用。按照此链接https://stackoverflow.com/a/15924160/5088142,您可以找到一种检查文件是否为空的好方法:

import os
def is_non_zero_file(fpath):  
    return os.path.isfile(fpath) and os.path.getsize(fpath) > 0

回答by DevShark

You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not

您不应该使用 Pandas,而应该直接使用 python 库。答案就在那里:python 如何检查文件是否为空

回答by Ahmad M.

You can get your work done with following code, just add your CSVs path to the pathvariable, and run. You should get an object raw_datawhich is a Pandas dataframe.

您可以使用以下代码完成您的工作,只需将您的 CSV 路径添加到路径变量,然后运行。你应该得到一个对象raw_data,它是一个 Pandas 数据框。

import os, pandas as pd, glob
import pandas.io.common

path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))

for i in range(0,len(files_list)):
   try:
       raw_data = pd.read_csv(files_list[i])
   except pandas.io.common.EmptyDataError:
      print(files_list[i], " is empty and has been skipped.")

回答by Nick Mortimer

How about this

这个怎么样

files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)