pandas 如何在 Python 中使用熊猫跳过读取空文件

Question

提问by Leukonoe

I read all the files in one folder one by one into a DataFrame and then I check them for some conditions. There are few thousand files, and I would love to make pandas raise an Exception when a file is empty, so that my reader funtion would skip this file.

我将一个文件夹中的所有文件一个一个读入 DataFrame，然后检查它们是否存在某些情况。有几千个文件，我很想在文件为空时让 Pandas 引发异常，这样我的阅读器功能就会跳过这个文件。

I have something like:

我有类似的东西：

class StructureReader(FileList):
    def __init__(self, dirname, filename):
        self.dirname=dirname
        self.filename=str(self.dirname+"/"+filename)
    def read(self):
        self.data = pd.read_csv(self.filename, header=None, sep = ",")
        if len(self.data)==0:
           raise ValueError
class Run(object):
    def __init__(self, dirname):
        self.dirname=dirname
        self.file__list=FileList(dirname)
        self.result=Result()
    def run(self):
        for k in self.file__list.file_list[:]:
            self.b=StructureReader(self.dirname, k)
            try:
                self.b.read()
                self.b.find_interesting_bonds(self.result)
                self.b.find_same_direction_chain(self.result)
            except ValueError:
                pass

Regular file that I'm searching for some condition looks like:

我正在搜索某些条件的常规文件如下所示：

"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,

But somehow I don't ever get ValueErrorraised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame ..." lines in my results file. How can I make program skip empty files?

但不知何故，我永远不会被ValueError提升，而且我的函数正在搜索空文件，这在我的结果文件中给了我很多“空数据帧...”行。如何让程序跳过空文件？

Answer 1

回答by Yaron

I'd first check if the file is empty, and if it isn't empty I'll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142you can find a nice way to check if a file is empty:

我首先检查文件是否为空，如果它不是空的，我会尝试将它与Pandas一起使用。按照此链接https://stackoverflow.com/a/15924160/5088142，您可以找到一种检查文件是否为空的好方法：

import os
def is_non_zero_file(fpath):  
    return os.path.isfile(fpath) and os.path.getsize(fpath) > 0

Answer 2

回答by DevShark

You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not

您不应该使用 Pandas，而应该直接使用 python 库。答案就在那里：python 如何检查文件是否为空

Answer 3

回答by Ahmad M.

You can get your work done with following code, just add your CSVs path to the pathvariable, and run. You should get an object raw_datawhich is a Pandas dataframe.

您可以使用以下代码完成您的工作，只需将您的 CSV 路径添加到路径变量，然后运行。你应该得到一个对象raw_data，它是一个 Pandas 数据框。

import os, pandas as pd, glob
import pandas.io.common

path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))

for i in range(0,len(files_list)):
   try:
       raw_data = pd.read_csv(files_list[i])
   except pandas.io.common.EmptyDataError:
      print(files_list[i], " is empty and has been skipped.")

Answer 4

回答by Nick Mortimer

How about this

这个怎么样

files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)

pandas 如何在 Python 中使用熊猫跳过读取空文件

提问by Leukonoe

回答by Yaron

回答by DevShark

回答by Ahmad M.

回答by Nick Mortimer

相关推荐

最近更新

标签

pandas 如何在 Python 中使用熊猫跳过读取空文件

提问by Leukonoe

回答by Yaron

回答by DevShark

回答by Ahmad M.

回答by Nick Mortimer

相关推荐

从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法

Pandas：使用波浪号运算符返回带有两个过滤器的逆向数据

pandas 基于 Python 列表从 yaml 文件中检索数据

pandas 用熊猫替换csv文件python中的标题

相关推荐

最近更新

标签