pandas 如何在 Python 中使用熊猫跳过读取空文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36133716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to skip reading empty files with panda in Python
提问by Leukonoe
I read all the files in one folder one by one into a DataFrame and then I check them for some conditions. There are few thousand files, and I would love to make pandas raise an Exception when a file is empty, so that my reader funtion would skip this file.
我将一个文件夹中的所有文件一个一个读入 DataFrame,然后检查它们是否存在某些情况。有几千个文件,我很想在文件为空时让 Pandas 引发异常,这样我的阅读器功能就会跳过这个文件。
I have something like:
我有类似的东西:
class StructureReader(FileList):
def __init__(self, dirname, filename):
self.dirname=dirname
self.filename=str(self.dirname+"/"+filename)
def read(self):
self.data = pd.read_csv(self.filename, header=None, sep = ",")
if len(self.data)==0:
raise ValueError
class Run(object):
def __init__(self, dirname):
self.dirname=dirname
self.file__list=FileList(dirname)
self.result=Result()
def run(self):
for k in self.file__list.file_list[:]:
self.b=StructureReader(self.dirname, k)
try:
self.b.read()
self.b.find_interesting_bonds(self.result)
self.b.find_same_direction_chain(self.result)
except ValueError:
pass
Regular file that I'm searching for some condition looks like:
我正在搜索某些条件的常规文件如下所示:
"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,
But somehow I don't ever get ValueError
raised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame ..." lines in my results file. How can I make program skip empty files?
但不知何故,我永远不会被ValueError
提升,而且我的函数正在搜索空文件,这在我的结果文件中给了我很多“空数据帧...”行。如何让程序跳过空文件?
回答by Yaron
I'd first check if the file is empty, and if it isn't empty I'll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142you can find a nice way to check if a file is empty:
我首先检查文件是否为空,如果它不是空的,我会尝试将它与Pandas一起使用。按照此链接https://stackoverflow.com/a/15924160/5088142,您可以找到一种检查文件是否为空的好方法:
import os
def is_non_zero_file(fpath):
return os.path.isfile(fpath) and os.path.getsize(fpath) > 0
回答by DevShark
You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not
您不应该使用 Pandas,而应该直接使用 python 库。答案就在那里:python 如何检查文件是否为空
回答by Ahmad M.
You can get your work done with following code, just add your CSVs path to the pathvariable, and run. You should get an object raw_datawhich is a Pandas dataframe.
您可以使用以下代码完成您的工作,只需将您的 CSV 路径添加到路径变量,然后运行。你应该得到一个对象raw_data,它是一个 Pandas 数据框。
import os, pandas as pd, glob
import pandas.io.common
path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))
for i in range(0,len(files_list)):
try:
raw_data = pd.read_csv(files_list[i])
except pandas.io.common.EmptyDataError:
print(files_list[i], " is empty and has been skipped.")
回答by Nick Mortimer
How about this
这个怎么样
files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)