pandas 在python中将dbf转换为csv的方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32772447/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:55:55  来源:igfitidea点击:

Way to convert dbf to csv in python?

pythoncsvpandasdbf

提问by Stefano Potter

I have a folder with a bunch of dbf files I would like to convert to csv. I have tried using a code to just change the extension from .dbf to .csv, and these files open fine when I use Excel, but when I open them in pandas they look like this:

我有一个文件夹,里面有一堆我想转换为 csv 的 dbf 文件。我尝试使用代码将扩展名从 .dbf 更改为 .csv,当我使用 Excel 时,这些文件可以正常打开,但是当我在 Pandas 中打开它们时,它们看起来像这样:

                                                s\t?
0                                                NaN
1            1       176 1.58400000000e+005-3.385...

This is not what I want, and those characters don't appear in the real file.
How should I read in the dbf file correctly?

这不是我想要的,那些字符不会出现在真实文件中。
我应该如何正确读取 dbf 文件?

采纳答案by Andy Hayden

Looking online, there's a few options:

网上查了一下,有以下几种选择:



With simpledbf:

使用simpledbf

dbf = Dbf5('fake_file_name.dbf')
df = dbf.to_dataframe()


Tweaked from the gist:

从要点调整:

import pysal as ps

def dbf2DF(dbfile, upper=True):
    "Read dbf file and return pandas DataFrame"
    with ps.open(dbfile) as db:  # I suspect just using open will work too
        df = pd.DataFrame({col: db.by_col(col) for col in db.header})
        if upper == True: 
           df.columns = map(str.upper, db.header) 
        return df

回答by Ethan Furman

Using my dbf libraryyou could do something like:

使用我的 dbf 库,您可以执行以下操作:

import sys
import dbf
for arg in sys.argv[1:]:
    dbf.export(arg)

which will create a .csvfile of the same name as each dbf file. If you put that code into a script named dbf2csv.pyyou could then call it as

这将创建一个.csv与每个 dbf 文件同名的文件。如果将该代码放入名为的脚本中dbf2csv.py,则可以将其称为

python dbf2csv.py dbfname dbf2name dbf3name ...

回答by Yang Qi

Here is my solution that I've been using for years. I have a solution for Python 2.7 and one for Python 3.5 (probably also 3.6).

这是我多年来一直使用的解决方案。我有一个适用于 Python 2.7 的解决方案和一个适用于 Python 3.5(可能也是 3.6)的解决方案。

Python 2.7:

蟒蛇 2.7:

import csv
from dbfpy import dbf

def dbf_to_csv(out_table):#Input a dbf, output a csv
    csv_fn = out_table[:-4]+ ".csv" #Set the table as .csv format
    with open(csv_fn,'wb') as csvfile: #Create a csv file and write contents from dbf
        in_db = dbf.Dbf(out_table)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields: #Write headers
            names.append(field.name)
        out_csv.writerow(names)
        for rec in in_db: #Write records
            out_csv.writerow(rec.fieldData)
        in_db.close()
    return csv_fn

Python 3.5:

蟒蛇 3.5:

import csv
from dbfread import DBF

def dbf_to_csv(dbf_table_pth):#Input a dbf, output a csv, same name, same path, except extension
    csv_fn = dbf_table_pth[:-4]+ ".csv" #Set the csv file name
    table = DBF(dbf_table_pth)# table variable is a DBF object
    with open(csv_fn, 'w', newline = '') as f:# create a csv file, fill it with dbf content
        writer = csv.writer(f)
        writer.writerow(table.field_names)# write the column name
        for record in table:# write the rows
            writer.writerow(list(record.values()))
    return csv_fn# return the csv name

You can get dbfpy and dbfread from pip install.

您可以从 pip install 获取 dbfpy 和 dbfread。

回答by Alessandro Trinca Tornidor

EDIT#2:

编辑#2:

It's possible to read a dbf file, line by line and without conversion into csv, with dbfread(simply install with pip install dbfread):

可以逐行读取 dbf 文件,无需转换为 csv,使用dbfread(只需安装pip install dbfread):

>>> from dbfread import DBF
>>> for row in DBF('southamerica_adm0.dbf'):
...     print row
... 
OrderedDict([(u'COUNTRY', u'ARGENTINA')])
OrderedDict([(u'COUNTRY', u'BOLIVIA')])
OrderedDict([(u'COUNTRY', u'BRASIL')])
OrderedDict([(u'COUNTRY', u'CHILE')])
OrderedDict([(u'COUNTRY', u'COLOMBIA')])
OrderedDict([(u'COUNTRY', u'ECUADOR')])
OrderedDict([(u'COUNTRY', u'GUYANA')])
OrderedDict([(u'COUNTRY', u'GUYANE')])
OrderedDict([(u'COUNTRY', u'PARAGUAY')])
OrderedDict([(u'COUNTRY', u'PERU')])
OrderedDict([(u'COUNTRY', u'SURINAME')])
OrderedDict([(u'COUNTRY', u'U.K.')])
OrderedDict([(u'COUNTRY', u'URUGUAY')])
OrderedDict([(u'COUNTRY', u'VENEZUELA')])

My updated references:

我更新的参考资料:

official project site: http://pandas.pydata.org

官方项目站点:http: //pandas.pydata.org

official documentation: http://pandas-docs.github.io/pandas-docs-travis/

官方文档:http: //pandas-docs.github.io/pandas-docs-travis/

dbfread: https://pypi.python.org/pypi/dbfread/2.0.6

dbfread: https://pypi.python.org/pypi/dbfread/2.0.6

geopandas: http://geopandas.org/

geopandashttp: //geopandas.org/

shp and dbfwith geopandas: https://gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fiona

shp 和 dbfgeopandashttps: //gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fiona