Pandas:将 dbf 表转换为数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41898561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:52:34  来源:igfitidea点击:

Pandas: transform a dbf Table into a dataframe

pythonpandasdataframearcgisdbf

提问by FaCoffee

I want to read a dbffile of an ArcGIS shapefile and dump it into a pandasdataframe. I am currently using the dbfpackage.

我想读取dbfArcGIS shapefile 的文件并将其转储到pandas数据框中。我目前正在使用dbf包。

I have apparently been able to load the dbffile as a Table, but have not been able to figure out how to parse it and turn it into a pandas dataframe. What is the way to do it?

我显然能够将dbf文件作为表格加载,但无法弄清楚如何解析它并将其转换为Pandas数据框。有什么方法可以做到?

This is where I am stuck at:

这就是我被困的地方:

import dbf
thisTable = dbf.Table('C:\Users\myfolder\project\myfile.dbf')
thisTable.open(mode='read-only')

Python returns this statement as output, which I frankly don't know what to make of:

Python 将此语句作为输出返回,坦率地说,我不知道该怎么做:

dbf.ver_2.Table('C:\\Users\\myfolder\\project\\myfile.dbf', status='read-only')

dbf.ver_2.Table('C:\\Users\\myfolder\\project\\myfile.dbf', status='read-only')



EDIT

编辑

Sample of my original dbf:

我的原始样本dbf

FID   Shape    E              N
0     Point    90089.518711   -201738.245555
1     Point    93961.324059   -200676.766517
2     Point    97836.321204   -199614.270439
...   ...      ...            ...

回答by Fabio Lamanna

You should have a look at simpledbf:

你应该看看simpledbf

In [2]: import pandas as pd

In [3]: from simpledbf import Dbf5

In [4]: dbf = Dbf5('test.dbf')

In [5]: df = dbf.to_dataframe()

This works for me with a little sample .dbf file. Hope that helps.

这对我有用,有一个小示例 .dbf 文件。希望有帮助。

回答by Philipe Riskalla Leal

As mmann1123 stated, you can use geopandas in order to read your dbf file. The Geopandas reads it even though it may or may not have geospatial data.

正如 mmann1123 所说,您可以使用 geopandas 来读取您的 dbf 文件。Geopandas 读取它,即使它可能有也可能没有地理空间数据。

Assuming your data is only tabular data (no geographical coordinate on it), and you wish to read it and convert to a format which pandas library can read, I would suggest using geopandas.

假设您的数据只是表格数据(上面没有地理坐标),并且您希望读取它并转换为 pandas 库可以读取的格式,我建议使用 geopandas。

Here is an example:

下面是一个例子:

import geopandas as gpd

My_file_path_name = r'C:\Users\...file_dbf.dbf'

Table = gpd.read_file(Filename)

import pandas as pd
Pandas_Table = pd.DataFrame(Table)

Keys = list(Table.keys())
Keys.remove('ID_1','ID_2') # removing ID attributes from the Table keys list
Keys.remove('Date') # eventually you have date attribute which you wanna preserve.

DS = pd.melt(Pandas_Table, 
             id_vars =['ID_1','ID_2'], # accepts multiple filter/ID values 
             var_name='class_fito', # Name of the variable which will aggregate all columns from the Table into the Dataframe
             value_name ='biomass (mg.L-1)' , # name of the variable in Dataframe
             value_vars= Keys # parameter that defines which attributes from the Table are a summary of the DataFrame)

# checking your DataFrame:

type(DS)   # should appear something like: pandas.core.frame.DataFrame

回答by mmann1123

You might want to look at geopandas. It will allow you to do most important GIS operations

你可能想看看 geopandas。它将允许您执行最重要的 GIS 操作

http://geopandas.org/data_structures.html

http://geopandas.org/data_structures.html

回答by Dobedani

How about using dbfpy? Here's an example that shows how to load a dbf with 3 columns into a dataframe:

使用dbfpy怎么?这是一个示例,展示了如何将具有 3 列的 dbf 加载到数据框中:

from dbfpy import dbf
import pandas as pd

df = pd.DataFrame(columns=('tileno', 'grid_code', 'area'))
db = dbf.Dbf('test.dbf')
for rec in db:
    data = []
    for i in range(len(rec.fieldData)):
        data.append(rec[i])
    df.loc[len(df.index)] = data
db.close()

If necessary, you could find out the column names from db.fieldNames.

如有必要,您可以从 db.fieldNames 中找出列名。

回答by Dobedani

Performance can be an issue. I tested a few of the libraries suggested above and elsewhere. For my test, I used a small dbf file of 17 columns and 23 records (7 kb).

性能可能是一个问题。我测试了上面和其他地方建议的一些库。在我的测试中,我使用了一个包含 17 列和 23 条记录 (7 kb) 的小型 dbf 文件。

Package simpledbf has a straightforward method to_dataframe(). And the practical aspect of the DBF table object of dbfread is the possibility to just iterate over it by adding it as an argument to Python's builtin function iter(), of which the result can be used to directly initialise a dataframe. In the case of pysal, I used the function dbf2DF as decribed here. The data from the other libraries I added to the dataframe by using the method shown above. However, only after retrieving the field names so that I could initialise the dataframe with the right column names first: from the fieldNames, _meta.keys and by means of the function ListFields respectively.

包 simpledbf 有一个简单的方法 to_dataframe()。dbfread 的 DBF 表对象的实际方面是可以通过将其作为参数添加到 Python 的内置函数 iter() 来对其进行迭代,其结果可用于直接初始化数据帧。在 pysal 的情况下,我使用了此处描述的函数 dbf2DF 。我使用上面显示的方法将来自其他库的数据添加到数据框中。但是,只有在检索字段名称之后,我才能首先使用正确的列名称初始化数据框:分别来自 fieldNames、_meta.keys 和函数 ListFields。

Probably adding records 1 by 1 is not the fastest way to obtain a filled dataframe, meaning that tests with dbfpy, dbf and arcpy would result in more favourable figures when a smarter way would be chosen to add the data to the dataframe. All the same, I hope the following table - with times in seconds - is useful:

Probably adding records 1 by 1 is not the fastest way to obtain a filled dataframe, meaning that tests with dbfpy, dbf and arcpy would result in more favourable figures when a smarter way would be chosen to add the data to the dataframe. 尽管如此,我希望下表 - 以秒为单位 - 有用:

simpledbf   0.0030
dbfread     0.0060
dbfpy       0.0140
pysal       0.0160
dbf         0.0210
arcpy       2.7770