是否有用于打开 SPSS 文件的 Python 模块?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14647006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:02:42  来源:igfitidea点击:

Is there a Python module to open SPSS files?

pythondatasetstatisticspython-modulespss

提问by Lamps1829

Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.

是否有 Python 模块可以打开 IBM SPSS(即 .sav)文件?如果有不需要任何额外 dll 文件/库的最新内容,那就太好了。

回答by JKP

But the benefit of using the IBM libraries is that they get this rather complex binary file format right. They are free, relieve you of the burden of writing code for this format, and the license permits you to redistribute them. What more could you ask?

但是使用 IBM 库的好处是它们可以正确处理这种相当复杂的二进制文件格式。它们是免费的,可以减轻您为此格式编写代码的负担,并且许可证允许您重新分发它们。你还能问什么?

回答by Jeromy Anglim

You could use a python interface to Rand then import the data using read.spssin library(foreign).

您可以使用Rpython 接口,然后使用read.spssin导入数据library(foreign)

回答by chl

Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedmanon a related thread might easily be adapted to suit your needs.

根据您想要做什么——使用来自rpy2 的R 相关命令处理数据,或切换到 Python—— @Spacedman在相关线程上提供解决方案可能很容易适应您的需求。

Otherwise, Pandasincludes a convenient wrapper for rpy2. Here is an example of use with Peat and Barton's weights.savdata set:

否则,Pandas包含一个方便的rpy2. 以下是使用 Peat 和 Bartonweights.sav数据集的示例:

>>> import pandas.rpy.common as com
>>> filename = "weights.sav"
>>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
>>> w = com.convert_robj(w)
>>> w.head()
     ID  WEIGHT  LENGTH  HEADC  GENDER  EDUCATIO              PARITY
1  L001    3.95    55.5   37.5  Female  tertiary  3 or more siblings
2  L003    4.63    57.0   38.5  Female  tertiary           Singleton
3  L004    4.75    56.0   38.5    Male    year12          2 siblings
4  L005    3.92    56.0   39.0    Male  tertiary         One sibling
5  L006    4.56    55.0   39.5    Male    year10          2 siblings

回答by Savage Henry

As a note for people findings this later (like me): pandas.rpyhas been deprecated in the newest versions of pandas (>0.16) as noted here. That page includes information on updating code to use the rpy2interface.

至于人们发现一张纸条后这(和我一样):pandas.rpy已经在熊猫的最新版本中已弃用(> 0.16)的注意这里。该页面包含有关更新代码以使用该rpy2界面的信息。

回答by 4ilin

Here're packages you probably interested in

这是您可能感兴趣的软件包

回答by Courtney

I had the same question as @Pyderman about how to update this for pandas (>0.16). This is what I came up with:

我和@Pyderman 有同样的问题,关于如何为熊猫更新这个(> 0.16)。这就是我想出的:

from rpy2.robjects import pandas2ri, r
filename = 'weights.sav'
w = r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
df = pandas2ri.ri2py(w)
df.head()

回答by Otto Fajardo

I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

我发布了一个 python 包“pyreadstat”,它可以读取 SPSS(sav、zsav 和 por)、Stata 和 SAS 文件。它是 C 库 ReadStat 的包装器,因此速度非常快。readstat是R库Haven后面用到的库,应用广泛,非常健壮。

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

该包是自动包含的。它不需要使用 R(不需要安装额外的应用程序)并且它不依赖于 IBM dll 或其他外部库。

For example, in order to read a SPSS sav file you would do:

例如,为了读取 SPSS sav 文件,您将执行以下操作:

import pyreadstat

df, meta = pyreadstat.read_sav("/path/to/sav/file.sav")

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

df 是一个熊猫数据框。Meta 包含元数据,例如变量标签或值标签。read_sav 读取 sav 和 zsav(压缩)文件。还有一个用于旧 por(便携式)文件的函数 read_por。

You can find it here: https://github.com/Roche/pyreadstat

你可以在这里找到它:https: //github.com/Roche/pyreadstat

回答by Sander van den Oord

When you have pandas >= 0.25.0you can now finally just do pd.read_spss():

当你有pandas >= 0.25.0你现在终于可以做pd.read_spss()

# you need pandas >= 0.25.0 for this    
import pandas as pd
df = pd.read_spss('your_spss_file.sav')

This has library pyreadstat as a requirement, so you might have to install that first:

这有库pyreadstat 作为要求,因此您可能必须先安装它:

pip install pyreadstat

Extra info on the parameters of pd.read_spss():

关于pd.read_spss() 参数的额外信息:

Parameters
----------
path: string or Path
File path

usecols: list-like, optional
Return a subset of the columns. If None, return all columns.

convert_categoricals: bool, default is True
Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame

参数
----------
path: string 或 Path
文件路径

usecols:类似列表,可选
返回列的子集。如果没有,则返回所有列。

convert_categoricals: bool,默认为 True
将分类列转换为 pd.Categorical。

返回
------- 数据