Python setuptools:包数据文件夹位置

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4519127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:06:05  来源:igfitidea点击:

setuptools: package data folder location

pythonsetuptools

提问by phant0m

I use setuptools to distribute my python package. Now I need to distribute additional datafiles.

我使用 setuptools 分发我的 python 包。现在我需要分发额外的数据文件。

From what I've gathered fromt the setuptools documentation, I need to have my data files inside the package directory. However, I would rather have my datafiles inside a subdirectory in the root directory.

从我从 setuptools 文档中收集的信息来看,我需要将我的数据文件放在包目录中。但是,我宁愿将我的数据文件放在根目录的子目录中。

What I would like to avoid:

我想避免什么:

/ #root
|- src/
|  |- mypackage/
|  |  |- data/
|  |  |  |- resource1
|  |  |  |- [...]
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

What I would like to have instead:

我想要的是:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

I just don't feel comfortable with having so many subdirectories, if it's not essential. I fail to find a reason, why I /have/ to put the files inside the package directory. It is also cumbersome to work with so many nested subdirectories IMHO. Or is there any good reason that would justify this restriction?

如果不是必需的,我只是对拥有这么多子目录感到不舒服。我找不到原因,为什么我 /have/ 将文件放在包目录中。恕我直言,使用这么多嵌套子目录也很麻烦。或者有什么好的理由可以证明这种限制是合理的?

采纳答案by samplebias

Option 1: Install as package data

选项 1:作为包数据安装

The main advantage of placing data files inside the root of your Python package is that it lets you avoid worrying about where the files will live on a user's system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can always find the directory datarelative to your Python package root, no matter where or how it is installed.

将数据文件放在 Python 包的根目录中的主要优点是,它可以让您避免担心文件在用户系统上的位置,这些系统可能是 Windows、Mac、Linux、某些移动平台或 Egg。data无论安装在何处或如何安装,您始终可以找到与Python 包根目录相关的目录。

For example, if I have a project layout like so:

例如,如果我有一个像这样的项目布局:

project/
    foo/
        __init__.py
        data/
            resource1/
                foo.txt

You can add a function to __init__.pyto locate an absolute path to a data file:

您可以添加一个函数来__init__.py定位数据文件的绝对路径:

import os

_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
    return os.path.join(_ROOT, 'data', path)

print get_data('resource1/foo.txt')

Outputs:

输出:

/Users/pat/project/foo/data/resource1/foo.txt

After the project is installed as an Egg the path to datawill change, but the code doesn't need to change:

项目安装为 Egg 后,路径data会发生变化,但代码不需要更改:

/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt


Option 2: Install to fixed location

选项 2:安装到固定位置

The alternative would be to place your data outside the Python package and then either:

另一种方法是将您的数据放在 Python 包之外,然后:

  1. Have the location of datapassed in via a configuration file, command line arguments or
  2. Embed the location into your Python code.
  1. 有位置data通过配置文件传入,命令行参数或
  2. 将该位置嵌入到您的 Python 代码中。

This is far less desirable if you plan to distribute your project. If you reallywant to do this, you can install your datawherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:

如果您计划分发您的项目,这远不是可取的。如果你真的想这样做,你可以data通过传入一个元组列表来指定每组文件的目的地,在目标系统上安装你喜欢的任何地方:

from setuptools import setup
setup(
    ...
    data_files=[
        ('/var/data1', ['data/foo.txt']),
        ('/var/data2', ['data/bar.txt'])
        ]
    )

Updated: Example of a shell function to recursively grep Python files:

更新:递归grep Python文件的shell函数示例:

atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9:    package_data={'foo': ['data/resource1/foo.txt']}

回答by lgautier

I think that you can basically give anything as an argument *data_files* to setup().

我认为您基本上可以将任何内容作为参数 *data_files* 提供给setup()

回答by polvoazul

I Think I found a good compromise which will allow you to mantain the following structure:

我想我找到了一个很好的折衷方案,可以让您保持以下结构:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:

您应该将数据安装为 package_data,以避免出现 samplebias 答案中描述的问题,但为了保持文件结构,您应该添加到 setup.py 中:

try:
    os.symlink('../../data', 'src/mypackage/data')
    setup(
        ...
        package_data = {'mypackage': ['data/*']}
        ...
    )
finally:
    os.unlink('src/mypackage/data')

This way we create the appropriate structure "just in time", and mantain our source tree organized.

通过这种方式,我们“及时”创建了适当的结构,并维护了我们的源代码树。

To access such data files within your code, you 'simply' use:

要在您的代码中访问此类数据文件,您“只需”使用:

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.

我仍然不喜欢在代码中指定“mypackage”,因为数据可能与这个模块没有任何关系,但我想这是一个很好的妥协。