Python 在pyspark中找不到col函数

Question

提问by Bamqf

In pyspark 1.6.2, I can import colfunction by

在 pyspark 1.6.2 中，我可以col通过

from pyspark.sql.functions import col

but when I try to look it up in the Github source codeI find no colfunction in functions.pyfile, how can python import a function that doesn't exist?

但是当我尝试在Github 源代码中查找它时，我发现文件中没有任何col函数，functions.pypython 如何导入一个不存在的函数？

Answer 1

采纳答案by zero323

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functionsare thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

它存在。它只是没有明确定义。导出的函数pyspark.sql.functions是围绕 JVM 代码的瘦包装器，除了少数需要特殊处理的例外，它们是使用辅助方法自动生成的。

If you carefully check the source you'll find collisted among other _functions. This dictionary is further iteratedand _create_functionis used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

如果您仔细检查来源，您会发现其中col列出了_functions. 该字典被进一步迭代并_create_function用于生成包装器。每个生成的函数在globals.

Finally __all__, which defines a list of items exported from the module, just exports all globalsexcluding ones contained in the blacklist.

最后__all__，它定义了从模块导出的项目列表，只导出globals黑名单中不包括的所有项目。

If this mechanisms is still not clear you can create a toy example:

如果这种机制仍然不清楚，您可以创建一个玩具示例：

Create Python module called foo.pywith a following content:

# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]

Place it somewhere on the Python path (for example in the working directory).
Import foo:
```
from foo import foo

foo(1)
```

创建foo.py使用以下内容调用的 Python 模块：

# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]

将它放在 Python 路径上的某个位置（例如在工作目录中）。
进口foo：
```
from foo import foo

foo(1)
```

An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

这种元编程方法的一个不希望有的副作用是，完全依赖静态代码分析的工具可能无法识别定义的函数。这不是一个关键问题，可以在开发过程中安全地忽略。

Depending on the IDE installing type annotationsmight resolve the problem (see for example zero323/pyspark-stubs#172).

根据 IDE 安装类型注释可能会解决问题（参见例如zero323/pyspark-stubs#172）。

Answer 2

回答by Dmytro

As of VS Code 1.26.1this can be solved by modifying python.linting.pylintArgssetting:

从VS Code 1.26.1 开始，这可以通过修改python.linting.pylintArgs设置来解决：

"python.linting.pylintArgs": [
        "--generated-members=pyspark.*",
        "--extension-pkg-whitelist=pyspark",
        "--ignored-modules=pyspark.sql.functions"
    ]

That issue was explained on github: https://github.com/DonJayamanne/pythonVSCode/issues/1418#issuecomment-411506443

这个问题在 github 上有解释：https: //github.com/DonJayamanne/pythonVSCode/issues/1418#issuecomment-411506443

Answer 3

回答by Vincent Claes

In Pycharm the colfunction and others are flagged as "not found"

在 Pycharm 中，col函数和其他函数被标记为“未找到”

a workaround is to import functionsand call the colfunction from there.

一种解决方法是从那里导入functions并调用该col函数。

for example:

例如：

from pyspark.sql import functions as F
df.select(F.col("my_column"))

Answer 4

回答by Thomas

As explained above, pyspark generates some of its functions on the fly, which makes that most IDEs cannot detect them properly. However, there is a python package pyspark-stubsthat includes a collection of stub files such that type hints are improved, static error detection, code completion, ... By just installing with

如上所述，pyspark 会动态生成一些函数，这使得大多数 IDE 无法正确检测它们。但是，有一个 python 包pyspark-stubs，其中包含一组存根文件，以便改进类型提示、静态错误检测、代码完成，...只需安装

pip install pyspark-stubs==x.x.x

(where x.x.x has to be replaced with your pyspark version (2.3.0. in my case for instance)), coland other functions will be detected, without changing anything at your code for most IDEs (Pycharm, Visual Studio Code, Atom, Jupyter Notebook, ...)

（其中 xxx 必须替换为您的 pyspark 版本（例如在我的情况下为 2.3.0）），col并且将检测其他功能，而无需更改大多数 IDE（Pycharm、Visual Studio Code、Atom、Jupyter）的代码中的任何内容笔记本， ...）

Answer 5

回答by AEDWIP

I ran into a similar problem trying to set up a PySpark development environment with Eclipse and PyDev. PySpark uses a dynamic namespace. To get it to work I needed to add PySpark to "force Builtins" as below.

我在尝试使用 Eclipse 和 PyDev 设置 PySpark 开发环境时遇到了类似的问题。PySpark 使用动态命名空间。为了让它工作，我需要将 PySpark 添加到“强制内置”中，如下所示。

Forced builtins

强制内置函数

Python 在pyspark中找不到col函数

提问by Bamqf

采纳答案by zero323

回答by Dmytro

回答by Vincent Claes

回答by Thomas

回答by AEDWIP

相关推荐

最近更新

标签

Python 在pyspark中找不到col函数

提问by Bamqf

采纳答案by zero323

回答by Dmytro

回答by Vincent Claes

回答by Thomas

回答by AEDWIP

相关推荐

Python 和 Anaconda 之间的混淆

使用 python 测试机器人框架工作中测试套件中每个测试用例的设置和拆卸

Python 如何使用 discord.py 获取不和谐用户的用户 ID

返回python中字符串的最后一个字符

相关推荐

最近更新

标签