java 我应该在 hadoop 上的每个操作之前调用 ugi.checkTGTAndReloginFromKeytab() 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34616676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop?
提问by Jan Zyka
In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. I'm using various components like the HDFS file system, Oozie, Hive etc. On the application startup I do call
在我的服务器应用程序中,我从我的 Java 应用程序连接到 Kerberos 安全的 Hadoop 集群。我正在使用各种组件,如 HDFS 文件系统、Oozie、Hive 等。在应用程序启动时,我会调用
UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );
This returns me UserGroupInformation
instance and I keep it for application lifetime. When doing privileged action I launch them with ugi.doAs(action)
.
这会返回我的UserGroupInformation
实例,我将它保留在应用程序生命周期内。在执行特权操作时,我使用ugi.doAs(action)
.
This works fine but I wonder if and when should I renew the kerberos ticket in UserGroupInformation
? I found a method UserGroupInformation.checkTGTAndReloginFromKeytab()
which seems to do the ticket renewal whenever it's close to expiry. I also found that this method is being called by various Hadoop tools like WebHdfsFileSystem
for example.
这工作正常,但我想知道我是否以及何时应该更新 kerberos 票证UserGroupInformation
?我找到了一种方法UserGroupInformation.checkTGTAndReloginFromKeytab()
,它似乎在接近到期时进行机票更新。我还发现此方法正在被各种 Hadoop 工具调用WebHdfsFileSystem
,例如。
Now if I want my server application (possibly running for months or even years) to never experience ticket expiry what is the best approach? To provide concrete questions:
现在,如果我希望我的服务器应用程序(可能运行数月甚至数年)永远不会遇到票证到期,最好的方法是什么?提供具体问题:
- Can I rely on the various Hadoop clients they call
checkTGTAndReloginFromKeytab
whenever it's needed? - Should I call ever
checkTGTAndReloginFromKeytab
myself in my code? - If so should I do that before every single call to
ugi.doAs(...)
or rather setup a timer and call it periodically (how often)?
- 我可以依赖他们
checkTGTAndReloginFromKeytab
在需要时调用的各种 Hadoop 客户端吗? - 我应该
checkTGTAndReloginFromKeytab
在我的代码中调用我自己吗? - 如果是这样,我应该在每次调用之前这样做
ugi.doAs(...)
还是设置一个计时器并定期调用它(多久一次)?
回答by Chris Nauroth
Hadoop committer here! This is an excellent question.
Hadoop 提交者在这里!这是一个很好的问题。
Unfortunately, it's difficult to give a definitive answer to this without a deep dive into the particular usage patterns of the application. Instead, I can offer general guidelines and describe when Hadoop would handle ticket renewal or re-login from a keytab automatically for you, and when it wouldn't.
不幸的是,如果不深入了解应用程序的特定使用模式,就很难对此给出明确的答案。相反,我可以提供一般指南并描述 Hadoop 何时会自动为您处理票证续订或从密钥表重新登录,何时不会。
The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Most of the daemon processes in the Hadoop ecosystem handle this by doing a single one-time call to UserGroupInformation#loginUserFromKeytab
at process startup. Examples of this include the HDFS DataNode, which must authenticate its RPC calls to the NameNode, and the YARN NodeManager, which must authenticate its calls to the ResourceManager. How is it that daemons like the DataNode can do a one-time login at process startup and then keep on running for months, long past typical ticket expiration times?
Hadoop 生态系统中 Kerberos 身份验证的主要用例是 Hadoop 的 RPC 框架,它使用 SASL 进行身份验证。Hadoop 生态系统中的大多数守护进程通过UserGroupInformation#loginUserFromKeytab
在进程启动时执行单个一次性调用来处理此问题。这方面的示例包括 HDFS DataNode,它必须验证其对 NameNode 的 RPC 调用,以及 YARN NodeManager,它必须验证其对 ResourceManager 的调用。像 DataNode 这样的守护进程如何在进程启动时进行一次性登录,然后继续运行数月,远远超过典型的票证到期时间?
Since this is such a common use case, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure
method:
由于这是一个如此常见的用例,Hadoop 直接在 RPC 客户端层内部实现了自动重新登录机制。此代码在 RPCClient#handleSaslConnectionFailure
方法中可见:
// try re-login
if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
} else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
}
You can think of this as "lazy evaluation" of re-login. It only re-executes login in response to an authentication failure on an attempted RPC connection.
您可以将其视为重新登录的“懒惰评估”。它仅重新执行登录以响应尝试的 RPC 连接上的身份验证失败。
Knowing this, we can give a partial answer. If your application's usage pattern is to login from a keytab and then perform typical Hadoop RPC calls, then you likely do not need to roll your own re-login code. The RPC client layer will do it for you. "Typical Hadoop RPC" means the vast majority of Java APIs for interacting with Hadoop, including the HDFS FileSystem
API, the YarnClient
and MapReduce Job
submissions.
知道了这一点,我们可以给出部分答案。如果您的应用程序的使用模式是从密钥表登录,然后执行典型的 Hadoop RPC 调用,那么您可能不需要滚动自己的重新登录代码。RPC 客户端层将为您完成。“典型的 Hadoop RPC”是指绝大多数用于与 Hadoop 交互的 Java API,包括 HDFS FileSystem
API、YarnClient
和 MapReduceJob
提交。
However, some application usage patterns do not involve Hadoop RPC at all. An example of this would be applications that interact solely with Hadoop's REST APIs, such as WebHDFSor the YARN REST APIs. In that case, the authentication model uses Kerberos via SPNEGO as described in the Hadoop HTTP Authenticationdocumentation.
但是,一些应用程序使用模式根本不涉及 Hadoop RPC。这方面的一个示例是仅与 Hadoop 的 REST API 交互的应用程序,例如WebHDFS或YARN REST API。在这种情况下,身份验证模型通过 SPNEGO 使用 Kerberos,如 Hadoop HTTP 身份验证文档中所述。
Knowing this, we can add more to our answer. If your application's usage pattern does not utilize Hadoop RPC at all, and instead sticks solely to the REST APIs, then you must roll your own re-login logic. This is exactly why WebHdfsFileSystem
calls UserGroupInformation#checkTGTAndReloginFromkeytab
, just like you noticed. WebHdfsFileSystem
chooses to make the call right before every operation. This is a fine strategy, because UserGroupInformation#checkTGTAndReloginFromkeytab
only renews the ticket if it's "close" to expiration.Otherwise, the call is a no-op.
知道了这一点,我们可以在答案中添加更多内容。如果您的应用程序的使用模式根本不使用 Hadoop RPC,而是仅使用 REST API,那么您必须推出自己的重新登录逻辑。这正是WebHdfsFileSystem
调用 的UserGroupInformation#checkTGTAndReloginFromkeytab
原因,就像您注意到的那样。 WebHdfsFileSystem
选择在每次操作之前拨打电话。这是一个很好的策略,因为UserGroupInformation#checkTGTAndReloginFromkeytab
只有在“接近”到期时才更新票证。否则,呼叫是空操作。
As a final use case, let's consider an interactive process, not logging in from a keytab, but rather requiring the user to run kinit
externally before launching the application. In the vast majority of cases, these are going to be short-running applications, such as Hadoop CLI commands. However, in some cases these can be longer-running processes. To support longer-running processes, Hadoop starts a background thread to renew the Kerberos ticket "close" to expiration. This logic is visible in UserGroupInformation#spawnAutoRenewalThreadForUserCreds
. There is an important distinction here though compared to the automatic re-login logic provided in the RPC layer. In this case, Hadoop only has the capability to renew the ticket and extend its lifetime. Tickets have a maximum renewable lifetime, as dictated by the Kerberos infrastructure. After that, the ticket won't be usable anymore. Re-login in this case is practically impossible, because it would imply re-prompting the user for a password, and they likely walked away from the terminal. This means that if the process keeps running beyond expiration of the ticket, it won't be able to authenticate anymore.
作为最后一个用例,让我们考虑一个交互式过程,不是从 keytab 登录,而是要求用户kinit
在启动应用程序之前在外部运行。在绝大多数情况下,这些将是短期运行的应用程序,例如 Hadoop CLI 命令。但是,在某些情况下,这些可能是运行时间更长的进程。为了支持更长时间运行的进程,Hadoop 启动了一个后台线程来更新“接近”到期的 Kerberos 票证。这个逻辑可见于UserGroupInformation#spawnAutoRenewalThreadForUserCreds
. 与 RPC 层提供的自动重新登录逻辑相比,这里有一个重要的区别。在这种情况下,Hadoop 只能更新票据并延长其生命周期。根据 Kerberos 基础结构的规定,票证具有最长的可更新生命周期。在那之后,票将不再可用。在这种情况下重新登录实际上是不可能的,因为这意味着重新提示用户输入密码,他们很可能离开了终端。这意味着如果进程在票证到期后继续运行,它将无法再进行身份验证。
Again, we can use this information to inform our overall answer. If you rely on a user to login interactively via kinit
before launching the application, and if you're confident the application won't run longer than the Kerberos ticket's maximum renewable lifetime, then you can rely on Hadoop internals to cover periodic renewal for you.
同样,我们可以使用这些信息来告知我们的整体答案。如果您依赖用户kinit
在启动应用程序之前以交互方式登录,并且您确信应用程序的运行时间不会超过 Kerberos 票证的最大可更新生命周期,那么您可以依靠 Hadoop 内部结构为您进行定期更新。
If you're using keytab-based login, and you're just not sure if your application's usage pattern can rely on the Hadoop RPC layer's automatic re-login, then the conservative approach is to roll your own. @SamsonScharfrichter gave an excellent answer here about rolling your own.
如果您使用基于 keytab 的登录,并且您只是不确定您的应用程序的使用模式是否可以依赖于 Hadoop RPC 层的自动重新登录,那么保守的方法是使用您自己的方式。@SamsonScharfrichter 在这里给出了一个很好的答案,关于滚动你自己的。
HBase Kerberos connection renewal strategy
Finally, I should add a note about API stability. The Apache Hadoop Compatibility
guidelines discuss the Hadoop development community's commitment to backwards-compatibility in full detail. The interface of UserGroupInformation
is annotated LimitedPrivate
and Evolving
. Technically, this means the API of UserGroupInformation
is not considered public, and it could evolve in backwards-incompatible ways. As a practical matter, there is a lot of code already depending on the interface of UserGroupInformation
, so it's simply not feasible for us to make a breaking change. Certainly within the current 2.x release line, I would not have any fear about method signatures changing out from under you and breaking your code.
最后,我应该添加一个关于 API 稳定性的注释。该Apache Hadoop Compatibility
指南详细讨论了 Hadoop 开发社区对向后兼容性的承诺。的接口UserGroupInformation
带有注释LimitedPrivate
和Evolving
。从技术上讲,这意味着 的 APIUserGroupInformation
不被视为公开的,并且它可能以向后不兼容的方式发展。实际上,已经有很多代码依赖于 的接口UserGroupInformation
,因此我们进行重大更改根本不可行。当然,在当前的 2.x 发行版中,我不会担心方法签名会从您身下改变并破坏您的代码。
Now that we have all of this background information, let's revisit your concrete questions.
现在我们已经掌握了所有这些背景信息,让我们重新审视您的具体问题。
Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?
我可以依赖他们在需要时调用 checkTGTAndReloginFromKeytab 的各种 Hadoop 客户端吗?
You can rely on this if your application's usage pattern is to call the Hadoop clients, which in turn utilize Hadoop's RPC framework. You cannot rely on this if your application's usage pattern only calls the Hadoop REST APIs.
如果您的应用程序的使用模式是调用 Hadoop 客户端,而后者又会使用 Hadoop 的 RPC 框架,那么您可以依赖它。如果您的应用程序的使用模式仅调用 Hadoop REST API,则您不能依赖于此。
Should I call ever checkTGTAndReloginFromKeytab myself in my code?
我应该在我的代码中自己调用 checkTGTAndReloginFromKeytab 吗?
You'll likely need to do this if your application's usage pattern is solely to call the Hadoop REST APIs instead of Hadoop RPC calls. You would not get the benefit of the automatic re-login implemented inside Hadoop's RPC client.
如果您的应用程序的使用模式只是调用 Hadoop REST API 而不是 Hadoop RPC 调用,那么您可能需要这样做。您不会从 Hadoop 的 RPC 客户端内部实现的自动重新登录中受益。
If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?
如果是这样,我应该在每次调用 ugi.doAs(...) 之前这样做,还是设置一个计时器并定期调用它(多久一次)?
It's fine to call UserGroupInformation#checkTGTAndReloginFromKeytab
right before every action that needs to be authenticated. If the ticket is not close to expiration, then the method will be a no-op. If you're suspicious that your Kerberos infrastructure is sluggish, and you don't want client operations to pay the latency cost of re-login, then that would be a reason to do it in a separate background thread. Just be sure to stay a little bit ahead of the ticket's actual expiration time. You might borrow the logic inside UserGroupInformation
for determining if a ticket is "close" to expiration. In practice, I've never personally seen the latency of re-login be problematic.
可以UserGroupInformation#checkTGTAndReloginFromKeytab
在需要验证的每个操作之前调用。如果票据未接近到期,则该方法将是空操作。如果您怀疑您的 Kerberos 基础设施运行缓慢,并且您不希望客户端操作支付重新登录的延迟成本,那么这将是在单独的后台线程中执行此操作的一个理由。只要确保比票证的实际到期时间提前一点。您可能会借用内部逻辑UserGroupInformation
来确定票证是否“接近”到期。在实践中,我个人从未见过重新登录的延迟有问题。