Java 如何设置用于测试 Flume 设置的 HTTP 源?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18657548/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to setup a HTTP Source for testing Flume setup?
提问by Himanshu
I am a newbie to Flume and Hadoop. We are developing a BI module where we can store all the logs from different servers in HDFS.
我是 Flume 和 Hadoop 的新手。我们正在开发一个 BI 模块,我们可以将来自不同服务器的所有日志存储在 HDFS 中。
For this I am using Flume. I just started trying it out. Succesfully created a node but now I am willing to setup a HTTP source and a sink that will write incoming requests over HTTP to local file.
为此,我正在使用 Flume。我刚开始尝试。成功创建了一个节点,但现在我愿意设置一个 HTTP 源和一个接收器,它将通过 HTTP 将传入请求写入本地文件。
Any suggesstions?
有什么建议吗?
Thanks in Advance/
提前致谢/
采纳答案by BeanBagKing
Hopefully this helps you get started. I'm having some problems testing this on my machine and don't have time to fully troubleshoot it right now, but I'll get to that...
希望这可以帮助您入门。我在我的机器上测试时遇到了一些问题,现在没有时间完全排除故障,但我会解决这个问题...
Assuming you have Flume up and running right now, this should be what your flume.conf file needs to look like to use an HTTP POST source and local file sink (note: this goes to a local file, not HDFS)
假设您现在已经启动并运行 Flume,这应该是您的 flume.conf 文件需要的样子以使用 HTTP POST 源和本地文件接收器(注意:这将转到本地文件,而不是 HDFS)
########## NEW AGENT ##########
# flume-ng agent -f /etc/flume/conf/flume.httptest.conf -n httpagent
#
# slagent = SysLogAgent
###############################
httpagent.sources = http-source
httpagent.sinks = local-file-sink
httpagent.channels = ch3
# Define / Configure Source (multiport seems to support newer "stuff")
###############################
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 81
# Local File Sink
###############################
httpagent.sinks.local-file-sink.type = file_roll
httpagent.sinks.local-file-sink.channel = ch3
httpagent.sinks.local-file-sink.sink.directory = /root/Desktop/http_test
httpagent.sinks.local-file-sink.rollInterval = 5
# Channels
###############################
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 1000
Start Flume with the command on the second line. Tweak it for your needs (port, sink.directory, and rollInterval especially). This is a pretty bare minimum config file, there are more options availible, check out the Flume User Guide. Now, as far as this goes, the agent starts and runs fine for me....
使用第二行的命令启动 Flume。根据您的需要调整它(特别是端口、sink.directory 和 rollInterval)。这是一个非常简单的最小配置文件,还有更多可用选项,请查看 Flume 用户指南。现在,就这一点而言,代理启动并运行良好....
Here's what I don't have time to test. The HTTP agent, by default, accepts data in JSON format. You -should- be able to test this agent by sending a cURL request with a form something like this:
这是我没有时间测试的内容。默认情况下,HTTP 代理接受 JSON 格式的数据。您 - 应该 - 能够通过发送带有类似以下形式的 cURL 请求来测试此代理:
curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '{"username":"xyz","password":"123"}' http://yourdomain.com:81/
-X sets the request to POST, -H sends headers, -d sends data (valid json), and then the host:port. The problem for me is that I get an error:
-X 将请求设置为 POST,-H 发送标头,-d 发送数据(有效的 json),然后是主机:端口。我的问题是我收到一个错误:
WARN http.HTTPSource: Received bad request from client. org.apache.flume.source.http.HTTPBadRequestException: Request has invalid JSON Syntax.
in my Flume client, invalid JSON? So something is being sent wrong. The fact that an error is popping up though shows the Flume source is receiving data. Whatever you have that's POSTing should work as long as it's in a valid format.
在我的 Flume 客户端中,JSON 无效?所以有些东西被错误地发送了。出现错误的事实表明 Flume 源正在接收数据。无论您拥有什么,只要它采用有效的格式,POSTing 就应该可以工作。
回答by sandyyyy
Try this :
尝试这个 :
curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{"username":"xrqwrqwryzas","password":"12124sfsfsfas123"}]' http://yourdomain.com:81/
curl -X POST -H '内容类型:应用程序/json; charset=UTF-8' -d '[{"username":"xrqwrqwryzas","password":"12124sfsfsfas123"}]' http://yourdomain.com:81/
回答by josiah
It's a bit hard to tell exactly what you want from the way the Question is worded, but I'm operating on the assumption that you want to send JSON to Flume using HTTP POST requests and then have Flume dump those JSON events to HDFS (Not the Local File System). If that's what you want to do, this is what you need to do.
从问题的措辞方式很难确切地说出您想要什么,但我假设您想使用 HTTP POST 请求将 JSON 发送到 Flume,然后让 Flume 将这些 JSON 事件转储到 HDFS(不是本地文件系统)。如果这就是你想要做的,这就是你需要做的。
Make sure you create a directory in HDFS for Flume to send the events to, first. For example, if you want to send events to
/user/flume/events
in HDFS, you'll probably have to run the following commands:$ su - hdfs $ hdfs dfs -mkdir /user/flume $ hdfs dfs -mkdir /user/flume/events $ hdfs dfs -chmod -R 777 /user/flume $ hdfs dfs -chown -R flume /user/flume
Configure Flume to use an HTTP Source and an HDFS Sink. You'll want to make sure to add in the interceptors for Host and Timestamp, otherwise your events will cause exceptions in the HDFS Sink because that sink is expecting a Host and Timestamp in the Event Headers. Also make sure to expose the port on the server that the Flume HTTPSource is listening on.
Here's a sample Flume config that works for the Cloudera Quickstart Docker container for CDH-5.7.0
确保首先在 HDFS 中为 Flume 创建一个目录以将事件发送到。例如,如果您想将事件发送到
/user/flume/events
HDFS,您可能必须运行以下命令:$ su - hdfs $ hdfs dfs -mkdir /user/flume $ hdfs dfs -mkdir /user/flume/events $ hdfs dfs -chmod -R 777 /user/flume $ hdfs dfs -chown -R flume /user/flume
配置 Flume 以使用 HTTP 源和 HDFS 接收器。您需要确保为主机和时间戳添加拦截器,否则您的事件将导致 HDFS 接收器中的异常,因为该接收器期望事件标头中的主机和时间戳。还要确保公开 Flume HTTPSource 正在侦听的服务器上的端口。
这是适用于 CDH-5.7.0 的 Cloudera Quickstart Docker 容器的示例 Flume 配置
# Please paste flume.conf here. Example: # Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = source1 tier1.channels = channel1 tier1.sinks = sink1 tier1.sources.source1.interceptors = i1 i2 tier1.sources.source1.interceptors.i1.type = host tier1.sources.source1.interceptors.i1.preserveExisting = false tier1.sources.source1.interceptors.i1.hostHeader = host tier1.sources.source1.interceptors.i2.type = timestamp # For each source, channel, and sink, set # standard properties. tier1.sources.source1.type = http tier1.sources.source1.bind = 0.0.0.0 tier1.sources.source1.port = 5140 # JSONHandler is the default for the httpsource # tier1.sources.source1.handler = org.apache.flume.source.http.JSONHandler tier1.sources.source1.channels = channel1 tier1.channels.channel1.type = memory tier1.sinks.sink1.type = hdfs tier1.sinks.sink1.hdfs.path = /user/flume/events/%y-%m-%d/%H%M/%S tier1.sinks.sink1.hdfs.filePrefix = event-file-prefix- tier1.sinks.sink1.hdfs.round = false tier1.sinks.sink1.channel = channel1 # Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel. tier1.channels.channel1.capacity = 1000
# Please paste flume.conf here. Example: # Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = source1 tier1.channels = channel1 tier1.sinks = sink1 tier1.sources.source1.interceptors = i1 i2 tier1.sources.source1.interceptors.i1.type = host tier1.sources.source1.interceptors.i1.preserveExisting = false tier1.sources.source1.interceptors.i1.hostHeader = host tier1.sources.source1.interceptors.i2.type = timestamp # For each source, channel, and sink, set # standard properties. tier1.sources.source1.type = http tier1.sources.source1.bind = 0.0.0.0 tier1.sources.source1.port = 5140 # JSONHandler is the default for the httpsource # tier1.sources.source1.handler = org.apache.flume.source.http.JSONHandler tier1.sources.source1.channels = channel1 tier1.channels.channel1.type = memory tier1.sinks.sink1.type = hdfs tier1.sinks.sink1.hdfs.path = /user/flume/events/%y-%m-%d/%H%M/%S tier1.sinks.sink1.hdfs.filePrefix = event-file-prefix- tier1.sinks.sink1.hdfs.round = false tier1.sinks.sink1.channel = channel1 # Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel. tier1.channels.channel1.capacity = 1000
It's necessary to create a Flume Client that can send the JSON events to the Flume HTTP in the format that it expects (this client could be as simple as a
curl
request). The most important thing about the format is that the JSON"body":
key must have a value that is a String."body":
cannot be a JSON object - if it is, theGson
library that the FlumeJSONHandler
is using to parse the JSONEvents will throw exceptions because it won't be able to parse the JSON - it is expecting a String.This is the JSON format you need:
有必要创建一个 Flume 客户端,它可以将 JSON 事件以它期望的格式发送到 Flume HTTP(这个客户端可以像一个
curl
请求一样简单)。格式最重要的一点是 JSON"body":
键必须有一个 String值。"body":
不能是 JSON 对象 - 如果是,Gson
FlumeJSONHandler
用来解析 JSONEvents 的库将抛出异常,因为它无法解析 JSON - 它需要一个字符串。这是您需要的 JSON 格式:
[ { "headers": { "timestamp": "434324343", "host": "localhost", }, "body": "No matter what, this must be a String, not a list or a JSON object", }, { ... following events take the same format as the one above ...} ]
[ { "headers": { "timestamp": "434324343", "host": "localhost", }, "body": "No matter what, this must be a String, not a list or a JSON object", }, { ... following events take the same format as the one above ...} ]
Troubleshooting
故障排除
- If Flume is sending your client (such as Curl) 200 OK Success messages, but you don't see any files on HDFS, check the Flume logs. An issue I ran into early on was that my Flume Channel didn't have enough capacity and couldn't receive any events as a result. If that happens, the Channel or the HTTPSource will throw Exceptions that you will be able to see in the Flume Logs (probably in
/var/log/flume-ng/
). To fix this problem, increase thetier1.channels.channel1.capacity
. - If you see Exceptions in the Flume logs indicating either that Flume couldn't write to HDFS because of permissions, or because the destination directory couldn't be found, check to make sure you created the destination directory in HDFS and opened up its permissions as detailed in Step 1, above.
- 如果 Flume 正在向您的客户端(例如 Curl)发送 200 OK Success 消息,但您在 HDFS 上没有看到任何文件,请检查 Flume 日志。我早期遇到的一个问题是我的 Flume Channel 没有足够的容量,因此无法接收任何事件。如果发生这种情况,Channel 或 HTTPSource 将抛出异常,您将能够在 Flume 日志中看到(可能在 中
/var/log/flume-ng/
)。要解决此问题,请增加tier1.channels.channel1.capacity
. - 如果您在 Flume 日志中看到 Exceptions 表明 Flume 由于权限而无法写入 HDFS,或者因为找不到目标目录,请检查以确保您在 HDFS 中创建了目标目录并将其权限打开为在上面的步骤 1 中进行了详细说明。