Member since
10-30-2016
20
Posts
15
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1561 | 07-09-2017 06:56 PM | |
3420 | 02-08-2017 03:54 PM | |
923 | 01-04-2017 04:05 PM |
07-13-2017
09:20 AM
It is possible if the website publishes their streaming data via a public API and if you implement a custom Flume source to ingest that. In case of Twitter there is an API for that but you have to pay to use it. In case of quora or blogger I am not sure if it exists. An option could be to write code that reads RSS feeds and writes that to disk or hdfs but to do this you do not need Flume.
... View more
07-09-2017
06:56 PM
Flume does not have website scraping capabilities. One might guess that HttpSource can be used for tasks like this but HttpSource is just a http server running in Flume. You can push data to it and not the other way around. Check IMDB site, you can download data from Amazon S3 but you have to pay the data transfer fee: http://www.imdb.com/interfaces
... View more
04-19-2017
12:40 PM
You can edit flume.conf directly and the running agent will reconfigure itself without restart. The default location of the configuration file is: /etc/flume/conf/{agent_name}/flume.conf. However these changes will not be visible in Ambari and next time you restart Flume from Ambari then it will overwrite your manual changes with the stale config.
... View more
02-08-2017
03:54 PM
5 Kudos
Execute the import command from bash. Seems like you were in hbase shell.
... View more
01-04-2017
04:05 PM
2 Kudos
You have to send an array of JSONEvents otherwise the handler will fail to deserialize the events. An event must have at least a body and the body must be a string. You can also add optional headers. See the event specification in the user guide. import requests
import json
a = [{'body': 'my 1st event data'}, {'body': 'my 2nd event data'}]
requests.post('http://localhost:44444', data=json.dumps(a))
You can also use GET method but still have to specify data to send.
... View more