Member since
10-30-2016
20
Posts
15
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
725 | 07-09-2017 06:56 PM | |
1806 | 02-08-2017 03:54 PM | |
491 | 01-04-2017 04:05 PM |
07-13-2017
09:20 AM
It is possible if the website publishes their streaming data via a public API and if you implement a custom Flume source to ingest that. In case of Twitter there is an API for that but you have to pay to use it. In case of quora or blogger I am not sure if it exists. An option could be to write code that reads RSS feeds and writes that to disk or hdfs but to do this you do not need Flume.
... View more
07-09-2017
06:56 PM
Flume does not have website scraping capabilities. One might guess that HttpSource can be used for tasks like this but HttpSource is just a http server running in Flume. You can push data to it and not the other way around. Check IMDB site, you can download data from Amazon S3 but you have to pay the data transfer fee: http://www.imdb.com/interfaces
... View more
06-15-2017
12:15 AM
No. Consumed files are either deleted or renamed to "originalname.COMPLETED".
... View more
04-19-2017
12:40 PM
You can edit flume.conf directly and the running agent will reconfigure itself without restart. The default location of the configuration file is: /etc/flume/conf/{agent_name}/flume.conf. However these changes will not be visible in Ambari and next time you restart Flume from Ambari then it will overwrite your manual changes with the stale config.
... View more
04-03-2017
10:35 AM
Please attach you Flume configuration (don't share your Twitter api key)
... View more
02-15-2017
03:19 PM
1 Kudo
There is no retry option in distributed shell client. Open ticket in Apache Jira:
YARN-815 - Add container failure handling to distributed-shell
... View more
02-11-2017
10:18 PM
@Wael Horchani That is strange. Maybe I was wrong and the table is there. Is security enabled? Are you using HDP, HDP Sandbox, CDH or vanilla Hadoop? Please edit the question to add details about your setup.
... View more
02-11-2017
07:38 PM
Link of the tutorial for the record: http://hortonworks.com/hadoop-tutorial/introduction-apache-hbase-concepts-apache-phoenix-new-backup-restore-utility-hbase/#section_4
... View more
02-11-2017
07:30 PM
1 Kudo
You get this error if you drop the backup table in hbase namespace. Check if you have the namespace and table: hbase(main):001:0> list_namespace
NAMESPACE
default
hbase
3 row(s) in 0.0220 seconds
hbase(main):002:0> list_namespace_tables 'hbase'
TABLE
acl
backup
meta
namespace
4 row(s) in 0.0280 seconds
If you have a backup of any table then you can run restore of that table and hbase will recreate the backup table: [hbase@sandbox ~]$ hadoop dfs -ls /user/hbase/backup
Found 2 items
drwxr-xr-x - hbase hdfs 0 2017-02-11 17:43 /user/hbase/backup/backup_1486835033442
drwxr-xr-x - hbase hdfs 0 2017-02-11 18:09 /user/hbase/backup/backup_1486836579046
[hbase@sandbox ~]$ hbase restore hdfs://sandbox.hortonworks.com:8020/user/hbase/backup/ backup_1486836579046 iemployee -overwrite
[...]2017-02-11 18:44:50,038 INFO [main] impl.RestoreClientImpl: Restore for [iemployee] are successful!
Or you can explicitly issue the create command. (describe 'hbase:backup' gives this definition but have to change TTL from 'FOREVER' to '2147483647'): create 'hbase:backup', {NAME => 'meta', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'session', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
... View more
02-08-2017
03:54 PM
5 Kudos
Execute the import command from bash. Seems like you were in hbase shell.
... View more
02-08-2017
12:40 PM
Do you have any exceptions when you run the above client? Try running the Flume agent with the extra option -Dflume.root.logger=DEBUG,console
... View more
02-06-2017
08:36 PM
Do you want to write a program which continuously polls a web server and writes that data to HDFS? If this is the case then you could add dependency of groupId: org.apache.hadoop, artifactId: hadoop-client and call append on HDFS FileSystem api directly (without the use of flume). Different approach would be to start an embedded flume agent inside your application. That way you do not have to setup the flume source but you could directly put events to the flume channel.
... View more
01-30-2017
01:50 PM
Please give some background like which tutorial, user guide or github repo you are following. Also attaching your flume conf file might help answering your question.
... View more
01-04-2017
04:05 PM
2 Kudos
You have to send an array of JSONEvents otherwise the handler will fail to deserialize the events. An event must have at least a body and the body must be a string. You can also add optional headers. See the event specification in the user guide. import requests
import json
a = [{'body': 'my 1st event data'}, {'body': 'my 2nd event data'}]
requests.post('http://localhost:44444', data=json.dumps(a))
You can also use GET method but still have to specify data to send.
... View more