Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3558 | 05-03-2017 05:13 PM | |
| 2933 | 05-02-2017 08:38 AM | |
| 3183 | 05-02-2017 08:13 AM | |
| 3146 | 04-10-2017 10:51 PM | |
| 1622 | 03-28-2017 02:27 AM |
09-11-2016
09:56 PM
2 Kudos
@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.
... View more
09-11-2016
09:30 PM
@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3E
... View more
09-09-2016
09:48 PM
I might try writing a UDF with custom counters, sounds like an interesting challenge
... View more
09-09-2016
09:45 PM
Yup, it's a choice of coding a few lines in Pig vs spending a couple of hours with Java.
... View more
09-09-2016
09:07 PM
Not the point, you execute COUNT on each filter condition, it's not efficient but does answer his question.
... View more
09-09-2016
08:50 PM
🙂 what if your filter statement is a multiple of OR and AND ?
... View more
09-09-2016
08:36 PM
go to /var/log/hadoop directory and navigate to secondary namenode log directory, start reviewing the logs there. If you have a custom location for your logs, you can find that in the Ambari configs section of the HDFS service. Once you find the log, feel free to post your errors.
... View more
09-09-2016
08:29 PM
I can't think of a way to do it in one shot in Pig, if I was to write a Mapreduce job for the task, I'd implement custom counter so with every filter, custom counter gets updated https://diveintodata.org/2011/03/15/an-example-of-hadoop-mapreduce-counter/ you can also write a UDF and update custom counters, I haven't tried it but it's worth a shot http://stackoverflow.com/questions/14748120/how-to-increment-hadoop-counters-in-jython-udfs-in-pig
... View more
09-09-2016
05:25 PM
@Giuseppe Maldarizzi just heard back from engineering, also please look at ignoreZKOffsets parameter in place of forceFromStart, documentation will be updated. https://github.com/apache/storm/tree/master/external/storm-kafka#how-kafkaspout-stores-offsets-of-a-kafka-topic-and-recovers-in-case-of-failures
... View more