About aervits

aervits · ‎09-11-2016

@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.

aervits · ‎09-11-2016

@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3E

aervits · ‎09-09-2016

I might try writing a UDF with custom counters, sounds like an interesting challenge

aervits · ‎09-09-2016

Yup, it's a choice of coding a few lines in Pig vs spending a couple of hours with Java.

aervits · ‎09-09-2016

Not the point, you execute COUNT on each filter condition, it's not efficient but does answer his question.

aervits · ‎09-09-2016

🙂 what if your filter statement is a multiple of OR and AND ?

aervits · ‎09-09-2016

go to /var/log/hadoop directory and navigate to secondary namenode log directory, start reviewing the logs there. If you have a custom location for your logs, you can find that in the Ambari configs section of the HDFS service. Once you find the log, feel free to post your errors.

aervits · ‎09-09-2016

I can't think of a way to do it in one shot in Pig, if I was to write a Mapreduce job for the task, I'd implement custom counter so with every filter, custom counter gets updated https://diveintodata.org/2011/03/15/an-example-of-hadoop-mapreduce-counter/ you can also write a UDF and update custom counters, I haven't tried it but it's worth a shot http://stackoverflow.com/questions/14748120/how-to-increment-hadoop-counters-in-jython-udfs-in-pig

aervits · ‎09-09-2016

@rich please see above

aervits · ‎09-09-2016

@Giuseppe Maldarizzi just heard back from engineering, also please look at ignoreZKOffsets parameter in place of forceFromStart, documentation will be updated. https://github.com/apache/storm/tree/master/external/storm-kafka#how-kafkaspout-stores-offsets-of-a-kafka-topic-and-recovers-in-case-of-failures

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: Can't import data via Sqoop cli with HCatalog

Re: How to Consume KAFKA messages using PIG ?

Re: Count values that are filtered - Apache PIG

Re: Count values that are filtered - Apache PIG

Re: Count values that are filtered - Apache PIG

Re: Count values that are filtered - Apache PIG

Re: Secondary NameNode Host has 4 service problems...

Re: Count values that are filtered - Apache PIG

Re: HDPCD Exam Environment issues

Re: "ForceFromStart" option in Storm-Kafka 1.0