About aervits

aervits · ‎12-21-2016

Simba driver is available as of HDP 2.5 with additional certification from Simba that it works with Kerberos. Here's the official documentation guide for the ODBC driver https://hortonworks.com/wp-content/uploads/2016/08/phoenix-ODBC-guide.pdf

aervits · ‎12-21-2016

To drive my point home here's more http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce And http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig

aervits · ‎12-21-2016

You can try your use case using Pig and built-in Split function as you'll benefit from underlying query plan optimizations and Tez execution engine compared to pure mapreduce implementation http://pig.apache.org/docs/r0.16.0/basic.html#SPLIT It might be a much more worthwhile investment in your case

aervits · ‎12-21-2016

Generally to control output format from reducer you'd use multiple output class https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html You get best results by writing larger files, not sure what you benefit from splitting a dataset that fits your criteria into smaller chunks, job won't complete until all of the criteria is addressed and in fact I think you'll hurt performance by splitting what is by design a better approach.

aervits · ‎12-21-2016

If Constantin's awesome answer helped you, please accept the answer to close this thread, otherwise provide your solution or follow up questions for more clarity

aervits · ‎12-21-2016

@Edgar Daeds In Apache Zepplin 0.7 there will be an Apache Beam Interpreter that by default will have Java REPL. You can use Beam API to work with Spark, Flink, Mapreduce and Google Dataflow https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/beam.html

aervits · ‎12-21-2016

Thank you Sergey, I wish I could accept both answers. Voting up!

aervits · ‎12-20-2016

Thanks Ted, what about the rest?

aervits · ‎12-20-2016

Can someone explain the decision to include the following jars in the HDP distribution for HBase? Is this for compatibility? Please provide some technical background on the decision. -rw-r--r-- 1 user user 790250 Nov 15 19:41 netty-3.2.4.Final.jar -rw-r--r-- 1 user user 1779991 Nov 15 18:11 netty-all-4.0.23.Final.jar -rw-r--r-- 1 user user 132368 Nov 15 18:12 servlet-api-2.5-6.1.14.jar -rw-r--r-- 1 user user 105112 Nov 15 18:05 servlet-api-2.5.jar

aervits · ‎12-20-2016

@Rishit shah can you follow my suggestion and install hive and hcat client tools on the flume nodes? It will dynamically link the jars to proper locations. Just need you to confirm.

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: How to connect to HBase using Tableau? Does Ho...

Re: How to set No of output file per reducer in Cu...

Re: How to set No of output file per reducer in Cu...

Re: How to set No of output file per reducer in Cu...

Re: How to implement "connect By" of ORACLE in Hiv...

Re: Zeppelin Java code in paragraph (NullPointerEx...

Re: HBase jars with conflicting versions?

Re: HBase jars with conflicting versions?

HBase jars with conflicting versions?

Re: getting error while submitting flume with hive...