Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2129 | 07-09-2019 12:53 AM | |
| 12449 | 06-23-2019 08:37 PM | |
| 9561 | 06-18-2019 11:28 PM | |
| 10526 | 05-23-2019 08:46 PM | |
| 4895 | 05-20-2019 01:14 AM |
09-14-2017
10:43 AM
I followed this artical (http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html) and change below parameters. Maximum Number of HStoreFiles Compaction : 20 HStore Blocking Store Files : 200 HBase Memstore Block Multiplier : 4 HBase Memstore Flush Size : 256 Hope this helps. Thanks, Chathuri
... View more
09-08-2017
01:16 AM
I was able to make the job run by adding hive-exec jar in HADOOP_CLASSPATH as well as adding the jar in distributed cache. Can you throw some light as to why do we need to export the jar to classpath and also add in distributed cache.
... View more
09-06-2017
12:10 AM
How are you invoking your job? Do you use 'hadoop jar …' to invoke your jar, or are you triggering it with a more raw 'java -cp …' style CLI? If the latter, ensure you also pass the directory '/etc/hadoop/conf/' as an early element on your -cp/CLASSPATH-env. Also ensure your submitting host has a YARN+MR2 gateway deployed on it: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny_jk__section_zjt_fwz_xk
... View more
09-05-2017
11:59 PM
Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own. Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java#L201 If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require. There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
... View more
08-26-2017
09:46 PM
1 Kudo
You may only use the -Dname=value form if your main class implements the Tool interface and gets invoked via the ToolRunner utility. Check the Tool javadoc example and model your implementation around it: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/util/Tool.html
... View more
08-16-2017
03:07 AM
Great Suggestion! Thanks for that! But i would like to ask one question that, if i want to have a struct or array fields in target, then how i should transform the mysql data, so that it will fit in HCatalog schema. The need here is to just have nested data from other collection instead of foreign key representation. Currently we are using sqoop import only, and we are trying to modify the query so that it will be accepted by hcat schema. Thanks & Regards, Mahendra
... View more
08-14-2017
10:58 AM
@Harsh J For my special case where the hadoop nodes uptime was 1200 days and the servers with old centos versions, restarted the servers took the inodes down from 88% to 10%.
... View more
08-10-2017
03:35 AM
Hi, You must place the connector jar inside the shared folder for oozie which is hdfs: hdfs://user/oozie/sqoop/lib/lib<instance>/
... View more
08-01-2017
11:05 PM
Hi Harsh, i am using cloudera express 5.5, Is it possible to restore hdfs backup created manually(using crontab).
... View more
07-31-2017
09:34 AM
Should I pass the /etc/hosts (all nodes on the cluster including the edge node , name node , data node) file in the java code instead of getting it from the host I am connecting to (edge node)
... View more