About Harsh J

Harsh J · ‎08-29-2016

HBase replication is an integral part of HBase itself (vs. provided by CM BDR as a feature), and can be used directly by following http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_bdr_hbase_replication.html. So yes, you can use it on your Cloudera Express installation. HDFS and Hive replication are BDR provided features instead, and cannot be configured on a source running Cloudera Express. Does this clarification help?

Harsh J · ‎08-24-2016

Adding onto @dice's post, this WARN does not impair any current functionality your HDFS is performing. It can be ignored until you are able to grab the bug-fix via the update to 5.7.2 or higher. See also the past community topic on the same question: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/quot-Report-from-the-DataNode-datanodeUuid-is-unsorted-quot/m-p/41943#M2188

Harsh J · ‎08-23-2016

A snapshot in HBase is a function of some captured metadata and associated archived HFiles covering the data. If either of these were to be permanently lost, then the snapshot would be inconsistent. There is a snapshot info utility you can invoke on snapshot name directories to have it examine the contents: ~> hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot "NAME OF SNAPSHOT" (For more, run with -help)

Harsh J · ‎08-22-2016

> 1) Does spark read data from Oracle in driver or executors? It would read it from the executor-run task. The read is done via an RDD. > 2) If Spark reads Oracle in executors, how does the source data importing is split among different executors? That is, how each executor know which part of data is should read? There's a partitioning mechanism which if left unspecified then it will read the whole query/table inside a single task. This is what would happen in the provided trivial example above. If you do specify full partition attributes in the read function, as documented over http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases, then it will generate appropriate WHERE clauses and read them from multiple tasks in parallel with the partition and its bounds applied to divide each task properly. Agreed that Sqoop specialises the DB reads overall much more than Spark does at this point (especially so when specialised connectors are involved), but for very simple ad-hoc usage where code tie-in is required, Spark JDBC may be used. In all other use-cases it seems like the wrong tool to use for the sole purpose of a DB import or export given the lack of specialisations.

Harsh J · ‎08-22-2016

Can you re-check what JDK/JRE version gets picked up when you run it on the cluster, vs. when you're using your developer machine's IDE? Some older Linux distributions may have installed GNU Java version that's very outdated and may not carry this specific class amongst others, and if your app runs with such an older version then the error may be seen. If you are using 'hadoop jar' to run the application, try exporting an explicit JAVA_HOME that points to an Oracle JDK7 or JDK8 install path before running the command to have it pick up the intended JVM. Alternatively you may be missing the right crypo extensions installed on your JDK.

Harsh J · ‎08-17-2016

Just a correction to my typo above, the right config field for the skipACL switch is not the "Environment Advanced Configuration Snippet" but is instead is "Java Configuration Options for Zookeeper Server" in CM -> ZooKeeper -> Configuration page.

Harsh J · ‎08-16-2016

Thanks, could you also illustrate what you provide as env-vars, and why they are required? The standard Kerberos setup documentation of Ubuntu does not require any preset env-vars. The ldd difference is what is driving the problem, and from the looks of it there may be multiple kerberos libraries installed on the system, but am uncertain how it has ended up that way. If you resolve the library trouble, CM would be able to run the command normally.

Harsh J · ‎08-15-2016

What do you specifically mean by "setting all the kerberos environment" - do you mean you need to use some environment variables before invoking commands? Yes CM invokes the script as the cloudera-scm user but that shouldn't matter in terms of running the command. Somehow the right libs are not being used when CM runs kadmin, but when you run it directly it does appear to load the right ones. Perhaps you can run ldd on the kadmin binary from the script and outside and try to compare them.

Harsh J · ‎08-15-2016

The cause for crashes would be unrelated to this observance. I'd recommend starting a new topic by posting the logs of the service that crashes for you, specifically the earliest FATAL message it produces before it aborts, if there is one.

Harsh J · ‎08-15-2016

The HMaster has two RPC handler pools, one for priority tasks, the other for regular ones. The regular ones see activity regularly and are represented by the section "B.defaultRpcServer" and you'll find that these continually process client or RS calls made to them. The priority ones, on the other hand, only see limited activity relating to certain specific work, and will mostly be idle depending on your cluster's situation. It is normal to see a very large "Waiting since" value on the priority handlers, as they are used only in special situations. Some more bits for your information: - "Waiting for a call" means that the thread is idle and is waiting for some work to flow in. - The "queue=X" value here is a queue identifier associated to the specific handler thread ID (there are multiple parallel queues that the handlers feed themselves from) and not the size of queue. These waiting statuses on the HMaster UI only appear if you switch the Tasks tabs to show all monitored tasks or all RPC tasks regardless of state. If you're interested in finding actual calls, use the "Show Active RPC Calls" tab under it instead, and you may be able to catch some of the calls that regularly come in.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Can Hbase replication feature be used in Cloud...

Re: name node log full of WARN Please update the D...

Re: Is hbase snapshot valid after hfiles get delet...

Re: Sqoop to write JSON in HDFS

Re: LoginException: Algorithm HmacMD5 not availabl...

Re: how to remove a node in zookeeper, forcibly ?

Re: [CDH 5.8 Kerberos] Generate Missing Credential...

Re: [CDH 5.8 Kerberos] Generate Missing Credential...

Re: hbase PriorityRpcServer.handler Waiting for a ...

Re: hbase PriorityRpcServer.handler Waiting for a ...