About Harsh J

Harsh J · ‎07-27-2016

Thanks I'm certain you're hitting the same error as HADOOP-12559, given the AuthenticationException is coming at write-time, and from the client package that's used for HTTP work - indicating that the NN is unable to contact the KMS. You'll also likely observe this error only much after a NameNode restart period (but that it works immediately after NN restart), and that it may go away after one day or so, only to return again, which is inline with HADOOP-12559's behaviour within the NameNode. The bug-fix update of 5.5.x or any minor upgrade to the newer releases should solve this up.

Harsh J · ‎07-27-2016

Here's one example that uses the native hbase-spark module via DataFrames in PySpark: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Include-latest-hbase-spark-in-CDH/m-p/43236/highlight/true#M2280

Harsh J · ‎07-27-2016

Retry your command this way: ~> HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -put testkb.txt /data/fi Among other outputs, it should produce the fuller exception trace before it aborts with the same message.

Harsh J · ‎07-27-2016

You should be able to read HBase Spark connector data via DataFrames in Pyspark, via the sqlContext already today: ~> hbase shell > create 't', 'c' > put 't', '1', 'c:a', 'a column data' > put 't', '1', 'c:b', 'b column data' > exit ~> export SPARK_CLASSPATH=$(hbase classpath) ~> pyspark > hTbl = sqlContext.read.format('org.apache.hadoop.hbase.spark') .option('hbase.table','t') .option('hbase.columns.mapping', 'KEY_FIELD STRING :key, A STRING c:a, B STRING c:b') .option('hbase.use.hbase.context', False) .option('hbase.config.resources', 'file:///etc/hbase/conf/hbase-site.xml') .load() > hTbl.show() +---------+---------+---------+ |KEY_FIELD| A| B| +---------+---------+---------+ |1|a column data|b column data| +---------+---------+---------+ There are some limitations as the JIRA notes of course. Which specific missing feature are you looking for, just so we know the scope of request?

Harsh J · ‎07-26-2016

What version of CDH do you use? Can you share the full stack trace around the exception? Depending on your version and the stack trace you're most likely hitting the https://issues.apache.org/jira/browse/HADOOP-12559 described failure. This has been addressed in CDH 5.5.4 onwards for the 5.5.x line, and is also in all 5.6.x and 5.7.x and any future releases since then. An ACL setting failure would give you a different error, such as a 403 from a KMS.

Harsh J · ‎07-24-2016

Using a single quote around the value will help it get evaluated properly in shell. The & is otherwise taken as a token to fork the process. An ex. of quoting in shell: … --connect 'jdbc:mysql://IP/DB?zeroDateTimeBehavior= convertToull&useTimezone=true&serverTimezone=GMT' \ …

Harsh J · ‎07-22-2016

The POST command on the API you're using [1] requires passing the username and password as query parameters, not as a JSON object in the request body. Try something like this instead: ~> curl -X POST -u "admin:admin" -i 'http://localhost:7180/api/v11/cm/commands/importAdminCredentials?username=user/admin@REALM&password=your-password' The type of parameter expected is noted in the column on the table in the link above. You need to use JSON request bodies if and only if the POST description requires such a structure, for example for this request [2] (notice the two parameters, plus an additional request body data structure).

Harsh J · ‎07-07-2016

Copy is done to a temporary file, and then moved to the actual destination upon completion. There's no "merge", only move. This procedure is done to ensure partial file copies don't get leftover if the job fails or gets killed.

Harsh J · ‎07-07-2016

LazyOutputFormat is available for both APIs. Here's the one for the older API: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/mapred/lib/LazyOutputFormat.html

Harsh J · ‎07-06-2016

Block-level copies (with file merges) is not supported as a DistCp feature yet. However, you can use the -update options to do progressive copies - resuming upon the last failure.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Unable to upload new files to encrypted zone i...

Re: Using spark-hbase-connector Package with Pyspa...

Re: Unable to upload new files to encrypted zone i...

Re: Include latest hbase-spark in CDH

Re: Unable to upload new files to encrypted zone i...

Re: change sqoop metastore timezone to GMT

Re: importAdminCredentials - REST API

Re: Handling Distcp of large files between 2 clust...

Re: how to suppress mapper output files if the out...

Re: Handling Distcp of large files between 2 clust...