About Harsh J

Harsh J · ‎02-28-2016

There's no current way to do this today, aside of scripting it by using the regular SHOW GRANT commands and then parsing the output into a file and then into a table.

Harsh J · ‎02-28-2016

You can run an EXPLAIN on a query to see how Hive would plan to run the query (how many phases). This will help you get a sense of 'how many jobs' or something close to it. Your query is invalid in HiveQL, but with GROUP BY statements further added for col1 and col2 to make it legal, it would take a single job.

Harsh J · ‎02-28-2016

The Hive "Streaming" feature is built upon its unsupported [1] transactional features: https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs. Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1]. But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this. [1] - http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html, specific quote: """ Hive ACID is not supported Hive ACID is an experimental feature and Cloudera does not currently support it. """ [2] - https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L320, etc.

Harsh J · ‎02-27-2016

Hive provides a skip header/footer feature when creating your table (as part of table properties). See the release notes on https://issues.apache.org/jira/browse/HIVE-5795 """ CREATE TABLE testtable (name STRING, message STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES ("skip.header.line.count"="1"); LOAD DATA LOCAL INPATH '/tmp/header-inclusive-file.csv' INTO TABLE testtable; """

AlinaGHERMAN · ‎02-22-2016

There was a difference in the amount of average load. Since it's computed when we do the command, it may vary.. I forgot to mention one question , what is the difference in the information that I get in the commands status and status 'replication'?

Harsh J · ‎02-18-2016

You can get your live RegionServer IDs with startcodes included via the HBase Shell command: status 'simple' An output line from this, such as the below: host.cloudera.com:60020 1455726247381 Can then be converted into the right format: host.cloudera.com,22101,1455726247381

Harsh J · ‎02-17-2016

CDH 5.4 had Spark 1.3.0 plus patches, which per the blog post seems like it would not work either (it quotes "strong dependency", which I take means ONLY 1.4.1?). CDH 5.5.x onwards carries Spark 1.5.x with patches. There has been no CDH5 release with Spark 1.4.x in it. You could use a Apache Spark 1.4.1 release from upstream, manually rebuilt against your CDH5 version of Apache Hadoop, and use the tar-ball paths for all Spark operations, and this should work. However, such a Spark deployment would not be officially supported by Cloudera Support (if you have a subscription).

jrohland · ‎02-17-2016

Thanks for heads up, will be upgrading tomorrow.

scobanx · ‎02-05-2016

Thanks for the answer, We are going to look increasing block size to 1GB+ to reduce the NN heap size. All data will reside in hbase, any other solutions other than seperating the cluster into physical partitions?

naveen1 · ‎01-28-2016

Thank you all for your time, logical workaround sounds good to me.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Capture SHOW PRIVILEGES into a TABLE

Re: Number of MapReduce jobs for single Hive query

Re: Large hive metastore db size when using stream...

Re: Skipping Headers in Hive

Re: HBase status command meaning

Re: HBase - Get region start from one shell comman...

Re: How to downgrade Spark

Re: What happened to CDH5.5.2?

Re: Federation and hbase

Re: Is it possible to use trash in HDFS encryption...