Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1963 | 07-09-2019 12:53 AM | |
| 11819 | 06-23-2019 08:37 PM | |
| 9103 | 06-18-2019 11:28 PM | |
| 10061 | 05-23-2019 08:46 PM | |
| 4480 | 05-20-2019 01:14 AM |
02-28-2016
07:10 AM
There's no current way to do this today, aside of scripting it by using the regular SHOW GRANT commands and then parsing the output into a file and then into a table.
... View more
02-28-2016
01:58 AM
1 Kudo
You can run an EXPLAIN on a query to see how Hive would plan to run the query (how many phases). This will help you get a sense of 'how many jobs' or something close to it. Your query is invalid in HiveQL, but with GROUP BY statements further added for col1 and col2 to make it legal, it would take a single job.
... View more
02-28-2016
01:29 AM
The Hive "Streaming" feature is built upon its unsupported [1] transactional features: https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs. Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1]. But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this. [1] - http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html, specific quote: """ Hive ACID is not supported Hive ACID is an experimental feature and Cloudera does not currently support it. """ [2] - https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L320, etc.
... View more
02-27-2016
09:35 PM
1 Kudo
Hive provides a skip header/footer feature when creating your table (as part of table properties). See the release notes on https://issues.apache.org/jira/browse/HIVE-5795 """ CREATE TABLE testtable (name STRING, message STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES ("skip.header.line.count"="1"); LOAD DATA LOCAL INPATH '/tmp/header-inclusive-file.csv' INTO TABLE testtable; """
... View more
02-22-2016
02:29 AM
There was a difference in the amount of average load. Since it's computed when we do the command, it may vary.. I forgot to mention one question , what is the difference in the information that I get in the commands status and status 'replication'?
... View more
02-18-2016
01:03 AM
1 Kudo
You can get your live RegionServer IDs with startcodes included via the HBase Shell command: status 'simple' An output line from this, such as the below: host.cloudera.com:60020 1455726247381 Can then be converted into the right format: host.cloudera.com,22101,1455726247381
... View more
02-17-2016
09:12 PM
CDH 5.4 had Spark 1.3.0 plus patches, which per the blog post seems like it would not work either (it quotes "strong dependency", which I take means ONLY 1.4.1?). CDH 5.5.x onwards carries Spark 1.5.x with patches. There has been no CDH5 release with Spark 1.4.x in it. You could use a Apache Spark 1.4.1 release from upstream, manually rebuilt against your CDH5 version of Apache Hadoop, and use the tar-ball paths for all Spark operations, and this should work. However, such a Spark deployment would not be officially supported by Cloudera Support (if you have a subscription).
... View more
02-05-2016
12:10 AM
Thanks for the answer, We are going to look increasing block size to 1GB+ to reduce the NN heap size. All data will reside in hbase, any other solutions other than seperating the cluster into physical partitions?
... View more
01-28-2016
05:39 PM
Thank you all for your time, logical workaround sounds good to me.
... View more