About Harsh J

Harsh J · ‎02-28-2016

Lets say you want to execute "script.sh" 1. If you have script.sh inside your WF/lib/ path on HDFS, you just need <exec>script.sh</exec> 2. If you have script.sh on an arbitrary path on HDFS, you need: <exec>script.sh</exec> <file>/path/to/script.sh#script.sh</file> 3. Use of the below form with (1) is redundant, but the subsequent form is when you want to invoke it as a different name: <exec>script.sh</exec> <file>script.sh#script.sh</file> <exec>linked-script-name.sh</exec> <file>original-script-name.sh#linked-script-name.sh</file>

Harsh J · ‎02-28-2016

Note: CDH3 is long past its supported lifetime. Netezza JDBC should be worth trying on the CDH3 Sqoop version. I don't recall if it worked without a specialised connector, but the generic SQL connector should likely make it go through.

Harsh J · ‎02-28-2016

What's the DESCRIBE output of your avro_test table? If it includes a VOID column type, HCatalog currently does not support that.

Harsh J · ‎02-28-2016

There's no current way to do this today, aside of scripting it by using the regular SHOW GRANT commands and then parsing the output into a file and then into a table.

Harsh J · ‎02-28-2016

I don't see the tab character. There are lots of interleaved null characters in the file though, and the closest I can guess your delimiter to be is a double null byte sequence: \0\0. You may want to format the file right before use with Hive, if this is so. Something like the below in Python can do the trick for cleanup, for example, assuming your delimiter is indeed a double sequence of null bytes: data = data.replace('\0\0', '\t').replace('\0', '')

Harsh J · ‎02-28-2016

CDH Hive sources are available either via GitHub at https://github.com/cloudera/hive/tree/cdh5.4.5-release/, or in tarball form under http://archive.cloudera.com/cdh5/cdh/5/. You can use a "patch" command to apply the latest patch from the JIRA, and then use "mvn" to build the updated jars. If you are a Cloudera Enterprise subscriber, please log a case with Support for any patch requests instead. Custom-patching a component will render it unsupported. P.s. ACID features are currently not supported as a feature in CDH Hive: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html

Harsh J · ‎02-28-2016

You can run an EXPLAIN on a query to see how Hive would plan to run the query (how many phases). This will help you get a sense of 'how many jobs' or something close to it. Your query is invalid in HiveQL, but with GROUP BY statements further added for col1 and col2 to make it legal, it would take a single job.

Harsh J · ‎02-28-2016

The Hive "Streaming" feature is built upon its unsupported [1] transactional features: https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs. Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1]. But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this. [1] - http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html, specific quote: """ Hive ACID is not supported Hive ACID is an experimental feature and Cloudera does not currently support it. """ [2] - https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L320, etc.

Harsh J · ‎02-28-2016

The first CREATE TABLE specification looks correct to me, for your described file. Can you also double-inspect your /path/file.csv with "head -n1 /path/file.csv | od -c" command to ensure it does have the actual \t character between each field (vs. using a visual editor)?

Harsh J · ‎02-27-2016

Hive provides a skip header/footer feature when creating your table (as part of table properties). See the release notes on https://issues.apache.org/jira/browse/HIVE-5795 """ CREATE TABLE testtable (name STRING, message STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES ("skip.header.line.count"="1"); LOAD DATA LOCAL INPATH '/tmp/header-inclusive-file.csv' INTO TABLE testtable; """

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Oozie shell action: exec and file tags

Re: How can I import data from Netezza to my CDH 3...

Re: Using Pig to Load data from avro backed hive ...

Re: Capture SHOW PRIVILEGES into a TABLE

Re: csv to hive, data not correctly imported to co...

Re: How to apply the patch of HIVE-11981 to my clu...

Re: Number of MapReduce jobs for single Hive query

Re: Large hive metastore db size when using stream...

Re: csv to hive, data not correctly imported to co...

Re: Skipping Headers in Hive