About David M.

David M. · ‎02-11-2018

Is there text data in this result set? Are there new lines? This can trip up Hive and Hue. Try changing the file format to a binary format. set hive.query.result.fileformat=SequenceFile; <query>

David M. · ‎02-11-2018

I am sorry, but I do not understand the requirement. However, perhaps you are looking for the 'explode' UDF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode(array)

David M. · ‎02-11-2018

Also, we at Cloudera are partial to the Apache Parquet format: https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cdh_ig_parquet.html

David M. · ‎02-11-2018

It could be a few things. However, a detail log message should be available in the Spark History Server / YARN Resource Manager UI when you click on the failed job. The error will be in one of the Executor logs. 1// Invalid JSON You could have some invalid JSON that is failing to parse. Hive will not skip erroneous records, it will simply fail the entire job. 2// Not Installing the SerDe This can be confusing for users, but have you installed the JSON Serde into the Hive auxiliary directory? The file that contains this JSON Serde class is: hive-hcatalog-core.jar It can be found in several places in the CDH distribution. It needs to be installed into the Hive auxilliary directory and the HiveServer2 instances subsquently need to be restarted. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_hive_udf.html

David M. · ‎02-11-2018

You may set the character set used by Hive, for a given table, with the Table SerDe property "serialization.encoding". Take a look at the follow JIRA for an example on how to use it: https://issues.apache.org/jira/browse/HIVE-12653 If you would like to use the MultiDelimitSerDe class, referenced in HIVE-12653, this serialization feature is available starting in CDH 5.10. https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_510.html The valid Character Sets are discussed in the following link. In particular, take a look at the "Standard charsets." https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html

David M. · ‎01-09-2018

Since this is an external table (EXTERNAL_TABLE), Hive will not keep any stats on the table since it is assumed that another application is changing the underlying data at will. Why keep stats if we can't trust that the data will be the same in another 5 minutes? For a managed (non-external) table, data is manipulated through Hive SQL statements (LOAD DATA, INSERT, etc.) so the Hive system will know about any changes to the underlying data and can update the stats accordingly. Using the HDFS utilities to check the directory file sizes will give you the most accurate answer.

David M. · ‎01-09-2018

The CDH distribution of Hive does not support transactions (HIVE-5317). Currently, transaction support in Hive is an experimental feature that only works with the ORC file format. Cloudera recommends using the Parquet file format, which works across many tools. Merge updates in Hive tables using existing functionality, including statements such as INSERT, INSERT OVERWRITE, and CREATE TABLE AS SELECT. https://www.cloudera.com/documentation/enterprise/latest/topics/hive_ingesting_and_querying_data.html#hive_transaction_support If you require these features, please inquire about Apache Kudu. Kudu is storage for fast analytics on fast data—providing a combination of fast inserts and updates alongside efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer. https://www.cloudera.com/products/open-source/apache-hadoop/apache-kudu.html

David M. · ‎01-09-2018

If performing an ADD JAR statement in the HQL file, please reconsider and install the JAR into HiveServer2 as a permanent UDF. https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_mc_hive_udf.html https://issues.apache.org/jira/browse/HADOOP-13809 https://issues.apache.org/jira/browse/HIVE-11681

David M. · ‎01-09-2018

Foolbear, Since this is a Hive2 Action, and the job is connecting through JDBC, the following configuration is probably superfluous and should be removed. All UDF interactions are done through HiveServer2 and are hidden from the client. <param>hiveUDFJarPath=${ciUDFJarPath}</param> Also remove any references to UDFs in <file>${hiveConfDir}/hive-site.xml#hive-site.xml</file> Thanks.

David M. · ‎12-19-2017

The way things are implemented, a MapJoin optimization will always use local task operation. If you would like to remove all instances of local tasks, you will have to disable MapJoins. Please examine these two explain plans (first with MapJoin enabled, second with disabled) | STAGE PLANS: | | Stage: Stage-5 | | Map Reduce Local Work | | Alias -> Map Local Tables: | | s07 | | Fetch Operator | | limit: -1 | | Alias -> Map Local Operator Tree: | | s07 | | TableScan | | alias: s07 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | HashTable Sink Operator | | keys: | | 0 code (type: string) | | 1 code (type: string) | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: s07 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | Reduce Output Operator | | key expressions: code (type: string) | | sort order: + | | Map-reduce partition columns: code (type: string) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | value expressions: description (type: string), salary (type: int) | | TableScan | | alias: s08 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 442 Data size: 46069 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE | | Reduce Output Operator | | key expressions: code (type: string) | | sort order: + | | Map-reduce partition columns: code (type: string) | | Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE | | value expressions: salary (type: int) | We can see that the first one uses "Map Reduce Local Work" and the second one does not. set hive.auto.convert.join=false; https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties This can be important becaue I'm seeing a case where the Local Job Runners are leaking the log file output from these Local Job Runners into the HS2's /tmp directory in the following format: /tmp/hive_20171219184242_3ecaf468-51c7-4ced-99b3-6bd9eaaa980a.log Disable the MapJoin optimization and these log files are not generated.

Online	Offline
Last Visited	‎12-24-2021 10:57 AM

Member Since	‎11-20-2015 11:40 AM
Last Visited	‎12-24-2021 10:57 AM
Posts	226
Kudos received	9

Cloudera Community

Re: Compiling statement: FAILED: ParseException : ...

Re: How to use Binary Data Type in Hive

Re: HUE - HIVE question

Re: array?

Re: return code 3 from org.apache.hadoop.hive.ql.e...

Re: return code 3 from org.apache.hadoop.hive.ql.e...

Re: Case and accent insensitive ?

Re: Can we check size of Hive tables? If so - how?

Re: Update and Delete are not working in Hive ?

Re: java.lang.IllegalStateException(zip file close...

Re: java.lang.IllegalStateException(zip file close...

Re: HiveServer2 - disable local task execution