Member since
11-20-2015
226
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
88190 | 05-11-2018 12:26 PM | |
43957 | 08-26-2016 08:52 AM |
02-11-2018
06:43 PM
Is there text data in this result set? Are there new lines? This can trip up Hive and Hue. Try changing the file format to a binary format. set hive.query.result.fileformat=SequenceFile;
<query>
... View more
02-11-2018
06:38 PM
I am sorry, but I do not understand the requirement. However, perhaps you are looking for the 'explode' UDF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode(array)
... View more
02-11-2018
06:27 PM
Also, we at Cloudera are partial to the Apache Parquet format: https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cdh_ig_parquet.html
... View more
02-11-2018
06:23 PM
It could be a few things. However, a detail log message should be available in the Spark History Server / YARN Resource Manager UI when you click on the failed job. The error will be in one of the Executor logs. 1// Invalid JSON You could have some invalid JSON that is failing to parse. Hive will not skip erroneous records, it will simply fail the entire job. 2// Not Installing the SerDe This can be confusing for users, but have you installed the JSON Serde into the Hive auxiliary directory? The file that contains this JSON Serde class is: hive-hcatalog-core.jar It can be found in several places in the CDH distribution. It needs to be installed into the Hive auxilliary directory and the HiveServer2 instances subsquently need to be restarted. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_hive_udf.html
... View more
02-11-2018
06:08 PM
You may set the character set used by Hive, for a given table, with the Table SerDe property "serialization.encoding". Take a look at the follow JIRA for an example on how to use it: https://issues.apache.org/jira/browse/HIVE-12653 If you would like to use the MultiDelimitSerDe class, referenced in HIVE-12653, this serialization feature is available starting in CDH 5.10. https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_510.html The valid Character Sets are discussed in the following link. In particular, take a look at the "Standard charsets." https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
... View more
01-09-2018
12:00 PM
Since this is an external table (EXTERNAL_TABLE), Hive will not keep any stats on the table since it is assumed that another application is changing the underlying data at will. Why keep stats if we can't trust that the data will be the same in another 5 minutes? For a managed (non-external) table, data is manipulated through Hive SQL statements (LOAD DATA, INSERT, etc.) so the Hive system will know about any changes to the underlying data and can update the stats accordingly. Using the HDFS utilities to check the directory file sizes will give you the most accurate answer.
... View more
01-09-2018
09:29 AM
1 Kudo
The CDH distribution of Hive does not support transactions (HIVE-5317). Currently, transaction support in Hive is an experimental feature that only works with the ORC file format. Cloudera recommends using the Parquet file format, which works across many tools. Merge updates in Hive tables using existing functionality, including statements such as INSERT, INSERT OVERWRITE, and CREATE TABLE AS SELECT. https://www.cloudera.com/documentation/enterprise/latest/topics/hive_ingesting_and_querying_data.html#hive_transaction_support If you require these features, please inquire about Apache Kudu. Kudu is storage for fast analytics on fast data—providing a combination of fast inserts and updates alongside efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer. https://www.cloudera.com/products/open-source/apache-hadoop/apache-kudu.html
... View more
01-09-2018
09:21 AM
If performing an ADD JAR statement in the HQL file, please reconsider and install the JAR into HiveServer2 as a permanent UDF. https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_mc_hive_udf.html https://issues.apache.org/jira/browse/HADOOP-13809 https://issues.apache.org/jira/browse/HIVE-11681
... View more
01-09-2018
09:13 AM
Foolbear, Since this is a Hive2 Action, and the job is connecting through JDBC, the following configuration is probably superfluous and should be removed. All UDF interactions are done through HiveServer2 and are hidden from the client. <param>hiveUDFJarPath=${ciUDFJarPath}</param>
Also remove any references to UDFs in <file>${hiveConfDir}/hive-site.xml#hive-site.xml</file> Thanks.
... View more
12-19-2017
06:43 PM
The way things are implemented, a MapJoin optimization will always use local task operation. If you would like to remove all instances of local tasks, you will have to disable MapJoins. Please examine these two explain plans (first with MapJoin enabled, second with disabled) | STAGE PLANS: |
| Stage: Stage-5 |
| Map Reduce Local Work |
| Alias -> Map Local Tables: |
| s07 |
| Fetch Operator |
| limit: -1 |
| Alias -> Map Local Operator Tree: |
| s07 |
| TableScan |
| alias: s07 |
| filterExpr: code is not null (type: boolean) |
| Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: code is not null (type: boolean) |
| Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE |
| HashTable Sink Operator |
| keys: |
| 0 code (type: string) |
| 1 code (type: string) | | STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: s07 |
| filterExpr: code is not null (type: boolean) |
| Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: code is not null (type: boolean) |
| Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: code (type: string) |
| sort order: + |
| Map-reduce partition columns: code (type: string) |
| Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE |
| value expressions: description (type: string), salary (type: int) |
| TableScan |
| alias: s08 |
| filterExpr: code is not null (type: boolean) |
| Statistics: Num rows: 442 Data size: 46069 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: code is not null (type: boolean) |
| Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: code (type: string) |
| sort order: + |
| Map-reduce partition columns: code (type: string) |
| Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE |
| value expressions: salary (type: int) | We can see that the first one uses "Map Reduce Local Work" and the second one does not. set hive.auto.convert.join=false; https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties This can be important becaue I'm seeing a case where the Local Job Runners are leaking the log file output from these Local Job Runners into the HS2's /tmp directory in the following format: /tmp/hive_20171219184242_3ecaf468-51c7-4ced-99b3-6bd9eaaa980a.log Disable the MapJoin optimization and these log files are not generated.
... View more
- « Previous
-
- 1
- 2
- Next »