Member since
11-20-2015
226
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
87976 | 05-11-2018 12:26 PM | |
43830 | 08-26-2016 08:52 AM |
11-09-2020
09:02 AM
CDH does not support the keyring credential cache. https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s4_kerb_wizard.html#concept_irl_x5y_l4
... View more
11-09-2020
08:56 AM
To create a table in this way, there are two steps: CREATE TABLE ... LOAD DATA INPATH ... The first statement creates the table schema within Hive, and the second directive tells Hive to move the data from the source HDFS directory into the Hive HDFS table directory /user/joe/sales.csv => /user/hive/warehouse/sales/sales.csv The move operation occurs as the 'hive' user, so in order for this to complete, the 'hive' user must have access to perform this move operation in HDFS. Ensure that the 'hive' user has the correct permissions to move this file into the final location. (Impala, but a lot of overlap with Hive) https://docs.cloudera.com/documentation/enterprise/6/latest/topics/impala_load_data.html Also please note that latest version is 6.3.4 and has lots of benefits over 6.0. https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html
... View more
04-11-2020
09:16 PM
I was working on something unrelated, but I hit this same error, detailed the issue in Jira, and have proposed a workaround. The issue is that there is a feature in Hive called the REGEX Column Specification. IMHO this feature was ill conceived and is not standard SQL. It should be removed from Hive and this issue is yet another reason why. That's what I was working on when I hit this issue. When Hive looks at the table name surrounded by back ticks, it looks at that string and determines that it is a Regex. When Hive looks at the table name surrounded by quotes, it looks at that string and determines that it is a Table Name. The basic rule it uses is "most anything ASCII surrounded by back ticks is a Regex." However, when Hive sees the quotes, it sees the string as a table name. Using quotes (and technically back ticks too, but that's clearly broken) around table names can be allowed/disallowed with a feature in Hive called "hive.support.quoted.identifiers". This feature is enabled in the user's HS2 session by default. However, when performing masking, it is a multi step process: The query is parsed by HS2 The masking is applied The query is parsed again by HS2 The first parsing attempt respects the hive.support.quoted.identifiers configuration and allows a query with quotes to be parsed. However, the masking code does not pass this configuration information to the parser on the second attempt. And oddly enough, if the configuration information is not passed along, the parser will consider this feature to be disabled. So, it's actually on the second pass that it fails because the parser rejects the quotes. For the record, I hit this issue when I removed the Regex feature, because it forced all quoted strings to be considered table names (and subjected to this feature being enabled/disabled) instead of sneaking by as being considered a Regex. All the masking unit tests failed. https://issues.apache.org/jira/browse/HIVE-23182 https://issues.apache.org/jira/browse/HIVE-23176
... View more
10-22-2019
01:42 PM
Another option for large data sets, if ordering doesn't matter, is to create an EXTERNAL table with the necessary delimiters and issue an INSERT statement into the table instead of a SELECT statement in beeline. To copy the data locally, issue a: hdfs dfs -cat /my/table/* "Order doesn't matter' because the cat application will not necessarily read the files in proper order. If an ORDER BY is included in the query, the contents of each file will be in order, but the files may be read out of order by the 'cat' application. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html#cat
... View more
10-22-2019
01:21 PM
Also, just wanted to point out that, depending on the version of Hive being used, may default to buffering within the beeline client. Be sure to enable 'incremental' fetches of data from the Hive server when dealing with large result sets. --incremental=[true/false] Defaults to true from Hive 2.3 onwards, before it defaulted to false. When set to false, the entire result set is fetched and buffered before being displayed, yielding optimal display column sizing. When set to true, result rows are displayed immediately as they are fetched, yielding lower latency and memory usage at the price of extra display column padding. Setting --incremental=true is recommended if you encounter an OutOfMemoryException. https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
... View more
05-10-2019
02:02 PM
By default, in Hive, Parquet files are not written with compression enabled. https://issues.apache.org/jira/browse/HIVE-11912 However, writing files with Impala into a Parquet table will create files with internal Snappy compression (by default).
... View more
04-15-2019
10:36 AM
I also just ran into this issue. The way I solved it was to install the latest MySQL JDBC Driver as described here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_mysql.html#cmig_topic_5_5_3 This action must be performed on the node(s) with the Hive Metastore role installed.
... View more
08-02-2018
06:49 AM
This process will become easier in a future version of CDH. https://issues.apache.org/jira/browse/HIVE-19899
... View more
05-11-2018
12:26 PM
This regression was introduced into the product in CDH 5.9.2 [HIVE-13864] and it was addressed in CDH 5.11.2, CDH 5.12.1 or CDH 5.13.0 and higher. [HIVE-17050]
... View more
02-11-2018
06:50 PM
We recommend using the JsonSerDe that comes with Hive. https://github.com/apache/hive/blob/3972bf05159581d6aa515ba5dd9e75d59ac62a45/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe You will have to install the JAR file into the Hive auxiliary directory. The JAR file is hive-hcatalog-core.jar and can be found in several places within the CDH distribution. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_hive_udf.html
... View more