About David M.

David M. · ‎11-09-2020

CDH does not support the keyring credential cache. https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s4_kerb_wizard.html#concept_irl_x5y_l4

David M. · ‎11-09-2020

To create a table in this way, there are two steps: CREATE TABLE ... LOAD DATA INPATH ... The first statement creates the table schema within Hive, and the second directive tells Hive to move the data from the source HDFS directory into the Hive HDFS table directory /user/joe/sales.csv => /user/hive/warehouse/sales/sales.csv The move operation occurs as the 'hive' user, so in order for this to complete, the 'hive' user must have access to perform this move operation in HDFS. Ensure that the 'hive' user has the correct permissions to move this file into the final location. (Impala, but a lot of overlap with Hive) https://docs.cloudera.com/documentation/enterprise/6/latest/topics/impala_load_data.html Also please note that latest version is 6.3.4 and has lots of benefits over 6.0. https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html

David M. · ‎04-11-2020

I was working on something unrelated, but I hit this same error, detailed the issue in Jira, and have proposed a workaround. The issue is that there is a feature in Hive called the REGEX Column Specification. IMHO this feature was ill conceived and is not standard SQL. It should be removed from Hive and this issue is yet another reason why. That's what I was working on when I hit this issue. When Hive looks at the table name surrounded by back ticks, it looks at that string and determines that it is a Regex. When Hive looks at the table name surrounded by quotes, it looks at that string and determines that it is a Table Name. The basic rule it uses is "most anything ASCII surrounded by back ticks is a Regex." However, when Hive sees the quotes, it sees the string as a table name. Using quotes (and technically back ticks too, but that's clearly broken) around table names can be allowed/disallowed with a feature in Hive called "hive.support.quoted.identifiers". This feature is enabled in the user's HS2 session by default. However, when performing masking, it is a multi step process: The query is parsed by HS2 The masking is applied The query is parsed again by HS2 The first parsing attempt respects the hive.support.quoted.identifiers configuration and allows a query with quotes to be parsed. However, the masking code does not pass this configuration information to the parser on the second attempt. And oddly enough, if the configuration information is not passed along, the parser will consider this feature to be disabled. So, it's actually on the second pass that it fails because the parser rejects the quotes. For the record, I hit this issue when I removed the Regex feature, because it forced all quoted strings to be considered table names (and subjected to this feature being enabled/disabled) instead of sneaking by as being considered a Regex. All the masking unit tests failed. https://issues.apache.org/jira/browse/HIVE-23182 https://issues.apache.org/jira/browse/HIVE-23176

David M. · ‎10-22-2019

Another option for large data sets, if ordering doesn't matter, is to create an EXTERNAL table with the necessary delimiters and issue an INSERT statement into the table instead of a SELECT statement in beeline. To copy the data locally, issue a: hdfs dfs -cat /my/table/* "Order doesn't matter' because the cat application will not necessarily read the files in proper order. If an ORDER BY is included in the query, the contents of each file will be in order, but the files may be read out of order by the 'cat' application. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html#cat

David M. · ‎10-22-2019

Also, just wanted to point out that, depending on the version of Hive being used, may default to buffering within the beeline client. Be sure to enable 'incremental' fetches of data from the Hive server when dealing with large result sets. --incremental=[true/false] Defaults to true from Hive 2.3 onwards, before it defaulted to false. When set to false, the entire result set is fetched and buffered before being displayed, yielding optimal display column sizing. When set to true, result rows are displayed immediately as they are fetched, yielding lower latency and memory usage at the price of extra display column padding. Setting --incremental=true is recommended if you encounter an OutOfMemoryException. https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

David M. · ‎05-10-2019

By default, in Hive, Parquet files are not written with compression enabled. https://issues.apache.org/jira/browse/HIVE-11912 However, writing files with Impala into a Parquet table will create files with internal Snappy compression (by default).

David M. · ‎04-15-2019

I also just ran into this issue. The way I solved it was to install the latest MySQL JDBC Driver as described here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_mysql.html#cmig_topic_5_5_3 This action must be performed on the node(s) with the Hive Metastore role installed.

David M. · ‎08-02-2018

This process will become easier in a future version of CDH. https://issues.apache.org/jira/browse/HIVE-19899

David M. · ‎05-11-2018

This regression was introduced into the product in CDH 5.9.2 [HIVE-13864] and it was addressed in CDH 5.11.2, CDH 5.12.1 or CDH 5.13.0 and higher. [HIVE-17050]

David M. · ‎02-11-2018

We recommend using the JsonSerDe that comes with Hive. https://github.com/apache/hive/blob/3972bf05159581d6aa515ba5dd9e75d59ac62a45/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe You will have to install the JAR file into the Hive auxiliary directory. The JAR file is hive-hcatalog-core.jar and can be found in several places within the CDH distribution. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_hive_udf.html

Online	Offline
Last Visited	‎12-24-2021 10:57 AM

Member Since	‎11-20-2015 11:40 AM
Last Visited	‎12-24-2021 10:57 AM
Posts	226
Kudos received	9

Cloudera Community

Re: Compiling statement: FAILED: ParseException : ...

Re: How to use Binary Data Type in Hive

Re: Unable to access Hadoop CLI after enabling Ker...

Re: Hive doesn't load data while creating table th...

Re: Querying a Hive Table (via Hiveserver2) with C...

Re: How to dump the output to a file from Beeline?

Re: How to dump the output to a file from Beeline?

Re: Parquet table snappy compressed by default

Re: CDH6 installation - Failed to Validate Hive Me...

Re: Create Hive Table from JSON Files

Re: Compiling statement: FAILED: ParseException : ...

Re: Create Hive Table from JSON Files