Member since
11-04-2015
220
Posts
31
Kudos Received
28
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
343 | 09-07-2023 01:08 AM | |
818 | 04-17-2023 02:41 AM | |
538 | 04-03-2023 02:42 AM | |
852 | 03-28-2023 06:43 AM | |
801 | 03-06-2023 04:30 AM |
11-08-2023
01:27 AM
Hi @HadoopHero , For Hive, if there is a single reduce task to write the output data it will not break it up the output file into smaller files, that's expected and cannot be configured to behave in a different way. With DISTRIBUTE BY you should be able to achieve to have multiple reducers (if you have a column by which you can "split" your data reasonably into smaller subsets), see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy Best regards Miklos
... View more
11-08-2023
01:17 AM
1 Kudo
To add to the point of @ggangadharan, there are lots of good articles/posts why the float and even the double datatype has these problems. Note that this is not Hive / Hadoop or Java specific. https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency https://dzone.com/articles/never-use-float-and-double-for-monetary-calculatio https://www.red-gate.com/hub/product-learning/sql-prompt/the-dangers-of-using-float-or-real-datatypes Miklos
... View more
10-30-2023
02:40 AM
Hi @cl99 , yes, seems there is a 50 MB limit for the max rpc message size in the CDH 6.3.2 version. https://github.com/apache/impala/blob/branch-3.2.0/be/src/kudu/rpc/transfer.cc#L39 This error is likely the result of the unsafe flag you have turned on. Best regards Miklos
... View more
10-11-2023
08:42 AM
Hi @JKarount , To close the loop, this has been resolved in the latest Cloudera Impala ODBC driver 2.7.0, see the Resolved Issues section in the Release Notes: https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-7-0/Release-Notes-Impala-ODBC.pdf "[IMP-946][02795738] The connector does not generate the last COALESCE parameter from the ELSE expression in the CASE statement." You can download it from: https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html I hope this will help you to implement your usecases as expected. Best regards Miklos Szurap Customer Operations Engineer, Cloudera
... View more
10-02-2023
01:56 AM
Hi @andrea_pretotto Additionally, you can also choose to use PyODBC: https://pypi.org/project/pyodbc/ together with the Cloudera Hive ODBC drivers: https://cloudera.com/downloads/connectors/hive/odbc This should give the best compatibility with Hive. Best regards Miklos
... View more
09-07-2023
01:08 AM
1 Kudo
Hi @wcg_hdp_manager , Please review the Impala partitioning best practices guide: https://docs.cloudera.com/best-practices/latest/impala-partitioning/topics/bp-impala-partitioning-considerations.html And CDP 7.1.8 Impala partitioning guide: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/impala-reference/topics/impala-partition.html Do not partition your table if you do not have a good reason to do so. The number of records (100m) itself is not necessarily is a reason. You need to know what kind of queries you will have on your table (do you always know in WHERE clause one or more columns so Impala can take advantage of partition pruning? if not, then anyway the whole dataset might be scanned), how you ingest those (do you load new partitions each day? or some other factors?). Creating too many partitions will likely implicitly create too many small files instead of less but bigger files. Processing data which are in more datafiles is less efficient and you can put a stress on the HDFS NameNode (which needs to keep track of the many datafiles) if that is going to be a general trend. Hope this helps, Miklos
... View more
08-29-2023
06:11 AM
Based on the above, the HiveServer2 is not runnig. Please verify that it's running before trying to use it through Hue. How did you verify that the HS2 is running? How do you start it? Have you looked into the HiveServer2 logs?
... View more
08-28-2023
05:53 AM
Check if the HS2 runs on port 10000 or on 10001 ("ps -ef | grep HiveServer2" and with "netstat -tanp | grep <hs2pid>"). If it is running only on HTTP transport mode, then the port 10001 might be the only open port. In that case you need hive_server_http_port=10001
... View more
08-28-2023
01:02 AM
Is the Kerberos authentication enabled on the cluster? As the above comment suggests, you should use the fully-qualified domain name (FQDN) instead of IP address when Kerberos is enabled. Also note that the port 10001 should be the HTTP transport mode port. The "hive_server_port" on the other hand is for binary transport, and that should be port 10000 on the HS2. This is also explained in the hue.ini reference quoted before. # Binary thrift port for HiveServer2. ## hive_server_port=10000 # Http thrift port for HiveServer2. ## hive_server_http_port=10001 So please try the following rather: hive_server_port=10000
... View more
08-25-2023
05:52 AM
Hi @nysq_sq The above is not helpful to understand what went wrong. Can you check the Hue logs (runcpserver.log) and Hive (HiveServer2) logs? What configurations you've enabled in Hue to connect to Hive? (I assume the "beeswax" section was configured) Please see the hue.ini reference, what is the meaning of each config entry. https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini Best regards Miklos
... View more