Member since
11-04-2015
260
Posts
44
Kudos Received
33
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2515 | 05-16-2024 03:10 AM | |
1532 | 01-17-2024 01:07 AM | |
1551 | 12-11-2023 02:10 AM | |
2290 | 10-11-2023 08:42 AM | |
1572 | 09-07-2023 01:08 AM |
10-02-2023
01:56 AM
Hi @andrea_pretotto Additionally, you can also choose to use PyODBC: https://pypi.org/project/pyodbc/ together with the Cloudera Hive ODBC drivers: https://cloudera.com/downloads/connectors/hive/odbc This should give the best compatibility with Hive. Best regards Miklos
... View more
09-07-2023
01:08 AM
1 Kudo
Hi @wcg_hdp_manager , Please review the Impala partitioning best practices guide: https://docs.cloudera.com/best-practices/latest/impala-partitioning/topics/bp-impala-partitioning-considerations.html And CDP 7.1.8 Impala partitioning guide: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/impala-reference/topics/impala-partition.html Do not partition your table if you do not have a good reason to do so. The number of records (100m) itself is not necessarily is a reason. You need to know what kind of queries you will have on your table (do you always know in WHERE clause one or more columns so Impala can take advantage of partition pruning? if not, then anyway the whole dataset might be scanned), how you ingest those (do you load new partitions each day? or some other factors?). Creating too many partitions will likely implicitly create too many small files instead of less but bigger files. Processing data which are in more datafiles is less efficient and you can put a stress on the HDFS NameNode (which needs to keep track of the many datafiles) if that is going to be a general trend. Hope this helps, Miklos
... View more
08-29-2023
06:11 AM
Based on the above, the HiveServer2 is not runnig. Please verify that it's running before trying to use it through Hue. How did you verify that the HS2 is running? How do you start it? Have you looked into the HiveServer2 logs?
... View more
08-28-2023
05:53 AM
Check if the HS2 runs on port 10000 or on 10001 ("ps -ef | grep HiveServer2" and with "netstat -tanp | grep <hs2pid>"). If it is running only on HTTP transport mode, then the port 10001 might be the only open port. In that case you need hive_server_http_port=10001
... View more
08-28-2023
01:02 AM
Is the Kerberos authentication enabled on the cluster? As the above comment suggests, you should use the fully-qualified domain name (FQDN) instead of IP address when Kerberos is enabled. Also note that the port 10001 should be the HTTP transport mode port. The "hive_server_port" on the other hand is for binary transport, and that should be port 10000 on the HS2. This is also explained in the hue.ini reference quoted before. # Binary thrift port for HiveServer2. ## hive_server_port=10000 # Http thrift port for HiveServer2. ## hive_server_http_port=10001 So please try the following rather: hive_server_port=10000
... View more
08-25-2023
05:52 AM
Hi @nysq_sq The above is not helpful to understand what went wrong. Can you check the Hue logs (runcpserver.log) and Hive (HiveServer2) logs? What configurations you've enabled in Hue to connect to Hive? (I assume the "beeswax" section was configured) Please see the hue.ini reference, what is the meaning of each config entry. https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini Best regards Miklos
... View more
08-24-2023
06:41 AM
Hi @nysq_sq , Sqoop 1 is a client only solution, it submits Sqoop jobs (MR jobs) to the YARN cluster in order to import/export data from/to RDBMS databases. See the docs: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html Sqoop 2 was an initiative to make the same functionality work as a client-server architecture, to be able to define "stored jobs", etc... Sqoop 2 is not recommended for a long time and was discontinued long ago. Sqoop 1 was still maintained for some time, however it is also a retired Apache project, still Cloudera supports it in CDP/CDH 7.x versions. We would advise to use only Sqoop 1. From Hue you can use Sqoop1 actions (without sqoop2), see https://gethue.com/importing-data-from-traditional-databases-into-hdfshive-in-just-a-few-clicks/ Cheers, Miklos
... View more
07-18-2023
09:47 AM
Hi @pragz Not sure how you usually start HMS, but I would start with something like: ./bin/start-metastore >/var/tmp/hms-stdout.log 2>/var/tmp/hms-stderr.log or nohup ./bin/start-metastore >/var/tmp/hms-stdout.log 2>/var/tmp/hms-stderr.log & to start in the background. With that you should have a separate stdout and and stderr logfile. If a process suddenly exits, usually it is because the OS killed it, so look in dmesg or messages if there is a sign for the oomkiller. Also check your $HIVE_CONF_DIR/hive-log4j.properties is there and verify where the ordinary log4j based HMS logging messages are going. Hope this helps. Miklos
... View more
06-23-2023
08:22 AM
Hi @Choolake , sorry, maybe it is not clear only for me, have you executed the beeline command alone? What do you exactly get when you do so? beeline -u "jdbc:hive2://<lb_or_hs2_hostname>:10000/default;principal=hive/<lbhostname_if_lb_is_enabled_otherwise_hs2_hostname>@REALM.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks" -e 'SELECT count(*) from table_name; ' just to understand if you get any errors. Then repeat with redirecting the stderr to /dev/null beeline -u "jdbc:hive2://<lb_or_hs2_hostname>:10000/default;principal=hive/<lbhostname_if_lb_is_enabled_otherwise_hs2_hostname>@REALM.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks" -e 'SELECT count(*) from table_name; ' 2>/dev/null Thanks Miklos
... View more
04-20-2023
07:44 AM
Hi Jeremy, Sure, you're welcome. We also thank you for the report, I've forwarded it to our driver team. Best regards Miklos
... View more