About mszurap

mszurap · ‎10-02-2023

Hi @andrea_pretotto Additionally, you can also choose to use PyODBC: https://pypi.org/project/pyodbc/ together with the Cloudera Hive ODBC drivers: https://cloudera.com/downloads/connectors/hive/odbc This should give the best compatibility with Hive. Best regards Miklos

mszurap · ‎09-07-2023

Hi @wcg_hdp_manager , Please review the Impala partitioning best practices guide: https://docs.cloudera.com/best-practices/latest/impala-partitioning/topics/bp-impala-partitioning-considerations.html And CDP 7.1.8 Impala partitioning guide: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/impala-reference/topics/impala-partition.html Do not partition your table if you do not have a good reason to do so. The number of records (100m) itself is not necessarily is a reason. You need to know what kind of queries you will have on your table (do you always know in WHERE clause one or more columns so Impala can take advantage of partition pruning? if not, then anyway the whole dataset might be scanned), how you ingest those (do you load new partitions each day? or some other factors?). Creating too many partitions will likely implicitly create too many small files instead of less but bigger files. Processing data which are in more datafiles is less efficient and you can put a stress on the HDFS NameNode (which needs to keep track of the many datafiles) if that is going to be a general trend. Hope this helps, Miklos

mszurap · ‎08-29-2023

Based on the above, the HiveServer2 is not runnig. Please verify that it's running before trying to use it through Hue. How did you verify that the HS2 is running? How do you start it? Have you looked into the HiveServer2 logs?

mszurap · ‎08-28-2023

Check if the HS2 runs on port 10000 or on 10001 ("ps -ef | grep HiveServer2" and with "netstat -tanp | grep <hs2pid>"). If it is running only on HTTP transport mode, then the port 10001 might be the only open port. In that case you need hive_server_http_port=10001

mszurap · ‎08-28-2023

Is the Kerberos authentication enabled on the cluster? As the above comment suggests, you should use the fully-qualified domain name (FQDN) instead of IP address when Kerberos is enabled. Also note that the port 10001 should be the HTTP transport mode port. The "hive_server_port" on the other hand is for binary transport, and that should be port 10000 on the HS2. This is also explained in the hue.ini reference quoted before. # Binary thrift port for HiveServer2. ## hive_server_port=10000 # Http thrift port for HiveServer2. ## hive_server_http_port=10001 So please try the following rather: hive_server_port=10000

mszurap · ‎08-25-2023

Hi @nysq_sq The above is not helpful to understand what went wrong. Can you check the Hue logs (runcpserver.log) and Hive (HiveServer2) logs? What configurations you've enabled in Hue to connect to Hive? (I assume the "beeswax" section was configured) Please see the hue.ini reference, what is the meaning of each config entry. https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini Best regards Miklos

mszurap · ‎08-24-2023

Hi @nysq_sq , Sqoop 1 is a client only solution, it submits Sqoop jobs (MR jobs) to the YARN cluster in order to import/export data from/to RDBMS databases. See the docs: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html Sqoop 2 was an initiative to make the same functionality work as a client-server architecture, to be able to define "stored jobs", etc... Sqoop 2 is not recommended for a long time and was discontinued long ago. Sqoop 1 was still maintained for some time, however it is also a retired Apache project, still Cloudera supports it in CDP/CDH 7.x versions. We would advise to use only Sqoop 1. From Hue you can use Sqoop1 actions (without sqoop2), see https://gethue.com/importing-data-from-traditional-databases-into-hdfshive-in-just-a-few-clicks/ Cheers, Miklos

mszurap · ‎07-18-2023

Hi @pragz Not sure how you usually start HMS, but I would start with something like: ./bin/start-metastore >/var/tmp/hms-stdout.log 2>/var/tmp/hms-stderr.log or nohup ./bin/start-metastore >/var/tmp/hms-stdout.log 2>/var/tmp/hms-stderr.log & to start in the background. With that you should have a separate stdout and and stderr logfile. If a process suddenly exits, usually it is because the OS killed it, so look in dmesg or messages if there is a sign for the oomkiller. Also check your $HIVE_CONF_DIR/hive-log4j.properties is there and verify where the ordinary log4j based HMS logging messages are going. Hope this helps. Miklos

mszurap · ‎06-23-2023

Hi @Choolake , sorry, maybe it is not clear only for me, have you executed the beeline command alone? What do you exactly get when you do so? beeline -u "jdbc:hive2://<lb_or_hs2_hostname>:10000/default;principal=hive/<lbhostname_if_lb_is_enabled_otherwise_hs2_hostname>@REALM.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks" -e 'SELECT count(*) from table_name; ' just to understand if you get any errors. Then repeat with redirecting the stderr to /dev/null beeline -u "jdbc:hive2://<lb_or_hs2_hostname>:10000/default;principal=hive/<lbhostname_if_lb_is_enabled_otherwise_hs2_hostname>@REALM.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks" -e 'SELECT count(*) from table_name; ' 2>/dev/null Thanks Miklos

mszurap · ‎04-20-2023

Hi Jeremy, Sure, you're welcome. We also thank you for the report, I've forwarded it to our driver team. Best regards Miklos

Online	Offline
Last Visited	‎12-10-2024 10:10 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎12-10-2024 10:10 AM
Posts	260
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Error connecting to Kerberos using pyHive libr...

Re: Best practices for partition tables in Impala ...

Re: Error while connecting Hive to Hue

Re: Error while connecting Hive to Hue

Re: Error while connecting Hive to Hue

Re: Error while connecting Hive to Hue

Re: How to connect sqoop with Hue

Re: Standalone Metastore shuts unexpectedly and ca...

Re: Unable to get the record count into the variab...

Re: straight SELECT and SELECT via CTE produce dif...