About mbigelow

mbigelow · ‎06-12-2017

It is by design. The impala-shell scripts calls out that it will look in the current working directory. Either workaround it or submit a request or submit your change back to the Apache Impala project.

mbigelow · ‎06-08-2017

Better question: why are you putting python udfs in /usr/bin or /opt/cloudera/parcels/CDH/lib/impala-shell/?

mbigelow · ‎06-08-2017

I am not a python expert here but I believe this is intended behavior as the import itself will look in the working directory prior to checking Python home, etc. Yep confirmed in the Python docs. Well it is Python 3; should still be correct for Python 2.6 or 2.7. https://docs.python.org/3/tutorial/modules.html#the-module-search-path

mbigelow · ‎06-07-2017

The Cloudera ODBC (Hive or Impala) drivers are made to allow you to connect into those services to run queries. They are not meant to transfer data between RDBMS and Hadoop/Hive. For that you will want to use sqoop. https://sqoop.apache.org/docs/1.4.6/ The error itself is just stating that the service at uslv-sdbx-ora02 on port 10000 (the default HiveServer2 port) refused the connection. This can be anything from Hive isn't running at that location or on a different port, or a firewall is blocking access, or there is something wrong with HS2 that would prevent clients from connecting to it. Please verify that the HiveServer2 process is running on that host and listening on that port.

mbigelow · ‎06-07-2017

Is size the amount after the merge? What was the average size before? How long did it take to run?

mbigelow · ‎06-06-2017

If CM doesn't have a setting you have to use the Advance Configuration Snippet. It isn't always easy to figure out which one to put the settings in. First, step is to search by the file that these go in, which I believe is the hdfs-site.xml. My guess for the two client setting, you will want to find the Gateway ACS (there may not be one specifically for the core-site.xml). The block report setting is specific to the Datanodes, so look for an ACS for the Datanode roles for the hdfs-site.xml file. If you do the service level ACS it will apply to all roles in the service. http://www.cloudera.com/documentation/manager/5-1-x/Cloudera-Manager-Managing-Clusters/cm5mc_config_snippet.html

mbigelow · ‎06-06-2017

Do you mean that HUE is installed on a master node? The HUE configs will contain an [impala] section and that will contain the Impala Daemon used by HUE. It will only connect to that ImpalaD.

mbigelow · ‎06-06-2017

1. I needs to be on of the Impala daemons. They will act as the connection manager, coordinator, and still execute queries. 2. The Impala daemon has its own ports and it has one specifically for clients like the impala-shell and ones coming over ODBC and JDBC. The latter should be 21050, by default. 3. CM and HUE have they own backend authentication that can be separate than CDH. Use the auth. mech. to match CDH. 4. SSL is only needed if Impala is configured to use SSL. Since you have CM access, you should be able to review the CM configs. I don't have the specific settings on hand but search for anything related to SSL, TLS, or Kerberos. You can also verify the port and the hostnames of the Impala daemons.

mbigelow · ‎06-06-2017

Can you run 'SHOW CURRENT ROLES;' and 'SHOW GRANT ROLE <role_name>;' and provide the output for the user creating the function? Is /data, in HDFS, and /tmp, in the local filesystem, in the Hive Aux or Reloadable Aux paths? If yes, did you restart HS2 (if it is in the Aux path) or run the reload command in beeline (if it is in the Reloadable Aux path)? You ran the grant statement on the HDFS path but not the local path. Refer to the UDF doc as it states that you must do both. The create function statement is also missing the USING JAR portion. You need to specify the jar path in it. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_hive_udf.html

mbigelow · ‎06-06-2017

The parquet-tools are available with CDH and I recommend using that one as it is built for your version of CDH. Check under /opt/cloudera/parcels/CDH/lib/parquet/parquet-tools.jar. The warning seems to indicate that the parquet tools are trying to use the short circuit feature to bypass the NN and DN, but it does seem to failback to normal block access methods after failing that. The actual error is too many open files. Try the ulimit -a or ulimit -Hn or ulimit -Sn. This will show the limits on the number of open files the logged in user can have. The default for RHEL/Centos has been 1024 for some time. You are trying to open 2500 files at once. Increase it to 2500+ or reduce the number of files you are trying to merge together at once.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Impala-shell encounters ImportError when pwd c...

Re: Impala-shell encounters ImportError when pwd c...

Re: Impala-shell encounters ImportError when pwd c...

Re: Cloudera Hive ODBC driver error

Re: error creating ShortCircuitReplica - Merge Pa...

Re: Hive fails due to not have enough number of re...

Re: Impala connection question

Re: ODBC connection to Impala failing. Error from ...

Re: SemanticException Error retrieving udf

Re: error creating ShortCircuitReplica - Merge Pa...