Created on 01-29-2018 06:47 AM - edited 09-16-2022 05:47 AM
Hi,
When I run Impala shell commands with Oozie workflow based on the steps recommended here:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-schedule-with-oozie-tutorial/td-...
https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/How-to-Schedule-Impala-Jobs-with-Oozie...
I am getting an error thrown:
Traceback (most recent call last): File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/bin/../lib/impala-shell/impala_shell.py", line 38, in <module> from impala_client import (ImpalaClient, DisconnectedException, QueryStateException, File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/impala-shell/lib/impala_client.py", line 20, in < module> import sasl File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/impala-shell/ext-py/sasl-0.1.1-py2.7-linux-x86_64 .egg/sasl/__init__.py", line 1, in <module> File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/impala-shell/ext-py/sasl-0.1.1-py2.7-linux-x86_64 .egg/sasl/saslwrapper.py", line 7, in <module> File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/impala-shell/ext-py/sasl-0.1.1-py2.7-linux-x86_64 .egg/_saslwrapper.py", line 7, in <module> File "/app/localstorage/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/impala-shell/ext-py/sasl-0.1.1-py2.7-linux-x86_64 .egg/_saslwrapper.py", line 6, in __bootstrap__ ImportError: /tmp/impala-shell-python-egg-cache-subuser/sasl-0.1.1-py2.7-linux-x86_64.egg-tmp/_saslwrapper.so: fail ed to map segment from shared object: Operation not permitted
We are using kerberos and this folder has 777 permission, but still throwing this error.
How do we resolve this?
Created 02-01-2018 07:35 AM
Problem Solved
The issue was that the script /bin/impala_shell is hardcoded by cloudera on our nodes, and so the PYTHON_EGG_CACHE always got redefined to /tmp
Below is a snippet of the code in /bin/impala_shell:
# We should set the EGG_CACHE to a per-user temporary location. # This follows what hue does. PYTHON_EGG_CACHE=/tmp/impala-shell-python-egg-cache-${USER} if [ ! -d ${PYTHON_EGG_CACHE} ]; then mkdir ${PYTHON_EGG_CACHE} fi
Additionally, /tmp was not exec mounted on all nodes.
In order to solve the issue we were having, we added the below code to the impala shell script that we want to run with oozie:
export PYTHON_EGG_CACHE=/app/bds export link_folder=/tmp/impala-shell-python-egg-cache-$(whoami) if ! [ -L $link_folder ] then rm -Rf "$link_folder" ln -sfn ${PYTHON_EGG_CACHE}${link_folder} ${link_folder} fi mkdir -p ${PYTHON_EGG_CACHE}${link_folder}
This creates a new link dir for PYTHON_EGG_CACHE, on a shared folder, which can be accessed by all nodes.
Created 02-01-2018 07:35 AM
Problem Solved
The issue was that the script /bin/impala_shell is hardcoded by cloudera on our nodes, and so the PYTHON_EGG_CACHE always got redefined to /tmp
Below is a snippet of the code in /bin/impala_shell:
# We should set the EGG_CACHE to a per-user temporary location. # This follows what hue does. PYTHON_EGG_CACHE=/tmp/impala-shell-python-egg-cache-${USER} if [ ! -d ${PYTHON_EGG_CACHE} ]; then mkdir ${PYTHON_EGG_CACHE} fi
Additionally, /tmp was not exec mounted on all nodes.
In order to solve the issue we were having, we added the below code to the impala shell script that we want to run with oozie:
export PYTHON_EGG_CACHE=/app/bds export link_folder=/tmp/impala-shell-python-egg-cache-$(whoami) if ! [ -L $link_folder ] then rm -Rf "$link_folder" ln -sfn ${PYTHON_EGG_CACHE}${link_folder} ${link_folder} fi mkdir -p ${PYTHON_EGG_CACHE}${link_folder}
This creates a new link dir for PYTHON_EGG_CACHE, on a shared folder, which can be accessed by all nodes.