Member since
05-10-2016
184
Posts
60
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4105 | 05-06-2017 10:21 PM | |
4109 | 05-04-2017 08:02 PM | |
5024 | 12-28-2016 04:49 PM | |
1243 | 11-11-2016 08:09 PM | |
3338 | 10-22-2016 03:03 AM |
09-30-2016
03:07 PM
Since you running pig while hive.execution.engine is in tez mode, you can tune these parameters OR set the upper limit in hive-env either ways you should be able to control how much memory is allocated for your job. This community article explains the ideal values in detail: https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html In short: Set the values for tez.am.resource.memory.mb equivalent to yarn.scheduler.minimum-allocation-mb Try that and see if that helps.
... View more
09-28-2016
04:45 PM
@R M What is the value of the following properties: tez.am.resource.memory.mb tez.task.resource.memory.mb Have you tried playing around with the same since you are using Tez mode ?
... View more
09-24-2016
12:20 PM
2 Kudos
Goal Restricting HiveCLI access to specific users Prerequisite Access to admin account for Ambari Procedure Open Hive Config from Ambari, you can use the following link to do so http://<ambari-server-hostname>:8080/#/main/services/HIVE/configs Locate the following code within "Advanced hive-env" if [ "$SERVICE" = "cli" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseNUMA -XX:+UseParallelGC -XX:-UseGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
fi
fi
Replace the above code with the following declare -a users=(hdfs centos)
if [ "$SERVICE" = "cli" ]; then
for auser in ${users[@]}; do
if [ "$auser" = "$USER" ]; then
echo "User $USER is not authorized to use Hive"
exit 1
fi
done
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseNUMA -XX:+UseParallelGC -XX:-UseGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
fi
fi
NOTE:- Here, the array named "users" contains the list of users who should not be authorized to use HiveCLI. Every time this list is modified, a restart of Hive services is required. Restart Hive services via Ambari for settings to take effect. Testing the configuration Open a terminal to client & try the following method [centos@master ~]$ id
uid=500(centos) gid=500(centos) groups=500(centos),4(adm),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[centos@master ~]$ hive
User centos is not authorized to use Hive
[centos@master ~]$ sudo su - hdfs
[hdfs@master ~]$ id
uid=505(hdfs) gid=501(hadoop) groups=501(hadoop),502(hdfs) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[hdfs@master ~]$ hive
User hdfs is not authorized to use Hive
[hdfs@master ~]$ exit
logout
[centos@master ~]$ sudo su - hive
[hive@master ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
hive>
NOTE: Please use a series of test to verify if the configuration works for all/majority of your jobs and automations based on this approach.
... View more
Labels:
08-31-2016
04:51 PM
@Michael Young I am not sure if specifying "ROW FORMAT SERDE" helps as once we create the table, the formatted description for this table shows the following: SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
It is already using the LazySimpleSerDe format. @jk, Can you query the same data via a different tool like Excel or something else and check if you still see the same junk character ?
... View more
08-31-2016
04:15 PM
@Jon Roberts Can you try something like this: [hive@node1 ~]$ cat TestJdbcClient.java
import java.sql.*;
import org.apache.hadoop.security.UserGroupInformation;
public class TestJdbcClient {
public static void main (String args[]) {
try {
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("hive/node1.hortonworks.com@HWX.COM", "/etc/security/keytabs/hive.service.keytab");
Class.forName("org.apache.hive.jdbc.HiveDriver");
System.out.println("getting connection");
Connection con = DriverManager.getConnection("jdbc:hive2://node1.hortonworks.com:10000/default;hive.root.logger=ERROR,DFRA;principal=hive/node1.hortonworks.com@HWX.COM");
System.out.println("got connection");
Statement stmt = con.createStatement();
String sql = "show tables";
System.out.println("Running: " + sql);
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
while (res.next()) {
System.out.println(res.getString(1));
}
}
con.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
This is my result [hive@node1 ~]$ java -cp /usr/hdp/2.3.4.0-3485/hive/lib/hive-jdbc-1.2.1.2.3.4.0-3485-standalone.jar:/usr/hdp/2.3.4.0-3485/hadoop/client/commons-configuration-1.6.jar:/etc/hive/conf/hive-site.xml:/usr/hdp/2.3.4.0-3485/hadoop/client/hadoop-common-2.7.1.2.3.4.0-3485.jar:/usr/hdp/2.3.4.0-3485/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/hdp/2.3.4.0-3485/hadoop/client/slf4j-log4j12.jar:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar:/usr/hdp/2.3.4.0-3485/hive-hcatalog/share/webhcat/svr/lib/xercesImpl-2.9.1.jar:. TestJdbcClient
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
getting connection
got connection
Running: show tables
csvinternal
customera
cvsexternal
mytemp
sample_07
src
temp_source
test
testabc
testnormal
testnormal1
tgt
... View more
08-30-2016
10:19 PM
You should be able to specify "hiveConfs" and then try and set "hive.root.logger" to probably something like "FATAL" or "ERROR", that way you should be able to suppress some of the INFO messages. They are retrieving the values from log4j.properties, so another way would be to set it over there. Reference: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-jdbc-odbc-drivers.html
... View more
08-25-2016
09:50 PM
Even though the main requirement is addressed based on the choice of selected answer, thought I should log this for reference in the future: 0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from src;
+----------+------------+--+
| src.key | src.value |
+----------+------------+--+
| 1 | Value1 |
| 2 | Value2 |
+----------+------------+--+
2 rows selected (0.187 seconds)
0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from tgt;
+----------+------------+--+
| tgt.key | tgt.value |
+----------+------------+--+
+----------+------------+--+
No rows selected (0.154 seconds)
0: jdbc:hive2://node1.hortonworks.com:10000/d> from (from src select transform(src.key,src.value) using '/bin/cat' as (tkey,tvalue) )tmap insert overwrite table tgt select tkey,tvalue;
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: from (from src select transfor...tkey,tvalue(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1471888656011_0009)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table default.tgt from hdfs://node1.hortonworks.com:8020/apps/hive/warehouse/tgt/.hive-staging_hive_2016-08-25_21-51-10_715_1000932141605500109-1/-ext-10000
INFO : Table default.tgt stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16]
No rows affected (19.992 seconds)
0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from tgt;
+----------+------------+--+
| tgt.key | tgt.value |
+----------+------------+--+
| 1 | Value1 |
| 2 | Value2 |
+----------+------------+--+
2 rows selected (0.197 seconds)
... View more
08-25-2016
09:12 PM
1 Kudo
Article
Goals
Setup Hive ODBC on Ubuntu 14.x.x with/without Kerberos
Setup Hive ODBC on CentOS 6.x with/without Kerberos Enabling Debug logging useful for debugging connectivity issues
Notes/Requirements
Article assumes that you have a setup of Ubuntu/CentOS server Access to Ubuntu/CentOS repository Have libsasl2-modules-gssapi-mit installed Have "gcc" installed, required for compilation when building from source Have "make" installed, required for compilation when building from source Verify if you have a working Ubuntu repository by typing "sudo apt-get update" and see if this returns successfully Within CentOS as a non superuser you can run "sudo yum repolist" wherein the "base" and "epel" repositories should get listed with thousands as value in "status" field Here are the examples of working repository on either systems This node should be able to talk to the HiveServer2 via hostname/IP address Ensure that firewall is disabled (ufw status for Ubuntu) & (service iptables/ip6tables status on CentOS) Hive keytab file copied over to the client machine for getting the kerberos ticket
Ubuntu
root@srai-odbc-ubuntu:~# apt-get update
Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB]
....
Ign http://nova.clouds.archive.ubuntu.com trusty/universe Translation-en_US
Fetched 11.2 MB in 6s (1,623 kB/s)
Reading package lists... Done
CentOS
[centos@node1 ~]$ sudo yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
repo id repo name status
base CentOS-6 - Base 6,696
epel Extra Packages for Enterprise Linux 6 - x86_64 12,192
extras CentOS-6 - Extras 62
updates CentOS-6 - Updates 457
Installing UnixODBC on Ubuntu
Using existing repository
root@srai-odbc-ubuntu:~# apt-get install unixodbc
Reading package lists... Done
Building dependency tree
Reading state information... Done
.....
Setting up unixodbc (2.2.14p2-5ubuntu5) ...
Setting up odbcinst (2.2.14p2-5ubuntu5) ...
Processing triggers for libc-bin (2.19-0ubuntu6.4) ...
root@srai-odbc-ubuntu:~#
Using source file, which can be downloaded HERE. This is good if you want a custom version of the driver and a custom location for the binaries and libraries to be installed
Make a directory where you want to have the files downloaded for e.g., "/opt/odbc/unixodbc" Change path to this directory "cd /opt/odbc" Download the file in this location root@srai-odbc-ubuntu:~/ODBC# wget http://www.unixodbc.org/unixODBC-2.3.4.tar.gz
--2016-08-25 16:34:30-- http://www.unixodbc.org/unixODBC-2.3.4.tar.gz
Resolving www.unixodbc.org (www.unixodbc.org)... 87.106.19.214
Connecting to www.unixodbc.org (www.unixodbc.org)|87.106.19.214|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1830660 (1.7M) [application/x-gzip]
Saving to: ‘unixODBC-2.3.4.tar.gz’
100%[========================================================================================================================================>] 1,830,660 390KB/s in 5.2s
2016-08-25 16:34:35 (346 KB/s) - ‘unixODBC-2.3.4.tar.gz’ saved [1830660/1830660]
Extract the files from the tarzip file root@srai-odbc-ubuntu:~/ODBC# tar -xzf unixODBC-2.3.4.tar.gz
After the tarzip file has been extracted, you should be able to change directory to that folder cd unixODBC-2.3.4
By default when you build the source, files are copied within /usr/local/bin, /usr/local and /usr/local/man. Run the following, assuming you want the installation to be performed in a different location, for instance "/opt/odbc/unixodbc" ./configure --prefix=/opt/odbc/unixodbc --exec-prefix=/opt/odbc/unixodbc/oth --bindir=/opt/odbc/unixodbc/bin --sbindir=/opt/odbc/unixodbc/sbin --libexecdir=/opt/odbc/unixodbc/libexec --sysconfdir=/opt/odbc/unixodbc/etc --libdir=/opt/odbc/unixodbc/lib --includedir=/lib --includedir=/lib64
make
make install
Once the installation is complete, you should be able to find the "isql" binary # find /usr/local -name isql
/usr/local/bin/isql
Execute the command "odbcinst -j", this should return the location of configuration files # cd /opt/odbc/unixodbc/bin
# ./odbcinst -j
unixODBC 2.3.4
DRIVERS............: /usr/local/etc/odbcinst.ini
SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini
FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources
USER DATA SOURCES..: /root/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8
NOTE: If you see this error /usr/local/bin/odbcinst: error while loading shared libraries: libodbcinst.so.2: cannot open shared object file: No such file or directory
..this can be due to libraries not being found, but they might exist on the system. Perform the following steps to verify and fix this # ldd /opt/odbc/unixodbc/odbcinst
linux-vdso.so.1 => (0x00007fff33dfe000)
libodbcinst.so.2 => not found
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7f739e2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7f7361c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7f73418000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7f73e23000)
# find /opt -name libodbcinst.so.2
/opt/odbc/unixodbc/lib/libodbcinst.so.2 # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/odbc/unixodbc/lib
# ldd /opt/odbc/unixodbc/odbcinst
linux-vdso.so.1 => (0x00007fff33dfe000)
libodbcinst.so.2 => /opt/odbc/unixodbc/lib/libodbcinst.so.2 (0x00007f7f73c06000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7f739e2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7f7361c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7f73418000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7f73e23000) Installing & Configure Hive ODBC on Ubuntu
Download the driver HERE Copy the driver over to any temporary/staging directory or to a directory where you conserve all the binaries, for e.g., "/opt/odbc" Install the driver package using the following command # dpkg -i hive-odbc-native_2.1.2.1002-2_amd64.deb
(Reading database ... 54835 files and directories currently installed.)
Preparing to unpack hive-odbc-native_2.1.2.1002-2_amd64.deb ...
Unpacking hive-odbc-native (2.1.2.1002-2) over (2.1.2.1002-2) ...
Setting up hive-odbc-native (2.1.2.1002-2) ...
#
Ensure that the libraries installed (if UnixODBC was installed from source files) are included in LD_LIBRARY_PATH export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib:/lib64:/opt/odbc/unixodbc/lib Locate the files "odbc.ini" and "odbcinst.ini", location based on your method of installation. By default the files are available under /etc/, however, it can also be present in /usr/local/etc. In our case, assuming the custom installation, the files will be available under /opt/odbc/unixodbc/etc root@srai-odbc-ubuntu:/opt/odbc/unixodbc/etc# ls -lrth /opt/odbc/unixodbc/etc
total 4.0K
-rw-r--r-- 1 root root 0 Aug 25 18:00 odbcinst.ini
-rw-r--r-- 1 root root 0 Aug 25 18:00 odbc.ini
drwxr-xr-x 2 root root 4.0K Aug 25 18:00 ODBCDataSources
Copy the files from hive installation directory into this location # cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbc.ini /opt/odbc/unixodbc/etc/odbc.ini
# cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbcinst.ini /opt/odbc/unixodbc/etc/odbcinst.ini
Edit the file using vi or other editor and configure the following locate the string "[Sample Hortonworks Hive DSN 64]" this is the DSN, short for Data Source Name. You can change it to anything you like, for instance "hivedb" Ensure that the variable HOST has the name for HiveServer2 PORT by default should be set to 10000, or as configured otherwise AuthMech should be set to 0 for no authentication OR 1 for Kerberos UID can be set to hive or as needed KrbHostFQDN should be the FQDN of Kerberos Server KrbServiceName should be the principal name configured for hive OR the principal created KrbRealm should be the name of realm, for instance EXAMPLE.COM Here is the sample configuration output [hivedb]
Driver=/usr/lib/hive/lib/native/Linux-amd64-64/libhortonworkshiveodbc64.so
HOST=node1.hortonworks.com
PORT=10000
HiveServerType=2
AuthMech=1
UID=hive
KrbHostFQDN=kerb.hortonworks.com
KrbServiceName=hive
KrbRealm=HWX.COM Export the variables to ensure that only our configured configuration files are being used export ODBCINI=/opt/odbc/unixodbc/odbc.ini
export ODBCSYSINI=/opt/odbc/unixodbc
export HORTONWORKSHIVEINI=/usr/lib/hive/lib/native/Linux-amd64-64/hortonworks.hiveodbc.ini Ensure that hortonworks.hiveodbc.ini (as stated above) is pointing to the correct ODBC library, in here since we are using UnixODBC manager + Hive ODBC driver (Simba), we need to comment out the default "ODBCInstLib=libiodbcinst.so" and uncomment "ODBCInstLib=libodbcinst.so" # Generic ODBCInstLib
# iODBC
#ODBCInstLib=libiodbcinst.so <<<<<<< Comment This
# SimbaDM / unixODBC
ODBCInstLib=libodbcinst.so <<<<<<< Uncomment
Testing the connectivity
Get the ticket from Hive and Kerberos using keytab file. In this case, node1.hortonworks.com is the host which serves HiveServer2 connectivity # kinit -k -t /tmp/hive.service.keytab hive/node1.hortonworks.com@HWX.COM
# klist
Once the ticket is granted, execute the following command to test connectivity # /opt/odbc/unixodbc/bin/isql hivedb -v
You can run some commands to verify if you are able to list the database and access the table data. # /opt/odbc/unixodbc/bin/isql hivedb
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> show tables;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tab_name |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| csvinternal |
| cvsexternal |
| sample_07 |
| src |
| test |
| testabc |
| testnormal |
| tgt |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns -1
8 rows fetched
SQL> select * from test limit 5;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| col1 | col2 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | one |
| 1 | one |
| 10 | ten |
| 11 | eleven |
| 10 | ten |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns -1
5 rows fetched
SQL>
The process of installing UnixODBC on CentOS precisely the same with the exception of how firewalls are turned off. Everything else is virtually the same.
Enabling Debug Logging Open the file "hortonworks.hiveodbc.ini", you can use either of the following methods # vi $HORTONWORKSHIVEINI
# vi /usr/lib/hive/lib/native/Linux-amd64-64/hortonworks.hiveodbc.ini The only two parameters that we need to modify are "LogLevel" and "LogPath", here are the examples. Zero is basically no logging. LogLevel=6
LogPath=/opt/odbc/unixodbc/logs
Once the variables are set, run a few commands and see that there will be two log files generated within the LogPath specified above. These files can be reviewed for debugging information. # ls -lrth /opt/odbc/unixodbc/logs/
total 796K
-rw-r--r-- 1 root root 9.5K Aug 25 20:51 HortonworksHiveODBCDriver.log
-rw-r--r-- 1 root root 784K Aug 25 20:51 HortonworksHiveODBCDriver_connection_0.log
Advisable to turn the values to default i.e., "LogLevel=0" if there are no attempts for debugging required, as it may generate log of data.
Hope this helps!!
... View more
Labels:
08-24-2016
09:19 PM
Your statement might look something like insert into csvinternal (col2) select col1 from cvsexternal;
... View more
08-24-2016
09:09 PM
1 Kudo
You can consider using hive external table with the same column name and data types and then map the column names while loading from Source (hive external table) to Target (hive table). You can get the examples here for creation of external table. https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
... View more