About srai1

srai1 · ‎09-30-2016

Since you running pig while hive.execution.engine is in tez mode, you can tune these parameters OR set the upper limit in hive-env either ways you should be able to control how much memory is allocated for your job. This community article explains the ideal values in detail: https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html In short: Set the values for tez.am.resource.memory.mb equivalent to yarn.scheduler.minimum-allocation-mb Try that and see if that helps.

srai1 · ‎09-28-2016

@R M What is the value of the following properties: tez.am.resource.memory.mb tez.task.resource.memory.mb Have you tried playing around with the same since you are using Tez mode ?

srai1 · ‎09-24-2016

Goal Restricting HiveCLI access to specific users Prerequisite Access to admin account for Ambari Procedure Open Hive Config from Ambari, you can use the following link to do so http://<ambari-server-hostname>:8080/#/main/services/HIVE/configs Locate the following code within "Advanced hive-env" if [ "$SERVICE" = "cli" ]; then if [ -z "$DEBUG" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseNUMA -XX:+UseParallelGC -XX:-UseGCOverheadLimit" else export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" fi fi Replace the above code with the following declare -a users=(hdfs centos) if [ "$SERVICE" = "cli" ]; then for auser in ${users[@]}; do if [ "$auser" = "$USER" ]; then echo "User $USER is not authorized to use Hive" exit 1 fi done if [ -z "$DEBUG" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseNUMA -XX:+UseParallelGC -XX:-UseGCOverheadLimit" else export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" fi fi NOTE:- Here, the array named "users" contains the list of users who should not be authorized to use HiveCLI. Every time this list is modified, a restart of Hive services is required. Restart Hive services via Ambari for settings to take effect. Testing the configuration Open a terminal to client & try the following method [centos@master ~]$ id uid=500(centos) gid=500(centos) groups=500(centos),4(adm),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [centos@master ~]$ hive User centos is not authorized to use Hive [centos@master ~]$ sudo su - hdfs [hdfs@master ~]$ id uid=505(hdfs) gid=501(hadoop) groups=501(hadoop),502(hdfs) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [hdfs@master ~]$ hive User hdfs is not authorized to use Hive [hdfs@master ~]$ exit logout [centos@master ~]$ sudo su - hive [hive@master ~]$ hive WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties hive> NOTE: Please use a series of test to verify if the configuration works for all/majority of your jobs and automations based on this approach.

srai1 · ‎08-31-2016

@Michael Young I am not sure if specifying "ROW FORMAT SERDE" helps as once we create the table, the formatted description for this table shows the following: SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat It is already using the LazySimpleSerDe format. @jk, Can you query the same data via a different tool like Excel or something else and check if you still see the same junk character ?

srai1 · ‎08-31-2016

@Jon Roberts Can you try something like this: [hive@node1 ~]$ cat TestJdbcClient.java import java.sql.*; import org.apache.hadoop.security.UserGroupInformation; public class TestJdbcClient { public static void main (String args[]) { try { org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab("hive/node1.hortonworks.com@HWX.COM", "/etc/security/keytabs/hive.service.keytab"); Class.forName("org.apache.hive.jdbc.HiveDriver"); System.out.println("getting connection"); Connection con = DriverManager.getConnection("jdbc:hive2://node1.hortonworks.com:10000/default;hive.root.logger=ERROR,DFRA;principal=hive/node1.hortonworks.com@HWX.COM"); System.out.println("got connection"); Statement stmt = con.createStatement(); String sql = "show tables"; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); while (res.next()) { System.out.println(res.getString(1)); } } con.close(); } catch (Exception e) { e.printStackTrace(); } } } This is my result [hive@node1 ~]$ java -cp /usr/hdp/2.3.4.0-3485/hive/lib/hive-jdbc-1.2.1.2.3.4.0-3485-standalone.jar:/usr/hdp/2.3.4.0-3485/hadoop/client/commons-configuration-1.6.jar:/etc/hive/conf/hive-site.xml:/usr/hdp/2.3.4.0-3485/hadoop/client/hadoop-common-2.7.1.2.3.4.0-3485.jar:/usr/hdp/2.3.4.0-3485/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/hdp/2.3.4.0-3485/hadoop/client/slf4j-log4j12.jar:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar:/usr/hdp/2.3.4.0-3485/hive-hcatalog/share/webhcat/svr/lib/xercesImpl-2.9.1.jar:. TestJdbcClient log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. getting connection got connection Running: show tables csvinternal customera cvsexternal mytemp sample_07 src temp_source test testabc testnormal testnormal1 tgt

srai1 · ‎08-30-2016

You should be able to specify "hiveConfs" and then try and set "hive.root.logger" to probably something like "FATAL" or "ERROR", that way you should be able to suppress some of the INFO messages. They are retrieving the values from log4j.properties, so another way would be to set it over there. Reference: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-jdbc-odbc-drivers.html

srai1 · ‎08-25-2016

Even though the main requirement is addressed based on the choice of selected answer, thought I should log this for reference in the future: 0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from src; +----------+------------+--+ | src.key | src.value | +----------+------------+--+ | 1 | Value1 | | 2 | Value2 | +----------+------------+--+ 2 rows selected (0.187 seconds) 0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from tgt; +----------+------------+--+ | tgt.key | tgt.value | +----------+------------+--+ +----------+------------+--+ No rows selected (0.154 seconds) 0: jdbc:hive2://node1.hortonworks.com:10000/d> from (from src select transform(src.key,src.value) using '/bin/cat' as (tkey,tvalue) )tmap insert overwrite table tgt select tkey,tvalue; INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: from (from src select transfor...tkey,tvalue(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1471888656011_0009) INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.tgt from hdfs://node1.hortonworks.com:8020/apps/hive/warehouse/tgt/.hive-staging_hive_2016-08-25_21-51-10_715_1000932141605500109-1/-ext-10000 INFO : Table default.tgt stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16] No rows affected (19.992 seconds) 0: jdbc:hive2://node1.hortonworks.com:10000/d> select * from tgt; +----------+------------+--+ | tgt.key | tgt.value | +----------+------------+--+ | 1 | Value1 | | 2 | Value2 | +----------+------------+--+ 2 rows selected (0.197 seconds)

srai1 · ‎08-25-2016

Article Goals Setup Hive ODBC on Ubuntu 14.x.x with/without Kerberos Setup Hive ODBC on CentOS 6.x with/without Kerberos Enabling Debug logging useful for debugging connectivity issues Notes/Requirements Article assumes that you have a setup of Ubuntu/CentOS server Access to Ubuntu/CentOS repository Have libsasl2-modules-gssapi-mit installed Have "gcc" installed, required for compilation when building from source Have "make" installed, required for compilation when building from source Verify if you have a working Ubuntu repository by typing "sudo apt-get update" and see if this returns successfully Within CentOS as a non superuser you can run "sudo yum repolist" wherein the "base" and "epel" repositories should get listed with thousands as value in "status" field Here are the examples of working repository on either systems This node should be able to talk to the HiveServer2 via hostname/IP address Ensure that firewall is disabled (ufw status for Ubuntu) & (service iptables/ip6tables status on CentOS) Hive keytab file copied over to the client machine for getting the kerberos ticket Ubuntu root@srai-odbc-ubuntu:~# apt-get update Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB] .... Ign http://nova.clouds.archive.ubuntu.com trusty/universe Translation-en_US Fetched 11.2 MB in 6s (1,623 kB/s) Reading package lists... Done CentOS [centos@node1 ~]$ sudo yum repolist Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile repo id repo name status base CentOS-6 - Base 6,696 epel Extra Packages for Enterprise Linux 6 - x86_64 12,192 extras CentOS-6 - Extras 62 updates CentOS-6 - Updates 457 Installing UnixODBC on Ubuntu Using existing repository root@srai-odbc-ubuntu:~# apt-get install unixodbc Reading package lists... Done Building dependency tree Reading state information... Done ..... Setting up unixodbc (2.2.14p2-5ubuntu5) ... Setting up odbcinst (2.2.14p2-5ubuntu5) ... Processing triggers for libc-bin (2.19-0ubuntu6.4) ... root@srai-odbc-ubuntu:~# Using source file, which can be downloaded HERE. This is good if you want a custom version of the driver and a custom location for the binaries and libraries to be installed Make a directory where you want to have the files downloaded for e.g., "/opt/odbc/unixodbc" Change path to this directory "cd /opt/odbc" Download the file in this location root@srai-odbc-ubuntu:~/ODBC# wget http://www.unixodbc.org/unixODBC-2.3.4.tar.gz --2016-08-25 16:34:30-- http://www.unixodbc.org/unixODBC-2.3.4.tar.gz Resolving www.unixodbc.org (www.unixodbc.org)... 87.106.19.214 Connecting to www.unixodbc.org (www.unixodbc.org)|87.106.19.214|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1830660 (1.7M) [application/x-gzip] Saving to: ‘unixODBC-2.3.4.tar.gz’ 100%[========================================================================================================================================>] 1,830,660 390KB/s in 5.2s 2016-08-25 16:34:35 (346 KB/s) - ‘unixODBC-2.3.4.tar.gz’ saved [1830660/1830660] Extract the files from the tarzip file root@srai-odbc-ubuntu:~/ODBC# tar -xzf unixODBC-2.3.4.tar.gz After the tarzip file has been extracted, you should be able to change directory to that folder cd unixODBC-2.3.4 By default when you build the source, files are copied within /usr/local/bin, /usr/local and /usr/local/man. Run the following, assuming you want the installation to be performed in a different location, for instance "/opt/odbc/unixodbc" ./configure --prefix=/opt/odbc/unixodbc --exec-prefix=/opt/odbc/unixodbc/oth --bindir=/opt/odbc/unixodbc/bin --sbindir=/opt/odbc/unixodbc/sbin --libexecdir=/opt/odbc/unixodbc/libexec --sysconfdir=/opt/odbc/unixodbc/etc --libdir=/opt/odbc/unixodbc/lib --includedir=/lib --includedir=/lib64 make make install Once the installation is complete, you should be able to find the "isql" binary # find /usr/local -name isql /usr/local/bin/isql Execute the command "odbcinst -j", this should return the location of configuration files # cd /opt/odbc/unixodbc/bin # ./odbcinst -j unixODBC 2.3.4 DRIVERS............: /usr/local/etc/odbcinst.ini SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources USER DATA SOURCES..: /root/.odbc.ini SQLULEN Size.......: 8 SQLLEN Size........: 8 SQLSETPOSIROW Size.: 8 NOTE: If you see this error /usr/local/bin/odbcinst: error while loading shared libraries: libodbcinst.so.2: cannot open shared object file: No such file or directory ..this can be due to libraries not being found, but they might exist on the system. Perform the following steps to verify and fix this # ldd /opt/odbc/unixodbc/odbcinst linux-vdso.so.1 => (0x00007fff33dfe000) libodbcinst.so.2 => not found libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7f739e2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7f7361c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7f73418000) /lib64/ld-linux-x86-64.so.2 (0x00007f7f73e23000) # find /opt -name libodbcinst.so.2 /opt/odbc/unixodbc/lib/libodbcinst.so.2 # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/odbc/unixodbc/lib # ldd /opt/odbc/unixodbc/odbcinst linux-vdso.so.1 => (0x00007fff33dfe000) libodbcinst.so.2 => /opt/odbc/unixodbc/lib/libodbcinst.so.2 (0x00007f7f73c06000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7f739e2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7f7361c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7f73418000) /lib64/ld-linux-x86-64.so.2 (0x00007f7f73e23000) Installing & Configure Hive ODBC on Ubuntu Download the driver HERE Copy the driver over to any temporary/staging directory or to a directory where you conserve all the binaries, for e.g., "/opt/odbc" Install the driver package using the following command # dpkg -i hive-odbc-native_2.1.2.1002-2_amd64.deb (Reading database ... 54835 files and directories currently installed.) Preparing to unpack hive-odbc-native_2.1.2.1002-2_amd64.deb ... Unpacking hive-odbc-native (2.1.2.1002-2) over (2.1.2.1002-2) ... Setting up hive-odbc-native (2.1.2.1002-2) ... # Ensure that the libraries installed (if UnixODBC was installed from source files) are included in LD_LIBRARY_PATH export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib:/lib64:/opt/odbc/unixodbc/lib Locate the files "odbc.ini" and "odbcinst.ini", location based on your method of installation. By default the files are available under /etc/, however, it can also be present in /usr/local/etc. In our case, assuming the custom installation, the files will be available under /opt/odbc/unixodbc/etc root@srai-odbc-ubuntu:/opt/odbc/unixodbc/etc# ls -lrth /opt/odbc/unixodbc/etc total 4.0K -rw-r--r-- 1 root root 0 Aug 25 18:00 odbcinst.ini -rw-r--r-- 1 root root 0 Aug 25 18:00 odbc.ini drwxr-xr-x 2 root root 4.0K Aug 25 18:00 ODBCDataSources Copy the files from hive installation directory into this location # cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbc.ini /opt/odbc/unixodbc/etc/odbc.ini # cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbcinst.ini /opt/odbc/unixodbc/etc/odbcinst.ini Edit the file using vi or other editor and configure the following locate the string "[Sample Hortonworks Hive DSN 64]" this is the DSN, short for Data Source Name. You can change it to anything you like, for instance "hivedb" Ensure that the variable HOST has the name for HiveServer2 PORT by default should be set to 10000, or as configured otherwise AuthMech should be set to 0 for no authentication OR 1 for Kerberos UID can be set to hive or as needed KrbHostFQDN should be the FQDN of Kerberos Server KrbServiceName should be the principal name configured for hive OR the principal created KrbRealm should be the name of realm, for instance EXAMPLE.COM Here is the sample configuration output [hivedb] Driver=/usr/lib/hive/lib/native/Linux-amd64-64/libhortonworkshiveodbc64.so HOST=node1.hortonworks.com PORT=10000 HiveServerType=2 AuthMech=1 UID=hive KrbHostFQDN=kerb.hortonworks.com KrbServiceName=hive KrbRealm=HWX.COM Export the variables to ensure that only our configured configuration files are being used export ODBCINI=/opt/odbc/unixodbc/odbc.ini export ODBCSYSINI=/opt/odbc/unixodbc export HORTONWORKSHIVEINI=/usr/lib/hive/lib/native/Linux-amd64-64/hortonworks.hiveodbc.ini Ensure that hortonworks.hiveodbc.ini (as stated above) is pointing to the correct ODBC library, in here since we are using UnixODBC manager + Hive ODBC driver (Simba), we need to comment out the default "ODBCInstLib=libiodbcinst.so" and uncomment "ODBCInstLib=libodbcinst.so" # Generic ODBCInstLib # iODBC #ODBCInstLib=libiodbcinst.so <<<<<<< Comment This # SimbaDM / unixODBC ODBCInstLib=libodbcinst.so <<<<<<< Uncomment Testing the connectivity Get the ticket from Hive and Kerberos using keytab file. In this case, node1.hortonworks.com is the host which serves HiveServer2 connectivity # kinit -k -t /tmp/hive.service.keytab hive/node1.hortonworks.com@HWX.COM # klist Once the ticket is granted, execute the following command to test connectivity # /opt/odbc/unixodbc/bin/isql hivedb -v You can run some commands to verify if you are able to list the database and access the table data. # /opt/odbc/unixodbc/bin/isql hivedb +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> show tables; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tab_name | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | csvinternal | | cvsexternal | | sample_07 | | src | | test | | testabc | | testnormal | | tgt | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ SQLRowCount returns -1 8 rows fetched SQL> select * from test limit 5; +------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | col1 | col2 | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | one | | 1 | one | | 10 | ten | | 11 | eleven | | 10 | ten | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ SQLRowCount returns -1 5 rows fetched SQL> The process of installing UnixODBC on CentOS precisely the same with the exception of how firewalls are turned off. Everything else is virtually the same. Enabling Debug Logging Open the file "hortonworks.hiveodbc.ini", you can use either of the following methods # vi $HORTONWORKSHIVEINI # vi /usr/lib/hive/lib/native/Linux-amd64-64/hortonworks.hiveodbc.ini The only two parameters that we need to modify are "LogLevel" and "LogPath", here are the examples. Zero is basically no logging. LogLevel=6 LogPath=/opt/odbc/unixodbc/logs Once the variables are set, run a few commands and see that there will be two log files generated within the LogPath specified above. These files can be reviewed for debugging information. # ls -lrth /opt/odbc/unixodbc/logs/ total 796K -rw-r--r-- 1 root root 9.5K Aug 25 20:51 HortonworksHiveODBCDriver.log -rw-r--r-- 1 root root 784K Aug 25 20:51 HortonworksHiveODBCDriver_connection_0.log Advisable to turn the values to default i.e., "LogLevel=0" if there are no attempts for debugging required, as it may generate log of data. Hope this helps!!

srai1 · ‎08-24-2016

Your statement might look something like insert into csvinternal (col2) select col1 from cvsexternal;

srai1 · ‎08-24-2016

You can consider using hive external table with the same column name and data types and then map the column names while loading from Source (hive external table) to Target (hive table). You can get the examples here for creation of external table. https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables

Online	Offline
Last Visited	‎10-05-2018 02:57 PM

Member Since	‎05-10-2016 03:24 AM
Last Visited	‎10-05-2018 02:57 PM
Posts	184
Kudos received	60

Cloudera Community

Re: Anonymous user requests to access on Hive HDFS...

Re: How to change the default logs path of HUE?

Re: While loading the data from external hive tabl...

Re: Hive Security using Apache Ranger

Re: File View Error: Unauthorized connection for s...

Re: Setting Mapper memory for pig in tez mode

Re: Setting Mapper memory for pig in tez mode

Restricting HiveCLI access to limited users

Re: Why hive is not able to store special characte...

Re: How do I suppress INFO and WARNING messages wh...

Re: How do I suppress INFO and WARNING messages wh...

Re: Is there is any workaround to map csv columns ...

Hive ODBC Setup with UnixODBC and Debug Logging

Re: Is there is any workaround to map csv columns ...

Re: Is there is any workaround to map csv columns ...