About carter

carter · ‎07-11-2017

If you are using Hive 2 or later (including Hive LLAP), you no longer need the dummy table, statements like: INSERT INTO table test_array SELECT 1, array('a','b'); will work.

carter · ‎07-07-2017

Hybrid Procedural SQL On Hadoop (HPL/SQL) is a tool that implements procedural SQL for Hive. Lately, many people have investigated running HPL/SQL on HDP 2.6 and have encountered problems. These instructions tell you how to work around these problems so that you can experiment with HPL/SQL. In HDP 2.6, HPL/SQL is considered a "technical preview" and is not supported by Hortonworks support subscriptions. There are known limitations in HPL/SQL that may make it unsuitable for your needs, so test thoroughly before you decide HPL/SQL is right for you. These instructions require cluster changes which are not appropriate for production clusters and should only be done on development clusters or sandboxes. We'll cover 2 ways of using HPL/SQL: Using HiveServer Interactive (Preferred) Using an embedded metastore In either approach, you need to edit /etc/hive2/conf/hive-env.sh and change line 30 from: export HIVE_CONF_DIR=/usr/hdp/current/hive-server2-hive2/conf/conf.server to export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/usr/hdp/current/hive-server2-hive2/conf/conf.server} Again, do not do this on a production cluster. Note that hive-env.sh will be overwritten every time you restart HiveServer Interactive and this modification will need to be repeated for HPL/SQL to be used. Option 1 (Preferred): Using HPL/SQL with HiveServer Interactive: First, start HiveServer Interactive through Ambari and edit hive-env.sh as mentioned above. After editing hive-env.sh you will need to place this configuration into /usr/hdp/current/hive-server2-hive2/conf/hplsql-site.xml <configuration> <property> <name>hplsql.conn.default</name> <value>hiveconn</value> </property> <property> <name>hplsql.conn.hiveconn</name> <value>org.apache.hive.jdbc.HiveDriver;jdbc:hive2://ambari.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2</value> </property> <property> <name>hplsql.conn.convert.hiveconn</name> <value>true</value> </property> </configuration> You will need to replace the value for hplsql.conn.hiveconn with the value of HiveServer2 Interactive JDBC URL as shown in the Hive service page in Ambari. Proceed to the Validation Phase below. Option 2: Using Embedded Metastore To use an embedded metastore, HPL/SQL clients will need access to the database backing the metastore (e.g. MySQL), so will need a hive-site.xml that contains credentials to the database. Ambari sets up two hive-site.xml files, one without passwords in /etc/hive2/conf and one with passwords in /etc/hive2/conf/conf.server, only visible to certain users. You will need the one with credentials. Because of this security problem, use this approach only if you can't use HiveServer for some reason. Run these commands to clone the Hive configurations, including passwords: sudo cp -r /etc/hive2/conf/conf.server conf sudo chmod -R 755 conf sudo cp /etc/hive2/2.6.1.0-129/0/hive-env.sh conf Edit conf/hive-site.xml and change the value of hadoop.security.credential.provider.path to jceks://file/home/vagrant/conf/hive-site.jceks export HIVE_CONF_DIR=/home/vagrant/conf (you will need subtitute your actual path here) Finally, place this configuration in /home/vagrant/conf/hplsql-site.xml (again, substitute your actual path) <configuration> <property> <name>hplsql.conn.default</name> <value>hiveconn</value> </property> <property> <name>hplsql.conn.hiveconn</name> <value>org.apache.hive.jdbc.HiveDriver;jdbc:hive2://</value> </property> <property> <name>hplsql.conn.convert.hiveconn</name> <value>true</value> </property> </configuration> If you decided to look at the Embedded Metastore route, hopefully you read these instructions and decided the HiveServer Interactive route is a better choice. Validation Phase: To confirm your setup, run: /usr/hdp/current/hive-server2-hive2/bin/hplsql -e 'select "hello world";' If your setup is correct you will see hello world printed to your console. For more information, HPL/SQL includes excellent documentation (http://www.hplsql.org/doc) and you should consult this for most questions.

carter · ‎07-01-2017

This failure to start HiveServer Interactive / Hive LLAP is due to a known problem in Ambari 2.5 where certain keytab files are not generated if you enable LLAP after your cluster is Kerberized. The Ambari Kerberos wizard generates keytabs for all services that are present as of the time the Kerberos Wizard is run. If HiveServer Interactive is not enabled when the wizard runs, certain essential keytabs will not be present when you try to enable HiveServer Interactive / LLAP, nor are they generated at that time. There are two options for resolving this problem: Regenerate keytabs using the Ambari Kerberos wizard, refer to the Ambari documentation for this process. On all cluster nodes, copy the hive.service.keytab to hive.llap.zk.sm.keytab. If your keytabs are stored in the default location, cp /etc/security/keytabs/hive.service.keytab /etc/security/keytabs/hive.llap.zk.sm.keytab

carter · ‎07-01-2017

Maximum number of mappers will be bound by the number of splits calculated at split generation time. These settings impact split calculation: mapreduce.input.fileinputformat.split.minsize mapreduce.input.fileinputformat.split.maxsize Splits are grouped at the Tez layer based on these settings: tez.grouping.min-size tez.grouping.max-size If you want more mappers you can tune all these settings down. Note this will not guarantee lower latency, especially on small clusters.

carter · ‎07-01-2017

The right way to think about LATERAL VIEW is that it allows a table-generating function (UDTF) to be treated as a table source, so that it can be used like any other table, including selects, joins and more. LATERAL VIEW is often used with explode, but explode is just one UDTF of many, a full list is available in the documentation. To take an example: select tf1.*, tf2.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf1 lateral view explode(map('A',10,'B',20,'C',30)) tf2; This results in: tf1.key tf1.value tf2.key tf2.value A 10 A 10 A 10 B 20 A 10 C 30 B 20 A 10 (5 rows were truncated) The thing to see here is that this query is a cross product join between the tables tf1 and tf2. The LATERAL VIEW syntax allowed me to treat them as tables. The original question used "AS" syntax, which automatically maps the generated table's columns to column aliases. In my view it is much more powerful to leave them as tables and use their fully qualified table correlation identifiers. These tables can be used in joins as well: select tf1.*, tf2.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf1 lateral view explode(map('A',10,'B',20,'C',30)) tf2 where tf1.key = tf2.key; Now we get: tf1.key tf1.value tf2.key tf2.value A 10 A 10 B 20 B 20 C 30 C 30

carter · ‎06-30-2017

For those looking for an easy graphical tool, the Hive View 2.0 (included with Ambari 2.5 and up) has the ability to view table and column level stats, and to compute them if they are missing. For more info see https://hortonworks.com/blog/3-great-reasons-to-try-hive-view-2-0/ Note that column stats are listed under table stats and you can see the individual column's statistics there.

carter · ‎05-15-2017

If you're looking for a standalone tool to convert CSV to ORC have a look at https://github.com/cartershanklin/csv-to-orc It's a standalone Java tool that can run anywhere, including off of your Hadoop cluster. It supports custom null strings, row skipping and basic Hive types (no complex types currently)

carter · ‎03-16-2017

Add these to your pom.xml or to your Maven to resolve HDP dependencies: <repository> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> <id>hortonworks.extrepo</id> <name>Hortonworks HDP</name> <url>http://repo.hortonworks.com/content/repositories/releases</url> </repository> <repository> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> <id>hortonworks.other</id> <name>Hortonworks Other Dependencies</name> <url>http://repo.hortonworks.com/content/groups/public</url> </repository> The first of these is for the main HDP artifacts. These second of these is needed for dependencies like jetty-sslengine.jar:6.1.26.hwx.

carter · ‎12-22-2016

With HDP 2.5 this is supported. This tutorial shows connecting Tableau to Phoenix end-to-end: http://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/

carter · ‎12-22-2016

This error usually indicates you have defined aux jars in hive-site.xml. For now (HDP 2.5 and below), aux jars need to be set in the client rather than as a server-side property when using Hive on Tez. There is an improvement request tracking this.

Online	Offline
Last Visited	‎10-19-2017 06:34 PM

Member Since	‎10-07-2015 08:16 PM
Last Visited	‎10-19-2017 06:34 PM
Posts	21
Kudos received	29

Cloudera Community

Re: where can i find HDP maven Repos

Re: How to connect to HBase using Tableau? Does Ho...

Re: As we add new nodes to the production, new fir...

Re: Insert values in Array data type - Hive

Using HPL/SQL with HDP 2.6 (Unsupported)

Fixing HiveServer Interactive / LLAP failures with...

Re: hive set map.reduce.tasks not working

Re: Hive Explode / Lateral View clarification

Re: Viewing Hive Column or Table level Statistics

Re: How to load CSV file directly into Hive ORC ta...

Re: where can i find HDP maven Repos

Re: How to connect to HBase using Tableau? Does Ho...

Re: java.io.IOException: Previous writer likely fa...