About csankaraiah

csankaraiah · ‎03-31-2016

Steps to connect from a client machine (MAC in this case) to Hadoop cluster using hive JDBC. Here are couple of links that i have used to build this out Link that talks about hive drivers and jar files https://streever.atlassian.net/wiki/pages/viewpage.action?pageId=4390924 Link that talks about how to setup java jdbc setup for hive jar files https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients 1.Here are the jar files you need to connect to HS2 For HDP 2.3 You'll only need 2 jar files for the JDBC Client with HDP 2.3 # From /usr/hdp/current/hive-client hive-jdbc.jar (should be a symlink to the hive-jdbc-xxx-standalone.jar) # From /usr/hdp/current/hadoop-client hadoop-common.jar (hadoop-common-....jar) 2. Make sure that java home is set on your machine This is value that i have on my machine for JAVA_HOME echo $JAVA_HOME /Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home 3. Move these jar files to java library directory on my machine /Library/Java/Extensions 4. Set the java classpath for the hive jar file export CLASSPATH=$CLASSPATH:/Library/Java/Extensions/hive-jdbc.jar 5. Use any java based IDE (I used eclipse) to write a simple java class to connect to hiver server2. Where the jdbc string is mentioned, you have to specify the hive server that you are using and corresponding userid and password as well Here is the code for that package hive_test; import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveJdbcClientv1 { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; /** * @param args * @throws SQLException */ public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } //replace "hive" here with the name of the user the queries should run as Connection con = DriverManager.getConnection("jdbc:hive2://172.16.149.158:10000/default", "hive", ""); Statement stmt = con.createStatement(); String tableName = "testHiveDriverTable"; stmt.execute("drop table if exists " + tableName); stmt.execute("create table " + tableName + " (key int, value string)"); // show tables // String sql = "show tables '" + tableName + "'"; String sql = ("show tables"); ResultSet res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); } } }

csankaraiah · ‎02-08-2016

I am not having any activity on HBase. I wanted to make sure its stable before i build a demo on top of it. Should i add a custom HBase site property name zookeeper.session.timeout and set it to 60000?

csankaraiah · ‎02-08-2016

i have 8 GB allocated to the VM. You think thats sufficient?

csankaraiah · ‎02-08-2016

I have a HDP 2.3.2 sandbox up and running on VMWare fusion. I am able to start both HBase master and region server but then they fail after few hours. It seems like region server loses its connection to zookeeper for some reason. Here is the error that i get 2016-02-07 22:32:46,693 FATAL [main-EventThread] regionserver.HRegionServer: ABORTING region server sandbox.hortonworks.com,16020,1454868273283: regionserver:16020-0x150af1fbaec0164, quorum=sandbox.hortonworks.com:2181, baseZNode=/hbase-unsecure regionserver:16020-0x150af1fbaec0164 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:606) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Please let me know if i need to make any setting changes in sandbox.

csankaraiah · ‎01-14-2016

1Overview Traditionally enterprises have been dealing with data flows or data movement within their data centers. But as the world has become more flattened and global presence of companies has become a norm, enterprises are faced with the challenge of collecting and connecting data from their global footprint. This problem was daunting NSA a decade ago and they came up with a solution for this using a product which was later named as Apache Nifi. Apache nifi is a easy to use, powerful, and reliable system to process and distribute data. Within Nifi, as you will see, I will be able to build a global data flow with minimal to no Coding. You can learn the details about Nifi from Apache Nifi website. This is one of most well documented Apache projects. The focus of this article to just look at one specific feature within Nifi that I believe no other software product does it as well as Nifi. And this feature is “site to site” protocol data transfer. 2Business use case One of the classic business problem is to push data from a location that has a small IT footprint, to the main data center, where all the data is collected and connected. This small IT footprint could be a oil rig at the middle of the ocean, a small bank location at a remote mountain in a town, a sensor on a vehicle so on and so forth. So, your business wants a mechanism to push the data generated at various location to say Headquarters in a reliable fashion, with all the bells and whistles of an enterprise data flow which means maintain lineage, secure, provenance, audit, ease of operations etc. The data that’s generated at my sources are of various formats such as txt, csv, json, xml, audio, image etc.. and they could of various size ranges from few MBs to GBs. I wanted to break these files into smaller chunks as I have a low bandwidth at my source data centers and want to stich them together at the destination and load that into my centralized Hadoop data lake. 3Solution Architecture Apache Nifi (aka Hortonworks Data Flow) is a perfect tool to solve this problem. The overall architecture looks something like Fig 1. We have a Australian & Russian data center from where we want to move the data to US Headquarters. We will have what we call edge instance of nifi that will be sitting in Australian & Russian data center, that will act as a data acquisition points. We will then have a Nifi processing cluster in US where we will receive and process all these data coming from global location. We will build this end to end flow without any coding but rather by just a drag and drop GUI interface. 4Build the data flow Here are the high level steps to build the overall data flow. Step1) Setup a Nifi instance at Australian data center that will act as data acquisition instance. I will create a local instance of Nifi that will act as my Australian data center. Step2) Setup Nifi instance on a CentOS based virtual machine that will act as my Nifi data processing instance. This could be cluster of Nifi as well but, in my case it will be just a single instance. Step3) Build Nifi data flow for the processing instance. This will have an input port that will indicate that this instance can accept data from other Nifi instances. Step4) Build Nifi data for the data acquisition instance. This will have a “remote process group” that will talk to the Nifi data processing instance via site-to-site protocol. Step5) Test out the overall flow. Attached is the document that provides detailed step by step instruction on how to set this up. data-flow-across-data-centers-v5.zip

csankaraiah · ‎12-10-2015

I have a Solr Banana dashboard that has shows some panels with charts and tables. Is there a way to export a dashboard with data so that a user can play with it offline without being connected to the solr server?

csankaraiah · ‎12-04-2015

Here are some solution options i received from Ryan Merriman, Benjamin Leonhardi & Peter Coates Option1 You can use split –l to break the bigger file into small one while using iconv Option2 I suppose it would be a good idea to write a little program using icu if iconv fails. http://userguide.icu-project.org/conversion/converters Option3 You can try to do it in Java. Here’s one example: https://docs.oracle.com/javase/tutorial/i18n/text/stream.html You can try using File(Input|Output)Stream and String classes. You can specify character encoding when reading (converting byte[] to String): String s = String(byte[] bytes, Charset charset) And when writing it back out (String to byte[]): s.getBytes(Charset charset) This approach should solve your size limit problem.

csankaraiah · ‎12-04-2015

One of my client is trying to create an external Hive table in HDP from CSV files, (about 30 files, total of 2.5 TeraBytes) But the files are formatted as: “Little-endian, UTF-16 Unicode text, with CRLF, CR line terminators”. Here are couple of issues Is there an easy way to convert CSV/TXT files from Unicode (UTF-16 / UCS-2) to ASCII (UTF-8)? Is there is a way for Hive to recognize this format? He tried to use iconv to convert the utf-16 format to ascii format but it but it fails when source file is more than 15 GB file. iconv -c -f utf-16 -t us-ascii Any suggestions??

csankaraiah · ‎11-21-2015

As of HDF 1.0, we can write custom processor for HDF using Java, is there plan to support other programming languages.

csankaraiah · ‎11-20-2015

Thanks @bbende and @Jonas Straub

Online	Offline
Last Visited	‎08-14-2019 02:15 AM

Member Since	‎09-24-2015 10:46 PM
Last Visited	‎08-14-2019 02:15 AM
Posts	27
Kudos received	68

Cloudera Community

Re: Hive table with UTF-16 data

Re: Nifi Connection to MSSQL server DB

Re: What are best practices for setting up Backup ...

Simple steps to test Hive JDBC connect

Re: HBase on Sandbox 2.3.2 fails

Re: HBase on Sandbox 2.3.2 fails

HBase on Sandbox 2.3.2 fails

Apache Nifi (aka HDF) data flow across data center

Solr Banana Dashboard charts and data download

Re: Hive table with UTF-16 data

Hive table with UTF-16 data

HDF support for multiple languages.

Re: File Watcher scenario in HDF