About Santanu

Santanu · ‎09-30-2018

@Rahul Soni , I am creating one avro file using flume regex interceptor and multiplexing. But that file contains value something like below and when I am trying to generate schema using avro-tools getschema option it is giving only "headers" and "body" as 2 fields. Please advise how to resolve this. Objavro.codenullavro.schema▒{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}▒LA▒▒;ڍ▒(▒▒▒=▒YBigDatJava▒Y{"created_at":"Thu Sep 27 11:40:44 +0000 2018","id":1045277052822269952,"id_str":"1045277052822269952","text":"RT @SebasthSeppel: #Jugh \ud83d\udce3 heute ist wieder JUGH !\nHeute haben wir @gschmutz bei uns mit dem spannenden Thema: Streaming Data Ingestion in\u2026","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":716199874157547520,"id_str":"716199874157547520","name":"Alexander Gomes","screen_name":"nEinsKnull","location":"Kassel, Hessen","url":null,"description":"CTO of family, work at @Micromata, loves #tec #rc #mountainbikes and #paragliding","trans......... Thanking you Santanu

Santanu · ‎09-22-2018

Thanks a lot for your response and the clarification.

Santanu · ‎09-20-2018

Hello Friends, I have recently started working on hadoop and that too from 2.x onwards. Also, I do understand the concept of High-Availability of NN in Hadoop 2.x . But recently someone told me that even on Hadoop 1.x NN HA was possible, because Hadoop 1.x had ZK and QJN . If so then why most of the articles on web always say NN was SPoF on Hadoop 1.x ? Please do let me know the answer. Thanking you Santanu

Santanu · ‎07-02-2018

Thanks Vini. Based on your suggestion I am able to connect. Thanking you Santanu

Santanu · ‎07-01-2018

Hello Friends, Please help me on this problem. I am trying to install Jethro data on HDP sandbox and connect HDFS and Hive from that. Installation and setup of Jethro 3.x on Hortonworks (VirtualBox) HDP2.4 was successful. Access to HDFS from Jethro client by creating table and inserting data into that table is also working fine. But, I am not able connect Hive from Jethro client to create external table by accessing Hive JDBC. It is giving error: "jdbc driver load has failed or not found, setup JD_HIVE_JDBC_CLASSPATH env in /opt/jethro/current/conf/jd-hadoop-env.sh" . I have added relevant hive jdbc jars like "/usr/hdp/current/hive-client/lib/hive-jdbc*.jar" to JD_HIVE_JDBC_CLASSPATH env. Still it is not working. Please suggest what else I need to connect Hive from Jethro client. Thanking you Santanu Ghosh

Santanu · ‎04-20-2018

Thanks @Shu for the explanation. I really appreciate it.

Santanu · ‎04-19-2018

Thanks @Shu. So if input file is tab delimited and it says create hive table with default format, then just : "create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ; " should be sufficient. This will create table with tab delimiter, and it will take file storage from default format which is TextFile. (hive.default.fileformat) Thanking you Santanu

Santanu · ‎04-19-2018

Hello Friends, I have couple of questions related to Hive and I am confused about the correct answer. Q1: What is the correct way to define default Hive table ? Q2: Also, what is the default delimiter for Hive table ? In other words, requirement says create hive table with default format. Now I am checking "set hive.default.fileformat;" on hive cli. It is showing "TextFile". So I am creating the Hive table like : create table mydb.user (uid int,name string) ; But it is creating table with Row Format "LazySimpleSerDe" and without any delimiter. Is this correct way to define default hive table ? Or shall I define it as : create table mydb.user (uid int,name string) row format delimited fields terminated by '\t' ; Because, in this case it is showing Row Format "DELIMITED" and Fields Terminated By '\t'. Thanking you Santanu

Santanu · ‎03-24-2018

Hello Friends, I found something while running the final task (Task 10) of HDPCD Practice Exam. It says, "Put local files from /home/hortonworks/datasets/... into HDFS /user/hortonworks/..." But, "/home" directory does not have "hortonworks", it has "horton" sub directory. Also, user horton does not have permission to create any sub directory under "/home". Similarly, on hdfs there is "/user/horton" not "/user/hortonworks" directory. Not sure, whether the questions are incorrect or am I doing anything wrong ? Also, please let me know how to access MySQL on HDPCD Practice exam instance ? I tried "mysql -u root -p" with "hadoop" as password, but it did not work. Thanking you Santanu

Santanu · ‎03-23-2018

Thanks Geoffrey for your response. Yes, I did that and now the issue is resolved. It seems, only the EC2 instance of N.Virginia region works fine for hdpcd practice exam. Not sure about the reason. But when I changed the region from drop down and created a new instance, I was able to connect from VNC Viewer. I hope that problem will not appear again. If so then I will reach out to you guys. Thanking you Santanu

Online	Offline
Last Visited	‎01-20-2019 11:10 PM

Member Since	‎09-11-2018 06:54 AM
Last Visited	‎01-20-2019 11:10 PM
Posts	39
Kudos received	1

Cloudera Community

Re: How to build Avro Hive table based on avro fil...

Re: name node high availability on hadoop 1.x

name node high availability on hadoop 1.x

Re: Connect Hive from Jethro Client

Connect Hive from Jethro Client

Re: Hive Table Default Format

Re: Hive Table Default Format

Hive Table Default Format

HDPCD Practice Exam Task 10 error

Re: unable to connect aws ec2 from vnc viewer or s...