About dineshc

dineshc · ‎02-09-2017

@ed day If you do not specify the file system in the jar path, it will take the local file system path as shown below: hive.aux.jars.path=/home/ed/Downloads/serde/json-serde-1.3.7-jar-with-dependencies.jar gets translated to : hive.aux.jars.path=file:///home/ed/Downloads/serde/json-serde-1.3.7-jar-with-dependencies.jar In your case, you can try by explicitly adding file:// before the jar path. If your jar is in the HDFS then use: hive.aux.jars.path=hdfs:///master.royble.co.uk/jars/json-serde-1.3.7-jar-with-dependencies.jar P.S. please verify that the hadoop user you are using to execute these, has the read privileges to your local jar path.

dineshc · ‎02-09-2017

One of my talend package is failing when it tries to close the hive connection. Here is the log snapshot: [FATAL]: alpha.talendPackage - tHiveClose_1 Error while cleaning up the server resources Exception in component tHiveClose_1 java.sql.SQLException: Error while cleaning up the server resources at org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:729) at alpha.talendPackage.tHiveClose_1Process(TalendPackage.java:3274) at alpha.talendPackage$1tRunJob_1Thread.run(TalendPackage.java:2983) at routines.system.ThreadPoolWorker.runIt(TalendThreadPool.java:159) at routines.system.ThreadPoolWorker.runWork(TalendThreadPool.java:150) at routines.system.ThreadPoolWorker.access$0(TalendThreadPool.java:145) at routines.system.ThreadPoolWorker$1.run(TalendThreadPool.java:122) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.thrift.transport.TTransportException: org.apache.http.NoHttpResponseException: abc.com.net:10001 failed to respond at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:297) at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.hive.service.cli.thrift.TCLIService$Client.send_CloseSession(TCLIService.java:173) at org.apache.hive.service.cli.thrift.TCLIService$Client.CloseSession(TCLIService.java:165) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1388) at com.sun.proxy.$Proxy9.CloseSession(Unknown Source) at org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:727) ... 7 more I verified the connectivity between Talend Server and Hive Server (abc.com.net:10001). Also verified the connectivity on the cluster via Knox. What really puzzles me is that it only fails for this particular talend job while the rest of the jobs are working absolutely fine. Thanks in advance.

dineshc · ‎02-07-2017

@William Gonzalez - Thank you for the details, for now I would just stick with exam objectives but I am definitely interested in doing things from scratch to get a in-depth understanding.

dineshc · ‎02-07-2017

For HDPCD, we could use the Hortonworks Sandbox and try to go through the exam objectives to prepare ourselves. For HDPCA, can we use the same Sandbox and try to implement the tasks in Exam Objectives ? Since I am not planning on taking the Hortonworks Official Training course, I would like to know 1. What basic setup(Sandbox/VM with only Linux flavor and no hadoop) do I need to start with ? 2. How to go about preparing for HDPCA, specially what do I need to start with(other than referring the exam objectives).

dineshc · ‎01-12-2017

It is confirmed that nested objects are not supported in JSON via Upload Table function. Here is an excerpt from official documentation: The following json format is supported: [ { "col1Name" : "value-1-1", "col2Name" : "value-1-2"}, { "col1Name" : "value-2-1", "col2Name" : "value-2-2"}] The file should contain a valid json array containing any number of json objects. Each json object should contain column names as property and column values as property values. The names, number and order of columns in the table are decided from the first object of the json file. The names and datatype of column can be edited during the preview step. If some json objects have extra properties then they are ignored. If they do not have some of the properties then null values are assumed. Note that extension of files cannot be “.json”

dineshc · ‎01-10-2017

@Pardeep @vsithannan - Appreciate any inputs from you guys. I have tried to find the answer on multiple forums to no avail and I am finally posting it here.

dineshc · ‎01-10-2017

When I try to upload a simple JSON using Upload Table in Ambari>Hive View I am able to do it. When I try to upload a nested JSON ( containing one or more different arrays ), I get "E090 Row data cannot have an array. [IllegalArgumentException]." I am beginning to wonder if Upload Table supports loading complex nested JSON. Attached complex.txt file that I am trying to load. Please rename it to .json if you want to replicate the issue. Thank you.

dineshc · ‎01-10-2017

"perform a join on two or more datasets" - implies that there are more than 2 data sets involved and thus you may ave to write a solution which can comprise only a Map Join or only a Reduce Join or a combination of both. In essence, if the data sets are too large and could result in memory issues, then bloom filter is the route to take. So from a conceptual perspective, it is good to know Bloom Filter even if it is not specifically mentioned in Exam Objectives.

dineshc · ‎01-10-2017

@Ramesh Raja In the exam you may or may not be required to remove the header. It is better to know how to do it and feel more comfortable. To remove header in Hive use tblproperties: Create table test( name string, email string ) tblproperties("skip.header.line.count"="1"); //Now load the data into the table To remove header in Pig: A=load 'data.csv' using PigStorage(','); B=FILTER A BY $0>1;

dineshc · ‎01-10-2017

Very elaborate answer!

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Re: Hive cannot see jar

Hive - Error while cleaning up the server resource...

Re: How to start preparing for HDPCA ?

How to start preparing for HDPCA ?

Re: Error when trying to load Nested JSON file usi...

Re: Error when trying to load Nested JSON file usi...

Error when trying to load Nested JSON file using A...

Re: HDPCD: Java exam objectives,HDPCD: Java Exam O...

Re: hi i am planning to took HDPCD certificate exa...

Re: Removing YARN job summary