About dineshc

dineshc · ‎08-17-2017

Awesome. I faced this same problem in HDP 2.5, I was able to resolve it using the instructions mentioned in this post!

dineshc · ‎08-15-2017

@Ramya Please follow this official documentation of Sqoop. --target-dir is not a valid option when using import-all-tables. Thus you cannot use this.

dineshc · ‎08-15-2017

Thanks for sharing the data. Issue has been resolved. See the new answer. Please consider accepting and upvoting the new answer.

dineshc · ‎08-15-2017

@Andres Urrego There were multiple issues on the script that you posted. Also, thank you for sharing the test data, I was able to spot other issues too. 1. Your input file in pig is a CSV but you are using HCatLoader. This is incorrect, Hcatloader is only used when you want to load data in Pig from a Hive table. In you case you can put the data in a folder in HDFS and then load data from that path in Pig. 2. Since you loaded data in Pig without specifying the schema, then you must refer to the columns by position like $0, $1 and so on instead of referring to them by name like start_date,duration etc. Refer this page to understand the LOAD statement in Pig 3. When using ToDate function in Pig, either your data must match the default date format, or you must specify the date format that you have in you data. Refer to the script I have added below to see the same. Click here to understand the syntax of ToDate() in Pig. Click here to understand how to create date format string 4. When storing data from Pig to Hive using HCatStorer, ensure that the alias in Pig has fields names and data types to avoid any mismatch or type casting issues. After going through the sample data and replicating the issue, I was able to solve and load the data correctly. Here is the Pig script: july = LOAD '/hdfspath/july.csv' USING PigStorage(','); july = filter july by $0!='start_date'; //to remove header line july_cl = FOREACH july GENERATE GetDay(ToDate((chararray)$0,'yyyy-MM-dd HH:mm')) as day:int, $1 as start_station:chararray,(int)$4 as duration:int; jul_cl = FILTER july_cl BY day==31; july_gr = GROUP jul_cl BY (day,start_station); july_result = FOREACH july_gr { total_dura = SUM(jul_cl.duration); avg_dura = AVG(jul_cl.duration); qty_trips = COUNT(jul_cl); GENERATE FLATTEN( group) AS (day, code_station), (double)total_dura as (total_dura:double), (float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int); } STORE july_result INTO 'poc.july_analysis' USING org.apache.hive.hcatalog.pig.HCatStorer(); In the above script you were trying to store INTO 'poc.july_analysis'. Ensure there exists a database in hive called 'poc' and there exists the 'july_analysis' table in that poc database. If the poc database does not exist, then either create it or avoid using it(create july_analysis in default hive database). Here is the Hive Table DDL I used: CREATE TABLE july_analysis( day int, code_station string, total_dura double, avg_dura float, qty_trips int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

dineshc · ‎08-14-2017

@Ihor Dziuba Are you using VNC on your office computer ? Usually office computer have VPN network and thus Internet proxy settings. You must change VNC settings to use the same proxy.

dineshc · ‎08-14-2017

@Andres Urrego - Sounds good. I would love to replicate this scenario and help you. Can you share the POC.July file that you are loading in Pig. even if you give me a sample - like first few lines of that input file, it will help.

dineshc · ‎08-11-2017

@Andres Urrego I had faced a similar issue. This is due to the implicit cast not working correctly, thus you must do explicit casting. Also, I recommend the avg_dura be of float type to avoid loss of precision. Replace the generate flatten statement in your query as shown below with explicit casting. GENERATE FLATTEN( (int) group) as day:int, (int) total_dura as total_dura:int, (float) avg_dura as avg_dura:int,(int) qty_trips as qty_trips:int; If this does not help, then please share sample data, the schema of existing hive table and I will solve it.

dineshc · ‎08-08-2017

@pv poreddy Is SSL enabled ? Is this a Kerberized Cluster ? If answer to any of the above questions is yes, then the connection url will change. Also I hope you are pre-pending '!connect' before your connection URL in order to connect using beeling.

dineshc · ‎08-07-2017

@Roberto Sancho 1. If you need any detail from the backup then you MUST migrate the hive metastore backup to new postgres instance. 2. To ensure that the backup is applied correctly and that there are no inconsistencies, you must shut down the Hive Instance/Metastore then apply the backup, do quick consistency checks and then restart hive metastore and the instance. 3. You only need to shutdown Hive Metastore/Instance. No other component needs to be shutdown.

dineshc · ‎08-07-2017

@Roberto Sancho 1. If you need any detail from the backup then you MUST migrate the hive metastore backup to new postgres instance. 2. To ensure that the backup is applied correctly and that there are no inconsistencies, you must shut down the Hive Instance/Metastore then apply the backup, do quick consistency checks and then restart hive metastore and the instance.

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Re: Falcon Web UI is inaccessible(HTTP 503 error) ...

Re: Sqoop Import-all-tables not working with targe...

Re: Pig - Store a complex relation schema in a hiv...

Re: Pig - Store a complex relation schema in a hiv...

Re: Cannot connect to HDP sandbox 2.6 with VNC

Re: Pig - Store a complex relation schema in a hiv...

Re: Pig - Store a complex relation schema in a hiv...

Re: Beeline connection throws error

Re: Restore backup hive metastore

Re: Restore backup hive metastore