Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1185 | 01-16-2018 03:38 PM | |
6166 | 11-13-2017 05:45 PM | |
3064 | 11-13-2017 12:30 AM | |
1527 | 10-27-2017 03:58 AM | |
28469 | 10-19-2017 03:17 AM |
08-17-2017
07:36 PM
Awesome. I faced this same problem in HDP 2.5, I was able to resolve it using the instructions mentioned in this post!
... View more
08-15-2017
11:26 PM
2 Kudos
@Ramya Please follow this official documentation of Sqoop. --target-dir is not a valid option when using import-all-tables. Thus you cannot use this.
... View more
08-15-2017
07:12 PM
Thanks for sharing the data. Issue has been resolved. See the new answer. Please consider accepting and upvoting the new answer.
... View more
08-15-2017
07:11 PM
1 Kudo
@Andres Urrego There were multiple issues on the script that you posted. Also, thank you for sharing the test data, I was able to spot other issues too. 1. Your input file in pig is a CSV but you are using HCatLoader. This is incorrect, Hcatloader is only used when you want to load data in Pig from a Hive table. In you case you can put the data in a folder in HDFS and then load data from that path in Pig. 2. Since you loaded data in Pig without specifying the schema, then you must refer to the columns by position like $0, $1 and so on instead of referring to them by name like start_date,duration etc. Refer this page to understand the LOAD statement in Pig 3. When using ToDate function in Pig, either your data must match the default date format, or you must specify the date format that you have in you data. Refer to the script I have added below to see the same. Click here to understand the syntax of ToDate() in Pig. Click here to understand how to create date format string 4. When storing data from Pig to Hive using HCatStorer, ensure that the alias in Pig has fields names and data types to avoid any mismatch or type casting issues. After going through the sample data and replicating the issue, I was able to solve and load the data correctly. Here is the Pig script: july = LOAD '/hdfspath/july.csv' USING PigStorage(',');
july = filter july by $0!='start_date'; //to remove header line
july_cl = FOREACH july
GENERATE GetDay(ToDate((chararray)$0,'yyyy-MM-dd HH:mm')) as day:int,
$1 as start_station:chararray,(int)$4 as duration:int;
jul_cl = FILTER july_cl BY day==31;
july_gr = GROUP jul_cl BY (day,start_station);
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl.duration);
avg_dura = AVG(jul_cl.duration);
qty_trips = COUNT(jul_cl);
GENERATE FLATTEN( group) AS (day, code_station),
(double)total_dura as (total_dura:double),
(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
}
STORE july_result INTO 'poc.july_analysis' USING org.apache.hive.hcatalog.pig.HCatStorer(); In the above script you were trying to store INTO 'poc.july_analysis'. Ensure there exists a database in hive called 'poc' and there exists the 'july_analysis' table in that poc database. If the poc database does not exist, then either create it or avoid using it(create july_analysis in default hive database). Here is the Hive Table DDL I used: CREATE TABLE july_analysis(
day int,
code_station string,
total_dura double,
avg_dura float,
qty_trips int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
... View more
08-14-2017
05:21 PM
@Ihor Dziuba Are you using VNC on your office computer ? Usually office computer have VPN network and thus Internet proxy settings. You must change VNC settings to use the same proxy.
... View more
08-14-2017
03:02 PM
@Andres Urrego - Sounds good. I would love to replicate this scenario and help you. Can you share the POC.July file that you are loading in Pig. even if you give me a sample - like first few lines of that input file, it will help.
... View more
08-11-2017
03:35 AM
1 Kudo
@Andres Urrego I had faced a similar issue. This is due to the implicit cast not working correctly, thus you must do explicit casting. Also, I recommend the avg_dura be of float type to avoid loss of precision. Replace the generate flatten statement in your query as shown below with explicit casting. GENERATE FLATTEN( (int) group) as day:int, (int) total_dura as total_dura:int,
(float) avg_dura as avg_dura:int,(int) qty_trips as qty_trips:int; If this does not help, then please share sample data, the schema of existing hive table and I will solve it.
... View more
08-08-2017
08:04 PM
1 Kudo
@pv poreddy Is SSL enabled ? Is this a Kerberized Cluster ? If answer to any of the above questions is yes, then the connection url will change. Also I hope you are pre-pending '!connect' before your connection URL in order to connect using beeling.
... View more
08-07-2017
07:49 PM
3 Kudos
@Roberto Sancho
1. If you need any detail from the backup then you MUST migrate the hive metastore backup to new postgres instance. 2. To ensure that the backup is applied correctly and that there are no inconsistencies, you must shut down the Hive Instance/Metastore then apply the backup, do quick consistency checks and then restart hive metastore and the instance. 3. You only need to shutdown Hive Metastore/Instance. No other component needs to be shutdown.
... View more
08-07-2017
07:29 PM
1 Kudo
@Roberto Sancho 1. If you need any detail from the backup then you MUST migrate the hive metastore backup to new postgres instance. 2. To ensure that the backup is applied correctly and that there are no inconsistencies, you must shut down the Hive Instance/Metastore then apply the backup, do quick consistency checks and then restart hive metastore and the instance.
... View more