About AndresUrrego

AndresUrrego · ‎09-08-2017

I moved forward VirtualBox and same results I can't start up my VM. I don't get any error , the vm just stay all time loading the OS

AndresUrrego · ‎09-08-2017

Hey everyone, I have also problem setting up my sandbox, listen up. I'm in a windows 10 and I set for my VM 13GB RAM and 2 processors, I have enough storage but always when I start it up I got this message: And then the machine stuck a long time loading the OS CentOS. Someone could please let me know how could I figure it out? thanks

AndresUrrego · ‎08-31-2017

Hi guys I want to posted the solution , finally I have added in my flume file the options below: TwitterAgent.sources.Twitter.maxBatchSize = 50000 TwitterAgent.sources.Twitter.maxBatchDurationMillis = 100000 thanks

AndresUrrego · ‎08-31-2017

Hello guys, I've used flume to catch a few twitts. My agent and whole flume configuration run pretty well, my key point is when I need to read the outcome file with HIVE. I create an avro schema file to reuse it in hive to create the table to store flume data. (flume outcome file comes in avro format) Once the table in hive is ready I tried to check it out and confirm the right format and that looks good as you can see in the attached file tweettable.jpg. Then, I perform the command to load the flume data into this table and according to result message in hive that's also performed as expected. Even if the numRows is marked as 0. (attached file load.jpg) Now, finally when I try to read the data is when I got an error message saying that is not possible read this data. Unfortunately, I don't understand why guys. Please if someone can give me a hand with that I really appreciate. (attached file result.jpg) If you need more details about the scripts and everything I have used in this test I've posted all info in a GitHub repository https://github.com/AndresUrregoAngel/Flume_Twitter thanks so much

AndresUrrego · ‎08-16-2017

You are so amazing I really appreciate each of your comments and the time that you have put on. thanks so much. Just to let you know buddy the part that I forgot to tell you is that before going to pig I load the file information in a Hive table within the DB POC. then this is why I used: july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader; Then the data coming up from Hive already have a format and the relation in Pig will match the same schema. the problem is that even after setting a schema for the output I'm not able to store this outcome in a Hive table 😞 . so to get my real scenario you should: 1. Load the CSV file in HDFS without headers (I delete them before to avoid filters) run: tail -n +2 OD_XXX.csv >> july.csv 2. Create the table and load the file: Hive: create table july ( start_date string, start_station int, end_date string, end_station int, duration int, member_s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA INPATH '/user/andresangel/datasets/july.CSV' OVERWRITE INTO TABLE july; 3. Follow my script posted up to the end to try to store the final outcome on a hive table 🙂 thanks buddy @Dinesh Chitlangia

AndresUrrego · ‎08-16-2017

yes exactly you right but this is only in import as --export-dir for export operation 🙂 this how that works

AndresUrrego · ‎08-16-2017

Checking the official documentation link here. they suggest that each table will create automatically a separate folder for the outcome to store the data in the default HDFS path for the user who perform the operation. $ sqoop import-all-tables --connect jdbc:mysql://db.foo.com/corp $ hadoop fs -ls Found 4 items drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/EMPLOYEES drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/PAYCHECKS drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/DEPARTMENTS drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/OFFICE_SUPPLIES

AndresUrrego · ‎08-14-2017

Hi everyone, I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:` sqoop import --connect jdbc:mysql://xxxx/retail_export --username xxxx --password xxx \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles Unfortunatelly that only insert the new data but I can't update the record that already exits. before you ask a couple of statements: * the table doesn't have a PK * if i dont specify --last-value as a parametter I will get duplicated records for those who already exist. How could I figure it out without apply a truncate table or recreate the table using a PK? exist the way? thanks guys.

AndresUrrego · ‎08-14-2017

I couldnt load it here cuz it's a little big huge. So please download from my one drive just clicking here . Thanks buddy!

AndresUrrego · ‎08-14-2017

thanks so much @Dinesh Chitlangia I set the output format finally like : GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int). Now before storing the output in HIVE I have created the table below: CREATE TABLE july_analysis (day int,code_station int, total_dura double,avg_dura float,qty_trips int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; My problem now is when I try to store the data because I get back a message saying: STORE july_result INTO 'poc.july_analysis' USING org.apache.hive.hcatalog.pig.HCatStorer (); 2017-08-14 09:56:55,712 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias july_result I saved the output as file to confirm that everything was coming up right and that worked , I also to the moment to open my pig consoled I taped Pig -x tez -useHCatalog. thanks for whole info you can provide I apprefciate. Andres U,

Online	Offline
Last Visited	‎01-21-2018 10:09 PM

Member Since	‎01-21-2018 06:37 PM
Last Visited	‎01-21-2018 10:09 PM
Posts	58
Kudos received	4

Cloudera Community

Re: Load several files into HIVE table

Re: Read flume twitter files with HIVE

Re: Import Sqoop as textfile

Re: Facing issue in installation of HDP sandbox 2....

Re: Facing issue in installation of HDP sandbox 2....

Re: Read flume twitter files with HIVE

Read flume twitter files with HIVE

Re: Pig - Store a complex relation schema in a hiv...

Re: Sqoop Import-all-tables not working with targe...

Re: Sqoop Import-all-tables not working with targe...

Updating hive table with sqoop from mysql table

Re: Pig - Store a complex relation schema in a hiv...

Re: Pig - Store a complex relation schema in a hiv...