Member since
01-21-2018
58
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3302 | 09-23-2017 03:05 AM | |
1566 | 08-31-2017 08:20 PM | |
6192 | 05-15-2017 06:06 PM |
09-08-2017
01:18 PM
both VmWare and VirtualBox 13GB RAM and Bridge network
... View more
09-08-2017
12:36 PM
I moved forward VirtualBox and same results I can't start up my VM. I don't get any error , the vm just stay all time loading the OS
... View more
09-08-2017
01:14 AM
Hey everyone, I have also problem setting up my sandbox, listen up. I'm in a windows 10 and I set for my VM 13GB RAM and 2 processors, I have enough storage but always when I start it up I got this message: And then the machine stuck a long time loading the OS CentOS. Someone could please let me know how could I figure it out? thanks
... View more
08-31-2017
08:20 PM
Hi guys I want to posted the solution , finally I have added in my flume file the options below: TwitterAgent.sources.Twitter.maxBatchSize = 50000 TwitterAgent.sources.Twitter.maxBatchDurationMillis = 100000 thanks
... View more
08-31-2017
06:46 PM
Hello guys, I've used flume to catch a few twitts. My agent and whole flume configuration run pretty well, my key point is when I need to read the outcome file with HIVE. I create an avro schema file to reuse it in hive to create the table to store flume data. (flume outcome file comes in avro format) Once the table in hive is ready I tried to check it out and confirm the right format and that looks good as you can see in the attached file tweettable.jpg. Then, I perform the command to load the flume data into this table and according to result message in hive that's also performed as expected. Even if the numRows is marked as 0. (attached file load.jpg) Now, finally when I try to read the data is when I got an error message saying that is not possible read this data. Unfortunately, I don't understand why guys. Please if someone can give me a hand with that I really appreciate. (attached file result.jpg) If you need more details about the scripts and everything I have used in this test I've posted all info in a GitHub repository https://github.com/AndresUrregoAngel/Flume_Twitter thanks so much
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hive
08-16-2017
06:48 PM
You are so amazing I really appreciate each of your comments and the time that you have put on. thanks so much. Just to let you know buddy the part that I forgot to tell you is that before going to pig I load the file information in a Hive table within the DB POC. then this is why I used: july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader; Then the data coming up from Hive already have a format and the relation in Pig will match the same schema. the problem is that even after setting a schema for the output I'm not able to store this outcome in a Hive table 😞 . so to get my real scenario you should: 1. Load the CSV file in HDFS without headers (I delete them before to avoid filters) run: tail -n +2 OD_XXX.csv >> july.csv 2. Create the table and load the file: Hive: create table july ( start_date string, start_station int, end_date string, end_station int, duration int, member_s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA INPATH '/user/andresangel/datasets/july.CSV'
OVERWRITE INTO TABLE july;
3. Follow my script posted up to the end to try to store the final outcome on a hive table 🙂 thanks buddy @Dinesh Chitlangia
... View more
08-16-2017
05:39 PM
1 Kudo
yes exactly you right but this is only in import as --export-dir for export operation 🙂 this how that works
... View more
08-16-2017
04:00 PM
Checking the official documentation link here. they suggest that each table will create automatically a separate folder for the outcome to store the data in the default HDFS path for the user who perform the operation. $ sqoop import-all-tables --connect jdbc:mysql://db.foo.com/corp
$ hadoop fs -ls
Found 4 items
drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/EMPLOYEES
drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/PAYCHECKS
drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/DEPARTMENTS
drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/OFFICE_SUPPLIES
... View more
08-14-2017
07:54 PM
Hi everyone, I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:` sqoop import --connect jdbc:mysql://xxxx/retail_export --username xxxx --password xxx \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles Unfortunatelly that only insert the new data but I can't update the record that already exits. before you ask a couple of statements: * the table doesn't have a PK * if i dont specify --last-value as a parametter I will get duplicated records for those who already exist. How could I figure it out without apply a truncate table or recreate the table using a PK? exist the way? thanks guys.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
08-14-2017
04:07 PM
I couldnt load it here cuz it's a little big huge. So please download from my one drive just clicking here . Thanks buddy!
... View more