Member since
01-21-2018
58
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4072 | 09-23-2017 03:05 AM | |
| 2072 | 08-31-2017 08:20 PM | |
| 7397 | 05-15-2017 06:06 PM |
08-13-2020
12:42 AM
While starting Hortonworks sandbox it gets stuck on "extracting and loading the hortonworks sandbox..." And after some times it shows the message of critical error or sometimes it says "your system has ran into an error we'll restart it"
... View more
08-22-2019
10:43 AM
Did you find a solution to this?
... View more
02-25-2018
09:14 PM
Sorry sometime not read completely come up an issue 😞 works seamlessly.!
... View more
01-12-2018
06:53 PM
@Andres Urrego Regarding the VM failing, is it the services shutting down on their own and not staying up? One common cause of this is not enough memory - to reduce resource usage try turning off all services and starting only HDFS, Zookeeper, YARN and Spark. Also make sure that you give your VM at least 8GB of RAM (https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide shows how). As far as documentation for Spark2/HDFS, here is a good Spark2 starter tutorial followed by a Spark2/HDFS project walkthrough. https://hortonworks.com/tutorial/hands-on-tour-of-apache-spark-in-5-minutes/#option-2-download-and-setup-hortonworks-data-platform-hdp-sandbox https://hortonworks.com/tutorial/sentiment-analysis-with-apache-spark/
... View more
09-23-2017
03:05 AM
Hi Guys, I'm so so .... Well, I just remember that you can create just an external table stored in the same folder all files with the same structure are located. So , in that way I will load whole records in one shoot. > CREATE EXTERNAL TABLE bixi_his > ( > STATIONS ARRAY<STRUCT<id: INT,s:STRING,n:string,st:string,b:string,su:string,m:string,lu:string,lc:string,bk:string,bl:string,la:float,lo:float,da:int,dx:int,ba:int,bx:int>>, > SCHEMESUSPENDED STRING, > TIMELOAD BIGINT > ) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > LOCATION '/user/ingenieroandresangel/datasets/bixi2017/'; thanks
... View more
08-31-2017
08:20 PM
Hi guys I want to posted the solution , finally I have added in my flume file the options below: TwitterAgent.sources.Twitter.maxBatchSize = 50000 TwitterAgent.sources.Twitter.maxBatchDurationMillis = 100000 thanks
... View more
08-28-2017
07:49 PM
Thank you @Nandish B Naidu..!! The solution worked.
... View more
08-15-2017
10:54 PM
1 Kudo
@Andres Urrego, What you are looking for (UPSERTS) aren't available in SQOOP-import. There are several approaches on how to actually update data in Hive. One of them is described here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes. You can also check Hive ACID transactions, or using Hive-Hbase integration package. Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...
... View more
08-16-2017
06:48 PM
You are so amazing I really appreciate each of your comments and the time that you have put on. thanks so much. Just to let you know buddy the part that I forgot to tell you is that before going to pig I load the file information in a Hive table within the DB POC. then this is why I used: july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader; Then the data coming up from Hive already have a format and the relation in Pig will match the same schema. the problem is that even after setting a schema for the output I'm not able to store this outcome in a Hive table 😞 . so to get my real scenario you should: 1. Load the CSV file in HDFS without headers (I delete them before to avoid filters) run: tail -n +2 OD_XXX.csv >> july.csv 2. Create the table and load the file: Hive: create table july ( start_date string, start_station int, end_date string, end_station int, duration int, member_s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA INPATH '/user/andresangel/datasets/july.CSV'
OVERWRITE INTO TABLE july;
3. Follow my script posted up to the end to try to store the final outcome on a hive table 🙂 thanks buddy @Dinesh Chitlangia
... View more
06-23-2017
07:18 PM
Thanks so much @Lester Martin I appreciate your help now worked, I replaced my statement using yours and it worked. salaries_cl = FOREACH salaries_fl GENERATE (int)year as year:int,$1,$2,$3, (long)salary as salary:long; Weird why the other one didn't work but well thanks so much.
... View more