Member since
04-04-2016
147
Posts
40
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1173 | 07-22-2016 12:37 AM | |
4234 | 07-21-2016 11:48 PM | |
1605 | 07-21-2016 11:28 PM | |
2243 | 07-21-2016 09:53 PM | |
3322 | 07-08-2016 07:56 PM |
07-28-2016
10:54 PM
Hi, I am looking for a reference and demos to show Text & Data mining capabilities on our platform. I am trying to answer one of the RFP questions. Any help is highly appreciated. Thanks, Sujitha
... View more
07-22-2016
07:38 PM
Hi @srinivasa rao, glad that you are satisfied by the answer provided by Benjamin Leonhardi. Please let me know in case of any issues.
... View more
07-22-2016
12:37 AM
Hi @Juan Manuel Nieto, Generally /tmp directory mainly has temporary storage during MapReduce phases. Mapreduce adds the intermediate data that is kept under /tmp. These files will be automatically cleared out when Mapreduce job execution completes. Temporary files are also created by pig as it runs on Mapreduce phenomenon. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then we have to handle the situation. This could be better handled by added the lines or changes in the script itself. For further details I found an article here: Hope that helps. Thanks, Sujitha
... View more
07-22-2016
12:18 AM
Hi @srinivasa rao, Answer1: Application master negotiates with the Resource Manager for the resources not for the containers. Container can be assumed as a box with resources for running an application. Resources are negotiated with resource manager through resource manager protocol by Application Master based on the User-code. Since it is essentially user-code, do not trust the ApplicationMaster(s) i.e. any ApplicationMaster is not a privileged service. The YARN system (ResourceManager and NodeManager) has to protect itself from faulty or malicious ApplicationMaster(s) and resources granted to them at all costs. Answer2: each job is performed within a Container. it could be multiple jobs or one job thats been done in a container based on the resources granted by the RM through AM. Answer3: the internals of how the resources are allocated or scheduled is always taken care by Resource Manager. May be 20% or rest of the 80% always it the job of the Resource Manager to allocate the resources to the Application Master working along with the node manager on that particular Node. Its always the responsibility of Node Manager and Resource Manager to check the status of the resources allocated. Hope that help. For more information here is the article which explains in simple terms. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ Thanks, Sujitha
... View more
07-21-2016
11:48 PM
Hi @Johnny Fugers and @Suyog Nagaokar, I tried to provide the answer here . Please let me know if you still have issues. https://community.hortonworks.com/questions/46444/convert-millseconds-into-unix-timestamp.html#answer-46590 Thanks, Sujitha
... View more
07-21-2016
11:28 PM
1 Kudo
Hi @Johnny Fugers, Input file data as: dataset.csv This gives answer in CET 563355,1388481000000 563355,1388481000000 563355,1388481000000 563356,1388481000000 a = load '/tmp/dataset.csv' using PigStorage(',') as (id:chararray, at:chararray); b = foreach a generate id, ToString( ToDate( (long)at), 'yyyy-MM-dd hh:ss:mm' ); c = group b by id; dump c; This is how it works in GMT: a = load '/tmp/dataset.csv' using PigStorage(',') as (id:chararray, at:chararray);
b = foreach a generate id, ToDate(ToString(ToDate((long) at), 'yyyy-MM-dd hh:ss:mm'), 'yyyy-MM-dd hh:ss:mm', 'GMT'); c = group b by id; dump c; Hope that helps, Thanks, Sujitha
... View more
07-21-2016
09:53 PM
Hi @Rajinder Kaur, The step 8 should not take more than 3 seconds. Can you make sure you followed all the steps as instructed. Also wanted to check if you created sandbox in Azure? If so there are certain configurations that needs to be changed, thats also been instructed as part of the lab. Please rerun the steps and let me know if that works. Attached the output screen shots just for reference. Thanks, Sujitha
... View more
07-19-2016
08:40 PM
Hi, I am working on a RFP and looking for an answer to: Ability to recalculate and alert when there are changes to historical data within a time period within your solution: What I don't understand is we cannot modify the data in HDFS. Its immutable. So the change of historical data, does that applies? Any help is highly appreciated. Thanks, Sujitha
... View more
Labels:
- Labels:
-
HDFS
07-18-2016
07:44 PM
Hi, I am working on a RFP and looking answer: Specify Administration Tools recommended and their functionality in HDP in short. I am looking for a way to keep this simple. Any help is most appreciated, Thanks, Sujitha
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
07-14-2016
06:52 PM
Hi @ghost k, If this resolved your problem can you please vote the best answer. Thanks, Sujitha
... View more