Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1171 | 01-16-2018 03:38 PM | |
6139 | 11-13-2017 05:45 PM | |
3032 | 11-13-2017 12:30 AM | |
1518 | 10-27-2017 03:58 AM | |
28426 | 10-19-2017 03:17 AM |
01-10-2017
06:08 AM
3 Kudos
@Sridevi Kaup Ambari 2.4.2 is the most stable release as of today.
This is a maintainenance release of Ambari 2.4.0. Ambari 2.4.0 added the following features:
Version Definition File (AMBARI-15636) Audit Logging (AMBARI-15241) Ambari Management Packs (AMBARI-14682) Dynamic Stack Extensions (AMBARI-12885) Stack Featurization (AMBARI-13363) Spark2 Service (AMBARI-16753) Ambari Infra Service (AMBARI-17822) Zeppelin Service (AMBARI-15265) Logsearch Service - Beta Release (AMBARI-15139)
... View more
01-10-2017
06:05 AM
1 Kudo
@Ramya Grandhi Yes you can, however, this will spike your CPU and RAM usage, thereby affecting the overall response time of your system. You might notice sluggish performance, system might occasionally hang. To minimize these effects, when you are using the Sandbox, ensure all other unnecessary applications/browsers are closed.
... View more
01-10-2017
01:32 AM
You're welcome. Could you kindly accept my answer if I have answered your question adequately?
... View more
01-10-2017
12:42 AM
@bhaskaran periasamy You must use the correct package in you Pig script. data = LOAD 'db.table' using org.apache.hive.hcatalog.pig.HCatLoader();
... View more
01-10-2017
12:21 AM
4 Kudos
@Adnan Alvee You could use the wholetextfiles() in SparkContext provided by Scala. Here is a simple outline that will help you avoid the spark-submit for each file and thereby save you the 15-30 seconds per file by iterating over multiple files within the same job. val data = sc.wholeTextFiles("HDFS_PATH")
val files = data.map { case (filename, content) => filename}
def doSomething(file: String) = {
println (file);
// your logic of processing a single file comes here
val logData = sc.textFile(file);
val numAs = logData.filter(line => line.contains("a")).count();
println("Lines with a: %s".format(numAs));
// save rdd of single file processed data to hdfs comes here
}
files.collect.foreach( filename => {
doSomething(filename)
}) where : data - org.apache.spark.rdd.RDD[(String, String)] files - org.apache.spark.rdd.RDD[String]- filenames doSomething(filename) - your requirement/logic HDFS_PATH - hdfs path to your source directory (you could even restrict to import certain kind of files by specifying path as "/hdfspath/*.csv" sc - SparkContext instance
... View more
01-09-2017
11:58 PM
@Rkg Grg
'last_ddltime' gets updated everytime a table is modified. How are you replacing the data file of the table ? I have always used LOAD DATA to replace the data from a new file and it updates last_ddltime on every occassion. LOAD DATA [LOCAL] INPATH '/pathToNewFile' OVERWRITE INTO TABLE tablename; Converting unixtime to timestamp select cast(from_unixtime(1483631785) AS timestamp);
... View more
01-09-2017
11:28 PM
2 Kudos
@Neeraj Joshi Currently, HDPCD is completely exercise based exam where you will be given tasks from Hive, Pig, Sqoop and Flume with proper instructions on what is expected out of each task, where is the input located, where must you direct the output etc. Refer HDPCD Exam Objectives to aid you in your preparation. I wrote the exam yesterday and I feel the practice exam on AWS was of great help to get me a feel of the actual exam pattern and environment. Best of luck.
... View more
01-09-2017
11:09 PM
1 Kudo
@Neeraj Joshi You will have access to both VI and gedit editor. You may choose either one to write your script. Due to time constraint and the poor exam interface, I also chose to write in gedit instead of VI. I wrote the exam yesterday and I do not think this will change anytime soon 🙂 You may want to get your feet wet by writing the practice exam provided on AWS. This will give you a feel of the actual exam environment. Best of luck for your exam.
... View more
- « Previous
- Next »