About dineshc

dineshc · ‎01-10-2017

@Sridevi Kaup Ambari 2.4.2 is the most stable release as of today. This is a maintainenance release of Ambari 2.4.0. Ambari 2.4.0 added the following features: Version Definition File (AMBARI-15636) Audit Logging (AMBARI-15241) Ambari Management Packs (AMBARI-14682) Dynamic Stack Extensions (AMBARI-12885) Stack Featurization (AMBARI-13363) Spark2 Service (AMBARI-16753) Ambari Infra Service (AMBARI-17822) Zeppelin Service (AMBARI-15265) Logsearch Service - Beta Release (AMBARI-15139)

dineshc · ‎01-10-2017

@Ramya Grandhi Yes you can, however, this will spike your CPU and RAM usage, thereby affecting the overall response time of your system. You might notice sluggish performance, system might occasionally hang. To minimize these effects, when you are using the Sandbox, ensure all other unnecessary applications/browsers are closed.

dineshc · ‎01-10-2017

Thank you!

dineshc · ‎01-10-2017

This works for me! Thank you.

dineshc · ‎01-10-2017

You're welcome. Could you kindly accept my answer if I have answered your question adequately?

dineshc · ‎01-10-2017

@bhaskaran periasamy You must use the correct package in you Pig script. data = LOAD 'db.table' using org.apache.hive.hcatalog.pig.HCatLoader();

dineshc · ‎01-10-2017

@Adnan Alvee You could use the wholetextfiles() in SparkContext provided by Scala. Here is a simple outline that will help you avoid the spark-submit for each file and thereby save you the 15-30 seconds per file by iterating over multiple files within the same job. val data = sc.wholeTextFiles("HDFS_PATH") val files = data.map { case (filename, content) => filename} def doSomething(file: String) = { println (file); // your logic of processing a single file comes here val logData = sc.textFile(file); val numAs = logData.filter(line => line.contains("a")).count(); println("Lines with a: %s".format(numAs)); // save rdd of single file processed data to hdfs comes here } files.collect.foreach( filename => { doSomething(filename) }) where : data - org.apache.spark.rdd.RDD[(String, String)] files - org.apache.spark.rdd.RDD[String]- filenames doSomething(filename) - your requirement/logic HDFS_PATH - hdfs path to your source directory (you could even restrict to import certain kind of files by specifying path as "/hdfspath/*.csv" sc - SparkContext instance

dineshc · ‎01-09-2017

@Rkg Grg 'last_ddltime' gets updated everytime a table is modified. How are you replacing the data file of the table ? I have always used LOAD DATA to replace the data from a new file and it updates last_ddltime on every occassion. LOAD DATA [LOCAL] INPATH '/pathToNewFile' OVERWRITE INTO TABLE tablename; Converting unixtime to timestamp select cast(from_unixtime(1483631785) AS timestamp);

dineshc · ‎01-09-2017

@Neeraj Joshi Currently, HDPCD is completely exercise based exam where you will be given tasks from Hive, Pig, Sqoop and Flume with proper instructions on what is expected out of each task, where is the input located, where must you direct the output etc. Refer HDPCD Exam Objectives to aid you in your preparation. I wrote the exam yesterday and I feel the practice exam on AWS was of great help to get me a feel of the actual exam pattern and environment. Best of luck.

dineshc · ‎01-09-2017

@Neeraj Joshi You will have access to both VI and gedit editor. You may choose either one to write your script. Due to time constraint and the poor exam interface, I also chose to write in gedit instead of VI. I wrote the exam yesterday and I do not think this will change anytime soon 🙂 You may want to get your feet wet by writing the practice exam provided on AWS. This will give you a feel of the actual exam environment. Best of luck for your exam.

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Re: Which version is currently the stable version ...

Re: Can't we Install Hortonworks Sandbox 2.5 in a ...

Re: creating pig or Hive script in HDPCD certifica...

Re: Installing Nifi using Ambari

Re: creating pig or Hive script in HDPCD certifica...

Re: pig -useHCatalog not loading all the jars requ...

Re: How to iterate multiple HDFS files in Spark-Sc...

Re: Hive metadata report

Re: has the pattern of HDPCD certification exam ch...

Re: creating pig or Hive script in HDPCD certifica...