About mkumar13

mkumar13 · ‎07-27-2016

Thanks @Arun A K, i'll verify suggestions on my test case let you know progress if get.

mkumar13 · ‎07-26-2016

Just received feedback from developers that using above approach there are able to utilize 61 virtual cores out of 64. But performance is still the bottleneck means file still taking same time. Anybody have idea whats wrong going on?

mkumar13 · ‎07-26-2016

I think applying different memory parameter sizes are the best we can do with respect to file size to optimize spark performance except if we have already tuned underlining program. As i don’t know the operation my team is performing in program but i have suggested need to verify below :- We can set parallelism at rdd like below:- Val rdd =sc.textFile(“somefile”,8) Second major factor on performance is because of security like wire encryption having 2x overhead and data encryption(Ranger KMS) could cause 15 to 20% overhead. Note: Kerberos have no impact. Another parameter that need look is what is the default queue for your spark-submit job, if this is going to default queue and then override using below to more specialized queue with below parameter --queue <if you have queue's setup> Please let me know if we check anything else to gain performance....

mkumar13 · ‎07-26-2016

I have 8 node amazon cluster and I am trying to optimize my spark job but unable to bring down program execution below 15 minutes. I have tried executing my spark job with different memory parameters but it not accept and always execute with 16 executors even when i supply 21 or 33. Please help me what are the possible reasons as below is my command.. nohup hadoop jar /var/lib/aws/emr/myjar.jar spark-submit --deploy-mode cluster --num-executors 17 --executor-cores 5 --driver-cores 2 --driver-memory 4g --class class_name s3:validator.jar -e runtime -v true -t true -r true & Observation: When i pass 3 executes it default take 4 and execution is longer but other parameters have no effect.

mkumar13 · ‎07-22-2016

Someone please help me to point to a repository where i find ready made OS for Hadoop Installation. Means i don't want to spent time with other configurations like Java, Python, rmp, yum, network issue etc. and looking for OS which is just i download and start testing few hadoop components. Actually i have few images but they are giving problem here and there before reaching to the point where i'd start my actual Hadoop installation... Unknown ftp urls where OS images are present are also welcome...

mkumar13 · ‎07-20-2016

I am getting Unhanded Error on search page of examslocal(https://www.examslocal.com/) Below is error detail and attached is screen shot of error.. Unhandled Error You are signed in as mkumar13@xavient.com sign off An error has occured A run time error was generated while rendering . The exception message is: ID4223: The SamlSecurityToken is rejected because the SamlAssertion.NotOnOrAfter condition is not satisfied. NotOnOrAfter: '7/18/2016 1:52:35 PM' Current time: '7/20/2016 2:42:19 PM'

mkumar13 · ‎07-20-2016

Heterogeneous Storage in HDFS Hadoop version 2.6.0 introduced a new feature heterogeneous storage. Heterogeneous storage can be different according to each play their respective advantages of the storage medium to read and write characteristics. This is very suitable for cold storage of data. Data for the cold means storage with large capacity and where high read and write performance is not required, such as the most common disk for thermal data, the SSD can be used to store this way. On the other hand when we required efficient read performance, even in rate appear able to do ten times or a hundred times the ordinary disk read and write speed, or even data directly stored memory, lazy loaded hdfs. HDFS heterogeneous storage characteristics are when we do not need to build two separate clusters to store cold thermal class II data within a cluster can be done, so this feature is still very large practical significance. Here I introduce heterogeneous storage type, and if the flexible configuration of heterogeneous storage! Ultra cold data storage, hard disk storage is very inexpensive - bank notes video system scenario IO read and write large-scale deployment scenarios, providing order - the default storage type Type SSD storage - Efficient data query visualization, external data sharing, improve performance. RAM_DISK - For extreme performance. Hybrid disc - an ssd or a hdd + sata or sas HDFS Storage Type ARCHIVE - Archival storage is for very dense storage and is useful for rarely accessed data. This storage type is typically cheaper per TB than normal hard disks. DISK - Hard disk drives are relatively inexpensive and provide sequential I/O performance. This is the default storage type. SSD - Solid state drives are useful for storing hot data and I/O-intensive applications. RAM_DISK - This special in-memory storage type is used to accelerate low-durability, single-replica writes. HDFS Storage Policies has six preconfigured storage policies Hot - All replicas are stored on DISK. Cold - All replicas are stored ARCHIVE. Warm - One replica is stored on DISK and the others are stored on ARCHIVE. All_SSD - All replicas are stored on SSD. One_SSD - One replica is stored on SSD and the others are stored on DISK. Lazy_Persist - The replica is written to RAM_DISK and then lazily persisted to DISK. Next article i'll show practical usage with HDFS storage settings and a Storage Policy for HDFS Using Ambari, to be continue..

mkumar13 · ‎07-20-2016

To my knowledge NiFi service isn't official yet.

mkumar13 · ‎07-20-2016

I want to start Hadoop installation and test few component using Non-Ambari, but confuse with OS images available with me listed below:- Centos 6 Centos 5 Ubuntu 12 Ubuntu 14 -->> already tried having some issues. Red Hat 5 Slaris 10 Please advice what OS should i choose with minimum configuration required for other tools like SSH, python, java etc.. If you have any link/url point to OS image download is just ready for use will be a great help!!

mkumar13 · ‎07-19-2016

Check with Hive start service command below:- https://community.hortonworks.com/questions/35195/how-can-i-check-ambari-start-services-commands.html#answer-35200

Online	Offline
Last Visited	‎08-15-2019 08:33 PM

Member Since	‎05-05-2016 12:35 PM
Last Visited	‎08-15-2019 08:33 PM
Posts	147
Kudos received	222

Cloudera Community

Re: HDP3.0.1 Ambari unable to stop all services...

Re: Do we need to create a normal managed table be...

Re: Where can I find list of enhancements (Release...

Re: Spark performance parameter num-executors has ...

Re: How we can connect an external Hive table to a...

Re: Spark performance parameter num-executors has ...

Re: Spark performance parameter num-executors has ...

Re: Spark performance parameter num-executors has ...

Spark performance parameter num-executors has no e...

URL for OS image which is just ready for Hadoop in...

examslocal:Getting error while login...

Heterogeneous Storage in HDFS(Part-1)...

Re: Is NiFi Ambari integration is officially suppo...

advice what OS should i choose with minimum config...

Re: Cannot start Hiveserver2 - Python script has b...