Member since
05-05-2016
147
Posts
223
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3649 | 12-28-2018 08:05 AM | |
3610 | 07-29-2016 08:01 AM | |
2968 | 07-29-2016 07:45 AM | |
6882 | 07-26-2016 11:25 AM | |
1362 | 07-18-2016 06:29 AM |
07-27-2016
10:18 AM
Thanks @Arun A K, i'll verify suggestions on my test case let you know progress if get.
... View more
07-26-2016
02:36 PM
1 Kudo
Just received feedback from developers that using above approach there are able to utilize 61 virtual cores out of
64. But performance is still the bottleneck means file still taking same time. Anybody have idea whats wrong going on?
... View more
07-26-2016
11:25 AM
1 Kudo
I think applying different
memory parameter sizes are the best we can do with respect to file size to
optimize spark performance except if we have already tuned underlining program. As i don’t know the operation my team is performing in program but i have suggested need to verify below :- We can set parallelism at rdd like below:- Val rdd
=sc.textFile(“somefile”,8) Second major factor on
performance is because of security like wire encryption having 2x overhead and
data encryption(Ranger KMS) could cause 15 to 20% overhead. Note: Kerberos have no impact. Another parameter that need look
is what is the default queue for your spark-submit job, if this is going to
default queue and then override using below to more specialized queue with
below parameter --queue <if you have queue's
setup> Please let me know if we check anything else to gain performance....
... View more
07-26-2016
08:58 AM
2 Kudos
I have 8 node amazon cluster and I am trying to optimize my spark job but unable to bring down program execution below 15 minutes. I have tried executing my spark job with different memory parameters but it not accept and always execute with 16 executors even when i supply 21 or 33. Please help me what are the possible reasons as below is my command.. nohup hadoop jar
/var/lib/aws/emr/myjar.jar spark-submit
--deploy-mode cluster --num-executors 17 --executor-cores 5 --driver-cores 2
--driver-memory 4g --class
class_name s3:validator.jar
-e runtime -v true -t true -r true & Observation: When i pass 3 executes it default take 4 and execution is longer but other parameters have no effect.
... View more
Labels:
- Labels:
-
Apache Spark
07-22-2016
07:48 AM
1 Kudo
Someone please help me to point to a repository where i find ready made OS for Hadoop Installation. Means i don't want to spent time with other configurations like Java, Python, rmp, yum, network issue etc. and looking for OS which is just i download and start testing few hadoop components. Actually i have few images but they are giving problem here and there before reaching to the point where i'd start my actual Hadoop installation... Unknown ftp urls where OS images are present are also welcome...
... View more
Labels:
- Labels:
-
Apache Hadoop
07-20-2016
02:48 PM
I am getting Unhanded Error on search page of examslocal(https://www.examslocal.com/) Below is error detail and attached is screen shot of error.. Unhandled Error
You are signed in as mkumar13@xavient.com
sign off An error has occured A run time error was generated while rendering . The exception message is: ID4223: The SamlSecurityToken is rejected because the SamlAssertion.NotOnOrAfter condition is not satisfied. NotOnOrAfter: '7/18/2016 1:52:35 PM' Current time: '7/20/2016 2:42:19 PM'
... View more
07-20-2016
02:24 PM
2 Kudos
Heterogeneous Storage in HDFS Hadoop version 2.6.0 introduced a new feature heterogeneous storage. Heterogeneous storage can be different according to each play their respective advantages of the storage medium to read and write characteristics. This is very suitable for cold storage of data. Data for the cold means storage with large capacity and where high read and write performance is not required, such as the most common disk for thermal data, the SSD can be used to store this way. On the other hand when we required efficient read performance, even in rate appear able to do ten times or a hundred times the ordinary disk read and write speed, or even data directly stored memory, lazy loaded hdfs. HDFS heterogeneous storage characteristics are when we do not need to build two separate clusters to store cold thermal class II data within a cluster can be done, so this feature is still very large practical significance. Here I introduce heterogeneous storage type, and if the flexible configuration of heterogeneous storage!
Ultra cold data storage, hard disk storage is very inexpensive - bank notes video system scenario IO read and write large-scale deployment scenarios, providing order - the default storage type
Type SSD storage - Efficient data query visualization, external data sharing, improve performance.
RAM_DISK - For extreme performance.
Hybrid disc - an ssd or a hdd + sata or sas
HDFS Storage Type ARCHIVE - Archival storage is for very dense storage and is useful for rarely accessed data. This storage type is typically cheaper per TB than normal hard disks. DISK - Hard disk drives are relatively inexpensive and provide sequential I/O performance. This is the default storage type. SSD - Solid state drives are useful for storing hot data and I/O-intensive applications. RAM_DISK - This special in-memory storage type is used to accelerate low-durability, single-replica writes. HDFS Storage Policies has six preconfigured storage policies Hot - All replicas are stored on DISK. Cold - All replicas are stored ARCHIVE. Warm - One replica is stored on DISK and the others are stored on ARCHIVE. All_SSD - All replicas are stored on SSD. One_SSD - One replica is stored on SSD and the others are stored on DISK. Lazy_Persist - The replica is written to RAM_DISK and then lazily persisted to DISK.
Next article i'll show practical usage with HDFS storage settings and a Storage Policy for HDFS Using Ambari, to be continue..
... View more
Labels:
07-20-2016
11:17 AM
To my knowledge NiFi service isn't official yet.
... View more
07-20-2016
07:32 AM
2 Kudos
I want to start Hadoop installation and test few component using Non-Ambari, but confuse with OS images available with me listed below:- Centos 6 Centos 5 Ubuntu 12 Ubuntu 14 -->> already tried having some issues. Red Hat 5 Slaris 10 Please advice what OS should i choose with minimum configuration required for other tools like SSH, python, java etc.. If you have any link/url point to OS image download is just ready for use will be a great help!!
... View more
Labels:
- Labels:
-
Apache Hadoop
07-19-2016
10:06 AM
Check with Hive start service command below:- https://community.hortonworks.com/questions/35195/how-can-i-check-ambari-start-services-commands.html#answer-35200
... View more