About TS

TS · ‎05-22-2015

Also, what Spark userid's HDFS folder structure should look like? So far I am having only one HDFS folder: /user/spark/applicationHistory

TS · ‎05-22-2015

Should I un-set it? CM keeps complaining...

TS · ‎05-22-2015

Interesting... Somehow, the Spark Parameter spark_jar_hdfs_path is set to (HDFS) '/user/spark/share/lib/spark-assmbly.jar' value and CM complains about 'Failed parameter validation'! Should I unset it??

TS · ‎05-22-2015

Cool! I'll do the same for SNN's HDFS disk. <Q1> How does Hadoop know which HDFS folder/file to use? The one(s) in MASTER or the one(s) in DATA nodes?? Is the HDFS parameter 'dfs.namenode.edits.dir' that will be set to the HDFS directory created in MASTER?? (I guess based on RF Replication Factor files could be anywhere...) (Definitely will be faster for MASTER if it has to write to its own local disks...) <Q2> Should I use RAID-1 for the 2nd 300GB disk (the one that will hold CM's logs) at MASTER? (I guess I should!)

TS · ‎05-22-2015

The Spark Jar Location (HDFS) (spark_jar_hdfs_path) parameter is set to /user/spark/share/lib/spark-assembly.jar However, the HDFS file /user/spark/share/lib/spark-assembly.jar is NOT there! The only HDFS folder/file for Spark that exists is /user/spark/applicationHistory Although I have run via CM to 'Upload Spark Jar' (from drop-down Actions option) successfully (at least that's what CM tells me) when I check the spark HDFS folders/files the jar (spark-assembly.jar) is not there!!!

TS · ‎05-22-2015

Wilfred thank you! Some clarifications. MASTER Node Disk Layout (Total of 4x300GB HDs) ================ -- 2 disks for OS (RAID-1) -- 1 disk for apps & logs (CM's logs etc...) -- 1 disk (JBOD) for HDFS (what will be stored here?????) DATA Nodes Disk Layout (Total of 25x300GB HDs) =============== -- 2 disks for OS (RAID-1) -- 23 disks (JBODs) for HDFS ( --1. Does it make a difference if # of disks is even or odd??) ( --2. Should I go for higher capacity of disks and less # of them, i.e. 6x1.2TB HDs ??) DEFINITELY SPARK ON YARN!!!! The link for YARN tuning configuration is great!!! Please provide a link for tuning network traffic within the cluster (data movement among nodes in the cluster vs. data ingestion from sources).

TS · ‎05-22-2015

Thank you for your comment! The issue has been resolved, it had to do with permissions! I had to reset the mode of the /user/history/done folder /user/history/done_intermediate was good!

TS · ‎05-20-2015

That's a good start 🙂 For the argument sake, I am planning on provisioning 1 MASTER node w/ 2 CPUs (Intel E5-2690 v2 @3.00GHz, 10 cores each) and 256GB of RAM. Do I turn CPU multi-threading on? (Actually by default is on, which means I am getting 40 CPU threads). I will configure 4x300GB disks: 2 disks for OS (RAID-1) 2 disks for apps & logs (RAID-1) DO I NEED TO CONFIGURE ANY DISKS FOR HDFS IN MASTER? --- For the DATA nodes (3 of them), planning to have the same cpu/ram setting as MASTER. I will configure 25x300GB disks: 2 disks for OS (RAID-1) 2 disks for apps & logs (RAID-1) 21 disks for HDFS (JBODs) === Based on the above settings and the fact that CM and almost of CDH services will be running on MASTER and DataNodes, Spark-Workers, and RegionServers will be running on DATA nodes how do it look? Do you have any links/docs to share about ratio of cores/to memory/to disks/to workload ... Also, some useful documentation about configuring YARN's containers will be great! Cheers!

TS · ‎05-19-2015

I have created a new user (user2) and when running any MapReduce jobs the JobHistoryServer portal (port 19888) does NOT show any!!! I can see the job from witihin the ResourceManager Web UI (port 8088) and when I click on the 'History' link (under the Tracking UI column) I am getting the error: "No Found: job_xxxxxxxxxxxxx_xxxx). This happening for the MapReduce jobs only!!! When I run Spark Jobs (under the same userid, user2) I am able to see the 'History' logs!!! An existing userid (user1) works fine! It seems to me that the issue is permissions-related. Here are the HDFS permissions of the userids: [root@master ~]# hdfs dfs -ls /tmp/logs Found 5 items drwxrwxrwt - user1 hadoop 0 2015-04-20 08:48 /tmp/logs/user1 drwxrwxrwt - hdfs supergroup 0 2015-04-09 14:59 /tmp/logs/hdfs drwxrwxrwx - user2 hadoop 0 2015-05-06 16:23 /tmp/logs/user2 drwxrwxrwt - root hadoop 0 2015-04-09 16:46 /tmp/logs/root Also, here are the persmissions of the HDFS /user/history/done and /user/history/done_intermediate folders: [root@master ~]# hdfs dfs -ls /user/history Found 2 items drwxrwx--- - mapred hadoop 0 2015-05-11 10:32 /user/history/done drwxrwxrwt - mapred hadoop 0 2015-05-18 14:39 /user/history/done_intermediate

TS · ‎05-15-2015

Looking for Best Practices! Having almost all CDH services (and CM) in the Master node and YARN's NodeManager, Spark's Workers, HDFS 's DataNodes, and HBase's RegionServers in the Data nodes, what type of CPU configuration should be suitable? For instance, should I provision the Master host with 20 cores with 3.00GHz of speed (see Intel's Xeon CPU E5-2690 v2 - Ivy Bridge processor, 2 CPU's with 10 cores per socket and 3.00GHz of speed )? Should I provision the Data hosts with 24 cores with 2.70GHz of speed (see Intel's Xeon CPU E5-2697 v2 - Ivy Bridge processor, 2 CPU's with 12 cores per socket, but with 2.70GHz of speed )? Again, looking for the ultimate configuration and optimizing both cores and speed...

Online	Offline
Last Visited	‎05-18-2016 02:06 PM

Member Since	‎02-10-2015 06:48 AM
Last Visited	‎05-18-2016 02:06 PM
Posts	84
Kudos received	2

Cloudera Community

Re: RuntimeException running the hdfs command (cor...

Re: [YARN] JHS M/R Job not Found (Not Found: job_x...

Re: Spark History Server: How to install/configure...

Re: [CM 5.4]: The feature Operational Reports is n...

Re: hdfs:/user/spark/share/lib/spark-assembly.jar ...

Re: hdfs:/user/spark/share/lib/spark-assembly.jar ...

Re: hdfs:/user/spark/share/lib/spark-assembly.jar ...

Re: CPU Configuration (cores/speed) for Master and...

hdfs:/user/spark/share/lib/spark-assembly.jar is m...

Re: CPU Configuration (cores/speed) for Master and...

Re: [YARN] JHS M/R Job not Found (Not Found: job_x...

Re: CPU Configuration (cores/speed) for Master and...

[YARN] JHS M/R Job not Found (Not Found: job_xxxxx...

CPU Configuration (cores/speed) for Master and Dat...