About Harsh J

Harsh J · ‎09-10-2018

Please open a new topic as your issue is unrelated to this topic. This helps keep issues separate and improves your search experience. Effectively your issue is that your YARN Resource Manager is either (1) down, due to a crash explained in the /var/log/hadoop-yarn/*.out files, or (2) not serving on the external address that quickstart.cloudera runs on, for which you need to ensure that 'nslookup $(hostname -f)' resolves to the external address in your VM and not localhost/127.0.0.1.

phaothu · ‎09-10-2018

@Harsh J yeap with <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> It working perfect now Thanks you very much

Harsh J · ‎09-10-2018

Here's what I do to build Apache Oozie 5.x from CDH6 (6.0.0) via sources: ~> git clone https://github.com/cloudera/oozie.git ~> cd oozie/ && git checkout cdh6.0.0 ~> bin/mkdistro.sh -DskipTests -Puber … (takes ~15+ minutes if building for the first time) … ~> ls -lh distro/target/ # Look for oozie-5.0.0-cdh6.0.0-distro.tar.gz

Harsh J · ‎09-09-2018

Postgres is sensitive to how you connect to it. You should be using the exact address it listens on, as only that will be allowed by the default configuration. Your command carries an IP that's 0.0.0.0. While am uncertain if you've masked it or if you are truly using a wildcard-designate IP for a server address, you should ideally be using the exact hostname/IP that the Postgres service is listening on. Your Postgres service's configuration will carry the listening_address entry in it that designates this. Take a look at this thread over at Stack Exchange: https://dba.stackexchange.com/a/84002

Harsh J · ‎09-09-2018

Like the error notes, support for writing from a stream to a JDBC sink is not present in Spark yet: https://issues.apache.org/jira/browse/SPARK-19478 Take a look at this past thread where an alternative, more direct approach is discussed: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Streaming-save-output-to-mysql-DB/td-p/25607

Harsh J · ‎09-06-2018

Thank you for following up here! [1] Glad to hear you were able to chase down the cause. [1] - https://xkcd.com/979/

Harsh J · ‎09-06-2018

There are a few cons to raising your block size: - Increased cost of recovery during write failures When a client is writing a new block into the DataNode pipeline and one of the DataNode fails, there is a enabled-by-default recovery feature that will attempt to refill the gap in the replicated pipeline by transferring the partially written block from one of the remaining good DataNodes to a new DataNode. When this happens, the client is blocked (the outstream,write(…) caller is blocked in the API code). With increased block size, the time waited will also increase greatly depending on how much of the partial block data was written before the failure occurred. A worst-case wait example would involve the time required for network-copying 1.99 GiB for a 2 GiB block size because an involved DN may have failed at that specific point. - Cost of replication caused by DataNode loss or decommission When a DataNode is lost or is being decommissioned, the system has to react by re-filling the gaps in replica counts it creates. With smaller block sizes this activity is easy to spread randomly across the cluster, as several different nodes overall can take part in the re-replicate process. With larger blocks, only a few DNs can participate, and another consequence could be more lopsided space usage across DNs. That said, use of 1-2 GiB is not unheard of and I've seen a few large clusters apply that as their default block size. Its just worth being aware of the cons, looking out for such impact and tuning accordingly as you go. HDFS certainly functions at its best for large sized files, and your usage seems in accordance with that.

Harsh J · ‎09-06-2018

This is related to the JobHistoryServer log reported earlier. Please ensure/perform the following items for JHS and job completions to thoroughly work: First: Ensure that 'mapred' and 'yarn' are part of the 'hadoop' group in common: ~> hdfs groups mapred ~> hdfs groups yarn Both command must include 'hadoop' in their outputs. If not, ensure they are added to that group name. Second, all files and directories under HDFS /tmp/logs aggregation dir (or whatever you've reconfigured it to use) and /user/history/* have their group set to 'hadoop' and not anything else: ~> hadoop fs -chgrp -R hadoop /user/history /tmp/logs ~> hadoop fs -chmod -R g+rwx /user/history /tmp/logs Note: ACLs suggested earlier are not required to resolve this problem. The group used on these dirs is what matters in the default state, and the group setup described above is how YARN and JHS daemon users share information and responsibilities with each other. You may remove any ACLs set, or leave them be as they are still permissive.

Harsh J · ‎09-06-2018

> I am little bit confused, so the WebHDFS REST API is listening on the same port as the NameNode's UI? Yes this is correct. The HTTP(S) serving port of the NameNode does multiple things: Serves the UI for browsers on / and a few other paths, serves a REST API on /webhdfs/*, etc. WebHDFS on HDFS service is used by contacting the currently configured web port of the NameNode and DataNode(s) (the latter by following redirects, not directly). In your case, the cluster is set to use HTTPS (TLS security) so you need to use the 50470 port, swebhdfs:// (notice the s-prefix for security) in place of webhdfs:// and https:// in place of http:// when following any WebHDFS tutorial.

Harsh J · ‎09-05-2018

This is a shell behaviour, not Sqoop. "When referencing a variable, it is generally advisable to enclose its name in double quotes." "Use double quotes to prevent word splitting. An argument enclosed in double quotes presents itself as a single word, even if it contains whitespace separators." - http://tldp.org/LDP/abs/html/quotingvar.html

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: unable to import data from mysql to sqoop to H...

Re: Process to Start StandBy NameNode

Re: Education - building Oozie

Re: problems using postgresql and sqoop

Re: error writing data from spark streaming to pos...

Re: User not returning any groups for hdfs groups ...

Re: HDFS Block size 1Gb/2GB

Re: HDFS or HIVE Replication

Re: NameNode running but 50070 not listening

Re: Sqoop import using last-value from a hdfs file