Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2128 | 07-09-2019 12:53 AM | |
| 12446 | 06-23-2019 08:37 PM | |
| 9560 | 06-18-2019 11:28 PM | |
| 10523 | 05-23-2019 08:46 PM | |
| 4894 | 05-20-2019 01:14 AM |
09-10-2018
09:55 PM
Please open a new topic as your issue is unrelated to this topic. This helps keep issues separate and improves your search experience. Effectively your issue is that your YARN Resource Manager is either (1) down, due to a crash explained in the /var/log/hadoop-yarn/*.out files, or (2) not serving on the external address that quickstart.cloudera runs on, for which you need to ensure that 'nslookup $(hostname -f)' resolves to the external address in your VM and not localhost/127.0.0.1.
... View more
09-10-2018
09:36 PM
@Harsh J yeap with <property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property> It working perfect now Thanks you very much
... View more
09-10-2018
08:00 PM
Here's what I do to build Apache Oozie 5.x from CDH6 (6.0.0) via sources: ~> git clone https://github.com/cloudera/oozie.git
~> cd oozie/ && git checkout cdh6.0.0
~> bin/mkdistro.sh -DskipTests -Puber
… (takes ~15+ minutes if building for the first time) …
~> ls -lh distro/target/
# Look for oozie-5.0.0-cdh6.0.0-distro.tar.gz
... View more
09-09-2018
07:48 PM
1 Kudo
Postgres is sensitive to how you connect to it. You should be using the exact address it listens on, as only that will be allowed by the default configuration. Your command carries an IP that's 0.0.0.0. While am uncertain if you've masked it or if you are truly using a wildcard-designate IP for a server address, you should ideally be using the exact hostname/IP that the Postgres service is listening on. Your Postgres service's configuration will carry the listening_address entry in it that designates this. Take a look at this thread over at Stack Exchange: https://dba.stackexchange.com/a/84002
... View more
09-09-2018
07:32 PM
Like the error notes, support for writing from a stream to a JDBC sink is not present in Spark yet: https://issues.apache.org/jira/browse/SPARK-19478 Take a look at this past thread where an alternative, more direct approach is discussed: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Streaming-save-output-to-mysql-DB/td-p/25607
... View more
09-06-2018
08:00 PM
Thank you for following up here! [1] Glad to hear you were able to chase down the cause. [1] - https://xkcd.com/979/
... View more
09-06-2018
07:57 PM
1 Kudo
There are a few cons to raising your block size: - Increased cost of recovery during write failures When a client is writing a new block into the DataNode pipeline and one of the DataNode fails, there is a enabled-by-default recovery feature that will attempt to refill the gap in the replicated pipeline by transferring the partially written block from one of the remaining good DataNodes to a new DataNode. When this happens, the client is blocked (the outstream,write(…) caller is blocked in the API code). With increased block size, the time waited will also increase greatly depending on how much of the partial block data was written before the failure occurred. A worst-case wait example would involve the time required for network-copying 1.99 GiB for a 2 GiB block size because an involved DN may have failed at that specific point. - Cost of replication caused by DataNode loss or decommission When a DataNode is lost or is being decommissioned, the system has to react by re-filling the gaps in replica counts it creates. With smaller block sizes this activity is easy to spread randomly across the cluster, as several different nodes overall can take part in the re-replicate process. With larger blocks, only a few DNs can participate, and another consequence could be more lopsided space usage across DNs. That said, use of 1-2 GiB is not unheard of and I've seen a few large clusters apply that as their default block size. Its just worth being aware of the cons, looking out for such impact and tuning accordingly as you go. HDFS certainly functions at its best for large sized files, and your usage seems in accordance with that.
... View more
09-06-2018
01:53 AM
This is related to the JobHistoryServer log reported earlier. Please ensure/perform the following items for JHS and job completions to thoroughly work: First: Ensure that 'mapred' and 'yarn' are part of the 'hadoop' group in common: ~> hdfs groups mapred ~> hdfs groups yarn Both command must include 'hadoop' in their outputs. If not, ensure they are added to that group name. Second, all files and directories under HDFS /tmp/logs aggregation dir (or whatever you've reconfigured it to use) and /user/history/* have their group set to 'hadoop' and not anything else: ~> hadoop fs -chgrp -R hadoop /user/history /tmp/logs ~> hadoop fs -chmod -R g+rwx /user/history /tmp/logs Note: ACLs suggested earlier are not required to resolve this problem. The group used on these dirs is what matters in the default state, and the group setup described above is how YARN and JHS daemon users share information and responsibilities with each other. You may remove any ACLs set, or leave them be as they are still permissive.
... View more
09-06-2018
12:56 AM
1 Kudo
> I am little bit confused, so the WebHDFS REST API is listening on the same port as the NameNode's UI? Yes this is correct. The HTTP(S) serving port of the NameNode does multiple things: Serves the UI for browsers on / and a few other paths, serves a REST API on /webhdfs/*, etc. WebHDFS on HDFS service is used by contacting the currently configured web port of the NameNode and DataNode(s) (the latter by following redirects, not directly). In your case, the cluster is set to use HTTPS (TLS security) so you need to use the 50470 port, swebhdfs:// (notice the s-prefix for security) in place of webhdfs:// and https:// in place of http:// when following any WebHDFS tutorial.
... View more
09-05-2018
05:14 AM
1 Kudo
This is a shell behaviour, not Sqoop. "When referencing a variable, it is generally advisable to enclose its name in double quotes." "Use double quotes to prevent word splitting. An argument enclosed in double quotes presents itself as a single word, even if it contains whitespace separators." - http://tldp.org/LDP/abs/html/quotingvar.html
... View more