Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2244 | 07-09-2019 12:53 AM | |
| 12768 | 06-23-2019 08:37 PM | |
| 9841 | 06-18-2019 11:28 PM | |
| 10828 | 05-23-2019 08:46 PM | |
| 5096 | 05-20-2019 01:14 AM |
04-09-2019
10:00 PM
1 Kudo
To add on: If you will not require audits or lineage at all for your cluster, you can also choose to disable their creation: Impala - Configuration - "Enable Impala Lineage Generation" (uncheck) Impala - Configuration - "Enable Impala Audit Event Generation" (uncheck) If you are using Navigator with Cloudera Enterprise, then these audits and lineage files should be sent automatically to the Navigator services. If they are not passing through, it may be an indicator of problem in the pipeline - please raise a support case if this is true.
... View more
04-03-2019
06:53 PM
Is the job submitted to the source cluster, or the destination? The DistCp jobs should only need to contact the NodeManagers of the cluster it runs on, but if the submitted cluster is remote then the ports may need to be opened. The HDFS transfer part does not involve YARN service communication at all, so it is not expected to contact a NodeManager. It would be helpful if you can share some more logs leading up to the observed failure.
... View more
04-01-2019
06:55 PM
1 Kudo
Could you share the full log from this failure, both from the Oozie server for the action ID and the action launcher job map task logs? The 8042 port is the NodeManager HTTP port, useful in serving logs of live containers among other status details over REST. It is not directly used by DistCp in its functions, but MapReduce and Oozie diagnostics might be invoking it as part of a response to a failure, so it is a secondary symptom. Note though that running DistCp via Oozie requires you to provide appropriate configs that ensures delegation tokens for both kerberized clusters are acquired. Use "mapreduce.job.hdfs-servers" with a value such as "hdfs://namenode-cluster-1,hdfs://namenode-cluster-2" to influence this on the Oozie server's delegation token acquisition phase. This is only relevant if you use Kerberos on both clusters.
... View more
03-07-2019
06:23 PM
1 Kudo
You'll need to use lsof with a pid specifier (lsof -p PID). The PID must be your target RegionServer's java process (find via 'ps aux | grep REGIONSERVER' or similar). In the output, you should be able to classify the items as network (sockets) / filesystem (files) / etc., and the interest would be in whatever holds the highest share. For ex. if you see a lot more sockets hanging around, check their state (CLOSE_WAIT, etc.). Or if it is local filesystem files, investigate if those files appear relevant. If you can pastebin your lsof result somewhere, I can take a look.
... View more
03-06-2019
11:42 PM
1 Kudo
MapReduce jobs can be submitted with ease, as all they mostly require is the correct config on the classpath (such as under src/main/resources for Maven projects). Spark/PySpark greatly relies on its script tooling to submit to a remote cluster so it is a little more involved to achieve this. IntelliJ IDEA has a remote execution option in its run targets that can be configured to copy over the build jar and invoke any arbitrary command on an edge host. This can be combined with remote debugging perhaps to get equal experience as MR. Another option is to use a web interface based editor such as CDSW.
... View more
03-06-2019
07:30 PM
1 Kudo
> can we deploy the HttpFS role on more than one node? is it best practice? Yes, the HttpFs service is an end-point for REST API access to HDFS, so you can deploy multiple instances and also consider load balancing (might need sticky sessions for data read paging). > we can see that new logs are created on opt/hadoop/dfs/nn/current on the actine namenode on node01 but no new files . on the standby namenode no node02 - is it OK ?? Yes, this is normal. The new edit logs are redundantly written only when the NameNode is active. At all times the edits are primarily always written into the JournalNode directories.
... View more
03-06-2019
06:33 PM
It is not normal to see the file descriptors limit run out or run close to limit unless you have an overload problem of some form. I'd recommend checking via 'lsof' what is the major contributor towards the FD count for your RegionServer process - chances are it is avoidable (a bug, a flawed client, etc.). The number should be proportional to your total region store file counts and the number of connecting clients. While the article at https://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/ focuses on DN data transceiver threads in particular, the formulae at the end can be applied similarly to file descriptors in general too.
... View more
03-06-2019
05:20 PM
The issue appears to crop up when distributing certain configuration files to prepare for installing packages. Could you check or share what the failure is via the log files present under /tmp/scm_prepare_node.*/*?
... View more
03-06-2019
04:30 PM
Currently Hive's connections to LDAP do not support the StartTLS extension [1]. This does make sense as a feature request however, could you log your request over at https://issues.apache.org/jira/projects/HIVE please? [1] - https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java#L52-L62
... View more
03-05-2019
09:46 PM
1 Kudo
> Clear Cache > This is the one I am not too sure what happens It appears to clear the cached entries within Hue frontend, so the metadata for assist and views is loaded again from its source (Impala, etc.). I don't see it calling a refresh on the tables, but it is possible I missed some implicit action. > Perform Incremental metadata Update > I assume this issues a refresh command for all tables within the current database which is been viewed? If no database is veiwed does it do it for everything? This will compare HMS listing against Impala's for the DB in context and run specific "INVALIDATE METADATA [[db][.table]];" for the missing ones in Impala. Yes, if no DB is in the context, it will equate to running a global "INVALIDATE METADATA;" > Invalidate All metadata and rebuild index This runs a plain "INVALIDATE METADATA;"
... View more
- « Previous
- Next »