Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1969 | 07-09-2019 12:53 AM | |
| 11879 | 06-23-2019 08:37 PM | |
| 9144 | 06-18-2019 11:28 PM | |
| 10132 | 05-23-2019 08:46 PM | |
| 4578 | 05-20-2019 01:14 AM |
05-16-2018
07:36 PM
Also, if you'd like to check if the JCE files you have are of unlimited policy type, you can follow this: http://harshj.com/checking-if-your-jre-has-the-unlimited-strength-policy-files-in-place/ Also, a note: In latest JDK8 update and in all JDK9+, the unlimited cryptography policies are shipped and active by default. This step of manually replacing the JCE jars is only required for JDK7 and early JDK8 releases. The QuickStart VM uses JDK7 currently.
... View more
05-15-2018
04:47 PM
The DB is throwing back an error this time. The message is self explanatory though: Access denied for user 'retail_dba'@'%' to database 'myfirsttutorial' Follow https://dev.mysql.com/doc/refman/5.5/en/grant.html
... View more
05-15-2018
02:05 PM
Your target directory path has a space character in it that makes sqoop think it's two different values being passed. Remove the space in the path word 'im port's.
... View more
05-14-2018
06:43 PM
The quoted documentation also indicates that the specific issue that required that format was resolved in CDH 5.2.1 onwards, so you shouldn't necessarily be running that as a fix for your problem. The RMs in HA mode run an election after they are both up, with logs from classes org.apache.hadoop.ha.ActiveStandbyElector, org.apache.hadoop.yarn.server.resourcemanager.ResourceManager and org.apache.zookeeper.ZooKeeper helping detail its process. I'd advise checking the logs for these classes and try to spot what the failure is. It may be ZK related or some other configuration. Alternatively share the RM logs via pastebin/etc..
... View more
05-14-2018
09:12 AM
1 Kudo
The requirement for Oozie is not different than the general requirement that after you enable HDFS HA (or YARN HA, etc.), always use the logical URI everywhere and never directly place/hardcode a NameNode hostname in any manual configuration. Oozie as a service carries HDFS client configs that are maintained for it by CM. These become HA-aware when you complete the HDFS HA wizard. All that remains is that you submit the new jobs to Oozie with the nameNode and jobTracker URIs pointing to the logical name (such as hdfs://nameservice1) instead of the previous single-host/port value.
... View more
05-06-2018
11:18 PM
1 Kudo
One common misconception with YARN is that it 'preallocates' resources. It does not. The memory requests and the CPU requests are not 'limited' in any pre-reserved manner. For memory checks, the NodeManagers run a simple monitor that periodically checks if the container (child process) it spawned is exceeding the request granted to it. If the pmem usage is higher, it is enforced with a kill sent to the process. The same is not done for other resources yet - only memory. You're correct that CGroups can help in enforcing CPUs/etc. See https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_pools.html > However despite that configuration one can see more than 20 threads open for every container, whereas I'd expect no more than 2 The thread count depends on what the code runs. For ex. HDFS client work typically runs up 3-4 threads by itself, because it is architected that way. However, this does not indicate load by itself - that is driven by how the code uses the HDFS client. Thread count alone is not a good indicator, but active thread count over time may be a better metric. > Does anyone know why a single container uses so much CPU time? Difficult to say. For straight-forward record-by-record work (such as map tasks) I'd expect ~100% CPU most of the task's life, except parts that benefit from parallelism (sort, etc.) within the framework. Outside of this, the user code is also typically free to run however it wants in the container JVM, and it may decide to run some work concurrently depending on what its goals are (once the data is read, or when writing the data). > Is it possible to control number of threads per container with some MapRed configuration or java settings? Some of the thread counts such as a Reducer's parallel fetcher thread pool can be configured. Others (such as the Responder/Reader/Caller thread arch. of a HDFS client) are not configurable since they are designed to work that way. > Or maybe this problem comes from flaws in the MapRed program itself? This could be very possible if its only affecting a partial set of your overall container workload. A jstack can help you see which threads are working on code from your own organization vs. the framework itself. > Would cgroups be a good approach to solve this issue? Yes, Linux can help 'enforce' the CPU shares. Following https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_pools.html may help with that. However, its worth negotiating with the developers of the jobs on requesting higher vcore counts for their CPU-heavy work before you take this step, as a visibly large slowdown may not be very favorable.
... View more
04-17-2018
05:14 AM
Apologies for the lack of details. The role logs typically lie under the component-named directories under /var/log. For Oozie this would therefore be /var/log/oozie/ on the Oozie server host, and /var/log/hive/ for Hive on the HMS host.
... View more
04-16-2018
06:44 PM
Can you check your Oozie server log around the time of the START_RETRY failure? The HCat (HMS) credentials are obtained by the Oozie server communicating directly with the configured HMS URI before the jobs are submitted to the cluster - so the Oozie server log and the HMS log would have more details behind the generic 'TTransportException' message that appears in the frontend.
... View more
04-16-2018
07:47 AM
For the trivial shell example you could just make echo print both with an inlined sub-shell that does the counting: for file in $(FILE_LIST_SUBCOMMAND) do echo ${file} $(hadoop fs -text ${file} | wc -l) done
... View more
04-16-2018
07:18 AM
1 Kudo
> Do I need to contact an administrator or are other ways to get this keytab file? A keytab stores a form of your password in it. When you already have the password on hand, you may place it into a keytab as such, without requiring any further rights: ~> ktutil > addent -password -p your-principal@REALM -k 1 -e aes256-cts Password for your-principal@REALM: [enter your password when prompted] > wkt your-principal.keytab > quit ~> ls your-principal.keytab ~> klist -ekt your-principal.keytab … As to your original issue state in this thread, see http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Hue-Oozie-Spark-Kerberos-Delegation-Token-Error/m-p/35163/highlight/true#M1317
... View more