Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1969 | 07-09-2019 12:53 AM | |
| 11881 | 06-23-2019 08:37 PM | |
| 9147 | 06-18-2019 11:28 PM | |
| 10134 | 05-23-2019 08:46 PM | |
| 4580 | 05-20-2019 01:14 AM |
09-21-2015
10:36 AM
1 Kudo
The use of sudo will not pass forward your local environment. Login and try instead.
... View more
09-20-2015
07:20 PM
1 Kudo
Looks like the jar is also required on the front-end. In addition to the ADD JAR within the prompt, please also launch the CLI instead in this way: ~> export HADOOP_CLASSPATH=$(hbase classpath) ~> hive
... View more
09-20-2015
07:56 AM
1 Kudo
The snapshot read path uses a few more jars than the default table read path code does, and the error suggests that at least one such extra jar is not on the default set of aux-jars pre-added for Hive-HBase integration in CDH. You will need to do an "ADD JAR /opt/cloudera/parcels/CDH/lib/hbase/lib/metrics-core-2.2.0.jar;" to get this required class on the Hive CLI classpath.
... View more
09-20-2015
07:50 AM
> Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed The error above likely suggests your HMS is configured for security (kerberos) but that your login lacks a valid TGT (such as one obtained via kinit). Could you post the output of klist, and confirm if a 'hadoop fs' test already works?
... View more
09-20-2015
07:37 AM
Should be a simple extension in bash, if that is what you're looking for: ./count_row.sh tables.txt | paste -s -d+ - | bc Ref: http://stackoverflow.com/questions/450799/shell-command-to-sum-integers-one-per-line#comment12469220_451204 P.s. It may be more efficient to generate a list of queries and run it via a single hive command, cause each command runs a whole new JVM.
... View more
09-20-2015
06:30 AM
1 Kudo
Sentry, as a service, has its own service-side config maintained and generated by CM within its special process directory. To view any service-side generated configs, visit its instance page and then the processes tab under it. In your case, CM -> Sentry -> Instances -> Sentry Server -> Processes -> sentry-site.xml. Please also consider reading http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_intro_primer.html to understand your Cloudera Manager architecture better.
... View more
09-20-2015
06:28 AM
Given you want 'engineering' group members to have access to a role 'developer', your grant should be: GRANT ROLE developer TO GROUP engineering Not, GRANT ROLE developer TO GROUP hadoop -- Or was this already done? The response is unclear about this.
... View more
09-20-2015
06:18 AM
Have you or your AD admins also attempted to profile what specific AD operation(s) are pouring in? Are they group lookups? Or are they actual authentication requests? The latter would normally be unexpected, given use of tokens won't require re-auth. Group lookups are indeed done for every HDFS operation when permissions are in use. However, the groups are also cached internally by HDFS for 5m by default (configurable), and often also by a NameNode-local NSCD/equivalent service. These things help reduce the backend load, but the need is certainly present and the cache timeouts are finite, so it wouldn't be too odd to see a lot of group related requests get fired to whatever user directory backend is in use. Are you already using NSCD? Perhaps that may help you if you aren't, or you can consider raising HDFS's cache timeout: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/core-default.xml#hadoop.security.groups.cache.secs
... View more
09-20-2015
06:08 AM
Thank you for expanding on the process - it was unclear from the word "last". What you meant was "largest", given the sorting involved. What you are looking to do is only emit the largest (by value) key out, i.e. a MAX(…) behaviour in SQL for example. This is simple to perform: 1. In the Mapper's setup call, initialise a zero-valued string (lowest ascii value) as the base key, along with a zeroed counter. 2. Across all map(…) calls keep track of if the current probable key is greater than the previous encountered key (beginning with the base key set above). Don't emit anything just yet - just keep reassigning the base key if its greater than the existing one (and reset the counter to 1). If its found equal, increment its counter. 3. In the cleanup(…) method, emit just the base key. 4. Given a MAX-like operation, configure a single reducer, and perform the very same max-tracking/final-emit within the setup(…), reduce(…) and cleanup(…) of the Reducer implementation, but take care to do the count aggregations before the compare, so you get the real count.
... View more
09-18-2015
04:59 AM
The value on the doc page is picked as about 20% of the RAM for overhead reservation, but you could set it lower. Our past overcommit testing does show that the values can reach close to extra 20% in use for some tested workloads, but that would not be an always-as-such case - and this may have changed overall lately also. We're reworking the docs for these recommendations soon in future, as developments happen. For now, please rely on the XLSX file for a more closer guideline on the recommended calculated values.
... View more