Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1984 | 07-09-2019 12:53 AM | |
| 11940 | 06-23-2019 08:37 PM | |
| 9196 | 06-18-2019 11:28 PM | |
| 10189 | 05-23-2019 08:46 PM | |
| 4610 | 05-20-2019 01:14 AM |
02-27-2016
09:27 PM
We do not currently recommend the use of StorageBasedAuthorizationProvider. While Sentry's initial setup (esp. with HDFS ACL sync enabled) may seem a little involved, note that its much simpler than ending up in a longer term situation of managing several HDFS paths and keeping them controlled manually. Currently that fix is not in scope of a backport, since this plugin is not supported for use in a CDH environment, but it may be added in future (such as if/when a rebase occurs).
... View more
02-27-2016
09:23 PM
Are you having issues creating a table backed as AVRO type? What is your CREATE TABLE statement, and how are you loading the file into the table? For more on creating and using Avro tables in Hive, see http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive.html
... View more
02-27-2016
09:21 PM
What is your CREATE TABLE statement? Are you specifying the right field delimiter character (\t if you are using that)? The default delimiter otherwise is ^A, which your data likely does not carry.
... View more
02-25-2016
05:06 PM
The listener feature is to make Kafka Brokers listen on multiple ports: http://kafka.apache.org/documentation.html#security_configbroker. Specifying hostnames along with that is just a way of specifying an interface (i.e. if not to be wild-carded). Thereby, the validation that each element's port be different if multiple are specified, is the right thing for it to do. If you wanted a single port to be listened to globally, simply use PLAINTEXT://0.0.0.0:9092, instead of a multi-host list.
... View more
02-25-2016
03:23 PM
1 Kudo
Unless "hdfs groups x" returns both "A" and "B" in its results, HDFS/HBase/etc. would not be aware of such a relationship. If you use SSSD+LDAP to resolve groups for your OS, you can request it to query a certain level of nesting. See ldap_group_nesting_level under http://linux.die.net/man/5/sssd-ldap.
... View more
02-25-2016
03:00 PM
Sure - Spark is a pure YARN app for the most part, with little to none server-side components. As long as you submit your application with the right Spark tarball/binary, the specified Spark version will be in use for running that very application. The use of multiple Spark History Servers, if needed, can also be done in form of separated configs and ports. Note that CDH-wise, we ship only one Spark version, bound to its CDH version by build. Formal support of other varied versions outside of the CDH provided one is not covered (if you have a subscription).
... View more
02-22-2016
02:20 AM
1 Kudo
> - is the average load in byte?/kb? > - the average is done on each day/ week? It is the average of number of regions hosted by each RS. Its computed when you run the status command. > - why is the average different than when I do only status and the same on status 'summary' and status 'replication' ? Am not sure I follow. Could you post the observed difference? > - what is the meaning of the 'aggregate load' indicator? The load is aggregated from the number of requests per second ( requestsPerSecond) > - does the compactionProgressPct correesponds to major_compact ? Yes, and it measures the progress for a specific major compacting region at the point of time of running the status command. > - What is the meaning of: totalCompactingKVs / currentCompactedKVs Compactions cause rewrites of KV pairs inside HFiles, into new (fewer/single) HFiles. These numbers track the total KVs participating in the tracked compaction, and the count of how many that have completed persistence so far.
... View more
02-18-2016
01:03 AM
1 Kudo
You can get your live RegionServer IDs with startcodes included via the HBase Shell command: status 'simple' An output line from this, such as the below: host.cloudera.com:60020 1455726247381 Can then be converted into the right format: host.cloudera.com,22101,1455726247381
... View more
02-18-2016
12:59 AM
1 Kudo
The "admin" user is something usually used within Hue (it could of course be a valid user in your environment, but this is the only assumption I can draw). HS2 cleans the temporary elements if the session holding the query that created it, has terminated. With Hue, especially on versions prior to CDH 5.2.0, you may have a situation where the admin user's sessions have never been closed/terminated, and the HS2 continues to hold references of the queries that user ran in past, whereas the other usernames are likely ending their Hue backed sessions correctly (depends on how they're working over Hue). If you have CDH 5.2.0 or above, consider setting the various idle server-side timeouts under CM -> Hive -> Configuration (search "idle").
... View more
02-17-2016
09:12 PM
CDH 5.4 had Spark 1.3.0 plus patches, which per the blog post seems like it would not work either (it quotes "strong dependency", which I take means ONLY 1.4.1?). CDH 5.5.x onwards carries Spark 1.5.x with patches. There has been no CDH5 release with Spark 1.4.x in it. You could use a Apache Spark 1.4.1 release from upstream, manually rebuilt against your CDH5 version of Apache Hadoop, and use the tar-ball paths for all Spark operations, and this should work. However, such a Spark deployment would not be officially supported by Cloudera Support (if you have a subscription).
... View more