Member since
04-16-2019
373
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
23934 | 10-16-2018 11:27 AM | |
7988 | 09-29-2018 06:59 AM | |
1224 | 07-17-2018 08:44 AM | |
6800 | 04-18-2018 08:59 AM |
02-07-2022
04:12 AM
To fetch the policy details in Json format. Use the below command : curl -v -k -u {username} -H "Content-Type: application/json" -H "Accept: application/json" -X GET https://{Ranger_Host}:6182/service/public/v2/api/service/cm_hive/policy/ | python -m json.tool
... View more
07-05-2021
05:04 AM
Hi, I have a requirement like, i need to create hive policy with two groups .one group with "ALL" permissions to some "x" user and 2nd group with "select" permission to "y" user. i have created policy through REST APi with one group but with "all" permissions but how to mention 2nd group with "select" permission in same create policy command. Thanks in advance! Srini Podili
... View more
07-13-2020
09:41 PM
Hi, There is no package called spark.implicits. Spark 1.x: If you are using spark1.x version you will create sqlContext. By using sqlContext you can call sqlContext.implicits Example: val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) sqlContext.implicits Spark 2.x: If you are using spark2.x version you will create session object. By using session you can call spark.implicits. Example: val spark: SparkSession = SparkSession.builder.appName(appName).config("spark.master", "local[*]").getOrCreate spark.implicits Note: If you are created session object using different name then you need to call with that reference name. For example, val rangaSpark: SparkSession = SparkSession.builder.appName(appName).config("spark.master", "local[*]").getOrCreate rangaSpark.implicits
... View more
06-21-2020
11:29 PM
@trent_larson As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question.
... View more
04-20-2020
07:37 AM
1 Kudo
Hello @amol_08 , thank you for raising your question about why a hive select query with limit fails, while without limit isn't. Can you please specify the Hadoop distribution and the version you are using? E.g. CDH5.16, HDP3.1. what is the platform you are using, please? E.g. Hive, HiveServer2, Hive LLAP? I am asking these clarification questions to rule out any known issue you might hit. For this general problem statement I would like to raise your attention to our Cloudera Documentation [1] that describes the same type of query of "SELECT * FROM <table_name> LIMIT 10;" that will cause all partitions of the target table loaded into memory if the table was partitioned resulting memory pressure and how to tackle this issue. Please let us know if the referenced documentation addresses your enquiry by accepting this post as a solution. Thank you: Ferenc [1] https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_hive_tuning.html#hs2_identify_workload_characteristics
... View more
06-04-2019
02:24 AM
The above question and the entire reply thread below was originally posted in the Community Help Track. On Tue Jun 4 01:52 UTC 2019, a member of the HCC moderation staff moved it to the Cloud & Operations track. The Community Help Track is intended for questions about using the HCC site itself.
... View more
04-22-2019
02:13 AM
The error message shows you don't have a valid leader for the partition you are accessing. In kafka, all read/writes should go through the leader of that partition. You should make sure the topic/partitions have healthy leader first, run: kafka-topics --describe --zookeeper <zk_url, put /chroot if you have any> --topic <topic_name>
... View more
04-12-2019
04:36 PM
@Anurag Mishra Having used both Ranger and Sentry to build security over clusters, I can tell you Sentry was the weak link in Cloudera offering. The Apache Ranger It is a framework to enable, monitor and manage data security across the Hadoop platform. It provides a centralized security administration, access control and detailed auditing for user access within the Hadoop, Hive, HBase and other Apache components. This Framework has the vision to provide comprehensive security across the Apache Hadoop ecosystem. Because of Apache YARN, the Hadoop platform can now support a true data lake architecture. The data security within Hadoop needs to evolve to support multiple use cases for data access while providing a framework for the central administration of security policies and monitoring of user access. I can't enumerate all the advantages of Ranger over Sentry but here are a few The latest version has plugins for most of the components in the Hadoop ecosystem.(Hive, HDFS, YARN, Kafka, etc) You can extend the functionality by writing your own UDF's like [Geolocalised based policies] It has time-based rules. Data masking (PII, HIPAA compliance for GDPR). Ref:https://hortonworks.com/apache/ranger/ Sentry Personally, I find it rudimentary just like the Oracle Role-Based Access Control security where you create a role, grant this particular role some privileges and give the role to a user. This is quite cumbersome and a security management nightmare Ref:https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_sentry_overview.html#concept_bp4_tjw_jr__section_qrt_c54_js You will need to extensively read about the 2 solutions one of the reasons there was a merger was the solid security Hortonworks provided combined with governance with Atlas that Cloudera was lacking.
... View more
10-30-2018
05:38 PM
Hi, You can find here all the options for distcp : https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it. so you finally have : hadoop distcp -Dmapreduce.map.memory.mb=4096-Dyarn.app.mapreduce.am.resource.mb=4096 -Dmapred.job.queue.name=DISTCP_exec -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path
... View more
10-03-2018
06:46 AM
Hi @Anurag Mishra Spark keeps intermediate files in /tmp, where it likely ran out of space. You can either adjust spark.local.dir or set this at submission time, to a different directory with more space. Try the same job while adding in this during spark-submit; --conf "spark.local.dir=/directory/with/space" If that works well, you can change this permanently by adding this property to the custom spark defaults in ambari; spark.local.dir=/directory/with/space See also: https://spark.apache.org/docs/latest/configuration.html#application-properties
... View more