About amol_08

PandurangB · ‎02-07-2022

To fetch the policy details in Json format. Use the below command : curl -v -k -u {username} -H "Content-Type: application/json" -H "Accept: application/json" -X GET https://{Ranger_Host}:6182/service/public/v2/api/service/cm_hive/policy/ | python -m json.tool

srinivasp · ‎07-05-2021

Hi, I have a requirement like, i need to create hive policy with two groups .one group with "ALL" permissions to some "x" user and 2nd group with "select" permission to "y" user. i have created policy through REST APi with one group but with "all" permissions but how to mention 2nd group with "select" permission in same create policy command. Thanks in advance! Srini Podili

RangaReddy · ‎07-13-2020

Hi, There is no package called spark.implicits. Spark 1.x: If you are using spark1.x version you will create sqlContext. By using sqlContext you can call sqlContext.implicits Example: val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) sqlContext.implicits Spark 2.x: If you are using spark2.x version you will create session object. By using session you can call spark.implicits. Example: val spark: SparkSession = SparkSession.builder.appName(appName).config("spark.master", "local[*]").getOrCreate spark.implicits Note: If you are created session object using different name then you need to call with that reference name. For example, val rangaSpark: SparkSession = SparkSession.builder.appName(appName).config("spark.master", "local[*]").getOrCreate rangaSpark.implicits

VidyaSargur · ‎06-21-2020

@trent_larson As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question.

Bender · ‎04-20-2020

Hello @amol_08 , thank you for raising your question about why a hive select query with limit fails, while without limit isn't. Can you please specify the Hadoop distribution and the version you are using? E.g. CDH5.16, HDP3.1. what is the platform you are using, please? E.g. Hive, HiveServer2, Hive LLAP? I am asking these clarification questions to rule out any known issue you might hit. For this general problem statement I would like to raise your attention to our Cloudera Documentation [1] that describes the same type of query of "SELECT * FROM <table_name> LIMIT 10;" that will cause all partitions of the target table loaded into memory if the table was partitioned resulting memory pressure and how to tackle this issue. Please let us know if the referenced documentation addresses your enquiry by accepting this post as a solution. Thank you: Ferenc [1] https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_hive_tuning.html#hs2_identify_workload_characteristics

ask_bill_brooks · ‎06-04-2019

The above question and the entire reply thread below was originally posted in the Community Help Track. On Tue Jun 4 01:52 UTC 2019, a member of the HCC moderation staff moved it to the Cloud & Operations track. The Community Help Track is intended for questions about using the HCC site itself.

Yuexin Zhang · ‎04-22-2019

The error message shows you don't have a valid leader for the partition you are accessing. In kafka, all read/writes should go through the leader of that partition. You should make sure the topic/partitions have healthy leader first, run: kafka-topics --describe --zookeeper <zk_url, put /chroot if you have any> --topic <topic_name>

Shelton · ‎04-12-2019

@Anurag Mishra Having used both Ranger and Sentry to build security over clusters, I can tell you Sentry was the weak link in Cloudera offering. The Apache Ranger It is a framework to enable, monitor and manage data security across the Hadoop platform. It provides a centralized security administration, access control and detailed auditing for user access within the Hadoop, Hive, HBase and other Apache components. This Framework has the vision to provide comprehensive security across the Apache Hadoop ecosystem. Because of Apache YARN, the Hadoop platform can now support a true data lake architecture. The data security within Hadoop needs to evolve to support multiple use cases for data access while providing a framework for the central administration of security policies and monitoring of user access. I can't enumerate all the advantages of Ranger over Sentry but here are a few The latest version has plugins for most of the components in the Hadoop ecosystem.(Hive, HDFS, YARN, Kafka, etc) You can extend the functionality by writing your own UDF's like [Geolocalised based policies] It has time-based rules. Data masking (PII, HIPAA compliance for GDPR). Ref:https://hortonworks.com/apache/ranger/ Sentry Personally, I find it rudimentary just like the Oracle Role-Based Access Control security where you create a role, grant this particular role some privileges and give the role to a user. This is quite cumbersome and a security management nightmare Ref:https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_sentry_overview.html#concept_bp4_tjw_jr__section_qrt_c54_js You will need to extensively read about the 2 solutions one of the reasons there was a merger was the solid security Hortonworks provided combined with governance with Atlas that Cloudera was lacking.

slim_abderrahim · ‎10-30-2018

Hi, You can find here all the options for distcp : https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it. so you finally have : hadoop distcp -Dmapreduce.map.memory.mb=4096-Dyarn.app.mapreduce.am.resource.mb=4096 -Dmapred.job.queue.name=DISTCP_exec -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path

JonathanSneep · ‎10-03-2018

Hi @Anurag Mishra Spark keeps intermediate files in /tmp, where it likely ran out of space. You can either adjust spark.local.dir or set this at submission time, to a different directory with more space. Try the same job while adding in this during spark-submit; --conf "spark.local.dir=/directory/with/space" If that works well, you can change this permanently by adding this property to the custom spark defaults in ambari; spark.local.dir=/directory/with/space See also: https://spark.apache.org/docs/latest/configuration.html#application-properties

Online	Offline
Last Visited	‎07-12-2021 03:39 AM

Member Since	‎04-16-2019 11:04 PM
Last Visited	‎07-12-2021 03:39 AM
Posts	373
Kudos received	7

Cloudera Community

Re: not able to import import spark.implicits._

Re: HBase data migration from one cluster to other...

Re: ranger policy not visible for delegate admin

Re: hive issue while renaming location

Re: Rest api to get ranger policy for a particular...

Re: rest api to add user into ranger's policy

Re: not able to import import spark.implicits._

Re: HBase data migration from one cluster to other...

Re: hive select query with limit

Re: how to reduce hbase number of regions per regi...

Re: kakfa leader not available I have added 1 mor...

Re: difference between Ranger and sentry

Re: How to improve distcp performance

Re: spark job failure with no space left on device