About sunile_manjee

rgelhausen · ‎07-14-2016

You can use ExecuteSQL to run any query that the target DB supports. See docs here.

sunile_manjee · ‎07-15-2016

@Randy Gelhausen that is is awesome feedback. thanks for the insights. I more then likely would use the api for most use cases. for this specific use case a jdbc connector is required.

sunile_manjee · ‎07-14-2016

@Saravanan Ramaraj have you looked into apache knox? The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of policy enforcement, through providers and the backend services for which it proxies requests. The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Apache Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise. Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) Auditing And then for authorization you can use Apache Ranger which offers a centralized security framework to manage fine-grained access control over Hadoop data access components coupled with kerberos you cluster will be secured and the links shall be authenticed using kerberos and ranger will provide authorization on what services the user has access to. Finally knox will be your perimeter security.

doug_mengistu · ‎07-18-2016

Try this, but this version is for version 1.5 and up data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')

abhi86865 · ‎03-19-2017

when i try for tcp port i am getting connection refused for udp it is ok... what could be the reason

cstanca · ‎07-11-2016

@Sunile Manjee Until available as a feature, your best answer is the one that provides a workaround using -f InitFile and include in your initFile what you need to initialize. beeline -f InitFile jdbc:hive2://WHATEVER:10000

anandt8dev · ‎12-01-2017

Docker image of hortonworks schema registry: https://hub.docker.com/r/thebookpeople/hortonworks-registry/

bleonhardi · ‎07-11-2016

Regarding how refer to Sunile. Pig is nice and flexible, Hive is good if you know SQL and your RFID data is already basically in a flat table format, Spark also works well ... But the question is if you really want to process 100GB of data on the sandbox. The memory settings are tiny there is a single drive data is not replicated ... If you do it like this you can just use python on a local machine. If you want a decent environment you might want to set up 3-4 nodes on a VMware server perhaps 32GB of RAM for each? That would give you a nice little environment and you could actually do some fast processing.

sunile_manjee · ‎07-09-2016

Short Description: Teragen and Terasort Performance testing on AWS Article This article should be used with extreme care. Do not use as benchmark. I performed this test to simply run a quick 1 Terabype teragen test on AWS to determine what type of performance I can get from mapreduce on AWS with VERY LITTLE configuration tweaking/tuning On my github page here you will find the following: teragen script hadoop,yarn,mapred,capacity scheduler configurations used during testing Hardware: (Master & Datanode) 1 Master, 3 Data nodes d2.4xlarge, 16vCPU, 122GB ram, (max) 12x2000 Storage TeraGen Results: 1hrs, 6mins, 38sec Job Counters: Terasort Results: 1hrs, 34mins, 20sec Teravalidate Results: 25mins, 27sec

akanto · ‎07-08-2016

hi, it seems it is an issue in the documentation. On every cloud provider you can use 'cloudbreak' user. If you are looking for the public IP address of VMs started by Cloudbreak, you can find them under the Nodes section of the UI. Attila

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Can NiFi execute MDX queries?

Re: Can NiFi connect to SAP BW?

Re: Restrict/Protect free access to users through ...

Re: How do you write a RDD as a tab delimited file...

Re: Ambari Kerberos Wizard: Zookeeper service won'...

Re: beeline is there -i option?

Re: Avro Schema Registry

Re: How to process large volume of data(e.g, 100 G...

Teragen, Terasort, and Teravalidate Performance te...

Re: Accessing HDP client services on cloudbreak