Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4025 | 08-20-2018 08:26 PM | |
| 1930 | 08-15-2018 01:59 PM | |
| 2361 | 08-13-2018 02:20 PM | |
| 4077 | 07-23-2018 04:37 PM | |
| 4993 | 07-19-2018 12:52 PM |
07-14-2016
03:57 PM
You can use ExecuteSQL to run any query that the target DB supports. See docs here.
... View more
07-15-2016
09:04 PM
@Randy Gelhausen that is is awesome feedback. thanks for the insights. I more then likely would use the api for most use cases. for this specific use case a jdbc connector is required.
... View more
07-14-2016
03:12 AM
@Saravanan Ramaraj have you looked into apache knox? The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of
policy enforcement, through providers and the backend services for which it proxies requests. The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Apache Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control,
integration, monitoring and automation of critical administrative and analytical needs of the enterprise.
Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) Auditing And then for authorization you can use Apache Ranger which offers a centralized security framework to manage fine-grained access control over Hadoop data access components coupled with kerberos you cluster will be secured and the links shall be authenticed using kerberos and ranger will provide authorization on what services the user has access to. Finally knox will be your perimeter security.
... View more
07-18-2016
03:42 PM
Try this, but this version is for version 1.5 and up data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')
... View more
03-19-2017
06:47 AM
when i try for tcp port i am getting connection refused for udp it is ok... what could be the reason
... View more
07-11-2016
05:23 PM
@Sunile Manjee Until available as a feature, your best answer is the one that provides a workaround using -f InitFile and include in your initFile what you need to initialize. beeline -f InitFile jdbc:hive2://WHATEVER:10000
... View more
12-01-2017
11:39 AM
Docker image of hortonworks schema registry: https://hub.docker.com/r/thebookpeople/hortonworks-registry/
... View more
07-11-2016
10:31 AM
Regarding how refer to Sunile. Pig is nice and flexible, Hive is good if you know SQL and your RFID data is already basically in a flat table format, Spark also works well ... But the question is if you really want to process 100GB of data on the sandbox. The memory settings are tiny there is a single drive data is not replicated ... If you do it like this you can just use python on a local machine. If you want a decent environment you might want to set up 3-4 nodes on a VMware server perhaps 32GB of RAM for each? That would give you a nice little environment and you could actually do some fast processing.
... View more
07-09-2016
01:30 AM
6 Kudos
Short Description: Teragen and Terasort Performance testing on AWS Article This article should be used with extreme care. Do not use as benchmark. I performed this test to simply run a quick 1 Terabype teragen test on AWS to determine what type of performance I can get from mapreduce on AWS with VERY LITTLE configuration tweaking/tuning On my github page here you will find the following:
teragen script hadoop,yarn,mapred,capacity scheduler configurations used during testing Hardware: (Master & Datanode) 1 Master, 3 Data nodes d2.4xlarge, 16vCPU, 122GB ram, (max) 12x2000 Storage TeraGen Results: 1hrs, 6mins, 38sec Job Counters: Terasort Results: 1hrs, 34mins, 20sec Teravalidate Results: 25mins, 27sec
... View more
Labels:
07-08-2016
05:33 PM
hi, it seems it is an issue in the documentation. On every cloud provider you can use 'cloudbreak' user. If you are looking for the public IP address of VMs started by Cloudbreak, you can find them under the Nodes section of the UI. Attila
... View more