About sunile_manjee

sunile_manjee · ‎07-14-2016

Can NiFi execute MDX queries? Which processor would I use?

sunile_manjee · ‎07-14-2016

Can NiFi connect to SAP BW? I want to move data from SAP BW/Hana to HDP. Can NiFi do this? which processor would I use?

sunile_manjee · ‎07-14-2016

@Raghu Udiyar you can manage your cluster through puppet/chef which would make call to ambari rest api for changes. this is used a ton in the field with success. I believe you are suggest simply upload new blueprint and cluster should detect changes and perform. its an interested idea. however the current methods work very well with success.

sunile_manjee · ‎07-14-2016

@Emily Sharpe thanks for the insights.

sunile_manjee · ‎07-14-2016

@Raghu Udiyar the blueprint is managed by ambari. So if you are changing your cluster, the blueprint should reflect the change. for example I just added storm to my cluster. I did a fetch on the blueprint and the storm now exist in the blueprint. Does that help or did i misunderstand your questions. cheers

sunile_manjee · ‎07-14-2016

@Binu Mathew do you have any thoughts?

sunile_manjee · ‎07-14-2016

@ANSARI FAHEEM AHMED I am not sure if I follow your question. You should be able to increase yarn memory independent of node node heap size. Ambari may make recommendations based on the memory available to cluster what the name node heap size should be. So if yarn memory is increase it now believes more memory is available to the cluster and may make a increase/decrease recommendation Run the yarn util scipt available here to play with different configurations.

sunile_manjee · ‎07-14-2016

@Saravanan Ramaraj have you looked into apache knox? The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of policy enforcement, through providers and the backend services for which it proxies requests. The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Apache Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise. Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) Auditing And then for authorization you can use Apache Ranger which offers a centralized security framework to manage fine-grained access control over Hadoop data access components coupled with kerberos you cluster will be secured and the links shall be authenticed using kerberos and ranger will provide authorization on what services the user has access to. Finally knox will be your perimeter security.

sunile_manjee · ‎07-12-2016

@Kit Menke can you verify your KDC is using udp and not tcp?

sunile_manjee · ‎07-12-2016

@jestin ma found a similar solution here. "You can use date processing functions which have been introduced in Spark 1.5. Assuming you have following data: val df =Seq((1L,"05/26/2016 01:01:01"),(2L,"#$@#@#")).toDF("id","dts") You can use unix_timestamp to parse strings and cast it to timestamp import org.apache.spark.sql.functions.unix_timestamp val ts = unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("timestamp") df.withColumn("ts", ts).show(2,false)// +---+-------------------+---------------------+// |id |dts |ts |// +---+-------------------+---------------------+// |1 |05/26/2016 01:01:01|2016-05-26 01:01:01.0|// |2 |#$@#@# |null |// +---+-------------------+---------------------+ As you can see it covers both parsing and error handling. In Spark < 1.6 you'll have to use use something like this: unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("double").cast("timestamp") or (unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss")*1000).cast("timestamp") due to SPARK-11724. In Spark < 1.5 you should be able to use these with expr and HiveContext ."

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Can NiFi execute MDX queries?

Can NiFi connect to SAP BW?

Re: How to update/modify a blueprint once the clus...

Re: Best way to ensure null values are not stored ...

Re: How to update/modify a blueprint once the clus...

Re: How do you write a RDD as a tab delimited file...

Re: Name node heap memory and yarn memory?

Re: Restrict/Protect free access to users through ...

Re: Ambari Kerberos Wizard: Zookeeper service won'...

Re: TimestampType format for Spark DataFrames