About csankaraiah

csankaraiah · ‎11-20-2015

Thanks @bbende!! So we do not recommend scaling Nifi vertical by increasing the heap size for the JVM to really large size?

csankaraiah · ‎11-19-2015

Can we have a file watcher kind of mechanism in Nifi, where the data flow gets triggered when ever a file shows up at source? Is it same as scheduling a getfile processor or run always?

csankaraiah · ‎11-19-2015

When we run HDF on a single machine , does all the data flow build on that machine run under a single JVM? I did see in Nifi documents which talks about how how you can control the spill the data from JVM to hardisk. But is there option to run via multiple JVM say one for each flow. Also How big of a JVM size you usually have for a edge node.

csankaraiah · ‎11-19-2015

1Nifi Custom Processor Overview Apache nifi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Its very simple to use product using which you can build "data flow" very easily. As of version 3.0 it has 90 prebuilt processors but then again you can extend them by adding your own custom processors. In this article I am going to talk about how you can build a custom nifi processor in your local machine and then move the final finished processor which is a nar file to nifi. This article is based on a video from youtube and here is the link for that https://www.youtube.com/watch?v=3ldmNFlelhw 2Steps to build Customer processor Here are the steps involved to build the custom processor for nifi. I used my mac to build this processor 2.1Required Software Two software that you would need in you location machine are 1.maven 2.Java Here is how you can quickly check if you have them installed mvn -version java -version Here are the results from my machine $ mvn -version Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-14T11:29:23-06:00) Maven home: /usr/local/Cellar/maven/3.2.5/libexec Java version: 1.8.0_65, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.10.4", arch: "x86_64", family: "mac" $ java -version java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) 2.2Create a directory where you want to build the processer I created the directory under following location cd <Home Dir>/Documents/nifi/ChakraProcessor mkdir ChakraProcessor 2.3Create the nifi processor with default value Get to that new directory you just created and use mvn command to build the required java files cd <Home Dir>/Documents/nifi/ChakraProcessor mvn archetype:generate You will be asked for bunch of parameters. I choose following parameters which are highlighted in bold Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): 690: nifi Choose archetype: 1: remote -> org.apache.nifi:nifi-processor-bundle-archetype (-) 2: remote -> org.apache.nifi:nifi-service-bundle-archetype (-) Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1 Choose org.apache.nifi:nifi-processor-bundle-archetype version: 1: 0.0.2-incubating 2: 0.1.0-incubating 3: 0.2.0-incubating 4: 0.2.1 5: 0.3.0 Choose a number: 5: 4 Downloading: https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.jar Downloaded: https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.jar (12 KB at 8.0 KB/sec) Downloading: https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.pom Downloaded: https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.pom (2 KB at 9.4 KB/sec) Define value for property 'groupId': : hwx Define value for property 'artifactId': : HWX Define value for property 'version': 1.0-SNAPSHOT: : 1.0 Define value for property 'artifactBaseName': : demo Define value for property 'package': hwx.processors.demo: : [INFO] Using property: nifiVersion = 0.1.0-incubating-SNAPSHOT Confirm properties configuration: groupId: hwx artifactId: HWX version: 1.0 artifactBaseName: demo package: hwx.processors.demo nifiVersion: 0.1.0-incubating-SNAPSHOT Y: : Y [INFO] ---------------------------------------------------------------------------- [INFO] Using following parameters for creating project from Archetype: nifi-processor-bundle-archetype:0.2.1 [INFO] ---------------------------------------------------------------------------- 2.4Modify the processor Above command will result in a MyProcessor.java file which is where you will put your code for your custom processor Open MyProcessor.java under following location <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-processors/src/main/java/hwx/processors Add following lines at the end after //TODO implement section // TODO implement System.out.println("This is a custom processor that will receive flow file"); session.transfer(flowFile,MY_RELATIONSHIP); 2.5Change POM There is change that you need to make to the POM file before you can create the package. Remove the -Snapshot from pom.xml file under following location <Home Dir>/Documents/nifi/ChakraProcessor/HWX 2.6Create nar file for your processor cd <Home Dir>/Documents/nifi/ChakraProcessor/HWX mvn install Once maven install is done you will have the nar file at the target directory with name nifi-demo-nar-1.0.nar cd <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-nar/target $ ls classes maven-archiver maven-shared-archive-resources nifi-demo-nar-1.0.nar 2.7Copy the nar file to Nifi Copy the nar file to the bin directory of where nifi is installed Here is a sample command to copy the file -- scp nifi-demo-nar-1.0.nar root@172.16.149.157:/opt/nifi-1.0.0.0-7/bin/ Once the nar file is copied youi need to restart nifi Once restarted you should be able to add the custom processor that you built which will show up with the name "MyProcessor" 2.8Build Nifi data flow You can build a new data flow using this customer processor GenerateFlowFile --> MyProcessor --> LogAttribute For "MyProcessor" you can enter some random value under the property section to make it valid.

csankaraiah · ‎11-03-2015

Hi @Janos, not that i am aware of.

csankaraiah · ‎10-27-2015

In terms of Azure HDInsight environment, here are few things to be aware of in terms of infrastructure: You have option to install HDInsight on Windows or HDInight on Linux (only on ubuntu 12 LTS). Apache Ambari only comes only with Linux based install. Type of machines used for Linux-based install were limited to D3, D4 & D12. Not sure if this is because of my Azure account limitations. HDInsight version is 3.2.1 which comes with HDP 2.2 certain components. Separate cluster required for Hadoop, Hbase , Storm. And Spark is available as Technical preview. Uses Blob storage as default for HDFS. Not sure if there is a option to add VHD or SSD. HDInsight 3.2 does not contain Falcon, Flume, Accumulo, Ambari Metrics, Atlas, Kafka, Knox, Ranger, Ranger KMS & Slider. Also it has a bit older version of hadoop components. Attached is a file that has comparison of HDInsight 3.2.1 components to that of HDP 2.3.2. hdinsight-and-hdp-component-comparison.zip Update by @Ancil McBarnett HDInsight Component Versioning: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/

csankaraiah · ‎10-22-2015

Right Ancil, these are jar files specifically for Oracle SQL developers connectivity. I thought this article will be useful for folks who have SQL developer as a standard SQL client tool in their company and have no other workaround 🙂

csankaraiah · ‎10-22-2015

Oracle SQL Developer is one of the most common SQL client tool that is used by Developers, Data Analyst, Data Architects etc for interacting with Oracle and other relational systems. So extending the functionality of SQL developer to connect to hive is very useful for Oracle users. I found this original article on oracle’s website and made some additions based upon the issues that I ran into. Here is the original link Oracle SQL Developer support for Hive Here are the steps that i followed Step1) For the latest version of SQL developer you would need JDK 1.8 so you would need to install that on your mac and also change the JAVA_HOME path so that it points to JDK 1.8. Download JDK 1.8 Step2) Download the latest version of Oracle SQL developer for mac, from oracle and unzip it Oracle SQL Developer Download Move the SQL Developer file to your application, so that it is available for you. Now when you try to open up Oracle SQL developer on mac, it may not open. For me it showed up in the tray, blinked for a while and then gone. So I had to follow this instruction to fix it Fix for Mac SQL Developer setup Once you have this fix then you should be able to open the Oracle SQL developer. Step3) Need to download JDBC driver for Hive that can work with Oracle SQL Developer. Cloudera has one available and here it he link for it Link for Hive JDBC Driver for Oracle SQL Developer Step4) Unzip the downloaded file from step3. There will be another zip file you will find called “Cloudera_HiveJDBC4_2.5.15.1040.zip”. Unzip that file as well and move all the jars to <your home directory>/.sqldeveloper/ ql.jar hive_service.jar hive_metastore.jar TCLIServiceClient.jar zookeeper-3.4.6.jar slf4j-log4j12-1.5.11.jar slf4j-api-1.5.11.jar log4j-1.2.14.jar libthrift-0.9.0.jar libfb303-0.9.0.jar HiveJDBC4.jar Step5) Add these jars to SQL developer Go “Oracle SQL Developer” --> Preferences Select Database and then “Third Party JDBC Drivers” and use add entry option to add the jar files mentioned in steps above. Restart the SQL developer to reflect this change. Step6) Open SQL developer and right click on connections to add a connection. Select the hive tab, enter your hive server details and Add that connection. You are all set to browse Hive tables via SQL Developer

csankaraiah · ‎10-20-2015

One of my client is using Azure based IaaS for their HDP cluster. They are open to using more expensive storage to get better performance. Is it recommended to use SSD for some of the data in hive tables, to get that boost in performance? Also what are the steps to make your temporary storage to point to SSD, that is used by Tez/MR jobs?

csankaraiah · ‎10-19-2015

Thanks guys for the response. I was able to modify the configuration for MS SQL server. Database Connection URLInfo--> jdbc:sqlserver://a5d3iwbrq1.database.windows.net:1433;databaseName=chakra Database Driver Class NameInfo--> com.microsoft.sqlserver.jdbc.SQLServerDriver Database Driver Jar UrlInfo--> file:///usr/share/java/sqljdbc4.jar setDatabase UserInfo--> chakra PasswordInfo--> ****** Once you have the configuration set, you also need to use generateFlowFile or something to trigger the ExecuteSQL as Timer Driver schedule does not work on the version of Nifi that i was using. Once this is done i ran into a bug where ExecuteSQL is not able to get the source table structure and gives a avro schema error https://issues.apache.org/jira/browse/NIFI-1010 I am assuming that once the above bug is fixed we should be able to use ExecuteSQl for MS SQLServer DB.

Online	Offline
Last Visited	‎08-14-2019 02:15 AM

Member Since	‎09-24-2015 10:46 PM
Last Visited	‎08-14-2019 02:15 AM
Posts	27
Kudos received	68

Cloudera Community

Re: Hive table with UTF-16 data

Re: Nifi Connection to MSSQL server DB

Re: What are best practices for setting up Backup ...

Re: How to scale vertically for a HDF instance

File Watcher scenario in HDF

How to scale vertically for a HDF instance

Build Custom Nifi Processor

Re: Hadoop Using SAP

HDInsight Component Comparison to HDP

Re: Connect Oracle SQL Developer to Hive

Connect Oracle SQL Developer to Hive

Can we get better performance for hive queries by ...

Re: Nifi Connection to MSSQL server DB