Member since
09-24-2015
27
Posts
69
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5110 | 12-04-2015 03:40 PM | |
27023 | 10-19-2015 01:56 PM | |
4039 | 09-29-2015 11:38 AM |
11-20-2015
01:09 AM
Thanks @bbende!! So we do not recommend scaling Nifi vertical by increasing the heap size for the JVM to really large size?
... View more
11-19-2015
10:55 PM
Can we have a file watcher kind of mechanism in Nifi, where the data flow gets triggered when ever a file shows up at source? Is it same as scheduling a getfile processor or run always?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
11-19-2015
10:46 PM
When we run HDF on a single machine , does all the data flow build on that machine run under a single JVM? I did see in Nifi documents which talks about how how you can control the spill the data from JVM to hardisk. But is there option to run via multiple JVM say one for each flow. Also How big of a JVM size you usually have for a edge node.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
11-19-2015
01:29 PM
9 Kudos
1Nifi Custom Processor Overview Apache nifi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Its very simple to use product using which you can build "data flow" very easily. As of version 3.0 it has 90 prebuilt processors but then again you can extend them by adding your own custom processors. In this article I am going to talk about how
you can build a custom nifi processor in your local machine and then move the final
finished processor which is a nar file to nifi. This article is based on a video from youtube
and here is the link for that https://www.youtube.com/watch?v=3ldmNFlelhw 2Steps to build Customer processor Here are the steps involved to build the
custom processor for nifi. I used my mac to build this processor 2.1Required
Software Two software that you would need in you
location machine are 1.maven 2.Java Here is how you can quickly check if you have
them installed mvn -version java -version Here are the results from my machine $ mvn -version Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1;
2014-12-14T11:29:23-06:00) Maven home: /usr/local/Cellar/maven/3.2.5/libexec Java version: 1.8.0_65, vendor: Oracle Corporation Java home:
/Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.10.4", arch:
"x86_64", family: "mac" $ java -version java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) 2.2Create
a directory where you want to build the processer I created the directory under following
location cd <Home Dir>/Documents/nifi/ChakraProcessor mkdir ChakraProcessor 2.3Create
the nifi processor with default value Get to that new directory you just created
and use mvn command to build the required java files cd <Home Dir>/Documents/nifi/ChakraProcessor mvn archetype:generate You will be asked for bunch of parameters. I
choose following parameters which are highlighted in bold Choose a number or apply filter (format:
[groupId:]artifactId, case sensitive contains): 690: nifi Choose archetype: 1: remote ->
org.apache.nifi:nifi-processor-bundle-archetype (-) 2: remote ->
org.apache.nifi:nifi-service-bundle-archetype (-) Choose a number or apply filter (format:
[groupId:]artifactId, case sensitive contains): : 1 Choose
org.apache.nifi:nifi-processor-bundle-archetype version: 1: 0.0.2-incubating 2: 0.1.0-incubating 3: 0.2.0-incubating 4: 0.2.1 5: 0.3.0 Choose a number: 5: 4 Downloading:
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.jar Downloaded:
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.jar
(12 KB at 8.0 KB/sec) Downloading:
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.pom Downloaded:
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-processor-bundle-archetype/0.2.1/nifi-processor-bundle-archetype-0.2.1.pom
(2 KB at 9.4 KB/sec) Define value for property 'groupId': : hwx Define value for property 'artifactId': : HWX Define value for property 'version': 1.0-SNAPSHOT: : 1.0 Define value for property 'artifactBaseName':
: demo Define value for property 'package': hwx.processors.demo: : [INFO] Using property: nifiVersion =
0.1.0-incubating-SNAPSHOT Confirm properties configuration: groupId: hwx artifactId: HWX version: 1.0 artifactBaseName: demo package: hwx.processors.demo nifiVersion: 0.1.0-incubating-SNAPSHOT Y: : Y [INFO]
---------------------------------------------------------------------------- [INFO] Using following parameters for
creating project from Archetype: nifi-processor-bundle-archetype:0.2.1 [INFO] ---------------------------------------------------------------------------- 2.4Modify
the processor Above command will result in a MyProcessor.java
file which is where you will put your code for your custom processor Open MyProcessor.java under following
location <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-processors/src/main/java/hwx/processors Add following lines at the end after //TODO
implement section //
TODO implement System.out.println("This is a custom
processor that will receive flow file"); session.transfer(flowFile,MY_RELATIONSHIP); 2.5Change
POM There is change that you need to make to the
POM file before you can create the package. Remove the -Snapshot from pom.xml file under
following location <Home Dir>/Documents/nifi/ChakraProcessor/HWX 2.6Create
nar file for your processor cd <Home Dir>/Documents/nifi/ChakraProcessor/HWX mvn install Once maven install is done you will have the
nar file at the target directory with name nifi-demo-nar-1.0.nar cd <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-nar/target $ ls classes maven-archiver maven-shared-archive-resources
nifi-demo-nar-1.0.nar 2.7Copy
the nar file to Nifi Copy the nar file to the bin directory of
where nifi is installed Here is a sample command to copy the file -- scp nifi-demo-nar-1.0.nar
root@172.16.149.157:/opt/nifi-1.0.0.0-7/bin/ Once the nar file is copied youi need to
restart nifi Once restarted you should be able to add the
custom processor that you built which will show up with the name "MyProcessor" 2.8Build
Nifi data flow You can build a new data flow using this
customer processor GenerateFlowFile --> MyProcessor -->
LogAttribute For "MyProcessor" you can enter some random value under the property section to make
it valid.
... View more
Labels:
10-27-2015
12:11 PM
4 Kudos
In terms of Azure HDInsight environment, here are few things to be aware of in terms of infrastructure:
You have option to install HDInsight on Windows or HDInight on Linux (only on ubuntu 12 LTS). Apache Ambari only comes only with Linux based install.
Type of machines used for Linux-based install were limited to D3, D4 & D12. Not sure if this is because of my Azure account limitations. HDInsight version is 3.2.1 which comes with HDP 2.2 certain components. Separate cluster required for Hadoop, Hbase , Storm. And Spark is available as Technical preview. Uses Blob storage as default for HDFS. Not sure if there is a option to add VHD or SSD. HDInsight 3.2 does not contain Falcon, Flume, Accumulo, Ambari Metrics, Atlas, Kafka, Knox, Ranger, Ranger KMS & Slider. Also it has a bit older version of hadoop components. Attached is a file that has comparison of HDInsight 3.2.1 components to that of HDP 2.3.2. hdinsight-and-hdp-component-comparison.zip Update by @Ancil McBarnett HDInsight Component Versioning: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/
... View more
Labels:
10-22-2015
07:01 PM
Right Ancil, these are jar files specifically for Oracle SQL developers connectivity. I thought this article will be useful for folks who have SQL developer as a standard SQL client tool in their company and have no other workaround 🙂
... View more
10-22-2015
03:02 PM
7 Kudos
Oracle SQL Developer is one of the most common SQL client tool that is used by Developers, Data Analyst, Data Architects etc for interacting with Oracle and other relational systems. So extending the functionality of SQL developer to connect to hive is very useful for Oracle users. I found this original article on oracle’s website and made some additions
based upon the issues that I ran into. Here is the original link Oracle
SQL Developer support for Hive Here are the steps that i followed Step1) For the latest version of SQL developer you would
need JDK 1.8 so you would need to install that on your mac and also change the
JAVA_HOME path so that it points to JDK 1.8. Download
JDK 1.8 Step2) Download the latest version of Oracle SQL developer
for mac, from oracle and unzip it Oracle
SQL Developer Download Move the SQL Developer file to your application, so that it
is available for you. Now when you try to open up Oracle SQL developer on mac,
it may not open. For me it showed up in the tray, blinked for a while and then
gone. So I had to follow this instruction to fix it Fix
for Mac SQL Developer setup Once you have this fix then you should be able to open the
Oracle SQL developer. Step3) Need to download JDBC driver for Hive that can work
with Oracle SQL Developer. Cloudera has one available and here it he link for
it Link
for Hive JDBC Driver for Oracle SQL Developer Step4) Unzip the downloaded file from step3. There will be
another zip file you will find called “Cloudera_HiveJDBC4_2.5.15.1040.zip”.
Unzip that file as well and move all the jars to <your home directory>/.sqldeveloper/ ql.jar hive_service.jar hive_metastore.jar TCLIServiceClient.jar zookeeper-3.4.6.jar slf4j-log4j12-1.5.11.jar slf4j-api-1.5.11.jar log4j-1.2.14.jar libthrift-0.9.0.jar libfb303-0.9.0.jar HiveJDBC4.jar Step5) Add these jars to SQL developer Go “Oracle SQL Developer” --> Preferences Select Database and then “Third Party JDBC Drivers” and use
add entry option to add the jar files mentioned in steps above. Restart the SQL developer to reflect this change. Step6) Open SQL developer and right click on connections to
add a connection. Select the hive tab, enter your hive server details and Add
that connection. You are all set to browse Hive tables via SQL Developer
... View more
Labels:
10-20-2015
11:49 AM
One of my client is using Azure based IaaS for their HDP cluster. They are open to using more expensive storage to get better performance. Is it recommended to use SSD for some of the data in hive tables, to get that boost in performance? Also what are the steps to make your temporary storage to point to SSD, that is used by Tez/MR jobs?
... View more
Labels:
- Labels:
-
Apache Hive
10-19-2015
01:56 PM
2 Kudos
Thanks guys for the response. I was able to modify the configuration for MS SQL server. Database Connection URLInfo--> jdbc:sqlserver://a5d3iwbrq1.database.windows.net:1433;databaseName=chakra Database Driver Class NameInfo--> com.microsoft.sqlserver.jdbc.SQLServerDriver Database Driver Jar UrlInfo--> file:///usr/share/java/sqljdbc4.jar
setDatabase UserInfo--> chakra PasswordInfo--> ****** Once you have the configuration set, you also need to use generateFlowFile or something to trigger the ExecuteSQL as Timer Driver schedule does not work on the version of Nifi that i was using. Once this is done i ran into a bug where ExecuteSQL is not able to get the source table structure and gives a avro schema error https://issues.apache.org/jira/browse/NIFI-1010 I am assuming that once the above bug is fixed we should be able to use ExecuteSQl for MS SQLServer DB.
... View more
- « Previous
-
- 1
- 2
- Next »