Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5127 | 09-21-2018 09:54 PM | |
6494 | 03-31-2018 03:59 AM | |
1968 | 03-31-2018 03:55 AM | |
2179 | 03-31-2018 03:31 AM | |
4828 | 03-27-2018 03:46 PM |
04-14-2020
12:10 AM
you can use .repartition(1) DF..repartition(1) .....
... View more
08-30-2019
11:19 AM
Hi, Instead of creating separate CGROUP for each Broker node in Kafka cluster, we can use kafka env to make it working. To configure Kafka to advertise FQDN and listening on all the IP addresses, add the following text to the bottom of the kafka-env-template. # Configure Kafka to advertise IP addresses instead of FQDN HOST_FQDN=$(hostname -f) echo advertised.listeners=$HOST_FQDN sed -i.bak -e '/advertised/{/advertised@/!d;}' /usr/hdp/current/kafka-broker/conf/server.properties echo "advertised.listeners=SASL_PLAINTEXT:://$HOST_FQDN:6667" >> /usr/hdp/current/kafka-broker/conf/server.properties To configure Kafka to listen on all network interfaces, change the value in the listeners field to SASL_PLAINTEXT:PLAINTEXT://0.0.0.0:6667 Reference from : https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connect-vpn-gateway Thanks, Saravana
... View more
11-13-2018
03:15 AM
It seems that the template is malformed and it has nothing to do with NiFi 1.8. The same issue with NiFi 1.3 which was the version used in the demo. I'll close this question.
... View more
09-23-2018
02:49 AM
I just noted there is a small note on top saying " Note This procedure requires change data capture from the operational database that has a primary key and modified date field where you pulled the records from since the last update. we don't have CDC on our database so we cant do incremental imports? it should be possible by looking at the date field as that's constantly increasing ?
... View more
04-11-2019
01:18 PM
A very informative post, Thanks for sharing! As per my observation, Kafka is more Network intensive application and with that being said I have question on Active-Active network bond configuration with Kafka. Is this something recommended and what are the considerations if i decide to go for it. Thanks again!
... View more
04-01-2018
06:21 PM
@Rahul Soni Yes, sir . . That's what I see in the ambari-server logs with regards to the error I posted above.
... View more
03-06-2018
03:58 AM
1 Kudo
@Constantin Stanca could you please explain the approach in detail.
... View more
02-05-2018
04:12 AM
14 Kudos
Apache NiFi evolution from version 1.2 included in HDF 3.0 and version 1.5 included in HDF is significant. I find myself quite often puzzled when required to provide differences between releases and just reading the release notes history at https://cwiki.apache.org/confluence/display/NIFI/Release+Notes and looking at the latest list of NiFi processors is not trivial to determine which new processors were added. I put together matrix which I hope will help developers to take advantage of new processor to improve old and develop new flows. In a nutshell, main functionality added is around: AzureEventHub Kafka 0.11 and 1.0 processors Record processors RethinkDB Flatten Json Execute Spark Interactive Execute Groovy Script My favorite improvements are
around record processors, flattening JSON and executing Spark
interactively. The following is a table of the matrix, arranged alphabetically from A-D: See here for the Matrix Table from E-J See here for the Matrix Tabke from K-Z For NiFi 1.5 NiFi 1.4 NiFi 1.3 NiFi 1.2 AttributeRollingWindow AttributeRollingWindow AttributeRollingWindow AttributeRollingWindow AttributesToJSON AttributesToJSON AttributesToJSON AttributesToJSON Base64EncodeContent Base64EncodeContent Base64EncodeContent Base64EncodeContent CaptureChangeMySQL CaptureChangeMySQL CaptureChangeMySQL CaptureChangeMySQL CompareFuzzyHash CompareFuzzyHash CompareFuzzyHash CompareFuzzyHash CompressContent CompressContent CompressContent CompressContent ConnectWebSocket ConnectWebSocket ConnectWebSocket ConnectWebSocket ConsumeAMQP ConsumeAMQP ConsumeAMQP ConsumeAMQP ConsumeAzureEventHub ConsumeEWS ConsumeEWS ConsumeEWS ConsumeEWS ConsumeIMAP ConsumeIMAP ConsumeIMAP ConsumeIMAP ConsumeJMS ConsumeJMS ConsumeJMS ConsumeJMS ConsumeKafka ConsumeKafka ConsumeKafka ConsumeKafka ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_11 ConsumeKafka_0_11 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_11 ConsumeKafkaRecord_0_11 ConsumeKafka_1_0 ConsumeKafkaRecord_1_0 ConsumeMQTT ConsumeMQTT ConsumeMQTT ConsumeMQTT ConsumePOP3 ConsumePOP3 ConsumePOP3 ConsumePOP3 ConsumeWindowsEventLog ConsumeWindowsEventLog ConsumeWindowsEventLog ConsumeWindowsEventLog ControlRate ControlRate ControlRate ControlRate ConvertAvroSchema ConvertAvroSchema ConvertAvroSchema ConvertAvroSchema ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToORC ConvertAvroToORC ConvertAvroToORC ConvertAvroToORC ConvertCharacterSet ConvertCharacterSet ConvertCharacterSet ConvertCharacterSet ConvertCSVToAvro ConvertCSVToAvro ConvertCSVToAvro ConvertCSVToAvro ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToSQL ConvertJSONToSQL ConvertJSONToSQL ConvertJSONToSQL ConvertRecord ConvertRecord ConvertRecord ConvertRecord CreateHadoopSequenceFile CreateHadoopSequenceFile CreateHadoopSequenceFile CreateHadoopSequenceFile CountText DebugFlow DebugFlow DebugFlow DebugFlow DeleteDynamoDB DeleteDynamoDB DeleteDynamoDB DeleteDynamoDB DeleteGCSObject DeleteGCSObject DeleteGCSObject DeleteGCSObject DeleteHDFS DeleteHDFS DeleteHDFS DeleteHDFS DeleteElasticsearch5 DeleteElasticsearch5 DeleteRethinkDB DeleteRethinkDB DeleteS3Object DeleteS3Object DeleteS3Object DeleteS3Object DeleteMongo DeleteSQS DeleteSQS DeleteSQS DeleteSQS DetectDuplicate DetectDuplicate DetectDuplicate DetectDuplicate DistributeLoad DistributeLoad DistributeLoad DistributeLoad DuplicateFlowFile DuplicateFlowFile DuplicateFlowFile DuplicateFlowFile
... View more
Labels:
12-15-2017
09:49 PM
1 Kudo
Airflow maintainer here. I know th is question is a bit dated, but it still turns up in the searches. Airflow and Nifi both have their strengths and weaknesses. Let me list some of the great things of Airflow that set it apart. 1. Configuration as code. Airflow uses python for the definitions of DAGs (I.e. workflows). This gives you the full power and flexibility of a programming language with a wealth of modules. 2. DAGs are testable and versionable. As they are in code you can integrate your workflow definitions into your CI/CD pipeline. 3. Ease of setup, local development. While Airflow gives you horizontal and vertical scaleability it also allows your developers to test and run locally, all from a single pip install Apache-airflow. This greatly enhances productivity and reproducibility. 4. Real Data sucks Airflow knows that so we have features for retrying and SLAs 5. Changing history. After a year you find out that you need to put a task into a dag, but it needs to run ‘in the past’. Airflow allows you to do backfills giving you the opportunity to rewrite history. And guess what, you more often need it than you think. 6. Great debugability. There are logs for everything, but nicely tied to the unit of work they are doing. Scheduler logs, DAG parsing/professing logs, task logs. Being in python the hurdle is quite low to jump in and do a fix yourself if needed. 7. A wealth of connectors that allow you to run tasks on kubernetes, Docker, spark, hive, presto, Druid, etc etc. 8. A very active community.
... View more
10-06-2017
07:20 PM
6 Kudos
Introduction This is a continuation of an article I wrote about 1 year ago: https://community.hortonworks.com/articles/60580/jmeter-setup-for-hive-load-testing-draft.htmlhttps://www.blazemeter.com/blog/windows-authentication-apache-jmeter Steps 1) Enable Kerberos on your cluster Perform all steps specified here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/configuring_amb_hdp_for_kerberos.html and connect successfully to hive service via command line using your user keytab. That implies a valid ticket. 2) Install JMeter See previous article mentioned in Introduction. 3) Set Hive User keytab in jaas.conf JMETER_HOME/bin/jaas.conf Your jaas.conf should look something like this: JMeter {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/etc/security/keytabs/hive.service.keytab"
principal="hive/server.example.com@EXAMPLE.COM"
debug=true;
}; 4) JMeter Setup There are 2 files under /bin folder of the JMeter installation which are used for Kerberos configuration: krb5.conf - file of .ini format which contains Kerberos configuration details jaas.conf - file which holds configuration details of Java Authentication and Authorization service These files aren’t being used by default, so you have to tell JMeter where they are via system properties such as: -Djava.security.krb5.conf=krb5.conf
-Djava.security.auth.login.config=jaas.conf Alternatively you can add the next two lines to the system.properties file which is located at the same /bin folder. java.security.krb5.conf=krb5.conf
java.security.auth.login.config=jaas.conf I suggest using full paths to files. 5) Manage Issues If you encounter any issues: - enable debug by adding the following to your command: -Dsun.security.krb5.debug=true
-Djava.security.debug=gssloginconfig,configfile,configparser,logincontext - check jmeter.log to see whether all properties are set as expected and map to existent file paths. 6) Turn-off Subject Credentials -Djavax.security.auth.useSubjectCredsOnly=false 7) Example of JMeter Command JVM_ARGS="-Xms1024m
-Xmx1024m" bin/jmeter -Dsun.security.krb5.debug=true
-Djavax.security.auth.useSubjectCredsOnly=false
-Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
-Djava.security.krb5.conf=/path/to/krb5.conf
-Djava.security.auth.login.config=/path/to/jaas.conf -n -t t1.jmx -l results -e
-o output This could be simplified if you add those two lines mentioned earlier to be added to system.properties.
... View more
Labels: