About cstanca

Ciro_Fuccio · ‎04-14-2020

you can use .repartition(1) DF..repartition(1) .....

saravanakumar_n · ‎08-30-2019

Hi, Instead of creating separate CGROUP for each Broker node in Kafka cluster, we can use kafka env to make it working. To configure Kafka to advertise FQDN and listening on all the IP addresses, add the following text to the bottom of the kafka-env-template. # Configure Kafka to advertise IP addresses instead of FQDN HOST_FQDN=$(hostname -f) echo advertised.listeners=$HOST_FQDN sed -i.bak -e '/advertised/{/advertised@/!d;}' /usr/hdp/current/kafka-broker/conf/server.properties echo "advertised.listeners=SASL_PLAINTEXT:://$HOST_FQDN:6667" >> /usr/hdp/current/kafka-broker/conf/server.properties To configure Kafka to listen on all network interfaces, change the value in the listeners field to SASL_PLAINTEXT:PLAINTEXT://0.0.0.0:6667 Reference from : https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connect-vpn-gateway Thanks, Saravana

cstanca · ‎11-13-2018

It seems that the template is malformed and it has nothing to do with NiFi 1.8. The same issue with NiFi 1.3 which was the version used in the demo. I'll close this question.

aliyesami · ‎09-23-2018

I just noted there is a small note on top saying " Note This procedure requires change data capture from the operational database that has a primary key and modified date field where you pulled the records from since the last update. we don't have CDC on our database so we cant do incremental imports? it should be possible by looking at the date field as that's constantly increasing ?

tarungoyal_1988 · ‎04-11-2019

A very informative post, Thanks for sharing! As per my observation, Kafka is more Network intensive application and with that being said I have question on Active-Active network bond configuration with Kafka. Is this something recommended and what are the considerations if i decide to go for it. Thanks again!

dcedielunesa · ‎04-01-2018

@Rahul Soni Yes, sir . . That's what I see in the ambari-server logs with regards to the error I posted above.

gaurang_n_shah · ‎03-06-2018

@Constantin Stanca could you please explain the approach in detail.

cstanca · ‎02-05-2018

Apache NiFi evolution from version 1.2 included in HDF 3.0 and version 1.5 included in HDF is significant. I find myself quite often puzzled when required to provide differences between releases and just reading the release notes history at https://cwiki.apache.org/confluence/display/NIFI/Release+Notes and looking at the latest list of NiFi processors is not trivial to determine which new processors were added. I put together matrix which I hope will help developers to take advantage of new processor to improve old and develop new flows. In a nutshell, main functionality added is around: AzureEventHub Kafka 0.11 and 1.0 processors Record processors RethinkDB Flatten Json Execute Spark Interactive Execute Groovy Script My favorite improvements are around record processors, flattening JSON and executing Spark interactively. The following is a table of the matrix, arranged alphabetically from A-D: See here for the Matrix Table from E-J See here for the Matrix Tabke from K-Z For NiFi 1.5 NiFi 1.4 NiFi 1.3 NiFi 1.2 AttributeRollingWindow AttributeRollingWindow AttributeRollingWindow AttributeRollingWindow AttributesToJSON AttributesToJSON AttributesToJSON AttributesToJSON Base64EncodeContent Base64EncodeContent Base64EncodeContent Base64EncodeContent CaptureChangeMySQL CaptureChangeMySQL CaptureChangeMySQL CaptureChangeMySQL CompareFuzzyHash CompareFuzzyHash CompareFuzzyHash CompareFuzzyHash CompressContent CompressContent CompressContent CompressContent ConnectWebSocket ConnectWebSocket ConnectWebSocket ConnectWebSocket ConsumeAMQP ConsumeAMQP ConsumeAMQP ConsumeAMQP ConsumeAzureEventHub ConsumeEWS ConsumeEWS ConsumeEWS ConsumeEWS ConsumeIMAP ConsumeIMAP ConsumeIMAP ConsumeIMAP ConsumeJMS ConsumeJMS ConsumeJMS ConsumeJMS ConsumeKafka ConsumeKafka ConsumeKafka ConsumeKafka ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_10 ConsumeKafka_0_11 ConsumeKafka_0_11 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_10 ConsumeKafkaRecord_0_11 ConsumeKafkaRecord_0_11 ConsumeKafka_1_0 ConsumeKafkaRecord_1_0 ConsumeMQTT ConsumeMQTT ConsumeMQTT ConsumeMQTT ConsumePOP3 ConsumePOP3 ConsumePOP3 ConsumePOP3 ConsumeWindowsEventLog ConsumeWindowsEventLog ConsumeWindowsEventLog ConsumeWindowsEventLog ControlRate ControlRate ControlRate ControlRate ConvertAvroSchema ConvertAvroSchema ConvertAvroSchema ConvertAvroSchema ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToJSON ConvertAvroToORC ConvertAvroToORC ConvertAvroToORC ConvertAvroToORC ConvertCharacterSet ConvertCharacterSet ConvertCharacterSet ConvertCharacterSet ConvertCSVToAvro ConvertCSVToAvro ConvertCSVToAvro ConvertCSVToAvro ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToAvro ConvertJSONToSQL ConvertJSONToSQL ConvertJSONToSQL ConvertJSONToSQL ConvertRecord ConvertRecord ConvertRecord ConvertRecord CreateHadoopSequenceFile CreateHadoopSequenceFile CreateHadoopSequenceFile CreateHadoopSequenceFile CountText DebugFlow DebugFlow DebugFlow DebugFlow DeleteDynamoDB DeleteDynamoDB DeleteDynamoDB DeleteDynamoDB DeleteGCSObject DeleteGCSObject DeleteGCSObject DeleteGCSObject DeleteHDFS DeleteHDFS DeleteHDFS DeleteHDFS DeleteElasticsearch5 DeleteElasticsearch5 DeleteRethinkDB DeleteRethinkDB DeleteS3Object DeleteS3Object DeleteS3Object DeleteS3Object DeleteMongo DeleteSQS DeleteSQS DeleteSQS DeleteSQS DetectDuplicate DetectDuplicate DetectDuplicate DetectDuplicate DistributeLoad DistributeLoad DistributeLoad DistributeLoad DuplicateFlowFile DuplicateFlowFile DuplicateFlowFile DuplicateFlowFile

bdbruin · ‎12-15-2017

Airflow maintainer here. I know th is question is a bit dated, but it still turns up in the searches. Airflow and Nifi both have their strengths and weaknesses. Let me list some of the great things of Airflow that set it apart. 1. Configuration as code. Airflow uses python for the definitions of DAGs (I.e. workflows). This gives you the full power and flexibility of a programming language with a wealth of modules. 2. DAGs are testable and versionable. As they are in code you can integrate your workflow definitions into your CI/CD pipeline. 3. Ease of setup, local development. While Airflow gives you horizontal and vertical scaleability it also allows your developers to test and run locally, all from a single pip install Apache-airflow. This greatly enhances productivity and reproducibility. 4. Real Data sucks Airflow knows that so we have features for retrying and SLAs 5. Changing history. After a year you find out that you need to put a task into a dag, but it needs to run ‘in the past’. Airflow allows you to do backfills giving you the opportunity to rewrite history. And guess what, you more often need it than you think. 6. Great debugability. There are logs for everything, but nicely tied to the unit of work they are doing. Scheduler logs, DAG parsing/professing logs, task logs. Being in python the hurdle is quite low to jump in and do a fix yourself if needed. 7. A wealth of connectors that allow you to run tasks on kubernetes, Docker, spark, hive, presto, Druid, etc etc. 8. A very active community.

cstanca · ‎10-06-2017

Introduction This is a continuation of an article I wrote about 1 year ago: https://community.hortonworks.com/articles/60580/jmeter-setup-for-hive-load-testing-draft.htmlhttps://www.blazemeter.com/blog/windows-authentication-apache-jmeter Steps 1) Enable Kerberos on your cluster Perform all steps specified here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/configuring_amb_hdp_for_kerberos.html and connect successfully to hive service via command line using your user keytab. That implies a valid ticket. 2) Install JMeter See previous article mentioned in Introduction. 3) Set Hive User keytab in jaas.conf JMETER_HOME/bin/jaas.conf Your jaas.conf should look something like this: JMeter { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=false doNotPrompt=true useKeyTab=true keyTab="/etc/security/keytabs/hive.service.keytab" principal="hive/server.example.com@EXAMPLE.COM" debug=true; }; 4) JMeter Setup There are 2 files under /bin folder of the JMeter installation which are used for Kerberos configuration: krb5.conf - file of .ini format which contains Kerberos configuration details jaas.conf - file which holds configuration details of Java Authentication and Authorization service These files aren’t being used by default, so you have to tell JMeter where they are via system properties such as: -Djava.security.krb5.conf=krb5.conf -Djava.security.auth.login.config=jaas.conf Alternatively you can add the next two lines to the system.properties file which is located at the same /bin folder. java.security.krb5.conf=krb5.conf java.security.auth.login.config=jaas.conf I suggest using full paths to files. 5) Manage Issues If you encounter any issues: - enable debug by adding the following to your command: -Dsun.security.krb5.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext - check jmeter.log to see whether all properties are set as expected and map to existent file paths. 6) Turn-off Subject Credentials -Djavax.security.auth.useSubjectCredsOnly=false 7) Example of JMeter Command JVM_ARGS="-Xms1024m -Xmx1024m" bin/jmeter -Dsun.security.krb5.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext -Djava.security.krb5.conf=/path/to/krb5.conf -Djava.security.auth.login.config=/path/to/jaas.conf -n -t t1.jmx -l results -e -o output This could be simplified if you add those two lines mentioned earlier to be added to system.properties.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: How to convert spark dataframes into xml files...

Re: how to configure Kafka so it's listening on tw...

Re: NiFi 1.8 Error: UpdateAttribute is not known t...

Re: 5th attempt at getting an answer to this quest...

Re: Design and Deployment Considerations for High ...

Re: How do I get rid of failed to fetch table info...

Re: Nifi QueryDatabaseTable processor returns 0 if...

Apache NiFi Journey from HDF 3.0 to 3.1 - Part 1

Re: workflow scheduler for ETL

JMeter Kerberos Setup for Hive Load Testing