Member since
01-11-2016
355
Posts
228
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4576 | 06-19-2018 08:52 AM | |
1499 | 06-13-2018 07:54 AM | |
1652 | 06-02-2018 06:27 PM | |
1473 | 05-01-2018 12:28 PM | |
2587 | 04-24-2018 11:38 AM |
04-16-2018
11:29 AM
Something you can do is to trigger a first Notify manually and let the gate do the rest. For instance, you can use a Generateflowfile with a notify for initial triggering and stop it afterward. This works but I don't know if there's a better way to do it. If the idea is to control flows, take a look at control rate processor which can be helpful.
... View more
04-16-2018
10:18 AM
Hi @Laurie McIntosh This is expected. Data Flows in your example are blocked at the Wait processor hence there's no flowfile going through the Putfile and then Notify to unblock the file blocked in the wait. Notify is never triggered here. You need to have your Notify in an independent flow with the triggering logic. Thanks
... View more
03-31-2018
10:14 AM
1 Kudo
Hi @Elisabeta Nenciulescu Are you using NiFi? if yes, each flow has an attribute LineageStartDate defined as the followin lineageStartDate: Any time that a FlowFile is cloned, merged, or split, this results in a "child" FlowFile being created. As those children are then cloned, merged, or split, a chain of ancestors is built. This value represents the date and time at which the oldest ancestor entered the system. Another way to think about this is that this attribute represents the latency of the FlowFile through the system. The value is a number that represents the number of milliseconds since midnight, Jan. 1, 1970 (UTC).
... View more
03-29-2018
06:42 PM
1 Kudo
As you can see in my second screenshot, a template is attached to a process group. This is the scope of the template. In this case, a template is a resource attached to your process group. A process group can not be deleted until all its attached resources are deleted.
... View more
03-29-2018
06:25 PM
2 Kudos
@Vincent van Oudenhoven You need to go to the hamburger menu at the top right of the UI, click on template, and delete the template that you added at this process group level.
... View more
03-29-2018
09:12 AM
Ok so maybe you don't have enough flow files to create a new merged flow file. The decision to merge is based on two things : age of the bin and number of record. Do you have 1000 records going through the merge? if no try to set a short Max Bin Age to force the process to do the merge.
... View more
03-29-2018
09:02 AM
@Vivek Singh when you say "multiple csv are generate" do you mean that no original csv is merged? you have X input flow files to MergeRecord and you get X out put? are they going through the success relation? I can see that you have flow files in the "original,failure", do you get errors ?
... View more
03-29-2018
08:22 AM
1 Kudo
Hi @Vivek Singh Have you tried by setting a blank "Correlation Attribute Name" ? As you can see from the doc, this attribute is used to gather files having the same value in this attribute, so having the same filename which leads to the behavior your are seeing If specified, two FlowFiles will be binned together only if they have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.
... View more
03-26-2018
07:44 PM
Thanks @Veerendra Nath Jasthi for the confirmation. I'll delete my initial answer. Regarding Ambari 2.5.2, it doesn't support HDF 3.0.1.1. Only Ambari 2.5.1 supports HDF 3.0.1.1 as you can see in the screenshot you shared. Thanks
... View more
03-26-2018
07:29 PM
@Veerendra Nath Jasthi I am a little confused. My first answer was for HDF 3.1.1 because you provided this link in your question : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_installing-hdf-on-hdp/content/ch_install-mpack.html Are you trying to install HDF 3.1.1 or HDF 3.0.1.1 ?
... View more
03-25-2018
05:48 PM
1 Kudo
Hi @Chris Dan Please keep in mind that Flow Registry is the beginning of FDLC in NiFi and an important building block on which several services will come. If you want to export a flow from the registry to backup it somewhere you can either use: NiFi Toolkit: part of NiFi project. https://issues.apache.org/jira/browse/NIFI-4839 Nipyapi: a community feature. As Tim suggested, you case this tool also and follow his excellent article. Today, the flow registry has a file storage provider. Flow registry supports plugins and new storage providers such as Git will follow in my opinion. Thanks
... View more
03-24-2018
10:05 PM
1 Kudo
Hello @Joe Harvy The Atlas reporting task requires building NiFi with a specific profil. If you downloaded NiFi from Apache site, it's normal that the reporting task is not available. You have 2 options:
Use NiFi from HDF that already has this reporting task Download NiFi code from Apache and rebuild with the Atlas profil using the following command mvn clean install -Pinclude-atlas -DskipTests
... View more
03-23-2018
10:53 PM
@Jake Simmonds You can Lookup processor to do the enrichment. Check this articles to see how to use the different options : https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html
... View more
03-23-2018
10:48 PM
@Kok Ching Hoo If your file is a csv the best thing is to use PutElasticsearchHttpRecord https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-nar/1.5.0/org.apache.nifi.processors.elasticsearch.PutElasticsearchHttpRecord/index.html This is a bulk operation so no need to split the file
... View more
03-23-2018
10:41 PM
@Haitam Dadsi You can use CDC tools such as Attunity Replicate to push event to Kafka, then consume from Kafka with NiFi and update a realtime dashboard (Solr for instance)
... View more
03-23-2018
10:25 PM
1 Kudo
Hi @Ankur Gupta Using the same Ambari for HDF and HDP have some caveat with the latest version. You cannot install SAM and Schema Registry for HDF 3.1 on an HDP 2.6.4 cluster, and you cannot upgrade your Storm and Kafka versions if they exist on an HDP cluster. This is a temporary limitation with HDF 3.1 and will be resolved with HDP 3.0 and HDF 3.2. This has been described in the doc https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_planning-your-deployment/content/ch_deployment-scenarios.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_installing-hdf-on-hdp/content/ch_add-hdf-to-hdp.html To have Kafka 1.x in HDP you need to wait for HDP 3.0
... View more
03-23-2018
06:00 PM
Hi @Jayendra Patil Any feedback on your issue ?
... View more
03-23-2018
04:25 PM
Hi @Raja Chowdary You can add an UpdateAttribute to split the attribute into two attributes like this. Then you can use myatt1 or myatt2 separately.
... View more
03-23-2018
01:36 PM
Hi @Mark Lin What you can do also to manage duplicates is to set a ControlRate with an expire duration for the connection before the ControlRate. This way, you let only when Flowfile goes through each X amount of time, and other FlowFile get stuck in the connexion and deleted automatically. However, to this to work, you should separate your messages before and not route all event to the same ControlRate otherwise you will have 1 notification whatever the issues are. I hope this helps Thanks
... View more
03-23-2018
07:52 AM
2 Kudos
Hi @Pramod N Several NiFi examples are available https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates One of them is what you are looking for : https://cwiki.apache.org/confluence/download/attachments/57904847/Hello_NiFi_Web_Service.xml?version=1&modificationDate=1449369797000&api=v2
... View more
03-22-2018
09:04 PM
Hi @Raja Chowdary Have you tried UpdateRecord ? https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.UpdateRecord/index.html Look to the additional details for exmaples https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.UpdateRecord/additionalDetails.html
... View more
03-22-2018
08:47 PM
Hi @Mark Lin When I develop, I use a funnel to see what's happening in my flow. Also, you can use UpdateAttribute with adding any attribute. I think these two don't have much impact on resources usage. Funnel: A funnel is a NiFi component that is used to combine the data from several Connections into a single Connection.
... View more
03-22-2018
01:59 PM
Hi @Krishna Srinivas Atlas 0.8 has a data model for HBase which define several components (HBase_namespace, HBase_table, etc). However, there's no automated scrolling of HBase that scroll over all tables and save data in Atlas. You can write a script that do this and update ATlas through it event or REST API. You have some examples here : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-governance/content/atlas_messaging_publishing_entity_changes.html#atlas_messaging_publishing_entity_create You can also manually create HBase entity through Atlas UI of you don't have lot of tables
... View more
03-22-2018
01:47 PM
Hi @rajdip chaudhuri Have you considered NiFi? you have out of the box processors to list/fetch files and to write to HDFS. You can also use a NiFi cluster if you want to distribute the load on several nodes.
... View more
03-22-2018
07:23 AM
1 Kudo
Hi @rutuja jagtap Can you please give us more details on contexte and error? HDF doesn't have Spark. Are you trying to install HDP and HDF with the same Ambari?
... View more
03-20-2018
11:38 AM
Can you show your configuration for the different controllers and readers/writters?
... View more
03-20-2018
10:01 AM
1 Kudo
Hi @Jayendra Patil Setting the optimal value of max thread count depends on your use cases and what processors you are using (CPU intensive like convert processor or IO intensive like the put/get processors). I've seen better usage of my hardware by having thread count around 2x number of cores. I've seen some cluster with 3x number of cores. I think you can go beyond 50 in your case and monitor the behavior. The best thing to do is to proceed in an incremental manner. I hope this helps. Abdelkrim
... View more
03-20-2018
09:33 AM
1 Kudo
Hi @dhieru singh AmbariReportingTask can be used to send metric to AMS. You can see the GC metrics that it can send to AMS : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-ambari-nar/1.5.0/org.apache.nifi.reporting.ambari.AmbariReportingTask/additionalDetails.html In the default Grafana dashboard, the information is not used. But you can create a dashboard to show jvm.gc.runs.G1 Young Generation for example. Below a simple dashboard that show this information:
... View more
03-17-2018
09:11 PM
Hi @roy p Below is a sample Blueprint for all HDF services including Registries {
"Blueprints": {
"stack_name": "HDF",
"stack_version": "3.1"
},
"host_groups": [
{
"name": "host-group-3",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
}
]
},
{
"name": "host-group-2",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
},
{
"name": "host-group-4",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "DRPC_SERVER"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "KAFKA_BROKER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIMBUS"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "REGISTRY_SERVER"
},
{
"name": "STORM_UI_SERVER"
}
]
},
{
"name": "host-group-1",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIFI_CA"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
}
],
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"ams-hbase-env": {
"hbase_master_heapsize": "512",
"hbase_regionserver_heapsize": "768",
"hbase_master_xmn_size": "192"
}
},
{
"nifi-logsearch-conf": {}
},
{
"storm-site": {
"topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\\\.population$\", \"__sendqueue\\\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\\\.usedBytes$\", \"memory/nonHeap\\\\.usedBytes$\", \"GC/.+\\\\.count$\", \"GC/.+\\\\.timeMs$\"]}]",
"metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter",
"storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]"
}
},
{
"registry-common": {
"registry.storage.connector.connectURI": "jdbc:mysql://myhost.hdf.com:3306/registry",
"registry.storage.type": "mysql",
"jar.storage.type": "local",
"registry.storage.connector.password": "StrongPassword"
}
},
{
"registry-logsearch-conf": {}
},
{
"streamline-common": {
"streamline.storage.type": "mysql",
"jar.storage.type": "local",
"streamline.storage.connector.connectURI": "jdbc:mysql://myhost.hdf.com:3306/streamline",
"streamline.dashboard.url": "http://localhost:9089",
"registry.url": "http://localhost:7788/api/v1",
"streamline.storage.connector.password": "StrongPassword"
}
},
{
"ams-hbase-site": {
"hbase.regionserver.global.memstore.upperLimit": "0.35",
"hbase.regionserver.global.memstore.lowerLimit": "0.3",
"hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
"hbase.hregion.memstore.flush.size": "134217728",
"hfile.block.cache.size": "0.3",
"hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
"hbase.cluster.distributed": "false",
"phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
"hbase.zookeeper.property.clientPort": "61181"
}
},
{
"ams-env": {
"metrics_collector_heapsize": "512"
}
},
{
"kafka-log4j": {}
},
{
"ams-site": {
"timeline.metrics.service.webapp.address": "localhost:6188",
"timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregator.ttl": "86400",
"timeline.metrics.service.handler.thread.count": "20",
"timeline.metrics.service.watcher.disabled": "false"
}
},
{
"kafka-broker": {
"kafka.metrics.reporters": "org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter"
}
},
{
"ams-grafana-env": {
"metrics_grafana_password": "StrongPassword"
}
},
{
"streamline-logsearch-conf": {}
}
]
}
... View more
03-17-2018
06:45 PM
Hi @Karl Fredrickson If you have Knox you can use it to encapsulate Kerberos authentication and use username/password. Thanks
... View more