Member since
06-06-2016
38
Posts
14
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1636 | 11-14-2016 09:08 AM | |
796 | 10-31-2016 09:20 AM |
05-31-2017
12:53 PM
1 Kudo
Hey @Joshua Adeleke, I haven't experienced this directly but looks like it could be a bug or rendering issue. Is it possible to check with another browser?
... View more
05-24-2017
01:56 PM
I want to set up Kafka like so:
SASL_SSL://locahost:9092,SSL://localhost,9093
Where the keystore's and truststore's are different for each endpoint. Is this possible at the moment?
... View more
Labels:
- Labels:
-
Apache Kafka
05-15-2017
03:07 PM
I'm not moving all directories to new places, but consolidating 8 locations to 3 - I wasn't sure how all the metadata and splits would copy over given some of the filenames are the same in each directory
... View more
05-15-2017
09:15 AM
Hi @frank chen
There is no space before -O, which could be the issue - this will overwrite old runs. However it looks like it should work. If I copy and paste the command I get the following:
scarroll@LAPTOP:~/tmp$ ~/nifi-toolkit-1.0.0/bin/tls-toolkit.sh standalone -n localhost -C 'CN=scarroll,OU=NIFI' -O -o security_output
<output omitted>
scarroll@LAPTOP:~/tmp$ tree security_output/
security_output/
├── CN=scarroll_OU=NIFI-O.p12
├── CN=scarroll_OU=NIFI-O.password
├── CN=scarroll_OU=NIFI.p12
├── CN=scarroll_OU=NIFI.password
├── localhost
│ ├── keystore.jks
│ ├── nifi.properties
│ └── truststore.jks
├── nifi-cert.pem
└── nifi-key.key
What version are you using?
... View more
05-12-2017
01:21 PM
Thanks! Will try. Its still in the early stages so load is not a huge concern right now
... View more
05-12-2017
01:02 PM
1 Kudo
I'm decommissioning some storage heavy nodes that are taking a really long time (days) to move all the blocks over. There doesn't seem to be much out there showing how to increase the speed (http://stackoverflow.com/questions/17789196/hadoop-node-taking-a-long-time-to-decommission) but there must be something. At this rate it will take a weeks to decommission the required nodes
... View more
Labels:
- Labels:
-
Apache Hadoop
05-11-2017
11:05 AM
Thanks for the quick response! So a work around would be something like? (in pseudo code) : correct_log_location = a
for each node:
manually set log.dirs <- correct_log_location
restart kafka
wait till partitions have been migrated
Set correct config in Ambari and restart through Ambari
... View more
05-11-2017
09:44 AM
If I want to change all the locations of log.dirs, can I just make this change in Ambari and restart kafka? There are a couple of articles saying that data will be replicated if I just delete a folder however it implies that the migration has to be done on a per machine basis : e.g.: https://community.hortonworks.com/articles/59715/migrating-kafka-partitions-data-to-new-data-folder.html. If Ambari applies this change to all machines at once, can it still migrate the data?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Kafka
04-26-2017
03:45 PM
The automated SSL setup (either with Ambari or the tls-toolkit) is awesome, however I can only get it to work with self-signed certs. Is there anyway to get it to work with a company (or external) CA?
... View more
Labels:
04-03-2017
12:18 PM
PutSlack was such a good addition! Be careful ingesting nifi-app.log though! I've tried this before and it quickly spirals out of control as each read of the log also generates log entries which then get picked up and generate more log entries.
... View more
04-02-2017
05:21 PM
Legend! This is perfect. Even gone the extra step of submitting the improvement ticket! Can't upvote this enough
... View more
03-31-2017
11:38 AM
@Matt Burgess I just spoke with @Emmanouil Petsanis. To clarify, we have Z number of XSLTs and want to apply all x to each incoming xml file independently. So a single incoming xml file would lead to Z outgoing JSON files like [xslt1(xml), xslt2(xml),..., xsltZ(xml)]. We can hard code this with just having Z number of TransfromXML processors: But its kind of tedious and we have multiple flows with multiple XSLT's. Is there a way to either: Have transformXML take a list of xslt's? Use a list file on the xslt dir and then somehow generate Z copies of the xml with a different version of the XSLT. (I know this isn't really how NiFi is designed to work) Execute script is the best I could come up with but thought there might be a native way Thanks, Seb
... View more
01-25-2017
01:43 PM
1 Kudo
# Ruby Scripting in NiFi Today I was asked about an issue that I didn't know how to
solve using NiFi. On the surface it sounded simple; just map an attribute to
another value. The attribute to map was 'port' and based on the port number
just add an attribute to more easily identify the system downstream. E.g. for
port 10003; syslog:local, for 10004; syslog:db1, etc. After a bit of digging I
found a few different options to solve this. ## Many UpdateAttribute Processors The first is to create a new UpdateAttribute processor for
each incoming stream. This labels (places an attribute) on all files that come
in from that listener. It looked like this:  This looks a bit confusing and tedious but is very precise
and arguably easier to read, especially when we label the processors. It also
has the added advantage of not having to change the port in more than one
place. If for example, the local logs coming in over port 10002 need to change
to port 10003, then we just make that change in the ListenSyslog processor and
the rest remains unchanged. ## One UpdateAttribute Using the Advanced The advanced option allowed me to keep all the configuration
in one place, easily mapping port numbers to tags. The two disadvantages I ran
into were: 1. A fairly tedious process to get each mapping. It
involved: * Create new rule * Add name * Search for existing
rule to import * Change port number
and associated label 2. Must now change the port in two different places if it
were to change I would look like:  ## ExecuteScript Processor This again allows me to keep all the configuration in one
place and makes it much easier to make changes. I created a processor that stores the mappings in a hash and
adds the correct attribute appropriately. It looks like so:  From the UI perspective it looks very similar to the single
UpdateAttribute solution. This requires the addition of the script: {% highlight ruby %} map = { 10 =>
"system1", 9 =>
"system2", 8 =>
"system3", } map.default = "unknown" flowFile = session.get() return unless flowFile label = map[flowFile.getAttribute("port")] flowFile = session.putAttribute(flowFile,
"system", label) session.transfer(flowFile, REL_SUCCESS) session.commit() {% endhighlight %} It is more complex by adding the need to understand a
scripting language and also doesn't remove the requirement of changing the port
number in more than one place. The script can add more complexity if it becomes
necessary to reference as a file rather than using the 'Script Body' option in
the processor. The main advantage is that it makes it easier to change the
mapping - just copy and paste one of the lines of the hash and make changes all
in one place. Given NiFi's goal of minimising the need for data flow
managers to know how to code, it's unlikely this is the best approach. # Conclusion The first option is quite foreign to programmers who feel
like it isn't generic. This is understandable given that it does feel a bit
like copy and paste. I would say it is the most NiFi way of achieving the
mapping as it is the solution which is most self-describing and resistant to
change.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
- Script
Labels:
01-23-2017
03:09 PM
Hey @Rohit Ravishankar, thanks for moving this to a comment! Short answer: Not Really Longer answer: It's not really what NiFi does It still sounds like you want NiFi to process a batch. Something like users put files into a directory and then after some external trigger start the job and report any failures. Is there a reason why NiFi cannot just continually listen for input? Does it have to sync up with other 'jobs' further up or down stream? If so, a better use would be to have the scheduler (e.g. control m) 'trigger' NiFi just by moving the files into the pickup directory for example. Apologies, I can't be more specific without more details on your use case
... View more
01-23-2017
09:46 AM
1 Kudo
Hey @Rohit Ravishankar, could you move this into a comment on my answer? Otherwise it will be difficult to follow that it is a reply to what I posted above
... View more
01-22-2017
05:42 PM
1 Kudo
Hi @Rohit Ravishankar, Could you elaborate more on what you are trying to achieve with NiFi? NiFi doesn't really have a concept of a 'job' or a 'batch' which you would trigger. The usual workflow with NiFi is for it to wait for data to appear and then process that data. Waiting for data could be listening for syslog messages, or waiting for files to appear in a directory or many other options. Processing data is done using a flow where each processor performs a different step. Each processor has a concept of input and outputs (usually success and failure but not necessarily - it depends on the processor). Even if files are routed to a 'failure' relationship it doesn't necessarily mean the job has failed. For example, lets presume there is a file of JSON records I want to convert to AVRO records; if only one record fails conversion just that record will be routed to failure. This could then be routed to another converter with a different (maybe more general schema). Does this helps!
... View more
01-18-2017
03:57 PM
Hi @Andy Liang, In addition to @Pierre Villard's answer. There are three aspects of data processing joined up here: Streaming - Simple Event Processing This is what NiFi is very good at. All the information needed to do the processing is contained in the event. For example: Log processing: If the log contains an error then separate from the flow and send an email alert Transformation: Our legacy system uses XML but we want to use AVRO. Convert each XML event to AVRO Streaming - Complex Event Processing This is what Storm is good at covered by Pierre. Batch This is where MR/Hive/Spark (not spark streaming) come in. Land on HDFS and then the data can be processed and/or explored.
... View more
01-17-2017
04:40 PM
Thanks @Matt, thats a big help! It aligns with my understanding although I didn't know about the attributes. I currently have: 3.91GB of heap space allocated with 97% usage 6k/170MB flow files in just two queues No files seem to have large attributes (not checked all - just sample) 0 active threads With respect to point 4, this would only come into affect when the specific processor is running correct? If all relevant split/merge processors were stopped then this shouldn't have an effect. I can only imagine its a leak somewhere, I can't see any other reason why the heap would have grown to that size. If I was to turn off all processors, empty all queues and the memory still didn't drop, would this indicate a leak?
... View more
01-17-2017
12:43 PM
1 Kudo
My Dev NiFi instance is stuck (no active threads - nothing happening). I can see two errors in the log: Cannot update repository because all partitions are unusable at this time. Writing to the repository would cause corruption. This most often happens as a result of the repository running out of disk space or the JMV running out of memory. and Unable to merge /dfs1/nifi/data/provenance_repository/journals/4289216.journal.15 with other Journal Files due to java.io.FileNotFoundException: Unable to locate file /dfs1/nifi/data/provenance_repository/journals/4289216.journal.15
As suggested above, I looked at the disks and memory. The disks are fine (>30% free) but it looks like the JVM is running out of memory as the heap usage is currently (and consistently) 97%+. Also, machine still has 8g free. Are there legitimate reasons that NiFi might run out of memory or does this look more like a memory leak? There's lots of custom processors running around but I don't have access to the code. Are there resources about java memory management in a NiFi specific context? Just trying to narrow down what might have caused this. NiFi version is 0.6
... View more
Labels:
- Labels:
-
Apache NiFi
01-12-2017
12:35 AM
I have a custom NiFi processor that worked great up until me trying use a distributedMapCache. I tried to include it like: import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
...
public class ListBox extends AbstractListProcessor {
...
public static final PropertyDescriptor DISTRIBUTED_CACHE_SERVICE = new PropertyDescriptor.Builder()
.name("Distributed Cache Service")
.description("Specifies the Controller Service that should be used to maintain state about what has been pulled from HDFS so that if a new node "
+ "begins pulling data, it won't duplicate all of the work that has been done.")
.required(false)
.identifiesControllerService(DistributedMapCacheClient.class)
.build();
But then when I mvn clean install and copy the nar over I get the following error: java.util.ServiceConfigurationError: org.apache.nifi.processor.Processor: Provider org.hortonworks.processors.boxconnector.ListBox could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232) ~[na:1.8.0_91]
at java.util.ServiceLoader.access$100(ServiceLoader.java:185) ~[na:1.8.0_91]
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) ~[na:1.8.0_91]
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) ~[na:1.8.0_91]
at java.util.ServiceLoader$1.next(ServiceLoader.java:480) ~[na:1.8.0_91]
at org.apache.nifi.nar.ExtensionManager.loadExtensions(ExtensionManager.java:116) ~[nifi-nar-utils-1.1.0.jar:1.1.0]
at org.apache.nifi.nar.ExtensionManager.discoverExtensions(ExtensionManager.java:97) ~[nifi-nar-utils-1.1.0.jar:1.1.0]
at org.apache.nifi.NiFi.<init>(NiFi.java:139) ~[nifi-runtime-1.1.0.jar:1.1.0]
at org.apache.nifi.NiFi.main(NiFi.java:262) ~[nifi-runtime-1.1.0.jar:1.1.0]
Caused by: java.lang.NoClassDefFoundError: org/apache/nifi/distributed/cache/client/DistributedMapCacheClient
I also have the dependancy configured in my pom.xml file: <dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-distributed-cache-client-service-api</artifactId>
</dependency> If I copy over the distributed map cache nar before bundling it works fine. Is there somewhere else I have to list the dependancy to get it bundled into the nar?
... View more
Labels:
- Labels:
-
Apache NiFi
01-03-2017
12:37 PM
@Pierre Villard Thanks! I was hoping I had missed something in the API docs
... View more
01-03-2017
11:30 AM
Trying to work out the message consumption rate per time slice is fairly difficult with the current format where the stats are presented in a sliding 5 min window but updated every minute. Is it possible to get stats on a per minute or per second basis?
... View more
Labels:
- Labels:
-
Apache NiFi
11-22-2016
12:13 PM
I have ConsumeJMS reading form a Tibco Queue at NiFi version 0.6.1. NIFI-1628 only refers to the SSL integration. However use the latest version if that is possible as there are improvements that are worth having.
... View more
11-14-2016
09:08 AM
2 Kudos
Hi @m mary, Looks like you're out of disk space. Can you check your disks have space? Regards, Seb
... View more
10-31-2016
09:20 AM
I am fairly certain this was caused by the disks being remounted in different positions on restart. http://sebastiancarroll.github.io/2016/10/24/azure-for-hdp-deployments.html
... View more
09-29-2016
01:56 PM
Running a cluster on Azure where twice now something has failed to start due to the ClusterID being out of sync. The first time it was the DataNodes failed to start as the ClusterID was different to the NN. Then, after enabling HA, the Standby NN and all DN's started, however the Active NN failed (which then caused active to failover to standby). After manually syncing the ID's the cluster started fine in both cases. We didn't manually format the NN either time so am seriously confused as to how this could have happened. What possible ways can the Cluster ID be changed?
... View more
Labels:
- Labels:
-
Apache Hadoop
09-25-2016
12:02 AM
4 Kudos
Step 1: Download Firstly download and unzip the package (I used: nifi-toolkit-1.0.0-bin.tar.gz) Inside there are a number of folders and files but I am mostly interested in bin/tls-toolkit.sh. This can be run in either standalone or server/client mode: Standalone is for a one-off generation of certificates and keys Client/Server allows you to run the tls toolkit as a server to sign Certificate Signing Requests from clients Step 2: Generate Keys and Certificates Initially I ran the help: ./bin/tls-toolkit.sh standalone -h Then I created three bundles for each of my servers: ./bin/tls-toolkit.sh standalone -n 'localhost(3)' -C 'CN=scarroll,OU=NIFI' -O -o ../security_output Which generated my truststore, keystore and nifi.properties file for my three hosts and my client certificates as well. Here is the output: $ tree security_output/
security_output/
├── CN=scarroll_OU=NIFI.p12
├── CN=scarroll_OU=NIFI.password
├── localhost
│ ├── keystore.jks
│ ├── nifi.properties
│ └── truststore.jks
├── localhost_2
│ ├── keystore.jks
│ ├── nifi.properties
│ └── truststore.jks
├── localhost_3
│ ├── keystore.jks
│ ├── nifi.properties
│ └── truststore.jks
├── nifi-cert.pem
└── nifi-key.key Then I copied the truststore, keystore and nifi.properties files out to each host. I was using vagrant so I just ran cp /vagrant/security_output/localhost/* /opt/nifi-1.0.0-BETA/conf/ WARNING: This will overwrite your old nifi.properties file which is not a problem for a clean system like mine. To maintain your old configuration you can manually copy the relevant security settings over or pass your existing nifi.properties file into the tls-toolkit which will modify the correct values. Issue #1: HTTPS listening only on localhost Initially NiFi wouldn't start on my external interface, but it is easy to resolve. Whatever value is in -n switch for the toolkit will be set to the nifi.web.https.host option. I just set nifi.web.https.host to empty and restarted NiFi. This will mean NiFi is now listening on all interfaces and may not be the most secure way of running NiFI however is fine for a demo Issue #2: ERR_CONNECTION_CLOSED If you try to access the UI now you should see an error: I was expecting to see a connection but get a permission denied/forbidden error so this stumped me for a while but seems to be standard behaviour for NiFi. The solution is to offer up some way to authenticate yourself. Step 3: Importing certificates to Chrome Since I am not integrating NiFi with any user management system I'll need to import the client certificates into Chrome to get access to the NiFi UI. I'm on a mac and use the Keychain access program for this so can open this directly. Alternatively, settings → Manage Certificates will get you to the same place. Once there pick a keychain that is unlocked (Initially I used System which prompted for a password every time I connected) so instead I created a new keychain called nifi-certs. Next import the CN=scarroll_OU=NIFI.p12 file and enter the password in the CN=scarroll_OU=NIFI.password file. Unfortunately on a mac this text box cannot be pasted into, so the super long secure password that the toolkit generated needs to be typed by hand (or some other workaround). Now you can restart chrome and you will be asked to chose a certificate to present: Issue #3: Forbidden Now even through NiFI knows who you are, you shouldn't be allowed to do anything. If open, NiFi gives power to all users by default however when secured, NiFi gives no permissions as default. The path to allow yourself access to the UI varies depending on whether this instance is a new instance or an upgrade. For a new instance you need to add yourself (or another admin) as the Initial Admin Identity and for an upgrade you can use the legacy existing authorized-users.xml file. Both settings can be found in the authorizers.xml. Since this is a new instance I added myself as an initial admin and restarted NiFi. Restarting is important so the Initial Admin credentials gets populated to the users.xml and authorizations.xml files. Now I can log in to my newly secured instance.
... View more
- Find more articles tagged with:
- How-ToTutorial
- NiFi
- Security
Labels:
07-28-2016
10:06 AM
There were (and still are) a number of methods, including: Throw data away Down Sample - Decide what you think is important up front and throw the rest away Age Off - Periodically delete old data Warehouse - write old data to tapes and delete off the disks Buy specialised hardware - Very large, expensive dedicated database machines which don't scale Don't use a traditional database - keep everything in files and distribute manually to a cluster Traditional database horizontal scaling - never done it but heard it's difficult Apparently, Facebook still uses MySQL "with a complex sharding and caching strategy" - Gigacom
... View more
07-28-2016
09:18 AM
Hi @Himanshu Rawat, Welcome to HCC! Whether we class data as structured or unstructured is
related to its degree of organization. For example, consider the content and
metadata of email. The metadata associated with the emails I have sent would be
structured. It needs to be very organized so the email servers know the sender,
recipient(s), CC, BCC, time sent/received, etc. For example, the time received can
easily be compared to the time on other emails. I could easily sort my emails
based on time and find the most recent or something from a particular date. The content or body on the other hand would be considered
unstructured. I could put anything in there. How would I organize emails if I
only considered the content? Number of words? Spaces? Positivity of the post? What
would it mean? Hope
that helps
... View more