Member since
09-15-2015
116
Posts
141
Kudos Received
40
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1781 | 02-05-2018 04:53 PM | |
2297 | 10-16-2017 09:46 AM | |
2012 | 07-04-2017 05:52 PM | |
3000 | 04-17-2017 06:44 PM | |
2195 | 12-30-2016 11:32 AM |
01-02-2016
12:51 PM
1 Kudo
This sounds like it may be a build problem. https://github.com/simonellistonball/spark-samples... has a working sample with sbt scripts to build against the Hortonworks repository, which has been testing on HDP 2.3.2. Note that the Kafka consumer API has changed a bit recently, so it's important to be aware of versions in Kafka. Also, I note that you're running in local model, we would recommend that you only use local mode for testing, and that you use --master yarn-client for running on a proper cluster.
... View more
12-03-2015
01:42 PM
1 Kudo
To do this you build a pipeline with the GetFiles processor, this can pick up files, and delete / move them afterwards (just as the spooldir source does). For the batching functionality you can use MergeContent, or other batching mechanisms on downstream Put processors.
... View more
11-08-2015
02:28 PM
That's a start, however PMML support in Spark is a way off being complete. In particular no support for transformations yet. Spark would be a great platform for this, though is a very heavy platform to spin up for simple scoring in a NiFi.
... View more
11-08-2015
01:07 PM
2 Kudos
JPMML is a great library for evaluating PMML models, including things like feature transformation and a good range of model support. However, it's license is AGPL3, which makes it hard to include in Apache projects. I'm looking to evaluate PMML models as part of a custom NIFI processor, so need an evaluator library with and Apache license.
... View more
Labels:
- Labels:
-
Apache NiFi
11-04-2015
07:33 PM
3 Kudos
The other thing to note is that to use Spark Packages, you also need z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
in the dep paragraph. There is currently a bug in the Zeppelin loader which prevents bringing in dependencies here, which we are working on, so for example in spark-csv, you may also have to manually app opencsv dependencies explicitly as well.
... View more
11-04-2015
11:15 AM
4 Kudos
There are a range of common NLP systems that work well on the platform. OpenNLP is a java native library which integrates well with, for example map reduce, and of course NLTK being a python system works well with pyspark. There are also native spark elements which are connected to NLP tasks: Latent Dirichlet Allocation for topic detection is one example. Of course the NLTK components also work well with Hive to do things like Tokenisation, and Part of Speech tagging. Stanford CoreNLP also provides a good toolkit of NLP functions. There is also a spark-package to integrate this with SparkML pipelines. Solr provides a number of useful tools that apply in the NLP space as well, such as stemming, synonym handling etc as part of its indexing and querying, so provides some building blocks for simple NLP analysis. There are also a number of commercial and partner solutions which handle NLP tasks. We are also looking to build tools for Entity Resolution on Spark, which will add to this.
... View more
10-24-2015
11:03 PM
Tried this on a 2.3.2 cluster (brand new build) with 1.4.1, and had the same problem with Zeppelin and Magellan. Seems like Zeppelin is doing something to the context.
... View more
10-23-2015
03:39 PM
That will work for HDP 2.2, but is not the way to do it on 2.3. In 2.3 we have a proper RPM based install. This stack has not yet been updated to reflect the new deployment mechanism.
... View more
10-20-2015
11:48 AM
Mirror Maker works by consuming a source Kafka and producing into a destination Kafka. If I am producing messages with compression enabled into the source Kafka, is there a way to consume them in Mirror Maker without decompression, ie, just grab the raw compressed bits, and pass those on the wire to the target Kafka, or will the Consumer force decompression and recompression at the other end (meaning uncompressed data goes over the wire)?
... View more
Labels:
- Labels:
-
Apache Kafka
10-08-2015
06:58 PM
2 Kudos
According to https://azure.microsoft.com/en-gb/documentation/articles/virtual-machines-a8-a9-a10-a11-specs/ The A8-9 instances support an RDMA 32MBs backplane for node to node communication on SLES. Is the SLES image the preferred / only image which support this networking layer, are there RedHat flavour alternatives. Would access to the 32MBs backplane through a multi-home topology make a significant difference to intra-cluster communication vs relatively small CPU scale in A8-9? Simon
... View more
- « Previous
- Next »