Member since 
    
	
		
		
		09-15-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                116
            
            
                Posts
            
        
                141
            
            
                Kudos Received
            
        
                40
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2231 | 02-05-2018 04:53 PM | |
| 3085 | 10-16-2017 09:46 AM | |
| 2480 | 07-04-2017 05:52 PM | |
| 3874 | 04-17-2017 06:44 PM | |
| 3095 | 12-30-2016 11:32 AM | 
			
    
	
		
		
		01-02-2016
	
		
		12:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This sounds like it may be a build problem.   https://github.com/simonellistonball/spark-samples...  has a working sample with sbt scripts to build against the Hortonworks repository, which has been testing on HDP 2.3.2.   Note that the Kafka consumer API has changed a bit recently, so it's important to be aware of versions in Kafka.  Also, I note that you're running in local model, we would recommend that you only use local mode for testing, and that you use --master yarn-client for running on a proper cluster.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		01:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 To do this you build a pipeline with the GetFiles processor, this can pick up files, and delete / move them afterwards (just as the spooldir source does). For the batching functionality you can use MergeContent, or other batching mechanisms on downstream Put processors. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-08-2015
	
		
		02:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That's a start, however PMML support in Spark is a way off being complete. In particular no support for transformations yet. Spark would be a great platform for this, though is a very heavy platform to spin up for simple scoring in a NiFi. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-08-2015
	
		
		01:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 JPMML is a great library for evaluating PMML models, including things like feature transformation and a good range of model support. However, it's license is AGPL3, which makes it hard to include in Apache projects.   I'm looking to evaluate PMML models as part of a custom NIFI processor, so need an evaluator library with and Apache license. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 
			
    
	
		
		
		11-04-2015
	
		
		07:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 The other thing to note is that to use Spark Packages, you also need   z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
  in the dep paragraph. There is currently a bug in the Zeppelin loader which prevents bringing in dependencies here, which we are working on, so for example in spark-csv, you may also have to manually app opencsv dependencies explicitly as well.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-04-2015
	
		
		11:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 There are a range of common NLP systems that work well on the platform. OpenNLP is a java native library which integrates well with, for example map reduce, and of course NLTK being a python system works well with pyspark. There are also native spark elements which are connected to NLP tasks: Latent Dirichlet Allocation for topic detection is one example. Of course the NLTK components also work well with Hive to do things like Tokenisation, and Part of Speech tagging.  Stanford CoreNLP also provides a good toolkit of NLP functions. There is also a spark-package to integrate this with SparkML pipelines.   Solr provides a number of useful tools that apply in the NLP space as well, such as stemming, synonym handling etc as part of its indexing and querying, so provides some building blocks for simple NLP analysis.  There are also a number of commercial and partner solutions which handle NLP tasks.   We are also looking to build tools for Entity Resolution on Spark, which will add to this.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-24-2015
	
		
		11:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Tried this on a 2.3.2 cluster (brand new build) with 1.4.1, and had the same problem with Zeppelin and Magellan. Seems like Zeppelin is doing something to the context. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-23-2015
	
		
		03:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That will work for HDP 2.2, but is not the way to do it on 2.3. In 2.3 we have a proper RPM based install. This stack has not yet been updated to reflect the new deployment mechanism. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-20-2015
	
		
		11:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Mirror Maker works by consuming a source Kafka and producing into a destination Kafka.   If I am producing messages with compression enabled into the source Kafka, is there a way to consume them in Mirror Maker without decompression, ie, just grab the raw compressed bits, and pass those on the wire to the target Kafka, or will the Consumer force decompression and recompression at the other end (meaning uncompressed data goes over the wire)? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Kafka
 
			
    
	
		
		
		10-08-2015
	
		
		06:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 According to https://azure.microsoft.com/en-gb/documentation/articles/virtual-machines-a8-a9-a10-a11-specs/ The A8-9 instances support an RDMA 32MBs backplane for node to node communication on SLES.   Is the SLES image the preferred / only image which support this networking layer, are there RedHat flavour alternatives.  Would access to the 32MBs backplane through a multi-home topology make a significant difference to intra-cluster communication vs relatively small CPU scale in A8-9?   Simon 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - Next »