Member since 
    
	
		
		
		03-24-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                184
            
            
                Posts
            
        
                239
            
            
                Kudos Received
            
        
                39
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3239 | 10-21-2017 08:24 PM | |
| 1962 | 09-24-2017 04:06 AM | |
| 6477 | 05-15-2017 08:44 PM | |
| 2205 | 01-25-2017 09:20 PM | |
| 7277 | 01-22-2017 11:51 PM | 
			
    
	
		
		
		03-28-2016
	
		
		06:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As the answer suggests, just use the built-in JMS spout. The Storm project page has details on how to set it up.... 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		06:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It's all about maven dependencies.   <dependencies>
    <dependency>
		<groupId>org.apache.storm</groupId>
		<artifactId>storm-core</artifactId>
		<version>0.10.0</version>
		<scope>provided</scope>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>
    </dependency>
    <dependency>
		<groupId>org.apache.zookeeper</groupId>
		<artifactId>zookeeper</artifactId>
		<version>3.4.6</version>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>
	</dependency>
	<dependency>
		<groupId>org.apache.storm</groupId>
		<artifactId>storm-hdfs</artifactId>
		<version>0.10.0</version>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>
	</dependency>
	<dependency>
		<groupId>org.apache.storm</groupId>
		<artifactId>storm-kafka</artifactId>
		<version>0.10.0</version>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>
	</dependency>
	<dependency>
	<groupId>org.apache.kafka</groupId>
	<artifactId>kafka_2.10</artifactId>
	<version>0.8.2.2</version>
	<exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
	</dependency>
	<dependency>
		<groupId>org.apache.storm</groupId>
		<artifactId>storm-hbase</artifactId>
		<version>0.10.0</version>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>
	</dependency>
    <dependency>
		<groupId>org.apache.hbase</groupId>
		<artifactId>hbase-client</artifactId>
		<version>1.1.1</version>
		<exclusions>
			<exclusion> 
    			<groupId>org.slf4j</groupId>
    			<artifactId>slf4j-log4j12</artifactId>
  			</exclusion>
  			<exclusion> 
    			<groupId>log4j</groupId>
    			<artifactId>log4j</artifactId>
  			</exclusion>
  		</exclusions>	
	</dependency>
</dependencies>
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		05:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Has anyone tried to create a local spark context within a Storm bolt to load a saved Spark model instead of creating a model using exported weights or PMML? It seems that there is a Log4J dependency conflict between Storm and Spark. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		03-28-2016
	
		
		04:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I don't think SparkSQL supports DML on text file datasource just yet. You need to create a DataFrame from the source file, register a table using the DataFrame, select with predicate to get the person whose age you want to update, apply a function to increment the age field, and then overwrite the old table with the new DataFrame. Here is the code:  import org.apache.spark.sql._
case class Person(name: String, age: Int)
var personRDD = sc.textFile("/user/spark/people.txt")
var personDF = personRDD.map(x=>x.split(",")).map(x=>Person(x(0),(x(1).trim.toInt))).toDF()
personDF.registerTempTable("people")
var personDF = sqlContext.sql("SELECT name, age FROM people WHERE age = 19")
personDF.show() 
var agedPerson = personDF.map(x=>Person(x.getAs[String]("name"), x.getAs[Int]("age")+2)).toDF()
agedPerson.registerTempTable("people")
var agedPeopleDF = sqlContext.sql("SELECT * FROM people")
agedPeopleDF.show
  This assumes that you have the SparkContext and SparkSQLContext, one person per line, file on HDFS at /user/spark/people.txt, and running shell as Spark-Client or Zeppelin. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		03:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This answer assumes that you have SparkContext and SQLSparkContext already created as part of the shell and that your file has one person per line and is located on HDFS at /user/spark/people.log.  import org.apache.spark.sql._
case class Person(name: String, age: Int)
var personRDD = sc.textFile("/user/spark/people.txt")
var personDF = personRDD.map(x=>x.split(",")).map(x=>Person(x(0),(x(1).trim.toInt))).toDF()
personDF.registerTempTable("people")
var personDF = sqlContext.sql("SELECT name, age FROM people WHERE age = 19")
personDF.show() 
var agedPerson = personDF.map(x=>Person(x.getAs[String]("name"), x.getAs[Int]("age")+2)).toDF()
agedPerson.registerTempTable("people")
var agedPeopleDF = sqlContext.sql("SELECT * FROM people")
agedPeopleDF.show  If you are running your driver as SparkClient or through Zeppelin, can bring up the Yarn Resource Manager UI and see the job being distributed across the cluster. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		12:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I like to think of the meta store as the definition of the structure you want to impose on the unstructured data that lives on HDFS. Not only column names and types but where rows and columns start/end on the format being queried. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		12:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There is now a processor for Couchbase and RabbitMQ is an AMQP messaging platform for which there is also a processor. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		12:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I would use ExtractText with Regex and then AttributestoJSON processor to create a JSON formated flow file. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		12:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You can run a Nifi instance locally in a small JVM to gather the log file data and send it to a larger cluster for processing. I would use ListenSyslog processor rather than UDP and point Log4J there. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-27-2016
	
		
		11:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Configure ConsumeJMS processor, configure MergeContent processor to create suitable file sizes for HDFS, the use PutHDFS processor to write to HDFS directly. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - Next »