Member since 
    
	
		
		
		09-21-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                133
            
            
                Posts
            
        
                130
            
            
                Kudos Received
            
        
                24
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 7086 | 12-17-2016 09:21 PM | |
| 4505 | 11-01-2016 02:28 PM | |
| 2225 | 09-23-2016 09:50 PM | |
| 3433 | 09-21-2016 03:08 AM | |
| 1786 | 09-19-2016 06:41 PM | 
			
    
	
		
		
		06-13-2016
	
		
		01:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Do I just use '/' as the directory separator? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-13-2016
	
		
		09:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		04-25-2016
	
		
		03:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Great solution to schema inference, @Simon Elliston Ball, but I still have the question about Spark and/or other YARN job launching from NiFi 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-21-2016
	
		
		03:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							@Sunile Manjee  have a look at the phoenix query server. It's a beta feature in HDP, but is installable via Ambari. When you pick which nodes should be clients, datanodes, nodemanagers, etc, you can check the Phoenix Query Server box. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-20-2016
	
		
		05:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		04-14-2016
	
		
		11:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I see that MergeContent can merge multiple flowfiles into one with specified size or flowfile count semantics. Is there a processor that does this based on elapsed time instead? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		04-14-2016
	
		
		11:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 this looks interesting, but would require an already running spark-application and ability to communicate with the correct hadoop worker-node, which doesn't seem straight-forward. Your idea did make me think about YARN RM's REST API, so have an upvote. Still want to see if there's a more straightforward suggestion, so will leave the q open. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-14-2016
	
		
		10:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 I'm aware of ExecuteProcess, which could invoke spark-submit, but I'm not running NiFi on an HDP node.  I receive lots of arbitrary CSV and JSON files that I don't have pre-existing tables for. Instead of trying to script DDL creation inside NiFi, it would be nice to invoke a Spark job that infers schema and creates tables from data already loaded to HDFS. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache NiFi
- 
						
							
		
			Apache Spark
			
    
	
		
		
		04-13-2016
	
		
		04:02 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Divya Gehlot- as @Sunile Manjee noted, HBase is an indexed lookup system which can also perform scans. This makes you think a bit about your data access/query patterns before you can create an optimal table design.  In general, you want to design your rowkeys around your access patterns. Ensure your highest order rowkey bits can always be known to your application at HBase read-time, else your access will be a full-scan instead of a range scan.  Users of the raw HBase API often find themselves performing logic in their application code instead of server-side within HBase's RegionServer processes. A simple, but powerful way to avoid both writing large amounts of client application code and pulling significant chunks of data back, consider using Apache Phoenix on top of HBase. It makes it easy to perform a more selective HBase query via SQL query language, which also:  1. Lends itself more naturally to thinking about how data is laid out in your tables  2. Lets you define secondary indices on the data your queries access regardless of whether your application knows a specific rowkey (or range) it needs to access. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2016
	
		
		03:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @nejm hadjmbarek, can you post the error you're seeing? From the above, it looks like you've forgotten to include the ZooKeeper znode for HBase: "/hbase-unsecure". Try with the following connection string instead: "jdbc:phoenix:195.154.55.93:2181:/hbase-unsecure".  If you need a simple simple java application (with maven pomfile) that embeds and uses the Phoenix JDBC driver, take a look here. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













