Member since 
    
	
		
		
		06-20-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                488
            
            
                Posts
            
        
                433
            
            
                Kudos Received
            
        
                118
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3601 | 08-25-2017 03:09 PM | |
| 2501 | 08-22-2017 06:52 PM | |
| 4191 | 08-09-2017 01:10 PM | |
| 8969 | 08-04-2017 02:34 PM | |
| 8946 | 08-01-2017 11:35 AM | 
			
    
	
		
		
		08-04-2017
	
		
		02:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Setting this property creates a tmp directory in BOTH local and HDFS.  It does so in HDFS because other properties use hadoop.tmp.dir as a base path to store data in HDFS.  Example: dfs.name.dir=${hadoop.tmp.dir}/dfs/name creates this path in hdfs.  There is no way to have this property NOT create a path locally.  See these links for a good discussion:   https://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir  https://stackoverflow.com/questions/40169610/where-exactly-should-hadoop-tmp-dir-be-set-core-site-xml-or-hdfs-site-xml 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		11:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Using your sed approach, this should replace all NULL with empty character  sed 's/[\t]/,/g; s/NULL//g'  > myfile.csv  If there is a chance that NULL is a substring of a value you will need to do the following where ^ is beginning of line and $ is end of line and , is your field delimiter  sed 's/[\t]/,/g; s/^NULL,/,/g; s/,NULL,/,,/g; s/,NULL$/,/g;'  > myfile.csv  Note that if your resultset is large, it is probably best to use Pig on HDFS and not sed (to leverage the parallel processing of hadoop and save yourself a lot of time.  Note also: To use empty character as nulls in the actual hive table, use the following in the DDL  TBLPROPERTIES('serialization.null.format'=''); 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		09:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Just want to be sure you are using port 10500.  LLAP and non-LLAP each have their own HiveServer2 (ports 10500 and 10000, respectively. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		08:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Darko Milovanovic  You can update it once (in the version control) but unfortunately it has to be re-deployed to each separate instance in your flows.  This is because each component is instantiated separately with a different global id as described in section 5.  Do note that in HDF 3.0 after you do this NiFi keeps versions of each processor deployed, so you can use one version of a processor in one flow, and another version in a different flow (all versions available to choose from).  There is active work on making reusable components shared (instantiated) but that has not been released. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		08:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Both are similar in their awesome drag-and-drop UI to process data in motion,  However, they differ fundamentally in purpose and underlying technology.  Differences  Purpose  NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics.  In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.  Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)       Flow Management (NiFi)  Stream Analytics (SAM)    data velocity  batch, microbatch or streaming (from diverse sources)  streaming (from diverse sources)    data size (per content)  small (kb) to large (GB)   small (KB, MB) per message in stream    data manipulation  rich: parse, filter, join, transform, enrich, reformat  minimal changes to data    data flow management  powerful: queue prioritization, back pressure, route/merge, persist to target  minimal: mostly route/merge and persist to target    real-time analytics  basic  powerful     So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between.  SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.    Technology  NiFi is built around processors and connections with repositories underneath.  SAM is built on top of Storm and Kafka (and Druid).    Shared  What do they have in common? Both have easy UI development that hides complexity underneath.  Both are components of Hortonworks Data Flow (HDF) distribution.  Both share Kafka (see below).  Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security).  Both can use the same Schema Registry to work with data structure of content.  Do they connect?  A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed).  SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc).  SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc)  Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		12:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 One point: if you specify a delimiter that is not the true delimiter in the file ... no error will be thrown.  Rather, it will treat the full record (including its true delimiters) as a single field. In this case, the true delims will just be characters in a string. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		12:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Aditya Jadhav  Small mistake: you need uppercase PigStorage('|').    lp = load '/employee.txt ' using PigStorage('|') as (aa,bb,cc,dd,ee);  Error shows that it is looking for a java function called pigStorage and cannot find it.  In addition to Pig's native functions (which PigStorage belongs) functions can be found in referenced libraries (e.g. 3rd party or ones you build yourself as User Defined Functions). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-14-2017
	
		
		04:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you @ccasano  It was due to this error handling design and InvokeHTTP not able to establish a connection. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-14-2017
	
		
		01:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 One of the Hive Interactive Query (LLAP) configs is "Hold Containers to Reduce Latency" and it is set to false by default.  What specifically does this config control, and since the goal of LLAP is fast response times (down to subsecond), why is the default value not true since the config name suggests that turning it on would reduce latency? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		07-10-2017
	
		
		06:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I recall now the port differences ... it is between HiveServer2 (10000) and HiveServer2 Interactive (10500) and nothing with jdbc 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













