Member since 
    
	
		
		
		09-23-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                88
            
            
                Posts
            
        
                109
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 8501 | 08-24-2016 09:13 PM | 
			
    
	
		
		
		12-07-2016
	
		
		03:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Issues:  1) In your table definition "create table ..." you do not specify the LOCATION attribute of your table. Therefore Hive will default to look for the file in the default warehouse dir path. The location in your screenshot is under /user/admin/. You can run the command "show create table ..." to see where Hive thinks the table's files are located.   By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation.    2) You are specifying the format using hive.default.fileformat. I would avoid using this property. Instead simply use "STORED AS TEXTFILE" or "STORED AS ORC" in your table definition.  Please change the above, retest and let us know how that works 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-28-2016
	
		
		07:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 My guess is that your local "Cluster A" config values are superseding your use of the "-D" option to overwrite the defaultFS parameter. E.g. your local Cluster A values may have higher priority.  I would have expected that your second command with "hadoop fs -ls" should work to display the remote clusters file directory. Perhaps there was a typo or some other reason why this is not being picked up?  Could you alternatively use WebHDFS command via REST API (bash or Python) to list directories?  https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-28-2016
	
		
		07:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 How did you write the ORC file to this location (Pig, Spark, NiFi other?).  Can you show the schema of the table, the contents of the folder where ORC files are written, and also any details/code on how the file was ingested? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-23-2016
	
		
		06:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Well done Ned! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		08:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Appreciate the correction 😉 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2016
	
		
		09:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 My instict is that the default Hive SerDe would be used and would not automatically skip over the col2 value as you've shown in your example. A few options for you:   Ingest the raw CSV data into a 3 column temp Hive table. Perform an "Insert ... Select * from temp_hive_table" to push those three column values into your destination Hive table.  Write a brief Pig script to parse the CSV table and push to your destination Hive table  Write your own Hive SerDe - https://cwiki.apache.org/confluence/display/Hive/SerDe#SerDe-Built-inandCustomSerDes   Cheers!  Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2016
	
		
		03:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 To your questions:   "Will the processing be distributed" - Yes, the incoming flow of data will be evenly distributed to all available NiFi instances in the NiFi cluster. NCM will act as load balancer  "Distribute the fetching of the source file" - could you elaborate on what you mean by this? In your existing example the Standalone instance is the only one which has access to its local filesystem. How would you prefer to distribute this?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-25-2016
	
		
		10:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I haven't seen a full document that covers sanity checking the entire cluster. This is often performed by the PS team at customer engagements. Side note: the most important common individual component test I use to smoke test a cluster is Hive-TestBench.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-07-2016
	
		
		08:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Does Knox’s Hive service configuration support the overloaded URLs used in the HiveServer2 + ZK HA approach? For example would this be a supportable Knox configuration:  <service>    
<role>HIVE</role>
<url>http://zk1.customer.com:2181,zk2.customer.com:2181,zk2.customer.com:2181/cliservice</url>
</service>  Not sure if the ZK URL  can be passed through Knox? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Knox
			
    
	
		
		
		03-17-2016
	
		
		10:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Please find below a potential design for Disk & RAID configuration for a typical 12 disk Server running NiFi. This design is intended for a simple log ingestion use case, where customer needs very little provenance records, but would also like reliability on the storage layer.  FlowFile repo: 2 drives setup as RAID 1    
Provenance repo:  2 Drives RAID 1   Content repo setup either:   
4 drives (RAID 10) /cont_repo1   
4 drives (RAID 10) /cont_repo2   
or  
2 drives (RAID 1) /cont_repo1   
2 drives (RAID 1) /cont_repo2   
2 drives (RAID 1) /cont_repo3   
2 drives (RAID 1) /cont_repo4     Thanks @mpayne & @Andrew Grande for guidance! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
         
					
				













