Member since 
    
	
		
		
		06-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                61
            
            
                Posts
            
        
                20
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2604 | 01-21-2017 06:18 PM | |
| 3097 | 08-19-2016 06:24 AM | |
| 2032 | 06-09-2016 03:23 AM | |
| 3774 | 05-27-2016 08:27 AM | 
			
    
	
		
		
		02-02-2017
	
		
		02:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-01-2017
	
		
		04:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hive is very similar to a database design - so as a first step you can create a hive table using syntax like (in its simplest form)  create table table_name (
  id                int,
  date       	    string,
  name              string
)
partitioned by (date string)
  There are many variants that you can add to this table creation such as where it is stored, how it is delimited, etc.. but in my opinion keep it simple first and then you can expand your mastery. This link (the one that I always refer to) will talk in detail on the syntax (for DDL operations), different options etc  - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL  Once you got this taken care of.. you can then start inserting data into Hive. Different options available for this is explained here at the DML documentation - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML  So these 2 links will be good to start for getting closer to hive in general.  Then sepecifically for your question on loading xml data - you can either load the whole xml file data as a single column and then read it using xpath udf at the read time, or break each xml tags as a seperate column at the write time. I will go through both of those options here in little details:  Writing xml data as a single column: you can simply create a table like   CREATE TABLE xmlfiles (id int, xmlfile string)
  and then put the entire xml data into the string column. Then at the time of reading, you can use the XPATH udf (user defined function that come along with Hive) to read the data. Details here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF  This approach is easy to write data, but may have some performance overhead at the time of reading data (as well as limitations on doing some aggregates on the result set)  Writing xml data as a columnar value into Hive: This approach is little more drawn out at the time of writing data. but easier and more flexible for read operation.  Here first you convert your xml data into either an Avro or Json and then using one of the serde (Serialize / deserialize) to write data to Hive. This will give you some context - https://community.hortonworks.com/repos/30883/hive-json-serde.html  Hope this makes sense.  If you find this answer helpful, please 'Accept' my initial answer above 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-21-2017
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2016
	
		
		01:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi My issue was solved by updating the SUSE 11SP4. Installed the updates as the os was in initial state.Erro rwas gone after that.             
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-23-2016
	
		
		03:02 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Scott Shaw Thanks a lot 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2016
	
		
		02:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 refer to manual installation doc for hdp-select to fix your symlink issues https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_upgrading_Ambari/content/_Run_HDP_Select_mamiu.html when you have a specific error open a question, generally you shouldn't get these errors. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-19-2016
	
		
		06:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Ted Yu @emaxwell @Josh Elser thanks all for your confirmation , that's why i asked if rpm is relocatable 🙂  so the bottom line is Hortonworks installation directories cannot be changed , all binary and config files of HDP go in /usr and /etc .. since its hardcoded in RPM and RPM is not relocatable   i will close this thread  But I believe it should support relocatability  from corporate IT policy POV , wherein we many times we have issue putting files in /usr and /etc  also i suggest at the time of RPM creation hortonworks should make RPM to be relocatable in order to allow installing binary and config files in other directories instead of /usr and /etc . i understand there are other software's which HDP consists of, but ultimately Hortwonworks  can customize this bundle to support user specific needs   I should open this as an idea , WDYT ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-15-2016
	
		
		06:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @ripunjay godhani  I want to be sure I understand your post.  Are you saying that modifying a single Ambari property will relocate logs for all components on a restart?  If so, can share what the name of that property is?  The page you linked to does not have a single mention of log location.  In a perfect world, I would have left plenty of room under /var for logging, but we have a heavily used cluster with a lot of data and constant crashes from full /var on many of the machines.  I need to move everything to a new location.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-04-2016
	
		
		09:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html  NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary.  You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-08-2016
	
		
		02:07 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I think there is an HCC article on this very topic, but https://martin.atlassian.net/wiki/x/EoC3Ag is a blog post I wrote back in mid-2015 on this subject as well in case it helps any.  Good luck! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













