Member since 
    
	
		
		
		06-07-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                923
            
            
                Posts
            
        
                322
            
            
                Kudos Received
            
        
                115
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4076 | 10-18-2017 10:19 PM | |
| 4323 | 10-18-2017 09:51 PM | |
| 14805 | 09-21-2017 01:35 PM | |
| 1830 | 08-04-2017 02:00 PM | |
| 2410 | 07-31-2017 03:02 PM | 
			
    
	
		
		
		06-26-2016
	
		
		05:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This is quite a custom requirement that you are converting some rows to column and other  rows to both rows and column. You'll have to write a lot of your code but take advantage of pivot functionality in Spark. Check following link.  https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html  sc.parallelize(rdd.collect.toSeq.transpose)  See the link here for more details. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-26-2016
	
		
		01:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Akash Mehta   So, even following wont work for you? If not, I think currently there is no other way given we have looked at all other possible options.   //a DataFrame can be created for a JSON dataset represented by
// an RDD[String] storing one JSON object per string.
    val anotherPeopleRDD = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val anotherPeople = sqlContext.read.json(anotherPeopleRDD)  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-25-2016
	
		
		08:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sri Bandaru  since you are not running in a sandbox, what does --master yarn resolves to? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		11:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 load will infer schema and convert to a row. Question is whether it will take an http url. Can you try? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		10:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Akash Mehta Can you do something like this?  
					 dataframe = sqlContext.read.format(“json”).load(your json here)   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		06:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 ALTER TABLE MAGNETO.SALES_FLAT_ORDER  SET SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'  assuming you have hive 0.14 or later. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		05:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Simran Kaur check your data by doing a "cat" to see what the data looks like. Are fields separated by a space or whatever it is. You can also instead create a table and in create table statement specify what you want your fields to be terminated by and then do an import using Sqoop.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		05:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It is likely an issue of field delimiter. Default in Hive is ^A. you should specify what your fields are delimited by.  --fields-terminated-by   Might want to do --lines-terminated-by also. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-21-2016
	
		
		04:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 @Kaliyug Antagonist 
Hi   I would disagree with your assumption that it doesn't make sense to backup peta bytes of data. Think what would you do if there is a fire in a data center and your data is physically destroyed. So even at Petabyte scale, it is very important to have a backup and DR strategy.  Now, snapshots only create backups of data for point in time. You can mark a directory "snapshottable" and then create snapshots of data in that directory. This will give you the ability to go back in time and restore the data to that particular point in time. Please see the following link for more details:  https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html  Snapshots still don't solve the problem you are trying to solve. You need to backup data using either distcp or use tools like Falcon to help. Please see the following link.   https://community.hortonworks.com/questions/394/what-are-best-practices-for-setting-up-backup-and.html  http://hortonworks.com/apache/falcon/  As for your question number 3, when your data nodes go down or name node goes down, I don't think your backups help. When a data node goes down, Hadoop will take care of creating the lost copy by replicating the data. Also, someone in operations will likely be working to bring the datanode up. Similarly if your name node goes down, your cluster should failover to standby namenode and your operations team should be working to restore the lost namenode. Backing up metadata doesn't help in this particular case because between namenode and standby name node you have a quorum journal manager you already have multiple copies of data (this does not discount the significance of a backup and DR strategy which includes metadata backup). Please check the following link. It will help you understand better on how this is working.  http://hortonworks.com/blog/namenode-high-availability-in-hdp-2-0/  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ch_HA-NameNode.html (If you are interested in learning more details)  Thanks  Imad 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-15-2016
	
		
		06:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @chandramouli muthukumaran No, as for HDFS files, their storage will depend only on replication factor. Think about it this way. You start with a fresh linux install. You have different mount points in your system with different capacities. Which mount points would you like to use to store your HDFS data (datanode) as well as your metadata (namenode). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - Next »