Member since 
    
	
		
		
		06-15-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                5
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		11-04-2017
	
		
		09:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	I have many questions, as I have been fiddling with Sandbox as a Hadoop newbie starting with the more basic one first:  
	¶ I have seen in that from the CLI/shell one can go view `/usr/hdp/current/spark2-thriftserver/conf/hive-site.xml` or `/usr/hdp/current/spark2-client/conf/hive-site.xml` and under port property find the listed port (10016) for Thrift Server.  Is this the efficient/preferred way one does this.  
	Further I am trying doing this to try and use this for an ODBC Spark SQL connection to connect to visualization tool, Spotfire.  
	I have successfully connected to the hive datatables in Hive-Server2 from sporfire on my laptop at port 10000 by downloading the Apache Hive connector, now I am hoping to do the same with the Spark ODBC driver, any hints or advice.  
	¶ I am a newbie to HDP and just trying to learn to work with data in hadoop file system, but frankly I don't know what is the reason to want to use one connector over the other is? other than that I'd like to be able to connect with the different methods ( I am an R user and succeeded in getting the hive tables in R as well with OBDC connectors anything I can do in R running on my laptop I could use it with Spotfire which is what I am currently using for analytics), a discussion/answer to this point will be much appreciated.  
	• ¶ Then there are some more challenging things I'd like to do ( You see I understand that I can install R on HDP sandbox and carry out computations, I have seen the SparkR predicting airline delays tutorial; but if I can connect to the data in HDP HDFS outside of HDP sandbox I can start leveraging R's power with Spotfire client's in-built R engine with data from hadoop file system (apparently Spotfire Server has lot more data access/connectivity options but I don't have access to Spotfire Server , so with that in mind some of the things I am trying to get to are::)    
	
  With SparkR from R session running in windows laptop how can I use (csv) files in HDFS in HDP to construst SparkDataFrame, either using Hive table in the Hive Server 2 or some other way?  I can only think of extracting the data that I am interested in from the HDP and then make it into a SparkDataFrame to carry out analysis with SparkR library in windows R session.  But is there such an option as connecting to a remote spark cluster in HDP sandbox.   	
  And 'Livy for Spark2 Server' is this something I should be getting familiar with first for my purpose of accessing data outside of sandbox.  Here is a https://spark.rstudio.com/deployment.html reference from sparklyr package that alludes to this possibility.   
	Thanks, I don't know how naive my questions are but bare with me and any clarification or attempt there at will be really appreciated.  
	Best 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 - 
						
							
		
			Apache Spark