1. Ensure that the host from where you are running spark-shell or spark2-shell has the corresponding Spark gateway role enabled.
- Login to the CM WebUI - Go to Spark2/Spark service - click on the instances tab - ensure that the Gateway role for host is there. If not, us Add roles to add it.
2. Ensure that you have selected Hive service in the Spark configuration. - Login to CM WebUI - Go to Spark2/Spark service - click on the configuration tab - in the search box type in hive - enable the service and redeploy the client and the stale configuration.
3. Once done, open spark shell and the hive context should already be there in the form of sqlContext variable. The example below shows a very basic SQL query on a hive table 'sample_07' which contains sample employee data with 4 columns. A transformation was applied using filter and then the resultant transformation was saved as atext file in HDFS.
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc (master = yarn-client, app id = application_1510839440070_0006). SQL context available as sqlContext.
Note: This might not be the most elegant to store the transformed dataframe, but would work for testing. There are other ways to save the transformation as well and since we are talking about columns and dataframes, you might want to consider saving it as CSV using spark-csv library or even better in parquet format.
Once saved, you can query the resultant file from HDFS and transfer it locally (if needed).
[root@nightly511-unsecure-1 ~]# hdfs dfs -cat /tmp/c/part-00000 [11-1011,Chief executives,299160,151370] [29-1022,Oral and maxillofacial surgeons,5040,178440] [29-1023,Orthodontists,5350,185340] [29-1024,Prosthodontists,380,169360] [29-1061,Anesthesiologists,31030,192780] [29-1062,Family and general practitioners,113250,153640] [29-1063,Internists, general,46260,167270] [29-1064,Obstetricians and gynecologists,21340,183600] [29-1067,Surgeons,50260,191410] [29-1069,Physicians and surgeons, all other,237400,155150]