About guptaneha_er

guptaneha_er · ‎07-07-2016

1) Read Data from a file in Hadoop to a DataFrame in Spark in Scala //sc -- SparkContext val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) var hadoopFileDataFrame =hiveContext.read.format("com.databricks.spark.csv").load(filePath) 2) Using Dataframe schema , create a table in Hive in Parquet format and load the data from dataframe to Hive Table. Issue 1 : Dependency added in pom.xml for parquet-hive-bundle-1.6.0.jar . Using following code: var query = "CREATE TABLE Test(EMP_ID string,Organisation string,Org_Skill string,EMP_Name string)ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' TBLPROPERTIES ('PARQUET.COMPRESS'='SNAPPY')" val dataFrame = hiveContext.sql(query) The code hangs although the table is created but unable to fire select query on the table. Issue 2: hadoopFileDataFrame.registerTempTable("temp") var query="CREATE TABLE TEST AS SELECT * FROM TEMP" hiveContext.sql(query) val dataFrame = hiveContext.sql("select * from test") dataFrame.show() Note: It successfully loads the data from dataframe to Hive Table as printed in the console logs. But when I check the Hive table using same Select Statement , there is no data in the table . What is the cause behind this? How can I copy the data from dataframe to a Hive table and store it as Parquet file and perform dynamic partitioning of the data ?(ensuring that the data is correctly copied in the Hive table ) ,

Online	Offline
Last Visited	‎07-07-2016 04:01 PM

Member Since	‎07-07-2016 04:01 PM
Last Visited	‎07-07-2016 04:01 PM
Posts	1

Cloudera Community

create a parquet table in Hive from a dataframe in...