Support Questions
Find answers, ask questions, and share your expertise

Error while loading dataframe into a hive partition

Error while loading dataframe into a hive partition


Im trying to load a dataframe into hive table which is partitioned like below.

> create table emptab(id int, name String, salary int, dept String)
> partitioned by (location String)
> row format delimited
> fields terminated by ','
> stored as parquet;


I have a dataframe created in the below format:

val empfile = sc.textFile("emp")
val empdata = => e.split(","))
case class employee(id:Int, name:String, salary:Int, dept:String)
val empRDD = => employee(e(0).toInt, e(1), e(2).toint, e(3)))
val empDF = empRDD.toDF()

But Im getting an error as below:

java.lang.RuntimeException: [1.1] failure: identifier expected


Data in my input file: emp


| id|   name|salary| dept|
|  1|   Mark|  1000|   HR|
|  2|  Peter|  1200|SALES|
|  3|  Henry|  1500|   HR|
|  4|   Adam|  2000|   IT|
|  5|  Steve|  2500|   IT|
|  6|  Brian|  2700|   IT|
|  7|Michael|  3000|   HR|
|  8|  Steve| 10000|SALES|
|  9|  Peter|  7000|   HR|
| 10|    Dan|  6000|   BS|

Could anyone tell what is the mistake I am doing here and how can I correct it ?


Re: Error while loading dataframe into a hive partition

New Contributor

If you're going to partition the data by location, shouldn't your dataframe contain a column named "location" that it can use to partition the data by?


I haven't implemented partitioning in Hive / Hadoop, but that's been true for every other database I've worked with.