Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

IllegalArgumentException when select with where clause on hive external table pointing to parquet data

SOLVED Go to solution
Highlighted

IllegalArgumentException when select with where clause on hive external table pointing to parquet data

New Contributor

Followed is the workflow

Hive>create external table if not exists table1 ( C1 string, C2 string, C3 int)

stored as parquet

location 'hdfs:/project/table1.parquet';

Parquet data are created by spark as

df.write.mode("overwrite").parquet('hdfs:/project/table1.parquet')

Hive>select * from table1 where C1='toto';

OK

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [c1] was not found in schema!

Time taken: 0.254 seconds

1 ACCEPTED SOLUTION

Accepted Solutions

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

Hi @Hocine Bouzelat Can you please check if you have defined same column name while creating parquet file schema through spark?

7 REPLIES 7

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

Hi @Hocine Bouzelat Can you please check if you have defined same column name while creating parquet file schema through spark?

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

can you please post the output of below command?

parquet-tools schema <parquet file path>

If parquet-tools is not configured then please follow below link.

https://github.com/Parquet/parquet-mr/tree/master/parquet-tools

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

This seems to be an known issue, please set below parameter on hive shell before running select query.

set hive.optimize.ppd = false;

https://issues.apache.org/jira/browse/HIVE-11401

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

Hi @Hocine Bouzelatplease accept this answer if suggsted workaround worked for you.

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

New Contributor

yes, schema spark and schema hive are same.

also,

hive> describe table1

gives the three columns

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

New Contributor

Hi Jitendra,

followed is the complete processs and very explicite

pyspark program

>>>data = [('C1_1',None,'C3_1'), ('C1_2','C2_2',None),('C1_3',None,None),(None,None,'C3_4'),('C1_5','C2_5','C3_5')] 
>>>df = sqlContext.createDataFrame(data, ['C1', 'C2','C3']) 
>>>df.printSchema() 

root

|-- C1: string (nullable = true)

|-- C2: string (nullable = true)

|-- C3: string (nullable = true)

>>>df.show()

+---------+-------+-------+

| C1 | C2 | C3 |

+--------+-------+--------+

| C1_1 | null | C3_1 |

| C1_2 | C2_2| null |

| C1_3| null | null |

| null | null | C3_4 |

| C1_5| C2_5 | C3_5 |

+------+--------+-------+

>>>df.write.mode("overwrite").parquet('hdfs:/tmp/table1.parquet') 

hdfs commands

$ hdfs dfs -ls /tmp/table1.parquet 

Found 11 items

/tmp/table1.parquet/_SUCCESS /tmp/table1.parquet/_common_metadata

/tmp/table1.parquet/_metadata

/tmp/table1.parquet/part-r-00000-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00001-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00002-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00003-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00004-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00005-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00006-3b1ecab2-1942-47eb-b393-c6988876d4a1.gz.parquet

/tmp/table1.parquet/part-r-00007-3b1ecab2-1942-47eb-b393-c6988876d4

Hive commands

hive>create external table table1 ( C1 string, C2 string, C3 string)
   stored as parquet 
   location 'hdfs:/tmp/table1.parquet'; 

OK Time taken: 6.469 seconds

hive>describe table1; 

OK

c1 string

c2 string

c3 string

Time taken: 1.455 seconds,

Fetched: 3 row(s)

hive> select * from table1; 

OK

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

C1_1 NULL C3_1

C1_2 C2_2 NULL

C1_3 NULL NULL

NULL NULL C3_4

C1_5 C2_5 C3_5

Time taken: 0.355 seconds,

Fetched: 5 row(s)

hive> select * from table1 where C1='toto'; 

OK

Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [c1] was not found in schema!

Time taken: 0.277 seconds

hive> select * from table1 where c1='toto'; 

OK

Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [c1] was not found in schema!

Time taken: 0.096 seconds

Re: IllegalArgumentException when select with where clause on hive external table pointing to parquet data

New Contributor

As workaround, the following insrt solves the inssue

hive> set hive.optimize.ppd = false;

thnks @Jitendra Yada