Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Warning: WARN ReaderImpl: Cannot find field for: field_name in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,

avatar
Contributor

I am running a Sql query in Spark:

spark.sql("select person_key, count(*) as count1 from <table_name> group by person_key order by count1 desc").show()

This throws a warning:

17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,
17/10/05 12:09:03 WARN ReaderImpl: Cannot find field for: person_key in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9,

But does give correct results. I want to understand what this means. I did not find anything on the net. I want this resolved because though it is giving the correct results, it is taking very long to execute. (The same query on Hive LLAP takes 3 seconds. Spark numbers are usually comparable to Hive LLAP numbers).

I checked person_key does exist in the table (I created it so the table so I know it exists). Not sure why the warning is coming.

2 REPLIES 2

avatar
Expert Contributor

That is a valid warning. Old Hive ORC writer doesn't save the correct schema into ORC files. They wrote dummy column names like, `_col1`. You are using that old ORC file. If you generate new ORC file with Hive 2, then you can not see that warning.

avatar
Rising Star

Desc <Table_name>;

you'll come to know your table would have columns like _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9..

What you need to do is Add the alias to your existing table so that your column names are the actual names and not default _col1, etc.