Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig Incompatable schema

avatar
Expert Contributor

my input file is below

a.txt

aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data

A = LOAD '/pigdata/a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);

How to resolve below error and what is the reason for this error

ERROR:-
C = FOREACH A GENERATE STRSPLIT(a1,'\\u002E') as (a1:chararray, a1of1:chararray),a2,a3;
2017-02-03 00:45:42,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "a1:chararray,a1of1:chararray", right is ":tuple()"
1 ACCEPTED SOLUTION

avatar
Expert Contributor

You need to flatten the STRSPLIT before you can project.

C = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;   

View solution in original post

7 REPLIES 7

avatar
Expert Contributor

You need to flatten the STRSPLIT before you can project.

C = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;   

avatar
Expert Contributor

Thanks for input.what is the problem with my relation C.

STRSPLIT will generate tuple as output.Here it will consists of two fields in a tuple.

(a1:chararray, a1of1:chararray) is also a tuple since it is enclosed in parentheses and also consists of two fields

avatar
Expert Contributor

Any input on my clarification

avatar
Expert Contributor

@vamsi valiveti

The result of the code you wrote gives the schema tike this

((a1),(a1of1)),(a2),(a3)

Now your projection wouldn't work in a data schema like this as Pig still considers the first two rows which is

"((a1),(a1of1))" as one. You need to use flatten for this case to make it into two separate columns.

Thats exactly what my code is doing. I tested your data using my code. works perfectly.

avatar
Master Guru

The output of STRSPLIT is a tuple, so if you want to provide its schema you need to explicitly say for example "t1:tuple", like below, and after that you can refer to it as t1.a1 and t1.a1of1. With FLATTEN you get rid of the tuple. So you can choose which way to declare it.

grunt> b = FOREACH a generate STRSPLIT(a1,'\\u002E') as (t1:tuple(a1:chararray,a1of1:chararray)), a2, a3;
grunt> describe b; 
b: {t1: (a1: chararray,a1of1: chararray),a2: chararray,a3: chararray}
grunt> c = foreach b generate t1.a1, a3;

avatar
Master Guru

Hi @vamsi valiveti

I noticed that you ask a lot of questions but haven't accepted many answers in the last few months. So, if you don't mind let me tell you a few words how this work: Both questions and answers can be up-voted if the others find them helpful. Also, for each question one answer can be "accepted", usually by the user who asked the question, if that answer resolved the question or greatly helped the user to resolve the issue by himself. Accepted answers help HCC to better manage answered questions, and also help the others who search and find that question later to know that the accepted answer indeed works and can be trusted. So, it will be great if you accept one answer to this question, and some other questions you recently asked. Many thanks!

avatar
Expert Contributor

Thanks for comments.I will do it definately starting from this post.