Support Questions
Find answers, ask questions, and share your expertise

Pig Incompatable schema

Contributor

my input file is below

a.txt

aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data

A = LOAD '/pigdata/a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);

How to resolve below error and what is the reason for this error

ERROR:-
C = FOREACH A GENERATE STRSPLIT(a1,'\\u002E') as (a1:chararray, a1of1:chararray),a2,a3;
2017-02-03 00:45:42,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "a1:chararray,a1of1:chararray", right is ":tuple()"
1 ACCEPTED SOLUTION

Expert Contributor

You need to flatten the STRSPLIT before you can project.

C = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;   

View solution in original post

7 REPLIES 7

Expert Contributor

You need to flatten the STRSPLIT before you can project.

C = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;   

Contributor

Thanks for input.what is the problem with my relation C.

STRSPLIT will generate tuple as output.Here it will consists of two fields in a tuple.

(a1:chararray, a1of1:chararray) is also a tuple since it is enclosed in parentheses and also consists of two fields

Contributor

Any input on my clarification

Expert Contributor

@vamsi valiveti

The result of the code you wrote gives the schema tike this

((a1),(a1of1)),(a2),(a3)

Now your projection wouldn't work in a data schema like this as Pig still considers the first two rows which is

"((a1),(a1of1))" as one. You need to use flatten for this case to make it into two separate columns.

Thats exactly what my code is doing. I tested your data using my code. works perfectly.

The output of STRSPLIT is a tuple, so if you want to provide its schema you need to explicitly say for example "t1:tuple", like below, and after that you can refer to it as t1.a1 and t1.a1of1. With FLATTEN you get rid of the tuple. So you can choose which way to declare it.

grunt> b = FOREACH a generate STRSPLIT(a1,'\\u002E') as (t1:tuple(a1:chararray,a1of1:chararray)), a2, a3;
grunt> describe b; 
b: {t1: (a1: chararray,a1of1: chararray),a2: chararray,a3: chararray}
grunt> c = foreach b generate t1.a1, a3;

Hi @vamsi valiveti

I noticed that you ask a lot of questions but haven't accepted many answers in the last few months. So, if you don't mind let me tell you a few words how this work: Both questions and answers can be up-voted if the others find them helpful. Also, for each question one answer can be "accepted", usually by the user who asked the question, if that answer resolved the question or greatly helped the user to resolve the issue by himself. Accepted answers help HCC to better manage answered questions, and also help the others who search and find that question later to know that the accepted answer indeed works and can be trusted. So, it will be great if you accept one answer to this question, and some other questions you recently asked. Many thanks!

Contributor

Thanks for comments.I will do it definately starting from this post.

; ;