Member since
06-03-2016
4
Posts
0
Kudos Received
0
Solutions
12-02-2016
10:12 PM
Ahh okay thank you makes much more sense now. Unfortunately we decided to just a create custom python udf to parse the xml column, this will be easier for us since we wouldn't need a seperate process just to extract the xml into another file.
... View more
12-01-2016
09:30 PM
Im looking at the examples and they don't seem to address my issue of dynamically getting the tag and value, in the examples they declaring which tags to parse. I can do that already with xpath.
... View more
12-01-2016
04:50 PM
Hey all, I have a column in hive that xml data, however the xml format is not static, it changes based on a category column and there are alot. I wanted to know if anybody knows if there is a way to programmaitcally take out the tag and the value in the xml column and store it in map column. Best thing I found is xpath in hive however I need to declare what I am taking out. I can do it with a script via java/scala/python however I would just be connecting to Hive via Jdbc and pulling the data out and doing the work client side. And Unfortunately we do not have spark which I think would be able to do it in. Any ideas?
... View more
Labels:
06-03-2016
08:17 PM
I’m trying to use pig
to do a group by and count distinct on a dataset and I am getting a java error
saying “java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag
cannot be cast to org.apache.pig.data.” Example Dataset
A
B
ids
flag
status
foo
f
1001
1
K
foo
f
1001
1
K
foo
c
1002
1
H
bar
g
1001
1
J
bar
g
1002
P
bar
g
1003
1
L
Here is an example of my code testtable = LOAD
'landing.testtable' USING org.apache.hive.hcatalog.pig.HCatLoader; filtertable = filter testtable by
flag != ' ' AND status != ‘P'; grpcount = FOREACH (GROUP
filtertable by (A, B)) {
uniqueids = Distinct(filtertable.ids);
GENERATE
group.A As A_group,
group.B As B_group,
COUNT(uniqueids) AS id_count; } STORE grpcount INTO
'landing.grpcount USING org.apache.hive.hcatalog.pig.HCatStorer(); This is where I get the error
“java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag cannot be
cast to org.apache.pig.data.” Not exactly sure what’s wrong here (hive table is
properly setup with the right datatypes as well). I assume its erroring out on
grpcount but Im not exactly sure why. But I am basically trying to
duplicate this SQL Code in Pig Select A AS A_group, B AS B_group, count(distinct ids) As id_count From landing.testtable Where flag != ' ' And status not in (‘P') Group by A, B; I found this alternate solution https://issues.apache.org/jira/browse/PIG-4515 here but Im not really sure how to Implement it in my code :/. Using Pig .15/Hortonworks 2.2.0
... View more
Labels: