Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

KUDU Update joining a hdfs table don't work

avatar
Master Collaborator

Hi,

I have a small cluster with 20 nodes, (10 of them has the SSE4.2 in CPU), so I have 20 HDFS DNs, and 10 KUDU tablets servers (10 are common).
When I try to execute the bellow query:

UPDATE t1 SET t1.num = t2.id
FROM db1.table1 t1
JOIN db2.table2 t2
WHERE t1.name= t2.name
AND t1.active IN (1,2);

Knowing that table1 is a KUDU table and table2 is HDFS/parquet table.

I had this error message:

WARNINGS: Unable to create Kudu client: Not implemented: The CPU on this system (Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz) does not support the SSE4.2 instruction set which is required for running Kudu. If you are running inside a VM, you may need to enable SSE4.2 pass-through.


NB: I use CDH v5.12, Impala v2.9 and Kudu v1.4.

Why I had this issue and is there another form to do the same query without problem ?
Thanks in advance.

 

2 ACCEPTED SOLUTIONS

avatar
Expert Contributor
Your non-JOIN queries probably work because Impala is scheduling for
locality and only scheduling work on nodes with Kudu running. When you join
with HDFS data, some work is scheduled on all of the nodes in the cluster,
and then those tasks running on non-Kudu nodes still need to write output
to Kudu.

-Todd

View solution in original post

avatar
Master Collaborator

Hmm I understand,
Thank you @Todd Lipcon for the answers,

So, now there is no way to do a query like this in a mixed cluster ?!
Else I'll try do a join in an intermediate table before  doing the update query to avoid the imbricate join.

View solution in original post

5 REPLIES 5

avatar
Expert Contributor
Hi,

Unfortunately the Kudu client is built in such a way that it requires
SSE4.2. The CPU you are running on was discontinued in Q4 2010 and not
supported by Kudu. That includes the Kudu client which is used by Impala.

Unfortunately you will not be able to query Kudu tables in a mixed cluster
with impala daemons that do not support SSE4.2.

-Todd

avatar
Master Collaborator

Hi,

All the 10 KUDU tablets servers and also KUDO master server in my cluster supports the SSE4.2 (Ex: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, Intel(R) Xeon(R) CPU E5506  @ 2.13GHz, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz...).

And I'm already working with KUDU, and the most of queries are good, also the UPDATE without JOIN (with a HDFS table) works fine.

Also the client of impala daemons where I execute the concerned UPDATE query supported the SSE4.2 feature.

avatar
Expert Contributor
Your non-JOIN queries probably work because Impala is scheduling for
locality and only scheduling work on nodes with Kudu running. When you join
with HDFS data, some work is scheduled on all of the nodes in the cluster,
and then those tasks running on non-Kudu nodes still need to write output
to Kudu.

-Todd

avatar
Master Collaborator

Hmm I understand,
Thank you @Todd Lipcon for the answers,

So, now there is no way to do a query like this in a mixed cluster ?!
Else I'll try do a join in an intermediate table before  doing the update query to avoid the imbricate join.

avatar
Expert Contributor
That's correct, I am not aware of a workaround for this issue.