Reply
Expert Contributor
Posts: 115
Registered: ‎07-17-2017
Accepted Solution

KUDU Update joining a hdfs table don't work

[ Edited ]

Hi,

I have a small cluster with 20 nodes, (10 of them has the SSE4.2 in CPU), so I have 20 HDFS DNs, and 10 KUDU tablets servers (10 are common).
When I try to execute the bellow query:

UPDATE t1 SET t1.num = t2.id
FROM db1.table1 t1
JOIN db2.table2 t2
WHERE t1.name= t2.name
AND t1.active IN (1,2);

Knowing that table1 is a KUDU table and table2 is HDFS/parquet table.

I had this error message:

WARNINGS: Unable to create Kudu client: Not implemented: The CPU on this system (Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz) does not support the SSE4.2 instruction set which is required for running Kudu. If you are running inside a VM, you may need to enable SSE4.2 pass-through.


NB: I use CDH v5.12, Impala v2.9 and Kudu v1.4.

Why I had this issue and is there another form to do the same query without problem ?
Thanks in advance.

 

Cloudera Employee
Posts: 64
Registered: ‎09-28-2015

Re: KUDU Update joining a hdfs table don't work

Hi,

Unfortunately the Kudu client is built in such a way that it requires
SSE4.2. The CPU you are running on was discontinued in Q4 2010 and not
supported by Kudu. That includes the Kudu client which is used by Impala.

Unfortunately you will not be able to query Kudu tables in a mixed cluster
with impala daemons that do not support SSE4.2.

-Todd
Expert Contributor
Posts: 115
Registered: ‎07-17-2017

Re: KUDU Update joining a hdfs table don't work

Hi,

All the 10 KUDU tablets servers and also KUDO master server in my cluster supports the SSE4.2 (Ex: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, Intel(R) Xeon(R) CPU E5506  @ 2.13GHz, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz...).

And I'm already working with KUDU, and the most of queries are good, also the UPDATE without JOIN (with a HDFS table) works fine.

Also the client of impala daemons where I execute the concerned UPDATE query supported the SSE4.2 feature.

Cloudera Employee
Posts: 64
Registered: ‎09-28-2015

Re: KUDU Update joining a hdfs table don't work

Your non-JOIN queries probably work because Impala is scheduling for
locality and only scheduling work on nodes with Kudu running. When you join
with HDFS data, some work is scheduled on all of the nodes in the cluster,
and then those tasks running on non-Kudu nodes still need to write output
to Kudu.

-Todd
Expert Contributor
Posts: 115
Registered: ‎07-17-2017

Re: KUDU Update joining a hdfs table don't work

Hmm I understand,
Thank you @Todd Lipcon for the answers,

So, now there is no way to do a query like this in a mixed cluster ?!
Else I'll try do a join in an intermediate table before  doing the update query to avoid the imbricate join.

Highlighted
Cloudera Employee
Posts: 64
Registered: ‎09-28-2015

Re: KUDU Update joining a hdfs table don't work

That's correct, I am not aware of a workaround for this issue.
Announcements