About mpercy

mpercy · ‎07-11-2018

Sounds like good news. Thanks for the update!

mpercy · ‎07-10-2018

Are you sure the bottleneck is Kudu? Maybe the bottleneck is reading from Oracle? Using the Kudu AUTO_FLUSH_BACKGROUND mode should give pretty fast throughput when writing. See https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html You can also try increasing the KuduSession.setMutationBufferSpace() value, also consider your partitioning scheme. If you want more parallelism you can also consider scanning different ranges in Oracle with different processes or threads on the same or different client machine and perform more parallelized writes to Kudu.

mpercy · ‎07-10-2018

Hi HJ, It is not possible to do a join using the native Kudu NoSQL API. You will need to use SQL with Impala or Spark SQL, or using the Spark data frame APIs to do the join. Mike

mpercy · ‎07-09-2018

You're welcome. If that worked for you, please mark my response as the answer / solution to your question.

mpercy · ‎07-05-2018

Another option is to write a Spark job that uses multiple tasks to read from Oracle and write to Kudu in parallel, or something equivalent using multiple processes or threads.

mpercy · ‎07-05-2018

One option is to export to Parquet on HDFS using Sqoop, then use Impala to CREATE TABLE AS SELECT * FROM your parquet table into your Kudu table. Unfortunately Sqoop does not have support for Kudu at this time.

mpercy · ‎07-05-2018

You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it. If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas. Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first. This will be prevented in a future release.

mpercy · ‎06-26-2018

Hi Razee, this issue at startup is improved a lot in CDH 5.15, see https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_release_notes.html#relnotes_5_15_0 From that page: The strategy Kudu uses for automatically healing tablets which have lost a replica due to server or disk failures has been improved. The new re-replication strategy, or replica management scheme, first adds a replacement tablet replica before evicting the failed one. With the previous replica management scheme, the system first evicts the failed replica and then adds a replacement. The new replica management scheme allows for much faster recovery of tablets in scenarios where one tablet server goes down and then returns back shortly after 5 minutes or so. The new scheme also provides substantially better overall stability on clusters with frequent server failures. See KUDU-1097 for more information. Mike

mpercy · ‎05-17-2018

Yes -- I saw the same behavior, and the workaround is adding the symlinks I detailed in a previous comment in this thread.

mpercy · ‎05-17-2018

With Centos 7 I think the only thing required to make "pip install kudu-python" work out-of-the-box is to add the symlinks I mentioned. I'm going to work on getting those added by default as part of the parcel install in a future release.

Online	Offline
Last Visited	‎05-09-2019 02:10 AM

Member Since	‎04-08-2014 11:43 PM
Last Visited	‎05-09-2019 02:10 AM
Posts	70
Kudos received	19

Cloudera Community

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Low Performance with Kudu and potential networ...

Re: Apache kudu

Re: Apache kudu

Re: kudu Data length checksum does not match

Re: Apache kudu

Re: Apache kudu

Re: kudu Data length checksum does not match

Re: Kudu start up - ksck: table consistency check ...

Re: Using kudu with Python

Re: Using kudu with Python