Member since
04-08-2014
70
Posts
20
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6838 | 07-16-2018 04:12 PM | |
6953 | 07-13-2018 03:17 PM | |
7453 | 07-10-2018 03:00 PM | |
7127 | 07-10-2018 02:54 PM | |
7708 | 07-05-2018 03:35 PM |
07-11-2018
02:54 PM
Sounds like good news. Thanks for the update!
... View more
07-10-2018
03:00 PM
1 Kudo
Are you sure the bottleneck is Kudu? Maybe the bottleneck is reading from Oracle? Using the Kudu AUTO_FLUSH_BACKGROUND mode should give pretty fast throughput when writing. See https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html You can also try increasing the KuduSession.setMutationBufferSpace() value, also consider your partitioning scheme. If you want more parallelism you can also consider scanning different ranges in Oracle with different processes or threads on the same or different client machine and perform more parallelized writes to Kudu.
... View more
07-10-2018
02:54 PM
1 Kudo
Hi HJ, It is not possible to do a join using the native Kudu NoSQL API. You will need to use SQL with Impala or Spark SQL, or using the Spark data frame APIs to do the join. Mike
... View more
07-09-2018
11:41 AM
You're welcome. If that worked for you, please mark my response as the answer / solution to your question.
... View more
07-05-2018
03:35 PM
1 Kudo
Another option is to write a Spark job that uses multiple tasks to read from Oracle and write to Kudu in parallel, or something equivalent using multiple processes or threads.
... View more
07-05-2018
03:27 PM
One option is to export to Parquet on HDFS using Sqoop, then use Impala to CREATE TABLE AS SELECT * FROM your parquet table into your Kudu table. Unfortunately Sqoop does not have support for Kudu at this time.
... View more
07-05-2018
10:44 AM
2 Kudos
You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it. If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas. Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first. This will be prevented in a future release.
... View more
06-26-2018
02:40 PM
1 Kudo
Hi Razee, this issue at startup is improved a lot in CDH 5.15, see https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_release_notes.html#relnotes_5_15_0 From that page: The strategy Kudu uses for automatically healing tablets which have lost a replica due to server or disk failures has been improved. The new re-replication strategy, or replica management scheme, first adds a replacement tablet replica before evicting the failed one. With the previous replica management scheme, the system first evicts the failed replica and then adds a replacement. The new replica management scheme allows for much faster recovery of tablets in scenarios where one tablet server goes down and then returns back shortly after 5 minutes or so. The new scheme also provides substantially better overall stability on clusters with frequent server failures. See KUDU-1097 for more information. Mike
... View more
05-17-2018
03:17 PM
1 Kudo
Yes -- I saw the same behavior, and the workaround is adding the symlinks I detailed in a previous comment in this thread.
... View more
05-17-2018
02:37 PM
With Centos 7 I think the only thing required to make "pip install kudu-python" work out-of-the-box is to add the symlinks I mentioned. I'm going to work on getting those added by default as part of the parcel install in a future release.
... View more