Member since
12-02-2017
9
Posts
0
Kudos Received
0
Solutions
10-14-2019
04:46 PM
@Plop564 I am not an expert in Spark, but my understand is below: 1. I will have 100 output files >>> this depends how many partitions you have in your original DF. "coalesce" can only reduce number of partitions, so if you have less than 100 partitions before, then it won't do anything, as "coalesce" does not do shuffling. If you want to guarantee number of output files, I believe "repartition" function is better. 2. Each single CSV file is locally sorted, I mean by the "date" column ascending >>> Yes 3. Files are globally sorted, I mean CSV part-0000 have "date" inferior to CSV part-0001, CSV part-0001 have "date" inferior to CSV part-0002 and so on .. >>> I believe it is also Yes, but will wait for other Spark experts to confirm. Cheers Eric
... View more
09-10-2018
08:00 PM
Here's what I do to build Apache Oozie 5.x from CDH6 (6.0.0) via sources: ~> git clone https://github.com/cloudera/oozie.git
~> cd oozie/ && git checkout cdh6.0.0
~> bin/mkdistro.sh -DskipTests -Puber
… (takes ~15+ minutes if building for the first time) …
~> ls -lh distro/target/
# Look for oozie-5.0.0-cdh6.0.0-distro.tar.gz
... View more
01-12-2018
02:34 AM
Hi @Tim Armstrong Thanks for the quality answer 🙂 As you mention, we don't use in production the lastest Impala version, so it is indeed possible that there are bugs in Impala or UDF/UDAF. I will check the changelog and evaluate possibles issues on thoses very large requests. Regarding our self-made UDF, the good thing is, after reviewing our logs history, such error was also triggered before the deployement of such UDF. So if there are some memory leaks currently, it might be then unrelated to our work, and it might be a minor issue as we have just to wait to upgrade the cluster (and Impala version) Otherwise, many thanks for implementation details you give me, it helps to better understand !
... View more
12-10-2017
11:17 AM
Thanks again !
... View more
12-10-2017
11:14 AM
Hi @Tim Armstrong Thank you very much for the reply !
... View more