Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3009 | 01-26-2018 04:02 AM | |
6344 | 12-22-2017 09:18 AM | |
3039 | 12-05-2017 06:13 AM | |
3303 | 10-16-2017 07:55 AM | |
9412 | 10-04-2017 08:08 PM |
01-26-2018
04:02 AM
1 Kudo
I know these are well-known as feature requests, and ones I share. I don't know that they are planned for any particular release, but am sure these are tracked already as possible features.
... View more
12-29-2017
05:29 AM
1 Kudo
You can make a DataFrame over all the files and then filter out the lines you don't want. You can make a DataFrame for just the files you want, then union them together. Both are viable. If you're saying different data types are mixed into sections of each file, that's harder, as you need to use something like mapPartitions to carefully process each file 3 times.
... View more
12-27-2017
05:58 AM
Not sure what you're trying to do there, but looks like you have a simple syntax error. bucketBy is a method. Please start with the API docs first.
... View more
12-27-2017
05:57 AM
I think you mean something like df.write.mode(SaveMode.Overwrite).saveAsTable(...) ? Depends on what language this is.
... View more
12-22-2017
09:18 AM
1 Kudo
This looks like a mismatch between the version of pandas Spark uses and that you have on the driver, and whatever is installed with the workers on the executors.
... View more
12-05-2017
06:13 AM
It now installs using Cloudera Manager, so yes you want the host to be part of the CM cluster to assign it to the workbench.
... View more
11-22-2017
05:56 AM
1 Kudo
I'm not sure how you would do that. We support spark-submit and the Workbench, not Jupyter. It's clear how to configure spark-submit, and you configure the workbench with spark-defaults.conf. You can see your Spark job's config in its UI, in the environment tab.
... View more
11-22-2017
05:42 AM
This has nothing to do with CM. It has to do with your app's memory configuration. The relevant settings are right there in the error.
... View more
11-17-2017
01:49 AM
If you have 4 topics with 3 partitions each then you need 12 executor slots to process fully in parallel. You have only 3 slots. If you are using receiver based streaming you may need 1 more, too. Also, 1 core per executor is generally very low. Your result is therefore not surprising and your second config much more reasonable.
... View more
11-13-2017
11:14 AM
1 Kudo
No, it requires Spark 2.
... View more