Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4135 | 12-20-2017 08:26 PM | |
| 3889 | 03-09-2017 03:47 PM | |
| 3299 | 11-18-2016 09:00 AM | |
| 6285 | 05-18-2016 08:29 PM | |
| 4714 | 02-29-2016 01:14 AM |
06-29-2015
07:02 PM
Thank you Sean for the answer, I actually misspoke and just need to upgrade to Spark 1.3 (I'm using Spark 1.2). I've been trying to use this guide: https://s3.amazonaws.com/quickstart-reference/cloudera/hadoop/latest/doc/Cloudera_EDH_on_AWS.pdf But I am still only getting Spark 1.2, do you have any suggestions on how I can use this guide to get Spark 1.3?
... View more
06-29-2015
08:56 AM
Thanks for the explanation. And thanks for tolerating me extending the original question. This issue can be closed.
... View more
06-16-2015
06:14 PM
Using the class path precedence is not the correct solution for all cases. A solution that will work in all cases is to use shading for the classes that you have modified versions of (use maven or gradle to do that). In your case you need to shade the parquet classes that you have modified when you package the jar. Be careful if you change classes like parquet: you could ed up with files that are only readable with your code and force you to keep packaging it with all jobs. That could cause problems later if you decide to use a different method to access the files. Wilfred
... View more
06-09-2015
08:28 PM
Can you check the path separator? I would have expected that on windows you would use the \ and not the / can you also explain how you start PySpark: do you use the cmd scripts or under cygwin? BTW: we do not test windows as a client, so you might see a known issue Wilfred
... View more
05-26-2015
06:11 PM
In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster. Wilfred
... View more
05-25-2015
05:13 PM
1 Kudo
A1: check the Hdfs Design page for details on what is stored were. The edits log and file system image are on the NN. Look for the section on persistence on file system data. For more detail on setting up the cluster follow Cluster Setup. A2: if you have the disk then having a mirrored disk will make it more resilient. Making a backup is still a good idea 😉 Wilfred
... View more
05-14-2015
04:39 AM
Thank you for the explanation! BTW, where do you see the 'Comments'?? I don't have that! << The comment on the setting in CM should have explained it for you: Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.>>
... View more
05-13-2015
08:21 PM
I was able to fix the issue. We use chef to setup hbase configuration on worker node, but there were problem with chef setting which caused the missing hbase connection setting on the worker node. After I fix chef, the hbase connection was setup fine. Thanks
... View more
- « Previous
- Next »