Member since
08-18-2017
6
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1855 | 08-18-2017 06:24 PM |
09-10-2017
09:23 PM
@Geoffrey Shelton Okot you're absolutely right! Thank you a lot. I was struggling to understand why I was loosing containers the first time I used "/fast_nfs/yarn/local (was getting "file not found" error messages) and then the tiredness made me think of using hdfs:/// instead...and I ended up sending this dumb question. My initial problem was caused by the fact that I was using the same NFS filesystem and path for all the nodes. After mounting /fast_nfs/nodeXXX/ on each node (where nodeXXX contains the yarn/local subdir) and adjusting permissions it worked perfectly (note that node1 gets /fast_nfs/node001 mounted as /scratch, node2 gets /fast_nfs/node002 mounted as /scratch and so on, so I'm using /fast_nfs as the NFS filesystem and mounting subdirs for each node..and defined yarn.nodemanager.local-dir as "/scratch/yarn/local")
... View more
09-01-2017
04:37 PM
Since it doesn't seem you can easily transform a dataframe into an RDD in Spark's structured streaming, I found a way to manipulate the dataset to fit my needs. I used the split function from the pyspark.sql.functions module to split the contents of the dataframe's column (a string containing the independent variables for my ML model) into several new columns and then I used the VectorAssembler class from pyspark.ml to merge the new columns into a vector column.
... View more