Member since
05-17-2016
190
Posts
46
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1387 | 09-07-2017 06:24 PM | |
1790 | 02-24-2017 06:33 AM | |
2575 | 02-10-2017 09:18 PM | |
7066 | 01-11-2017 08:55 PM | |
4707 | 12-15-2016 06:16 PM |
07-15-2016
06:10 PM
could you please post little more information on the job, the submit command etc. What is your data source?
... View more
07-14-2016
06:48 PM
I guess, if the data set does not contain a '\t' char then '\t'.join and saveAsTextFile should work for you. Else, you just need to wrap the strings within " as with normal CSVs.
... View more
07-14-2016
02:23 PM
Could you provide more details on the your RDD that you would like to save tab delimited? On the question about storing the DataFrames as a tab delimited file, below is what I have in scala using the package spark-csv df.write.format("com.databricks.spark.csv").option("delimiter", "\t").save("output path") EDIT
With the RDD of tuples, as you mentioned, either you could join by "\t" on the tuple or use mkString if you prefer not to use an additional library. On your RDD of tuple you could do something like .map { x =>x.productIterator.mkString("\t") }.saveAsTextFile("path-to-store")
@Don Jernigan
... View more
07-13-2016
08:58 PM
Is your RDD an RDD of strings? On the second part of the question, if you are using the spark-csv, the package supports saving simple (non-nested) DataFrame. There is an option to specify the delimiter which is , by default but can be changed. eg - .save('filename.csv', 'com.databricks.spark.csv',delimiter="DELIM")
... View more
06-08-2016
06:56 PM
1 Kudo
Difference is noticeable only when we run it in a cluster mode without actually knowing where the driver is. On the other case, if we know where the driver is set to launch, both methods are similar in action.
--files is a submit time parameter, main() can run anywhere and just need to know the file name. In code, I can refer to the file by a file:// call.
In case of addFile(), since this is a code level setting, the main() need to know the file location in order to perform the add() . As per the API doc, The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
... View more
06-08-2016
03:05 PM
@Benjamin Leonhardi. Thanks for pointing this out. I over looked this flag.
... View more
06-08-2016
02:08 PM
@Rajkumar Singh, don't the application.properties.file need to be in a key value format?
... View more
06-08-2016
01:58 PM
Thanks @Jitendra Yadav. I will take a look at the addFile API. I would like to try getting control on the driver as clukasik pointed out.
... View more
06-08-2016
01:55 PM
@clukasik, Thank You, I have had a look at broadcast variables. But I guess with the current requirement, I just require the RDD.
... View more