Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3440 | 01-26-2018 04:02 AM | |
7076 | 12-22-2017 09:18 AM | |
3535 | 12-05-2017 06:13 AM | |
3847 | 10-16-2017 07:55 AM | |
11192 | 10-04-2017 08:08 PM |
08-24-2015
12:00 AM
Actually, I don't know the exact reasons and had stuck in this problem for a few day with firewalls on all machines disabled at very first. I used to deploy hadoop, spark and so on by extracting source tarballs. Forturnately, edge node seems to be a good idea to acess cluster resources.
... View more
08-06-2015
02:36 AM
I don't think it has to do with functional programming per se, but yes, it's because the function/code being executed has to be sent from the driver to the executors, and so the function object itself must be serializable. It has no relation to security.
... View more
08-05-2015
11:05 AM
If you call persist() on an RDD, it means that the data in the RDD will be persisted but only later when something causes it to be computed for the first time. It is not immediately evaluated.
... View more
07-27-2015
04:16 AM
The first case is: read - shuffle - persist - count The second case is: read (from persisted copy) - count You are right that coalesce does not always shuffle, but it may in this case. It depends on whether you started with more or fewer partitions. You should look at the Spark UI to see whether a shuffle occurred.
... View more
07-26-2015
02:20 AM
I don't think that was the problem, I changed the code as below and it worked. The issue was in toDF method: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.SparkConf import sys.process._ class cc extends Runnable{ val conf = new SparkConf().setAppName("LoadDW") val sc = new SparkContext(conf) val sqlContext= new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ override def run(): Unit = { var fileName = "DimCustomer.txt" val fDimCustomer = sc.textFile("DimCustomer.txt") val schemaString = "ID Name City EffectiveFrom EffectiveTo" val schema = StructType(List( StructField("ID", IntegerType, true), StructField("Name", StringType, true), StructField("City", StringType, true), StructField("EffectiveFrom", IntegerType, true), StructField("EffectiveTo", IntegerType, true) ) ) println("----->>>>>>sdsdsd2222\n") var dimCustomerRDD = fDimCustomer.map(_.split(',')).map(r=>Row(r(0).toInt,r(1),r(2),r(3).toInt,r(4).toInt)) var customerDataFrame = sqlContext.createDataFrame(dimCustomerRDD, schema) customerDataFrame.registerTempTable("Cust_1") val customers = sqlContext.sql("select * from Cust_1") customers.show() println("+") } } object pp extends App { val cp = new cc() val rThread = new Thread (cp) rThread.start() }
... View more
07-02-2015
12:14 AM
It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built. There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model. The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.
... View more
06-29-2015
07:02 PM
Thank you Sean for the answer, I actually misspoke and just need to upgrade to Spark 1.3 (I'm using Spark 1.2). I've been trying to use this guide: https://s3.amazonaws.com/quickstart-reference/cloudera/hadoop/latest/doc/Cloudera_EDH_on_AWS.pdf But I am still only getting Spark 1.2, do you have any suggestions on how I can use this guide to get Spark 1.3?
... View more
06-29-2015
08:01 AM
Cool. Just read your changes and it seems it only impacts the local computation (not Hadoop computation). Correct? Yes, I know Hadoop computation is already doing the right thing and no need to fix.
... View more
06-23-2015
09:16 AM
Hi Sean, You are right. It has to do with config. I have figured it out. Thanks so much! Ying
... View more