Member since
03-20-2016
56
Posts
18
Kudos Received
0
Solutions
06-06-2016
03:45 PM
Hi John, I would recommend reading the paper "Spark SQL: Relational Data Processing in Spark" which describes the steps you are mentioning in more detail about the Catalyst Optimizer. https://web.eecs.umich.edu/~prabal/teaching/resources/eecs582/armbrust15sparksql.pdf
... View more
05-30-2016
12:11 PM
Thank for your help really! Now I get it!
... View more
05-26-2016
01:11 AM
Hi, thanks for your answer. But Im not understanding. I think the answer that I accpted fixed the issue. Because starting the spark-shell with spark-shell --master spark://masterhost:7077 in the 8080 port I get: Cores in use: 4 Total, 4 Used Memory in use: 4.0 GB Total, 2.0 GB Used Applications: 1 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE So it seems that it is already working starting spark-shell with thay way, right? But you are suggesting that should be spark-shell --master "local" spark:///mastehost:7077?
... View more
05-19-2016
10:04 PM
Please note, I modified the original comment above since it allocated too much PermGen space. I changed the value from 8192M to this, which would require a total of 3 GB RAM to run spark-shell: "-XX:MaxPermSize=1024M -Xmx2048m"
... View more
05-09-2016
11:26 AM
@John CodUnfortunately too vague. Also it looks to me like you use MapReduce? May I ask which distribution you are using? If its CDH then you won't have Tez and hive will be slow. Cloudera has their own query engine Impala and are now going to Hive on Spark so they do not really support the latest of the Open Source Hive. On CDH I would go with Parquet+Impala then. ( or switch to Hive and HDP or any other OpenHadoop distribution)
... View more
05-06-2016
07:06 PM
Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan?
... View more
04-29-2016
01:01 PM
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-<hostname>.log has datanode log and /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log has nodemanager on each node. You can also look at .out files with same name in the same directories.
... View more
04-04-2016
01:35 PM
Thank you really. Now it is working! It is just showing some warnings about "version information not found in metastore..." and "failed to get database default returning NoSuchObjectException". But as they are warnings should be working fine, right?
... View more
07-27-2016
02:28 PM
I have a 2 node configuration. I tried 127.0.0.1, internal IP, and hostname. Same results as @John Code. I did add spark through Ambari. Not sure what other configuration it might be missing. No matter what I do: 16/07/27 14:30:30 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
16/07/27 14:30:30 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries!
... View more