About joncodin

m_a_vervuurt · ‎06-06-2016

Hi John, I would recommend reading the paper "Spark SQL: Relational Data Processing in Spark" which describes the steps you are mentioning in more detail about the Catalyst Optimizer. https://web.eecs.umich.edu/~prabal/teaching/resources/eecs582/armbrust15sparksql.pdf

joncodin · ‎05-30-2016

Thank for your help really! Now I get it!

joncodin · ‎05-26-2016

Hi, thanks for your answer. But Im not understanding. I think the answer that I accpted fixed the issue. Because starting the spark-shell with spark-shell --master spark://masterhost:7077 in the 8080 port I get: Cores in use: 4 Total, 4 Used Memory in use: 4.0 GB Total, 2.0 GB Used Applications: 1 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE So it seems that it is already working starting spark-shell with thay way, right? But you are suggesting that should be spark-shell --master "local" spark:///mastehost:7077?

phargis · ‎05-19-2016

Please note, I modified the original comment above since it allocated too much PermGen space. I changed the value from 8192M to this, which would require a total of 3 GB RAM to run spark-shell: "-XX:MaxPermSize=1024M -Xmx2048m"

joncodin · ‎05-09-2016

Thank you really, it helped a lot understand better this.

bleonhardi · ‎05-09-2016

@John CodUnfortunately too vague. Also it looks to me like you use MapReduce? May I ask which distribution you are using? If its CDH then you won't have Tez and hive will be slow. Cloudera has their own query engine Impala and are now going to Hive on Spark so they do not really support the latest of the Open Source Hive. On CDH I would go with Parquet+Impala then. ( or switch to Hive and HDP or any other OpenHadoop distribution)

joncodin · ‎05-06-2016

Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan?

ravi1 · ‎04-29-2016

/var/log/hadoop/hdfs/hadoop-hdfs-datanode-<hostname>.log has datanode log and /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log has nodemanager on each node. You can also look at .out files with same name in the same directories.

joncodin · ‎04-04-2016

Thank you really. Now it is working! It is just showing some warnings about "version information not found in metastore..." and "failed to get database default returning NoSuchObjectException". But as they are warnings should be working fine, right?

spbryfczynski · ‎07-27-2016

I have a 2 node configuration. I tried 127.0.0.1, internal IP, and hostname. Same results as @John Code. I did add spark through Ambari. Not sure what other configuration it might be missing. No matter what I do: 16/07/27 14:30:30 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/07/27 14:30:30 ERROR SparkContext: Error initializing SparkContext. java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries!

Online	Offline
Last Visited	‎06-22-2016 06:53 PM

Member Since	‎03-20-2016 04:13 PM
Last Visited	‎06-22-2016 06:53 PM
Posts	56
Kudos received	18

Cloudera Community

Re: Catalyst optimization phases

Re: Spark physical plan doubts (TungstenAggregate,...

Re: Spark strange behavior: Im executing a query a...

Re: Error initializing SparkContext., Containers l...

Re: spark sql interaction with hive doubts

Re: create hive orc table

Re: Spark SQL Internally

Re: hadoop 3 nodes configuration issues: Datanode ...

Re: Help to start spark with no errors

Re: ./spark-shell dont starts corretlcy