Member since
07-10-2017
68
Posts
30
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4138 | 02-20-2018 11:18 AM | |
3369 | 09-20-2017 02:59 PM | |
17977 | 09-19-2017 02:22 PM | |
3574 | 08-03-2017 10:34 AM | |
2240 | 07-28-2017 10:01 AM |
11-29-2018
01:57 PM
By Ambari platform you mean? You can use either zeppelin or superset which you were using. Zeppelin has a lot of interpreters and it can connect to hive/spark/mysql. https://zeppelin.apache.org/supported_interpreters.html Visualization in superset is easier, you can create hive tables using that csv or create mysql using that. You can then add that database in superset and add existing tables. https://superset.incubator.apache.org/tutorial.html#connecting-to-a-new-database
... View more
11-29-2018
06:50 AM
@Ftoon Kedwan I think you've got the concept wrong In superset, you add datasources/databases and tables assuming they are already present in your environment. (it doesn't create those for you). For example, you'll have a mysql db somewhere and you'll have to provide an sqlalchemy url to add it. You can then go to add tables and add tables which already exist in the database. While adding database/datasource, there's no check whether the physical entity is present. (unless can do explicit test connection while adding), so you were able to add it (not create it).
... View more
08-24-2018
10:20 AM
1 Kudo
You can set -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps flags in YARN_NODEMANAGER_OPTS and then view nodemanager logs in GC visualizer like gceasy.io. This error occurs when all objects are referenced/live and subsequent GC cycles can't reclaim > 2% of heap space. https://plumbr.io/outofmemoryerror/gc-overhead-limit-exceeded
... View more
02-20-2018
11:18 AM
1 Kudo
Hive Views are a logical construct with no associated storage. Neither are they permanent, they exist only for the session. I don't think you'll be able to see a directory in hdfs hive warehouse corresponding to the view. See below links for reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView https://community.hortonworks.com/content/supportkb/48761/what-is-a-hive-view.html
... View more
11-30-2017
09:12 AM
Check whether SPARK_HOME in interpreter settings points to correct pyspark. Is it set to below value? SPARK_HOME /usr/hdp/current/spark2-client/ Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread: https://issues.apache.org/jira/browse/ZEPPELIN-295 Do spark.driver.memory=4G, spark.driver.cores=2. Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796 Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.
... View more
11-19-2017
12:08 PM
I also tried that once and it didn't seem to work for some reason. Please try using 'screen' utility. https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ Use Ctrl+a+c to create a new one, run your script there without nohup, that is ./run_beeline_hql.sh and detach from that session by using Ctrl+a+d. The process will keep running in the background which you can check by ps.
... View more
11-19-2017
12:04 PM
Try including the option --driver com.mysql.jdbc.Driver in the import command.
... View more
11-17-2017
12:33 PM
1 Kudo
Good to hear that anobi. I could not find how to restrict sessions to a particular value. However if you set this spark.sql.hive.thriftServer.singleSession true. Only 1 session can be run. This doesn't scale very well. Please run spark.conf.getAll(), you may find other properties related to num sessions. Also please accept/upvote any answers if they helped you in concept. Thank You
... View more
11-16-2017
01:14 PM
1 Kudo
@anobi do Did you try setting spark.sql.thriftServer.incrementalCollect true? I am not running multiple queries at a time, so maybe because of that I'm not seeing this, Try decreasing number of simultaneous sessions after setting incremental to true.
... View more
11-16-2017
05:59 AM
2 Kudos
You may also be facing a bug. Check below links and your spark version. https://issues.apache.org/jira/browse/SPARK-18857 https://forums.databricks.com/questions/344/how-does-the-jdbc-odbc-thrift-server-stream-query.html https://stackoverflow.com/questions/35046692/spark-incremental-collect-to-a-partition-causes-outofmemory-in-heap Regardless, please try with spark.sql.thriftServer.incrementalCollect true in thrift conf or start thrift-server with that. It is set to False by default, this would be an important thing to check and has a direct implication on driver heap (if you're in fact running out of that). Read link below: http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/
... View more