About tsharma

tsharma · ‎11-29-2018

By Ambari platform you mean? You can use either zeppelin or superset which you were using. Zeppelin has a lot of interpreters and it can connect to hive/spark/mysql. https://zeppelin.apache.org/supported_interpreters.html Visualization in superset is easier, you can create hive tables using that csv or create mysql using that. You can then add that database in superset and add existing tables. https://superset.incubator.apache.org/tutorial.html#connecting-to-a-new-database

tsharma · ‎11-29-2018

@Ftoon Kedwan I think you've got the concept wrong In superset, you add datasources/databases and tables assuming they are already present in your environment. (it doesn't create those for you). For example, you'll have a mysql db somewhere and you'll have to provide an sqlalchemy url to add it. You can then go to add tables and add tables which already exist in the database. While adding database/datasource, there's no check whether the physical entity is present. (unless can do explicit test connection while adding), so you were able to add it (not create it).

tsharma · ‎08-24-2018

You can set -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps flags in YARN_NODEMANAGER_OPTS and then view nodemanager logs in GC visualizer like gceasy.io. This error occurs when all objects are referenced/live and subsequent GC cycles can't reclaim > 2% of heap space. https://plumbr.io/outofmemoryerror/gc-overhead-limit-exceeded

tsharma · ‎02-20-2018

Hive Views are a logical construct with no associated storage. Neither are they permanent, they exist only for the session. I don't think you'll be able to see a directory in hdfs hive warehouse corresponding to the view. See below links for reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView https://community.hortonworks.com/content/supportkb/48761/what-is-a-hive-view.html

tsharma · ‎11-30-2017

Check whether SPARK_HOME in interpreter settings points to correct pyspark. Is it set to below value? SPARK_HOME /usr/hdp/current/spark2-client/ Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread: https://issues.apache.org/jira/browse/ZEPPELIN-295 Do spark.driver.memory=4G, spark.driver.cores=2. Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796 Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.

tsharma · ‎11-19-2017

I also tried that once and it didn't seem to work for some reason. Please try using 'screen' utility. https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ Use Ctrl+a+c to create a new one, run your script there without nohup, that is ./run_beeline_hql.sh and detach from that session by using Ctrl+a+d. The process will keep running in the background which you can check by ps.

tsharma · ‎11-19-2017

Try including the option --driver com.mysql.jdbc.Driver in the import command.

tsharma · ‎11-17-2017

Good to hear that anobi. I could not find how to restrict sessions to a particular value. However if you set this spark.sql.hive.thriftServer.singleSession true. Only 1 session can be run. This doesn't scale very well. Please run spark.conf.getAll(), you may find other properties related to num sessions. Also please accept/upvote any answers if they helped you in concept. Thank You

tsharma · ‎11-16-2017

@anobi do Did you try setting spark.sql.thriftServer.incrementalCollect true? I am not running multiple queries at a time, so maybe because of that I'm not seeing this, Try decreasing number of simultaneous sessions after setting incremental to true.

tsharma · ‎11-16-2017

You may also be facing a bug. Check below links and your spark version. https://issues.apache.org/jira/browse/SPARK-18857 https://forums.databricks.com/questions/344/how-does-the-jdbc-odbc-thrift-server-stream-query.html https://stackoverflow.com/questions/35046692/spark-incremental-collect-to-a-partition-causes-outofmemory-in-heap Regardless, please try with spark.sql.thriftServer.incrementalCollect true in thrift conf or start thrift-server with that. It is set to False by default, this would be an important thing to check and has a direct implication on driver heap (if you're in fact running out of that). Read link below: http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/

Online	Offline
Last Visited	‎01-08-2021 08:08 AM

Member Since	‎07-10-2017 03:41 AM
Last Visited	‎01-08-2021 08:08 AM
Posts	68
Kudos received	30

Cloudera Community

Re: how to check views in hive from hdfs?

Re: Extract timestamp from filename and add it in ...

Re: Pyspark dataframe: How to replace

Re: What is IPC client in Hive? What does it do?

Re: Window Operations on Spark Streaming

Re: I am not able to create a table inside my data...

Re: I am not able to create a table inside my data...

Re: Nodemanager process crashed due to 'GC overhea...

Re: how to check views in hive from hdfs?

Re: Why is my Spark job stuck?

Re: Unable to successfully launch beeline script f...

Re: I am getting below error while trying to impor...

Re: Need Spark Thrift Server Design because STS ha...

Re: Need Spark Thrift Server Design because STS ha...

Re: Need Spark Thrift Server Design because STS ha...