About khaslbeck

TimothySpann · ‎03-29-2018

This is awesome.

icocio · ‎01-27-2017

It has bee disallowed for security reasons. There is an RMP in the pipeline for address this. As a workaround, data should be transformed before reaching Hive.

sshimpi · ‎11-18-2016

@Kirk Haslbeck Ranger will allow you to authorize centrally where you can allow/disallow access to user level for creating or deleting directories. but it does not provide recovery options. Pls do check - http://hortonworks.com/apache/ranger/

ndembla · ‎09-16-2016

@Kirk Haslbeck Currently there is open issue with aggregating date column statistics for partitioned tables. https://issues.apache.org/jira/browse/HIVE-14773 When hive client queries the metastore db for date column statistics, it runs into a NPE. This may show up as increased lag time in Tez because it takes longer for DAG execution to kick in. To workaround this, you can delete the column statistics for the data column from PART_COL_STATS in the metastore db.

azeltov · ‎09-13-2016

@Kirk Haslbeck Michael is correct you will get 5 total executors

khaslbeck · ‎08-24-2016

Brandon Wilson has a great article that shows how to use the "CACHE TABLE" cmd in Tableau, however more recent drivers have come out and you can now connect directly to the thriftserver using a spark-sql driver. This is using HDP 2.5 and SimbaSparkOdbc. First pull up a Tableau connection and select the thriftServer. Additionally had to open the virtualbox port 10015. Next if you don't have the driver Tableau will jump you to a page where you can download a spark-sql driver and inside that package chose this driver. Once you establish a valid connection you will see Tableau flag the connects based on the driver. Below you will see the Hive connection from Brandon's article and now the new Spark connection. Next using the CACHE cmd enter the below into Tableau's initial SQL box. Finally check the storage of spark for the warehouse/crimes table in memory. Or any table of your chosing for that matter. Some visuals from Tableau.

san_engr · ‎01-31-2018

how to set JAVA_HOME path ?

sunile_manjee · ‎07-07-2016

Great article.

abajwa · ‎06-10-2016

Upvoted! If the notebook works on sandbox, pls consider including it in https://github.com/hortonworks-gallery/zeppelin-notebooks. This is actually the set of demo notebooks that get automatically installed when zeppelin is installed via Ambari

bmathew · ‎05-30-2016

@Kirk Haslbeck - I was working on something similar. Writing PySpark to use SparkSQL to analyze data in S3 using the S3A filesystem client. I documented my work with instructions here: https://community.hortonworks.com/articles/36339/spark-s3a-filesystem-client-from-hdp-to-access-s3.html

Online	Offline
Last Visited	‎08-02-2018 08:10 PM

Member Since	‎02-23-2016 02:08 AM
Last Visited	‎08-02-2018 08:10 PM
Posts	51
Kudos received	90

Cloudera Community

Re: Remove Zeppelin from Ambari then Add it back

Re: HAWQ to HIVE data type mapping

Re: Storage data in HDFS - What's next?

Re: HDP-2.3.4.0-3485 upgrade failed due to HDP-UTI...

Re: Preventive Maintenance - Machine Cost Avoidanc...

Re: Hive Transform Error

Re: HDFS Data Recovery

Re: Hive Metastore Connection Failure then Retry

Re: Spark num-executors setting

Tableau on Spark Cache via ThriftServer

Re: How to install and run Spark 2.0 on HDP 2.5 Sa...

Re: JSON to SQL using Spark

Re: Predicting Stock Portfolio Gains using Monte C...

Re: Spark on S3