Member since
02-23-2016
51
Posts
96
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2452 | 05-25-2016 04:42 PM | |
3659 | 05-16-2016 01:09 PM | |
1619 | 04-27-2016 05:40 PM | |
5784 | 02-26-2016 02:14 PM |
03-29-2018
09:06 PM
This is awesome.
... View more
01-27-2017
07:51 PM
It has bee disallowed for security reasons. There is an RMP in the pipeline for address this. As a workaround, data should be transformed before reaching Hive.
... View more
11-18-2016
02:41 PM
@Kirk Haslbeck
Ranger will allow you to authorize centrally where you can allow/disallow access to user level for creating or deleting directories. but it does not provide recovery options. Pls do check - http://hortonworks.com/apache/ranger/
... View more
09-16-2016
01:08 PM
@Kirk Haslbeck Currently there is open issue with aggregating date column statistics for partitioned tables. https://issues.apache.org/jira/browse/HIVE-14773 When hive client queries the metastore db for date column statistics, it runs into a NPE. This may show up as increased lag time in Tez because it takes longer for DAG execution to kick in. To workaround this, you can delete the column statistics for the data column from PART_COL_STATS in the metastore db.
... View more
09-13-2016
08:25 PM
1 Kudo
@Kirk Haslbeck Michael is correct you will get 5 total executors
... View more
08-24-2016
08:04 PM
7 Kudos
Brandon Wilson has a great article that shows how to use the "CACHE TABLE" cmd in Tableau, however more recent drivers have come out and you can now connect directly to the thriftserver using a spark-sql driver. This is using HDP 2.5 and SimbaSparkOdbc. First pull up a Tableau connection and select the thriftServer. Additionally had to open the virtualbox port 10015. Next if you don't have the driver Tableau will jump you to a page where you can download a spark-sql driver and inside that package chose this driver. Once you establish a valid connection you will see Tableau flag the connects based on the driver. Below you will see the Hive connection from Brandon's article and now the new Spark connection. Next using the CACHE cmd enter the below into Tableau's initial SQL box. Finally check the storage of spark for the warehouse/crimes table in memory. Or any table of your chosing for that matter. Some visuals from Tableau.
... View more
Labels:
01-31-2018
09:20 AM
how to set JAVA_HOME path ?
... View more
06-10-2016
10:47 PM
Upvoted! If the notebook works on sandbox, pls consider including it in https://github.com/hortonworks-gallery/zeppelin-notebooks. This is actually the set of demo notebooks that get automatically installed when zeppelin is installed via Ambari
... View more
05-30-2016
04:57 PM
@Kirk Haslbeck - I was working on something similar. Writing PySpark to use SparkSQL to analyze data in S3 using the S3A filesystem client. I documented my work with instructions here: https://community.hortonworks.com/articles/36339/spark-s3a-filesystem-client-from-hdp-to-access-s3.html
... View more