Member since
02-23-2016
51
Posts
96
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1348 | 05-25-2016 04:42 PM | |
2391 | 05-16-2016 01:09 PM | |
906 | 04-27-2016 05:40 PM | |
3705 | 02-26-2016 02:14 PM |
03-29-2018
01:49 PM
15 Kudos
An article on the challenges and solutions to predicting machine failures in the field. The full details can be found here: https://github.com/kirkhas/zeppelin-notebooks/tree/master/Preventive_maintenance Step #1 Feature Selection Step #2 Geolocation Step #3 - Scythe is a time-series library authored by Kirk Haslbeck for these purposes - Needed to Resample the data into trips or route segments (Scythe Resample) - Needed to Step Interpolate the miles since last service to be 4K, 5K and less continuous regression Step #4 - Indexing and OneHotEncoding to the Rescue. Found a relationship of a particular "Make" that was more problematic than most. Roc Curve - A near perfect model
... View more
03-29-2018
01:39 PM
Repo Description Preventive Maintenance Data Science Use-Case for Fleet Cost Avoidance. Repo Info Github Repo URL https://github.com/kirkhas/zeppelin-notebooks/tree/master/Preventive_maintenance Github account name kirkhas Repo name Preventive_maintenance
... View more
01-31-2017
04:11 PM
1 Kudo
Updating this thread. Hive has primary and foreign keys for metadata and query optimization. https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive ALTER TABLE TABLENAME ADD CONSTRAINT COLNAME_PK PRIMARY KEY (CS_ID);
ALTER TABLE TABLENAME ADD CONSTRAINT COLNAME_FK1 FOREIGN KEY (TBL_ID) REFERENCES TBLS
... View more
01-27-2017
06:39 PM
Is there another method or workaround that can replace the "transform" method. Or suggested usage to resolve the error below. select transform(host, ip) using 'python parse_mro.py' as (host string, ip string) from table1; Error: Error while processing statement: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException(Query with transform clause is disallowed in current configuration.) (state=08S01,code=12)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Ranger
11-18-2016
02:19 PM
What about Ranger, can that provide protection at this level? Assuming data does get removed any recovery options?
... View more
11-18-2016
01:23 PM
What are the best recovery options if a product like Abinitio runs an m_rm command that deletes the HDFS data in one of the environments. These type of low level executions by-pass the Hadoop dfs rm command that puts the deleted data in the trash folder for recover. The Trash Interval is Configured for 21 Days in the Hortonworks Environment. Data had to be recreated from the source files, but if this were prod what are the best recovery options?
... View more
Labels:
- Labels:
-
Apache Hadoop
09-14-2016
01:33 PM
1 Kudo
When running hive 1.2.1 on HDP 2.4 Hive successfully connects to the metastore and then later drops the connection. It seems like an exception is throwing after it successfully connects to metastore. I noticed if we turn off the CBO settings it will by pass the metastore and skip this exception. We are using ORC and have run compute stats. 2016-09-07 04:44:08,643 INFO [main]: hive.metastore (HiveMetaStoreClient.java:isCompatibleWith(296)) - Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
2016-09-07 04:44:08,647 INFO [main]: hive.metastore (HiveMetaStoreClient.java:open(382)) - Trying to connect to metastore with URI INFO [main]: hive.metastore (HiveMetaStoreClient.java:open(478)) - Connected to metastore. -- 2016-09-07 04:44:08,647 INFO [main]: hive.metastore (HiveMetaStoreClient.java:open(382)) - Trying to connect to metastore with URI 2016-09-07 04:44:08,649 INFO [main]: hive.metastore (HiveMetaStoreClient.java:open(478)) - Connected to metastore.
2016-09-07 04:44:08,664 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2016-09-07 04:44:08,729 WARN [main]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(184)) - MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3033) 2016-09-07 04:44:13,784 WARN [main]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(184)) - MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
... View more
Labels:
- Labels:
-
Apache Hive
09-13-2016
02:50 PM
4 Kudos
How will Spark designate resources in spark 1.6.1+ when using num-executors? This question comes up a lot so I wanted to use a baseline example. On an 8 node cluster ( 2 name nodes) (1 edge node) (5 worker nodes). Each worker node having 20 cores and 256G. if num-executors = 5 will you get 5 total executors or 5 on each node? Table below for illustration. cores executors per node executors total 25 5 25
... View more
Labels:
- Labels:
-
Apache Spark
08-29-2016
06:43 PM
Can it run both Spark 1.6.1 and Spark 2.0 or just Spark 2.0 ?
... View more
08-24-2016
08:04 PM
7 Kudos
Brandon Wilson has a great article that shows how to use the "CACHE TABLE" cmd in Tableau, however more recent drivers have come out and you can now connect directly to the thriftserver using a spark-sql driver. This is using HDP 2.5 and SimbaSparkOdbc. First pull up a Tableau connection and select the thriftServer. Additionally had to open the virtualbox port 10015. Next if you don't have the driver Tableau will jump you to a page where you can download a spark-sql driver and inside that package chose this driver. Once you establish a valid connection you will see Tableau flag the connects based on the driver. Below you will see the Hive connection from Brandon's article and now the new Spark connection. Next using the CACHE cmd enter the below into Tableau's initial SQL box. Finally check the storage of spark for the warehouse/crimes table in memory. Or any table of your chosing for that matter. Some visuals from Tableau.
... View more
Labels: