Member since
07-31-2019
346
Posts
258
Kudos Received
62
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
181 | 08-22-2018 06:02 PM | |
90 | 03-26-2018 11:48 AM | |
237 | 03-15-2018 01:25 PM | |
337 | 03-01-2018 08:13 PM | |
67 | 02-20-2018 01:05 PM |
06-20-2017
01:25 PM
@JAYA PARASU You could also try taking a HDFS snapshot: https://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/ https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html You can setup a cron job that takes the snapshot and does the copy on a regular basis.
... View more
05-25-2017
04:34 PM
Hi @Divya Reddy, I'll try to track down numbers but this HCC article provides some additional useful information https://community.hortonworks.com/questions/2517/maximum-hive-table-partitions-allowed-recommended.html
... View more
05-10-2017
02:37 PM
@Avijeet Dash most likely it is using the account you log into Ambari with. You can try using a different account but to me the errors don't indicate a permissions issue.
... View more
05-10-2017
02:11 PM
@Avijeet Dash Double-check the jdbc connection for HiveServer2 JDBC URL in the view definition. I've also had luck by creating a new Hive view instance.
... View more
05-10-2017
01:28 PM
Hi @spring spring If the CSV files are sent to a directory, I'd consider using HDF to pick them up and flow them directly into HDP. Once there you have a number of options. You can try using the new LLAP dynamic text cache https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/ to query them directly or convert them to ORC tables. You can also move them into ElasticSearch or Solr and create dashboards. Once the files are in Hive though you can use any visualization tool you want via ODBC or JDBC connections. Hope this helps.
... View more
05-10-2017
01:22 PM
Hi @Avijeet Dash Confirm HiveServer is running. Can also try restarting the service via Ambari
... View more
05-08-2017
01:36 AM
@mÁRIO Rodrigues use https://github.com/hortonworks/hive-testbench. Default format is ORC.
... View more
04-18-2017
06:49 PM
3 Kudos
Hi @Adnan Alvee I reached out to our certification team and this was their response: "We do have plans to revamp all Certifications to align closer with upcoming courses. We are planning for an Advance Big Data Analyst course and certification which will probably match up IBMs in content and scope. Some of the topics that will be covered here are Hadoop, Spark, Pig and optionally HBase. Although we do not have a hard date for the course/certification launch, we are targeting late 2017".
... View more
04-18-2017
11:57 AM
Hi @Kuldeep Mishra I'm assuming you mean HDP 2.6? Solr is part of the 2.6 stack as HDPSearch. The current documentation is wrong and is in the process of being updated. Solr has always been a separate support cost. Hope this helps. Scott
... View more
04-02-2017
11:54 AM
Hi @Bala Vignesh N V, Truncating the partition will not remove the partition metadata. To remove the metadata for all partitions you'll want to issue the CASCADE statement in an ALTER TABLE statement. This should remove the column metadata for all partitions. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment
... View more
03-27-2017
07:24 PM
Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics. Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6. Hope this helps.
... View more
03-11-2017
02:35 PM
@ccasano I'd also like to add to my comment that default LLAP leverages the LRFU algorithm with a pre-emption strategy on the Frequently Used side. This means that LLAP will always preempt long running queries in favor of short, adhoc queries yet it still will allow for the occasional "bring-back-all-the-data" scenario without flushing adhoc query cache. This allows for better query concurrency and provides optimal performance for the majority of BI workloads. In addition, LLAP does not use YARN containers which would limit concurrency since each container is a user session, aka job. Most BI use cases involve many users running ad hoc queries and then keeping their session open as they look at the reports. By leverages TEZ AM for queries LLAP gets much higher concurrency. The combination of LLAP's LRFU algorithm, use of TEZ AM, caching, and AtScale's Adaptive Cache, users get a nice boost in performance and concurrency out-of-the-box.
... View more
03-10-2017
03:56 PM
2 Kudos
Hi @ccasano Current limits are more tightly related to query performance because queries that take a long time can keep open threads which only serve to backup other users. So getting queries to Adaptive Cache and improving query performance is important. That being said, there will be some clients that will have enterprise grade concurrency requirements. When this is the case, the recommendation is to stand up additional AtScale nodes which will synchronize and have a load balancer in front. Of course this also assumes that the increased concurrency demand can be serviced by the HDP cluster leveraging the Adaptive Cache. Hope this helps.
... View more
03-01-2017
06:25 PM
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseTable/
... View more
02-27-2017
07:22 PM
Hi @Mayank Bhatt See if this applies to your issue https://community.hortonworks.com/content/kbentry/73416/hive-metastore-is-getting-down-frequently.html Also make sure the following value is set correctly: Configure Hive cluster delegation token storage class. <property> <name> hive.cluster.delegation.token.store.class </name> <value> org.apache.hadoop.hive.thrift.DBTokenStore </value> </property>
... View more
02-23-2017
06:54 PM
5 Kudos
Hi @James Dinkel I'm guessing there is a memory sizing issue. Make sure you follow these sizing rules:
MemPerDaemon (Container Size) > LLAP Heapsize (Java process heap) + CacheSize (off heap) +headroom
Multiple of yarn min allocation Should be less than yarn.nodemanager.resource.memory-mb Headroom is capped at 6GB QueueSize (Yarn Queue) >= MemPerDaemon * num daemons + slider + (tez AM size * concurrency) Cachesize = MemPerDaemon - (hive tez container * num of executors) Num executors per daemon = (MemPerDaemon - cache_size)/hive tez container size In addition, be sure your LLAP queue is setup appropriately and has sufficient capacity:
<queue>.user_limit_factor =1 <queue>.ama-resource-percent =1 (its actually a factor between 0 and 1) <queue>.capacity=100 <queue>.max-capacity=100
... View more
02-08-2017
10:26 PM
Hi @PJ Have you double-checked your YARN and Hive settings to determine if something has changed? Specifically check your YARN container settings.
... View more
02-02-2017
04:36 PM
@Georg Heiler Syncsort also provides a high performance bulk data import utility called Data Funnel. Data Funnel will run parallel import jobs which can significantly accelerate large data loads. Bulk data loads is not a good use case for Nifi.
... View more
01-24-2017
03:12 PM
Hi @suresh krish It appears you may have a Ranger policy preventing access to the table. You can disable Ranger authentication through Ambari in the Hive configs or review the Hive Ranger policies and provide the appropriate authorization. This HCC thread has some additional information https://community.hortonworks.com/questions/64345/how-to-add-another-hiveserver-for-current-metastor.html
... View more
01-24-2017
03:06 PM
@Leonid Fedotov You have the option to attach bundles directly to support tickets. You can also send a bundle via email. In any case, SmartSense is part of our support subscription and requires a unique customer id.
... View more
01-23-2017
07:44 PM
2 Kudos
Hi @Geetha Anne Our most recent release is 2.5.3. We provide support for a rolling window of the previous 2 versions. This means we still provide support for 2.3. HDP 2.3 was released on June 8, 2015.
... View more
01-18-2017
04:08 PM
@Yasir Faiz I believe LLAP will be GA in 2.6 which is due out sometime in Q2. The advantage of HDP 2.5 would be to test LLAP and Hive 2.0. Also, HDP 2.5 also includes a number fixes to Hive 1.2.1 1.2.1 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_release-notes/content/fixed_issues.html
... View more
01-18-2017
03:34 PM
2 Kudos
@Yasir Faiz Hortonworks fully supports Spark but we do not support Hive on Spark. There are couple of reasons for this: 1. Performance: Hive on Tez showed a 50x or greater improvement than Hive on MR, while Hive on Spark only showed a 3x improvement over Hive on MR. 2. Scale: Hive on Tez has been proven to scale at data sets well exceeding what is capable by Hive on Spark. Here is a presentation discussing some of the differences: http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final With LLAP, Hive speed is increased even further but LLAP is not a replacement for Tez. In fact, LLAP still leverages the Tez engine. Hope this helps!
... View more
01-17-2017
06:03 PM
2 Kudos
Hi @Joe Harvy You can create scripts that will create the databases and tables based off the Hive metastore. This blog walks you through the step. https://sharebigdata.wordpress.com/2016/06/12/hive-metastore-internal-tables/
... View more
01-17-2017
02:07 PM
Hi @Shihab Hive 2.0 is in tech preview for HDP 2.5. The performance gain is due to LLAP which leverages persistent, distributed cache. http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/ You enable LLAP by setting the Interactive query setting to "Yes" under the Hive config settings. This will setup an additional HiveServer 2 instance. You can run both Hive 1.2 and 2.0 on the same instance.
Please try it out. We'd love to hear your feedback.
... View more
01-09-2017
04:28 AM
@Ram D Here are some links which may help: R Cheat Sheet: http://cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf' Installing RHadoop: http://www.research.janahang.com/install-rhadoop-on-hortonworks-hdp-2-0/ Demo to Test: http://hortonworks.com/hadoop-tutorial/using-rhadoop-to-predict-visitors-amount/ R Running on YARN: http://blogs.perficient.com/multi-shoring/blog/2014/07/29/get-r-running-over-yarn-based-mapreduce-3/
... View more
01-04-2017
12:47 PM
@Chandra Manohar There are some blogs which go through installing Impala on HDP but most are probably dated. Impala on HDP is not supported or tested. I would recommend instead to install the latest 2.5 Sandbox and enable LLAP for interactive querying. http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/
... View more
01-03-2017
04:43 PM
@Sami Ahmad You'll want to use beeline going forward since the Hive CLI will be deprecated. Beeline's JDBC connection provides for a higher level of security than the Hive CLI.
... View more
01-03-2017
01:09 PM
1 Kudo
Hi @Ajay Sharma, I'd also strongly recommend our support subscription and implementing SmartSense. http://hortonworks.com/products/subscriptions/smartsense/. This is the quickest and best way to get a complete overview of your cluster health based on best practices and real-time workloads.
... View more