Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6961 | 09-21-2018 09:54 PM | |
| 8721 | 03-31-2018 03:59 AM | |
| 2613 | 03-31-2018 03:55 AM | |
| 2754 | 03-31-2018 03:31 AM | |
| 6174 | 03-27-2018 03:46 PM |
03-10-2017
03:05 PM
1 Kudo
@Mohammed El Moumni A queue has a limit in size (1 GB) or 10,000 files by default. To change the settings go to setting tab on "Configure" of that queue. See screenshot attached. If it helps, please vote/accept response. It is also possible that downstream you may have another queue or processor stuck due to this limit set by default. You have to increase there and let the processor start processing to reduce the amount in the queue before your queue report may start to drain. Imagine all this flow like a river with all kind of streams and obstructions...
... View more
03-08-2017
07:14 PM
4 Kudos
@Ram Ghase You are trying to follow-up a demo with Spark 2.1, but your sandbox is at best at 2.0. You should follow tutorials that are supported by the version of Spark deployed on HDP 2.5 sandbox, Spark 1.6.2. Spark 2.0 is also possible, but I would wait for HDP 2.6 sandbox which is to be released probably next month. The error is self-explanatory. If you wish to address it, you could add those missing libraries.
... View more
03-08-2017
07:00 PM
3 Kudos
@elliot gimple Hive is not like a traditional RDBMS in regard to DML operations because of how Hive leverages HDFS to store data in files. Keep in mind that each partition has a file, each bucket adds another file and so on. When you perform a DML action against of a row, you practically overwrite a file, not append to a file. This is how HDFS has been architected for good reasons.
... View more
03-08-2017
06:52 PM
2 Kudos
@Subramaniyam KMV I assume you mean mosquito mqtt. Here is an example of installation on centos-7: https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-the-mosquitto-mqtt-messaging-broker-on-centos-7 You can probably skip the "secure" part. This is not specific to HDP 2.5, you can assume that the sandbox is just a Centos VM for your case. What it matters is the OS and availability of resources.
... View more
03-08-2017
06:48 PM
2 Kudos
@som You would have to be more explicit about versions of Hive, Spark etc, also explain "failing". There is nothing different about accessing Hive views via Hive context from Spark as it is the same as with tables. Anyhow, check the following: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_spark-hive-access.html https://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/ Hopefully this helps.
... View more
03-07-2017
05:46 PM
5 Kudos
It is common to allocate capacity to an interactive queue during the day when business are active and to allocate capacity to a batch queue during the night when batch workloads are frequently executed. To configure this scenario, schedule-based policies are used. Per HDP Hive Performance Tuning Guide (8/29/2016), section 3.6.8, this is an alpha Apache feature. Can anyone elaborate on this feature? How will be used to setup time-based queue capacity (steps, screenshots) and whether this is actually available and if it is not available yet, when would it be?
... View more
Labels:
- Labels:
-
Apache YARN
03-07-2017
05:00 PM
3 Kudos
@CriCL Got you. I like to use R Studio connected to HDP and use markdown language to generate PDF files: http://rmarkdown.rstudio.com/pdf_document_format.html True. This is more a tool for data science. On the other hand any tool that you like and can use ODBC or JDBC can connect to Hive or to HBase via Phoenix. Try that approach.
... View more
03-07-2017
02:33 AM
3 Kudos
@Lior Hadaya CBO (cost based optimizer) and statistics collected on your tables. You may have the settings mentioned here set to true: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_use_col_stats_cost_base_opt.html As such, the behavior can change over time. You could also force stats on a specific table or even column.
... View more
03-07-2017
02:26 AM
4 Kudos
@Sree Kupp The 15 additional seconds are outside of Hive. As such, the changes you make to Tez settings will only impact (positively or negatively) the Hive query, but it will not address your 15 seconds lag outside Hive. You delay is a combination of network latency, ODBC and Power BI rendering it in UI, The focus for tuning should be on ODBC tuning to chunk the data better to increase throughput, address the network latency (if any), caching on Power BI side. There are a few things that you could do on Hive side too, but it is a long shot to guess what you and you did not. For example, use ORC format, use Interactive Query (LLAP), increase the use of caching for map-side joins, improve parallelism by setting a few parameters that will allow a better chunking of the data for increased parallelism, etc.
... View more