Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5130 | 09-21-2018 09:54 PM | |
6495 | 03-31-2018 03:59 AM | |
1969 | 03-31-2018 03:55 AM | |
2180 | 03-31-2018 03:31 AM | |
4834 | 03-27-2018 03:46 PM |
03-30-2017
03:34 PM
2 Kudos
@Sree Kupp Have you tried to set the num_executors=46 in the session variables of the JDBC connection string? JDBC URL connection string has the following format: jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars> Try to set <sessionConfs> parameter as: num_executors=46; As you know this use is not documented, nor supported by HWX or CDH. I like to use Hive/LLAP instead: https://cwiki.apache.org/confluence/display/Hive/LLAP http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html This is supported and it is very promising. Additionally, due to the new features included in HDP 2.6 to be released in the next a few weeks, it will be generally available and a definite option for Enterprise Data Warehouse Optimization and a single pane of SQL on Hadoop, with ANSI SQL 2011 compliance in the very near future.
... View more
03-30-2017
03:48 AM
5 Kudos
It seems that the following improvement addressed the requirement in v 2.6. https://issues.apache.org/jira/browse/YARN-1051
... View more
03-21-2017
07:51 PM
3 Kudos
@Boris Demerov Stay tuned. Sometime April.
... View more
03-20-2017
08:20 PM
2 Kudos
@P D That is the usual QA step. Pick and choose from here: https://github.com/aengusrooneyhortonworks/HadoopBenchmarks If you use HDFS, Hive, HBase, choose those applicable. At the minimum you could hive test-bench and teragen/terasort, and maybe one for HBase. You could do those, but it may take time. You
could just login to Hive and run some queries. Then log to HBase and perform
usual commands using hbase-shell and you could also run SQL via Phoenix. This
is a smoke test suite that you could build for the upgrades. You may have to
include tests for all the tools in the ecosystem. There will be Storm topologies that you have to handle. There will be Spark jobs that you have to test etc. A test plan of each tool is a good thing.
... View more
03-20-2017
06:25 PM
@Guy Riems Please read this: https://community.hortonworks.com/questions/89641/disk-size-used-is-bigger-than-replication-number-m.html#answer-89728 To know what is available, what you need is the number of blocks unused and multiply that with the size of the block. If your replication factor is 3 then quantify the number of blocks unused x size/block in all your data nodes and divide by three. To know what is used, it is the same: number of blocks used x size/block, but your blocks are most likely at < 100% ... +++ If this helped, please vote/accept as best answer.
... View more
03-20-2017
06:00 PM
2 Kudos
@dvt isoft Not necessarily. That would be only if your blocks will be 100% filled with data. Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%. Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file. That's why loading small files could be a waste. Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above. You need to understand your files, block % usage etc. The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂 +++ If this is helpful please vote and accept as the best answer.
... View more
03-20-2017
05:55 PM
3 Kudos
@Bruce Perez Good to hear. That was a force major solution which worked and it is not a big deal for a sandbox. However, for your learning exercise, re-read your question and my answer. That is helpful when you get out of the sandbox and deal with actual production servers where brute force solutions have implications. You should not change the IP address because Ambari shows something else. I responded to your question as it was stated for a server. When you will do a production installation you will see that is a step for hosts registration. At that time you will have the IP address set. Changes to that IP address are still possible but they could have some ripple effect if not handled properly.
... View more
03-20-2017
05:49 PM
@ Morten R To not mention, especially because you are a beginner, you should start directly with DataFrame instead of RDD. That is where Spark is going ...
... View more
03-20-2017
05:48 PM
1 Kudo
@
Morten R There are so many working examples. Sincerely, only you can debug your code. I suggest you take an easier path using Phoenix like any JDBC driver with all its SQL capability. It will simplify your code and it will also take advantage of the Phoenix distributed nature out of box. https://github.com/apache/phoenix/tree/master/phoenix-spark
... View more
03-20-2017
05:43 PM
2 Kudos
@Bruce Perez I suspect a refresh problem between hosts file content ... Hopefully, you server has a static IP address because if that is not the case that could be a continuous problem. Open a terminal window and try first to stop and start Ambari agent #ambari-agent stop #ambari-agent start Check if that fixed the problem. Then try to stop and start Ambari server #ambari-server stop #ambari-server start Anyhow, host IP address is not something managed by Ambari. It supposed to display the actual IP address of the server. You were not supposed to change the server IP address to match what Ambari UI shows. It supposed to have Ambari UI properly show what the server IP address is set 🙂 ++++ Let me know if this helped.
... View more