About cstanca

cstanca · ‎03-30-2017

@Sree Kupp Have you tried to set the num_executors=46 in the session variables of the JDBC connection string? JDBC URL connection string has the following format: jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars> Try to set <sessionConfs> parameter as: num_executors=46; As you know this use is not documented, nor supported by HWX or CDH. I like to use Hive/LLAP instead: https://cwiki.apache.org/confluence/display/Hive/LLAP http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html This is supported and it is very promising. Additionally, due to the new features included in HDP 2.6 to be released in the next a few weeks, it will be generally available and a definite option for Enterprise Data Warehouse Optimization and a single pane of SQL on Hadoop, with ANSI SQL 2011 compliance in the very near future.

cstanca · ‎03-30-2017

It seems that the following improvement addressed the requirement in v 2.6. https://issues.apache.org/jira/browse/YARN-1051

cstanca · ‎03-21-2017

@Boris Demerov Stay tuned. Sometime April.

cstanca · ‎03-20-2017

@P D That is the usual QA step. Pick and choose from here: https://github.com/aengusrooneyhortonworks/HadoopBenchmarks If you use HDFS, Hive, HBase, choose those applicable. At the minimum you could hive test-bench and teragen/terasort, and maybe one for HBase. You could do those, but it may take time. You could just login to Hive and run some queries. Then log to HBase and perform usual commands using hbase-shell and you could also run SQL via Phoenix. This is a smoke test suite that you could build for the upgrades. You may have to include tests for all the tools in the ecosystem. There will be Storm topologies that you have to handle. There will be Spark jobs that you have to test etc. A test plan of each tool is a good thing.

cstanca · ‎03-20-2017

@Guy Riems Please read this: https://community.hortonworks.com/questions/89641/disk-size-used-is-bigger-than-replication-number-m.html#answer-89728 To know what is available, what you need is the number of blocks unused and multiply that with the size of the block. If your replication factor is 3 then quantify the number of blocks unused x size/block in all your data nodes and divide by three. To know what is used, it is the same: number of blocks used x size/block, but your blocks are most likely at < 100% ... +++ If this helped, please vote/accept as best answer.

cstanca · ‎03-20-2017

@dvt isoft Not necessarily. That would be only if your blocks will be 100% filled with data. Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%. Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file. That's why loading small files could be a waste. Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above. You need to understand your files, block % usage etc. The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂 +++ If this is helpful please vote and accept as the best answer.

cstanca · ‎03-20-2017

@Bruce Perez Good to hear. That was a force major solution which worked and it is not a big deal for a sandbox. However, for your learning exercise, re-read your question and my answer. That is helpful when you get out of the sandbox and deal with actual production servers where brute force solutions have implications. You should not change the IP address because Ambari shows something else. I responded to your question as it was stated for a server. When you will do a production installation you will see that is a step for hosts registration. At that time you will have the IP address set. Changes to that IP address are still possible but they could have some ripple effect if not handled properly.

cstanca · ‎03-20-2017

@ Morten R To not mention, especially because you are a beginner, you should start directly with DataFrame instead of RDD. That is where Spark is going ...

cstanca · ‎03-20-2017

@ Morten R There are so many working examples. Sincerely, only you can debug your code. I suggest you take an easier path using Phoenix like any JDBC driver with all its SQL capability. It will simplify your code and it will also take advantage of the Phoenix distributed nature out of box. https://github.com/apache/phoenix/tree/master/phoenix-spark

cstanca · ‎03-20-2017

@Bruce Perez I suspect a refresh problem between hosts file content ... Hopefully, you server has a static IP address because if that is not the case that could be a continuous problem. Open a terminal window and try first to stop and start Ambari agent #ambari-agent stop #ambari-agent start Check if that fixed the problem. Then try to stop and start Ambari server #ambari-server stop #ambari-server start Anyhow, host IP address is not something managed by Ambari. It supposed to display the actual IP address of the server. You were not supposed to change the server IP address to match what Ambari UI shows. It supposed to have Ambari UI properly show what the server IP address is set 🙂 ++++ Let me know if this helped.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Setting num_executors on Spark2 in Ambari

Re: How to setup time-based queue capacity?

Re: What's the estimated date for HDP 2.6 release?

Re: Which specific tests we suppose to run during ...

Re: Hadoop Space Calculation,,How can I calculate ...

Re: Disk size used is bigger than replication numb...

Re: How to Change Host IP in Ambari View

Re: Spark On Hbase Read (Java)

Re: Spark On Hbase Read (Java)

Re: How to Change Host IP in Ambari View