Member since
05-02-2019
319
Posts
144
Kudos Received
58
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3720 | 06-03-2019 09:31 PM | |
763 | 05-22-2019 02:38 AM | |
1086 | 05-22-2019 02:21 AM | |
618 | 05-04-2019 08:17 PM | |
802 | 04-14-2019 12:06 AM |
02-20-2019
07:21 PM
I am assuming you have five different Capacity Scheduler queues setup giving each a guaranteed 20% with bursting to 100% (probably better to be a bit under that such as 90% to limit preemption thrashing that could occur in some scenarios). The preemption configuration describes sounds like it is working, but it would only take away 20% of user1's usage so that user2 could get its guarantee of 20%. There are more fine-grained user limit factors within the queues, but it doesn't sounds like what you want either which I understand to mean if 1 user (100%), 2 users (50% each), 3 users (33%), 4 users (25%) and 5 users (20%). With the queue setup I suggested of 20-100% then you would get something like if 1 user (100%), add 2nd user then (user1-80%, user2-20%), add 3rd user (user1-60%, user2-20%, user3-20%), and on down to when adding the 5th user they all get 20%. Of course, that assumes a LOT of things. Generally speaking, the queue has a min guarantee and preemption can help you get hold of that when other queues are consuming more than their min. I do not believe this is a way to get the time-delayed release of resources you asked for above.
... View more
02-20-2019
07:04 PM
Looks like @Bryan Bende already answered this over on https://stackoverflow.com/questions/54791414/how-i-can-use-hbase-2-0-4-version-in-nifi
... View more
01-28-2019
09:10 PM
You can find them at https://github.com/HortonworksUniversity/DevPH_Labs
... View more
01-08-2019
10:03 PM
Did you ever come up with a workable solution for this? If so, would you mind sharing what you settled on? Thanks!
... View more
12-30-2018
08:31 PM
Can you provide a very simple, but indicative, example of what you are looking for? Maybe a few rows and details of what you would be looking for. As you already know, the filter language documented at https://hbase.apache.org/book.html#thrift.filter_language will end up scanning everything. And also as you know, creating another table whose rowkey is aligned with your query you want to run fast will take effort to keep in sync as you upsert your data. There's always Phoenix is you want to stay completely in HBase, but maybe a hybrid of HBase and Solr might work. Again, my a simplified example could help.
... View more
12-01-2018
07:31 PM
1 Kudo
If their times are in sync now, I'm not sure of any inherent problems that would prevent you from starting them back up. Good luck & happy Hadooping!
... View more
12-01-2018
07:09 PM
The "Cadillac Answer" from Hortonworks is to use Data Lifecycle Manager, https://hortonworks.com/products/data-platforms/dataplane/data-lifecycle-manager/, as it handles Hive replication as documented at https://docs.hortonworks.com/HDPDocuments/DLM1/DLM-1.2.0/administration/content/dlm_replication_of_data_using_hive.html. Hive does not natively have the same cluster-to-cluster replication features that HBase has.
... View more
12-01-2018
07:05 PM
See similar question at https://community.hortonworks.com/questions/47798/hbase-graphical-client.html for some ideas. Good luck & happy Hadooping!
... View more
04-27-2018
11:12 AM
Surely NOT the same issue, but along this line of buggy behavior in the HDP Sandbox (2.6.0.3) using Hive and getting messages mentioning hostnames sandbox and sandbox.hortonworks.com, I got this message a few times. FAILED: SemanticException Unable to determine if hdfs://sandbox.hortonworks.com:8020/user/root/salarydata is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://sandbox.hortonworks.com:8020/user/root/salarydata, expected: hdfs://sandbox:8020 It seems to go away if I just exit the SSH connection and establish it again.
... View more
04-26-2018
10:59 PM
SANDBOX VERSION AFFECTED HDP 2.6.0.3 Sandbox as identified below. # wget https://downloads-hortonworks.akamaized.net/sandbox-hdp-2.6/HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
# md5sum HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
886845a5e2fc28f773c59dace548e516 HDP_2.6_docker_05_05_2017_15_01_40.tar.gz ISSUE When using classic Hive CLI after a while the following error surfaces. [root@sandbox demos]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:547)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:438)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:690)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:631)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:530)
... 8 more
Caused by: java.net.UnknownHostException: sandbox
... 21 more
[root@sandbox demos]# RESOLUTION Modify /etc/hosts to allow sandbox to be resolved just as sandbox.hortonworks.com does. [root@sandbox ~]# cat /etc/hosts
127.0.0.1localhost
::1localhost ip6-localhost ip6-loopback
fe00::0ip6-localnet
ff00::0ip6-mcastprefix
ff02::1ip6-allnodes
ff02::2ip6-allrouters
172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]# cp /etc/hosts /tmp/
[root@sandbox ~]# vi /etc/hosts
[root@sandbox ~]# diff /etc/hosts /tmp/hosts
7c7
< 172.17.0.2sandbox.hortonworks.com sandbox
---
> 172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]#
... View more
- Find more articles tagged with:
- hdp-2.6
- Hive
- Issue Resolution
- issue-resolution
- Sandbox
- Sandbox & Learning
02-13-2018
11:02 PM
I'm guessing we are having the same problem. I downloaded the 3.0.2 .ova file for VirtualBox with the following MD5 hash. Then validated my download's hash lined up nicely. HW13005:HDF302 lmartin$ ls -l
total 20381776
-rw-------@ 1 lmartin staff 10427907072 Feb 5 17:08 HDF_3.0.2.0_virtualbox_01_18_2018.ova
HW13005:HDF302 lmartin$ md5 HDF_3.0.2.0_virtualbox_01_18_2018.ova
MD5 (HDF_3.0.2.0_virtualbox_01_18_2018.ova) = eaadf00b49bf7f088e4f7f3d30adbaa5
HW13005:HDF302 lmartin$ But got the following error when following the "import appliance" instructions at https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/1/. Is this what you encountered? I'll see if I can reach the Sandbox team for them to investigate a bit and let us know.
... View more
02-13-2018
10:44 PM
1 Kudo
I'm guessing you've already seen http://hbase.apache.org/0.94/book/secondary.indexes.html which basically is telling you that you'll need to have a second table whose rowkey is your "secondary index" and is only being used to find the rowkey needed for the actual table. The coprocessor strategy, as I understand it, is to just formalize & automate the "dual-write secondary index" strategy. Good luck and happy Hadooping!
... View more
02-13-2018
10:38 PM
1 Kudo
While I don't want to oversimplify this process nor not suggest that Hortonworks Professional Services doesn't do these conversions with customers all the time (there is often more at play than simply moving the data, such as testing apps before & after), but... you can leverage DistCp, https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html, as your tool to move the data from your original cluster to your new one. For the HBase data, I'd look to its Snapshots feature, http://hbase.apache.org/book.html#ops.snapshots, including its ability to export the snapshot to another cluster, as a solid approach. Good luck and happy Hadooping!
... View more
02-13-2018
10:27 PM
Theoretically... yes, that should work. I'd stop HDFS as you are thinking and then get the contents of /hadoop/hdfs/data into /HDFS (might just leave it there as a fall back!!) and then update the dfs.datanode.data.dir property to now point to /HDFS instead of the /hadoop/hdfs/data default location. Using Ambari, you can find it as identified by the red arrows in the attached screenshot. After Ambari change is made and pushed to the datanodes, you can start HDFS back up and see if worked well or not. Again, theoretically should work, but if this was your production system, I'd do a dry run on another cluster (could do that on a single node psuedo-cluster) to gain some confidence that all would work well. Good luck and happy Hadooping!
... View more
02-13-2018
09:14 PM
1 Kudo
It looks like you are only letting YARN use 25GB's of your worker nodes' 64GB as well as only 6 of your 16 CPU cores, so these values should be raised. Check out details at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html for a script that can help you set some baseline values for these properties. As for the Spark jobs. Interestingly enough, each of these jobs is requesting a certain size and number of containers and I'm betting each job is a bit different. Since Spark jobs get their resources first, it would seem normal that a specific job (as long as the resource request doesn't change nor does the fundamental dataset size for input) take a comparable time to run from invocation to invocation. Surely, that isn't necessarily the case from different Spark jobs which may be doing entirely different things. Good luck and happy Hadooping/Sparking!
... View more
02-13-2018
09:07 PM
I'm thinking your best course of action is to create your new HDP cluster on AWS and then use DistCp, https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html, as your tool to move the data from your original cluster to your new one on AWS.
... View more
02-13-2018
09:04 PM
Your Windows machine has the "hadoop fs" client installed on it and has proper configuration files to find the NameNode? Can you get the output that shows what happens when you run a command such as "hadoop fs -put /my/localfile.txt /user/myhomedir/newfolder/remotefile.txt"?
... View more
01-12-2018
09:02 PM
https://stackoverflow.com/questions/45100487/how-data-is-split-into-part-files-in-sqoop can start to explain more, but ultimately (and thanks to the power of open-source) you'll have to go look for yourself - you can find source code at https://github.com/apache/sqoop. Good luck and happy Hadooping!
... View more
12-15-2017
07:46 PM
Not sure I ever thought I'd see "traditional" and "NoSQL" in the same sentence. 😉 Seriously, @Timothy Spann is correct that HBase & Phoenix are part of HDP (thus easily stood-up and managed if you run HDP) and available for your use AND @Harald Berghoff is also correct that you don't "need to use" HBase -- if your solution is solved best with another (notice I didn't say "traditional" -- wasn't HBase here before MongoDB anyways; hehe! I ~think~ they started in 2006 and 2007, respectively) NoSQL database then by all means leverage it. Good luck and happy Hadooping (or whatever Big Data-y framework you are using)!!
... View more
12-09-2017
08:45 PM
From looking at your RM UI it sure looks like both of these jobs are basically fighting each other to get running. Meaning, the AppMaster containers are running, but they can't get anymore more containers to be run from YARN. My recommendation would be to give the VM 10GB of memory (that's how I run it on my 16GB laptop) when you restart it. I'd also try to run it from the command line just to take the Ambari View out of the picture, but if you want to run it in Ambari then kill any application via the RM UI that is around should it hang again. Good luck and happy Hadooping!
... View more
11-01-2017
08:26 PM
Unfortunately, it is a bit more complicated than all of that. In general, Spark is lazy executed so depending on what you do even the "temp view" tables/DataFrame(Set) may not stay around from DAG to DAG. There is an explicit cache method you can use on a DataFrame(Set), but even then you may be trying to cache something that simply won't fit in memory. No worries, Spark assumes that your DF(S)/RDD collections won't fit and it inherently handles this. I'm NOT trying to sell you on anything, but probably some deeper learnings could help you. I'm a trainer here at Hortonworks (and again, not really trying to sell you something, but pointing to a resource/opportunity) and we spend several days building up this knowledge in our https://hortonworks.com/services/training/class/hdp-developer-enterprise-spark/ class). Again, apologies for being a salesperson, but my general thought was there's still a bit more to learn for you on Spark internals that might take some more interactive ways of building up that knowledge.
... View more
11-01-2017
08:17 PM
Sounds like the answer I gave over at https://community.hortonworks.com/questions/139184/sandbox-26-unable-to-start-failed-to-start-crash-r.html?childToView=139428#answer-139428 -- good luck and happy Hadooping!
... View more
10-31-2017
09:51 PM
Generically speaking, yes, I'd just run the query that is built upon your Hive tables as Spark SQL is going to "figure out" what it needs to do in its optimizer before doing any work anyway. If the performance is within your SLA then I'd just go with that, but of course, you could always then use that as a baseline to do some comparisons with if/when you do some other approaches in your code. Happy Hadooping (eh hem... Sparking!) and good luck!
... View more
10-26-2017
10:56 AM
Yes, that's my experience I described in my earlier "each time ambari is started, so the last time it happened I logged onto the ambari UI and was able to start up the services individually and all came up" comment. Meaning that only Ambari will be running and you can then via that UI start up the services. I'm a bit pessimistic, so I each time I do this I walk them down from the top (starting with HDFS) and start up each service (well... only the ones I'm about to use as this little VM is still very resource starved) individually instead of using the "start all" feature.
... View more
09-27-2017
06:42 PM
Awesome! Was it what I suggested above, or something else?
... View more
09-27-2017
04:54 PM
2 Kudos
My 2.6.1 Sandbox for VirtualBox hangs on the second startup (tested it twice) usually as shown in the attached screenshot (twice I got something like yours, too). Each time ambari is started, so the last time it happened I logged onto the ambari UI and was able to start up the services individually and all came up. Can you check to see if you are have the same "false positive" (or "mostly done") errors I have been having and verify if the HDP cluster does come up? It all worked fine for me after that. Good luck and happy Hadooping!
... View more
09-04-2017
09:20 PM
Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training. Additionally, Hadoop: The Definitive Guide guide, https://smile.amazon.com/Definitive-version-revised-English-Chinese/dp/7564159170/, is still a very good resource.
... View more
09-04-2017
09:19 PM
Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training as a good start.
... View more
09-04-2017
09:18 PM
The 1-day essentials course is available for free at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training in a self-paced format. Enjoy and good luck on the exam!
... View more
09-04-2017
09:15 PM
As https://hortonworks.com/services/training/certification/hca-certification/ states, "the HCA certification is a multiple-choice exam that consists of 40 questions with a passing score of 75%". Good luck!!
... View more