Member since
05-02-2019
319
Posts
145
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6997 | 06-03-2019 09:31 PM | |
1671 | 05-22-2019 02:38 AM | |
2122 | 05-22-2019 02:21 AM | |
1321 | 05-04-2019 08:17 PM | |
1627 | 04-14-2019 12:06 AM |
04-26-2018
10:59 PM
SANDBOX VERSION AFFECTED HDP 2.6.0.3 Sandbox as identified below. # wget https://downloads-hortonworks.akamaized.net/sandbox-hdp-2.6/HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
# md5sum HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
886845a5e2fc28f773c59dace548e516 HDP_2.6_docker_05_05_2017_15_01_40.tar.gz ISSUE When using classic Hive CLI after a while the following error surfaces. [root@sandbox demos]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:547)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:438)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:690)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:631)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:530)
... 8 more
Caused by: java.net.UnknownHostException: sandbox
... 21 more
[root@sandbox demos]# RESOLUTION Modify /etc/hosts to allow sandbox to be resolved just as sandbox.hortonworks.com does. [root@sandbox ~]# cat /etc/hosts
127.0.0.1localhost
::1localhost ip6-localhost ip6-loopback
fe00::0ip6-localnet
ff00::0ip6-mcastprefix
ff02::1ip6-allnodes
ff02::2ip6-allrouters
172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]# cp /etc/hosts /tmp/
[root@sandbox ~]# vi /etc/hosts
[root@sandbox ~]# diff /etc/hosts /tmp/hosts
7c7
< 172.17.0.2sandbox.hortonworks.com sandbox
---
> 172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]#
... View more
02-13-2018
10:44 PM
1 Kudo
I'm guessing you've already seen http://hbase.apache.org/0.94/book/secondary.indexes.html which basically is telling you that you'll need to have a second table whose rowkey is your "secondary index" and is only being used to find the rowkey needed for the actual table. The coprocessor strategy, as I understand it, is to just formalize & automate the "dual-write secondary index" strategy. Good luck and happy Hadooping!
... View more
02-13-2018
10:38 PM
1 Kudo
While I don't want to oversimplify this process nor not suggest that Hortonworks Professional Services doesn't do these conversions with customers all the time (there is often more at play than simply moving the data, such as testing apps before & after), but... you can leverage DistCp, https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html, as your tool to move the data from your original cluster to your new one. For the HBase data, I'd look to its Snapshots feature, http://hbase.apache.org/book.html#ops.snapshots, including its ability to export the snapshot to another cluster, as a solid approach. Good luck and happy Hadooping!
... View more
02-13-2018
09:14 PM
1 Kudo
It looks like you are only letting YARN use 25GB's of your worker nodes' 64GB as well as only 6 of your 16 CPU cores, so these values should be raised. Check out details at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html for a script that can help you set some baseline values for these properties. As for the Spark jobs. Interestingly enough, each of these jobs is requesting a certain size and number of containers and I'm betting each job is a bit different. Since Spark jobs get their resources first, it would seem normal that a specific job (as long as the resource request doesn't change nor does the fundamental dataset size for input) take a comparable time to run from invocation to invocation. Surely, that isn't necessarily the case from different Spark jobs which may be doing entirely different things. Good luck and happy Hadooping/Sparking!
... View more
01-12-2018
09:02 PM
https://stackoverflow.com/questions/45100487/how-data-is-split-into-part-files-in-sqoop can start to explain more, but ultimately (and thanks to the power of open-source) you'll have to go look for yourself - you can find source code at https://github.com/apache/sqoop. Good luck and happy Hadooping!
... View more
12-09-2017
08:45 PM
From looking at your RM UI it sure looks like both of these jobs are basically fighting each other to get running. Meaning, the AppMaster containers are running, but they can't get anymore more containers to be run from YARN. My recommendation would be to give the VM 10GB of memory (that's how I run it on my 16GB laptop) when you restart it. I'd also try to run it from the command line just to take the Ambari View out of the picture, but if you want to run it in Ambari then kill any application via the RM UI that is around should it hang again. Good luck and happy Hadooping!
... View more
11-01-2017
08:26 PM
Unfortunately, it is a bit more complicated than all of that. In general, Spark is lazy executed so depending on what you do even the "temp view" tables/DataFrame(Set) may not stay around from DAG to DAG. There is an explicit cache method you can use on a DataFrame(Set), but even then you may be trying to cache something that simply won't fit in memory. No worries, Spark assumes that your DF(S)/RDD collections won't fit and it inherently handles this. I'm NOT trying to sell you on anything, but probably some deeper learnings could help you. I'm a trainer here at Hortonworks (and again, not really trying to sell you something, but pointing to a resource/opportunity) and we spend several days building up this knowledge in our https://hortonworks.com/services/training/class/hdp-developer-enterprise-spark/ class). Again, apologies for being a salesperson, but my general thought was there's still a bit more to learn for you on Spark internals that might take some more interactive ways of building up that knowledge.
... View more
10-31-2017
09:51 PM
Generically speaking, yes, I'd just run the query that is built upon your Hive tables as Spark SQL is going to "figure out" what it needs to do in its optimizer before doing any work anyway. If the performance is within your SLA then I'd just go with that, but of course, you could always then use that as a baseline to do some comparisons with if/when you do some other approaches in your code. Happy Hadooping (eh hem... Sparking!) and good luck!
... View more
09-04-2017
09:20 PM
Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training. Additionally, Hadoop: The Definitive Guide guide, https://smile.amazon.com/Definitive-version-revised-English-Chinese/dp/7564159170/, is still a very good resource.
... View more
09-04-2017
09:19 PM
Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training as a good start.
... View more