Member since
05-02-2019
319
Posts
145
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7170 | 06-03-2019 09:31 PM | |
1744 | 05-22-2019 02:38 AM | |
2194 | 05-22-2019 02:21 AM | |
1382 | 05-04-2019 08:17 PM | |
1684 | 04-14-2019 12:06 AM |
11-08-2016
10:36 PM
That "no such id: sandbox" is the concerning error to me. I hate to ask, but could you please download the zipped VM again and start all over. I'd like to attach the latest setup guide as well, but HCC has a file size limit that is preventing me from doing that. If it doesn't work this next time, please send an email to training-support@hortonworks.com (you can reference this HCC post, too) which will create an more easy to track internal support case for our Training DevOps team to further help you. We could also attach the latest setup guide that way if needed.
... View more
11-08-2016
02:34 PM
2 Kudos
For this Training VM, there is a hidden folder named .sys in the directory you are in above which contains a recreate_sandbox.sh script that can be used to recreate the Docker instance needed and get everything operational again. You can run it as shown below. root@ubuntu:~# cd .sys
root@ubuntu:~/.sys# pwd
/root/.sys
root@ubuntu:~/.sys# ./recreate_sandbox.sh
creating sandbox....
docker stop/waiting
docker start/running, process 5857
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
sandbox started at 172.17.0.2
root@ubuntu:~/.sys# date
Tue Sep 20 22:15:10 EDT 2016
root@ubuntu:~/.sys# date
Tue Sep 20 22:19:51 EDT 2016
root@ubuntu:~/.sys# ssh sandbox
The authenticity of host 'sandbox (172.17.0.2)' can't be established.
RSA key fingerprint is 2e:0c:53:b1:d4:06:7d:ab:bd:79:f9:17:08:f2:8a:4b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'sandbox,172.17.0.2' (RSA) to the list of known hosts.
Last login: Sun Dec 20 14:06:45 2015 from ip-172-17-0-1.ec2.internal
[root@sandbox ~]# hdfs dfs -ls /
Found 7 items
drwxrwxrwx - yarn hadoop 0 2015-10-15 09:45 /app-logs
drwxr-xr-x - hdfs hdfs 0 2015-10-15 09:45 /apps
drwxr-xr-x - hdfs hdfs 0 2015-10-15 09:44 /hdp
drwxr-xr-x - mapred hdfs 0 2015-10-15 09:44 /mapred
drwxrwxrwx - mapred hadoop 0 2015-10-15 09:44 /mr-history
drwxrwxrwx - hdfs hdfs 0 2015-10-15 09:46 /tmp
drwxr-xr-x - hdfs hdfs 0 2015-10-20 14:31 /user
[root@sandbox ~]#
NOTE: If it fails again in the future, instead of recreating everything, try the restart_sandbox.sh script instead.
... View more
10-24-2016
07:47 PM
As I don't know the answer to "what do you want to do", I invite you to take a peek at the responses to https://community.hortonworks.com/questions/12787/how-to-integrate-kafka-to-pull-data-from-rdbms.html as it is along the same line of thinking (I believe). Technically, Kafka does have a Connector API, http://kafka.apache.org/documentation.html#connect, which could theoretically could do what you are asking, but I do not know anyone who has done exactly that with Kafka (mostly folks doing more traditional pub/sub clients). As for "in practice", I did a quick Google search for "kafka connect sql server" and found two non open-source solutions that work with Kafka Connect to do what you said, but it doesn't look like there is a completely open-source solution available at the moment. On the Flume front, I think there is only a JDBC Channel, not a source or sink (at least not in 1.5.2 which ships with HDP 2.5). I'm thinking NiFi (aka HDF) and/or Sqoop might be better tools for retrieving data from a RDBMS like SQL Server.
... View more
10-21-2016
05:26 PM
More details can be viewed from the "source" at http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
... View more
10-21-2016
01:32 PM
Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material. On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.html, so that is not yet available via HDP.
... View more
10-21-2016
01:21 PM
1 Kudo
The ALL are; especially when talking about so many files that were under-replicated. Ultimately, the NN is the one who determines if a file is under-replicated. It is then the NN's job to notify one of the DNs that has a good copy of one of the blocks' replicas to copy it to another DN. NN isn't going to do any of the actual movement of bits -- it will just coordinate the whole effort. Hope this helps!
... View more
10-19-2016
10:31 PM
I just peeked in the Training support system (the one triggered by emailing certification@hortonworks.com) and it looks like we're been able to discuss these items with you. I'll send you a quick update from the ones I'm aware of and see if we can fully run to ground your concerns. If I miss anything then please reply to the particular automated email and we'll work to make sure you have answers to your questinos. Thanks!
... View more
10-18-2016
08:09 PM
1 Kudo
You are correct that we do not post any sample questions for the HCA certification, but we will evaluate if that is a logical step for this entry-level certification. You correctly found the objectives at http://hortonworks.com/wp-content/uploads/2016/08/ExamObjectives-HCAssociate.pdf and while these are "vast" as you described, please note that the HCA "provides for individuals an entry point and validates the fundamental skills required to progress to the higher levels of the Hortonworks certification program". If you are comfortable with the materials discussed in the Hadoop Essentials course, http://hortonworks.com/training/class/hadoop-essentials/, then you are an ideal candidate for this examination. Good luck!
... View more
10-17-2016
11:54 AM
The whole goal of having partitions is to allow Hive to limit the files it will have to look at in order to fulfill the SQL request you send into it. On the other hand, you also clearly understand that having too many small files to look at is a performance/scalability drag. With so few number of records for each day, I'd suggest partitioning at the month level (as a single string such as @Joseph Niemiec and @bpreachuk suggest in their answers to https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html). This will allow you to keep your "original" dates as a column and let the partition months be a new virtual column. Of course, you'll need to train/explain to your query writers the benefit of using this virtual column of the partition name in the queries, but will then get the value of partitioning all while having 1/30th of the files and each of them being 30x bigger. Good luck!
... View more
10-06-2016
04:52 PM
@Artem Ervits is right; this would defeat the purpose of the test.
... View more