Member since
05-02-2019
319
Posts
145
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7170 | 06-03-2019 09:31 PM | |
1744 | 05-22-2019 02:38 AM | |
2195 | 05-22-2019 02:21 AM | |
1383 | 05-04-2019 08:17 PM | |
1684 | 04-14-2019 12:06 AM |
12-05-2016
07:32 PM
Is anyone aware of any plans for integrations between Knox and Accumulo?
... View more
Labels:
- Labels:
-
Apache Accumulo
-
Apache Knox
11-24-2016
12:17 AM
+1 on suggestion #1
... View more
11-22-2016
04:52 PM
1 Kudo
Yes, there was some messaging early this year (that it seems the Spring folks replicated on their site) which indicated a change in the certification program, but it was determined that the planned changes were being released too fast and did not give enough time for folks already preparing for the existing certification exams to complete them. It is imaginable that the certification program will continue to evolve, but our intention is to ensure adequate time to adjust will be offered whenever any future plans are introduced. Good luck on the HDPCD exam!
... View more
11-18-2016
04:49 PM
1 Kudo
Nope. There are no certification prerequisites for HDPCD. Good luck on the exam!!
... View more
11-18-2016
06:53 AM
5 Kudos
Most of the answers you are looking for are explained in http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_parallelism, but here's my 1-2-3 answers to your questions. Absolutely, Sqoop is building a SQL query (actually one for each mapper) to the source table it is ingesting into HDFS from. The number of mappers (default is four, but you can override) leverage the split-by column and basically Sqoop tries to build an intelligent set of WHERE clauses so that each of the mappers have a logical "slice" of the target table. As and example, if we used three mappers and a split-by column that is an integer with ranges from 0 to 1,000,000 for the actual data (i.e. sqoop can do a pretty easy min and max call to the DB on the split-by column), then Sqoop first mapper would try to get values 0-333333, the second mapper would pull 333334-666666, and the last would grab 666667-1000000. Nope, Sqoop is running a map-only job which each mapper (3 in my example above) running a query with a specific range to prevent any kind of overlap. The mapper then just drops the data in the target-dir HDFS directory with a file named part-m-00000 (well, the 2nd on ends with 00001 and the 3rd one ends with 00002). The composite export is represented by the target-dir HDFS directory (basically follows the MapReduce naming scheme of files). I'm hoping your question about parallelism makes sense now. I'm hopeful this helps out some. As with everything, some simple testing on your own will help it all make sense. As for an architectural diagram, check out the image (and additional details) at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html which might aid in your understanding. Happy Hadooping!!
... View more
11-15-2016
02:36 PM
For the HCA exam, there are no sample tests. Please review the objective link I provided above. The Essentials course is also offered as a one-day public training pretty regularly as seen at https://ilearning.seertechsolutions.com/lmt/clmsCatalogSummary.prMain?site=hw&in_region=hw and we are considering whether to offer this as a free on-demand offer as well.
... View more
11-11-2016
01:25 PM
1 Kudo
As with all "interesting" questions like this, the best answer is to try it and see for yourself. My hypothesis was that sqoop would report these directives are incompatible with each other and I was glad to see that was what happened when I gave it a try myself. [root@sandbox Lab3.1]# sqoop import --connect jdbc:mysql://sandbox/test?user=root --table salaries --columns gender,age --query "select * from salaries s where s.salary > 90000.00 and \$CONDITIONS" --split-by gender -m 2 --target-dir willItWork
16/11/11 08:22:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950
Cannot specify --query and --table together.
Try --help for usage instructions.
[root@sandbox Lab3.1]#
... View more
11-11-2016
01:03 PM
1 Kudo
I do not believe you can use Ambari to configure this, but the manual instructions at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-installation/content/install_kafka_rpms.html call out some notes on how to approach this "not recommended" strategy. I guess a member of the Support team would have to be solicited for a more definitive answer, but... it sounds like it is supported.
... View more
11-10-2016
01:17 PM
2 Kudos
User Defined Functions (UDF) come to the rescue. Search for "Filter Functions" in http://pig.apache.org/docs/r0.15.0/udf.html and you'll see a rough example of how to do this. Now, your "isEmpty" (or whatever you call the function) will be implemented differently. In your's, you would need to walk each element and check for null. If all of the row's (called "input" in that example UDF) fields are null then you can ultimately return a boolean value that can be used in your code (after you build the UDF). If this is your first Pig UDF, there are plenty of examples on the internet; including mine at https://martin.atlassian.net/wiki/x/C4BRAQ. Good luck!
... View more
11-09-2016
01:47 PM
The self-paced library's setup guide is current; I was using that as a ruse to help me move this concern over to our internal tracking system. 😉 Yes, if "hdfs dfs -ls /" is responding, then by all means march forward. If at some point it stops working (i.e. these VM's don't really like to be stopped & started) then please try the restart_sandbox.sh script mentioned earlier with the recreate_sandbox.sh as a "nuclear option". Good luck!
... View more