Member since
05-02-2019
319
Posts
145
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7110 | 06-03-2019 09:31 PM | |
1723 | 05-22-2019 02:38 AM | |
2174 | 05-22-2019 02:21 AM | |
1358 | 05-04-2019 08:17 PM | |
1671 | 04-14-2019 12:06 AM |
06-04-2019
01:06 PM
You could try to do a single INSERT INTO statement per partition and run as many of these simultaneously as your cluster has resources for.
... View more
06-03-2019
09:31 PM
Yep, create a new one defined the way you want the partitions to be and then insert into that new one using dynamic partitioning and you'll be good to go. Good luck and happy Hadooping.
... View more
05-22-2019
02:38 AM
I would strongly suggest you look at HBase's snapshotting model as detailed at https://hbase.apache.org/book.html#ops.snapshots. The snapshot create process is very fast as it does NOT create a copy of the underlying HFiles on HDFS (just keeps HDFS snapshot "pointers" to them). Then you can use the ExportSnapshot process that will copy the needed underlying HFiles over to the second HBase cluster. This model won't utilize any extra space on the source cluster (well, delete the snapshot once you are done!) or on the target cluster as you'll have to get all those HFiles created which is what this process does. Good luck and happy HBasing!
... View more
05-22-2019
02:21 AM
1 Kudo
As @gdeleon suggested... "that dog won't hunt". Basically, you'll need at least two YARN containers for each Hive user/query going on to house the applicationMaster and another container to start doing some actual work (the first one there is getting their application into the "Running" state). The "Accepted" state means those users were able to get a container for their applicationMasters, but then there isn't enough space for YARN to grant enough actual containers to do much else. Again, it is just isn't designed for this. A better solution would be to let each student have their own HDP Sandbox (and the won't need to allocate 32GB VMs). Good luck and happy Hadooping!
... View more
05-06-2019
08:53 PM
Hey @Matt Clarke, if there is a better way to do this w/o RPG as you suggested in your answer over in https://community.hortonworks.com/questions/245373/nifi-cluster-listensmtp.html, would you have time to update this article to account for that? I point folks to this link all the time. Thanks!
... View more
05-04-2019
08:17 PM
Probably for using HCatalog with can be extremely useful for Pig programmers even if they don't want to use Hive and just leverage this for schema management instead of defining AS clauses in their LOAD commands? Just as likely this is something hard-coded into Ambari? If you really don't want Hive, I bet you can just delete it after installation. For giggles, I stood up an HDFS-only HDP 3.1.0 cluster for https://community.hortonworks.com/questions/245432/is-it-possible-to-install-only-hdfs-on-linux-machi.html?childToView=245544#answer-245544 and just added Pig (required YARN, MR, Tez & ZK, but that makes sense!) and did NOT require Hive to be added as seen below. Good luck and happy Hadooping!
... View more
05-04-2019
06:52 PM
I didn't see such a property when I looked at http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ListFTP/index.html, but some quick solutions to jumpstart this could be to simply replace this processor with a new one (which will have its own state management) or if you using "timestamps" for the "Listing Strategy" property then you could always do a linux "touch" command on the files on the FTP server which should trick the processors to grab them again. Good luck and happy Flowfiling!
... View more
05-01-2019
03:50 PM
It is confusing what triggers this task to run. Do you have any additional info on that or know if there is any way to configure it more precisely?
... View more
04-16-2019
02:52 PM
I need to play with the S3 processors a bit to be more helpful, but wondering if there is any issue getting these files in a NiFi cluster and if you should be marking the pull processor to be an "isolated processor" to run only on the "primary node" as Brian describes in his answer to https://community.hortonworks.com/questions/99328/nifi-isolated-processors-in-a-clustered.html. Worth giving it a try first to see if that's part of the problem.
... View more
04-14-2019
12:06 AM
Not sure how you got into this shape, but the balancer can fix it. https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
... View more