About LesterMartin

LesterMartin · ‎06-04-2019

You could try to do a single INSERT INTO statement per partition and run as many of these simultaneously as your cluster has resources for.

LesterMartin · ‎06-03-2019

Yep, create a new one defined the way you want the partitions to be and then insert into that new one using dynamic partitioning and you'll be good to go. Good luck and happy Hadooping.

LesterMartin · ‎05-22-2019

I would strongly suggest you look at HBase's snapshotting model as detailed at https://hbase.apache.org/book.html#ops.snapshots. The snapshot create process is very fast as it does NOT create a copy of the underlying HFiles on HDFS (just keeps HDFS snapshot "pointers" to them). Then you can use the ExportSnapshot process that will copy the needed underlying HFiles over to the second HBase cluster. This model won't utilize any extra space on the source cluster (well, delete the snapshot once you are done!) or on the target cluster as you'll have to get all those HFiles created which is what this process does. Good luck and happy HBasing!

LesterMartin · ‎05-22-2019

As @gdeleon suggested... "that dog won't hunt". Basically, you'll need at least two YARN containers for each Hive user/query going on to house the applicationMaster and another container to start doing some actual work (the first one there is getting their application into the "Running" state). The "Accepted" state means those users were able to get a container for their applicationMasters, but then there isn't enough space for YARN to grant enough actual containers to do much else. Again, it is just isn't designed for this. A better solution would be to let each student have their own HDP Sandbox (and the won't need to allocate 32GB VMs). Good luck and happy Hadooping!

LesterMartin · ‎05-06-2019

Hey @Matt Clarke, if there is a better way to do this w/o RPG as you suggested in your answer over in https://community.hortonworks.com/questions/245373/nifi-cluster-listensmtp.html, would you have time to update this article to account for that? I point folks to this link all the time. Thanks!

LesterMartin · ‎05-04-2019

Probably for using HCatalog with can be extremely useful for Pig programmers even if they don't want to use Hive and just leverage this for schema management instead of defining AS clauses in their LOAD commands? Just as likely this is something hard-coded into Ambari? If you really don't want Hive, I bet you can just delete it after installation. For giggles, I stood up an HDFS-only HDP 3.1.0 cluster for https://community.hortonworks.com/questions/245432/is-it-possible-to-install-only-hdfs-on-linux-machi.html?childToView=245544#answer-245544 and just added Pig (required YARN, MR, Tez & ZK, but that makes sense!) and did NOT require Hive to be added as seen below. Good luck and happy Hadooping!

LesterMartin · ‎05-04-2019

I didn't see such a property when I looked at http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ListFTP/index.html, but some quick solutions to jumpstart this could be to simply replace this processor with a new one (which will have its own state management) or if you using "timestamps" for the "Listing Strategy" property then you could always do a linux "touch" command on the files on the FTP server which should trick the processors to grab them again. Good luck and happy Flowfiling!

LesterMartin · ‎05-01-2019

It is confusing what triggers this task to run. Do you have any additional info on that or know if there is any way to configure it more precisely?

LesterMartin · ‎04-16-2019

I need to play with the S3 processors a bit to be more helpful, but wondering if there is any issue getting these files in a NiFi cluster and if you should be marking the pull processor to be an "isolated processor" to run only on the "primary node" as Brian describes in his answer to https://community.hortonworks.com/questions/99328/nifi-isolated-processors-in-a-clustered.html. Worth giving it a try first to see if that's part of the problem.

LesterMartin · ‎04-14-2019

Not sure how you got into this shape, but the balancer can fix it. https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

Online	Offline
Last Visited	‎03-04-2021 02:39 PM

Member Since	‎05-02-2019 12:59 PM
Last Visited	‎03-04-2021 02:39 PM
Posts	319
Kudos received	145

Cloudera Community

Re: How to create partitions on existing Hive tabl...

Re: Copying data from One HBase to another Hbase c...

Re: Number of Concurrent Users on HDP Sandbox in a...

Re: Reason for Hive dependency on PIg during insta...

Re: One datanode nearly full but not the others

Re: How to create partitions on existing Hive tabl...

Re: How to create partitions on existing Hive tabl...

Re: Copying data from One HBase to another Hbase c...

Re: Number of Concurrent Users on HDP Sandbox in a...

Re: How-to: Retrieve files from a SFTP server usin...

Re: Reason for Hive dependency on PIg during insta...

Re: How to initialize a processor via NIFI API

Re: NiFi Monitoring - ControllerStatusReportingTas...

Re: NiFi is not understanding Primary Key constrai...

Re: One datanode nearly full but not the others