About bwilson

bwilson · ‎02-14-2017

Thanks! I wanted to confirm before took such a drastic move 🙂 Worked perfectly.

bwilson · ‎02-14-2017

I am aware that there is no way to delete Atlas tags via the UI or REST endpoints. However, I am wondering if there is a simple way to truncate the underlying database or wipe it so we start with what is essentially a fresh Atlas installation?

bwilson · ‎01-25-2017

Hi @Devpriyo Bhattacharya, You cannot install HDP natively on Windows 10. You can, however, run HDP on your Windows 10 laptop via Docker or virtual machines but I expect the behavior to be very unpredictable. You are going to be *very* resource constrained and you will likely experience occasional component failures and/or slow response time. That said, typically I would recommend that you leverage the dockerized sandbox for this situation. However, if you wish to go through the process of a customized installation then you can take the whole thing for a test drive using Docker for Windows. To do this you will need to do the following: Download and install Docker for Windows Launch a CentOS 7 container with appropriate ports opened - most importantly 8080 for Ambari and 10000 for HiveServer2 - you may realize that you need others open later for various UIs and connectivity (e.g., 50070 for the HDFS UI) Connect to the CentOS 7 container and run through the standard Ambari installation process to install your custom single-node HDP installation. I recommend installing the very minimum number of components due to your resource limitation. This will get you a single-node HDP installation running on your laptop that you can use for basic functionality testing. It will be similar to the Sandbox with the exception being that you have hand-selected the components that you wish to install.

bwilson · ‎11-23-2016

@Dagmawi Mengistu Happy to help. If you don't need any more detail then feel free to accept the answer so we can close out the issue. Thanks!

bwilson · ‎11-23-2016

Hi @Dagmawi Mengistu, We do not currently support start/stop of cluster created via HDC. The likely reason that you are seeing the above error is that the local instance storage was chosen to support HDFS at cluster creation. This is short-lived and does not persist through start/stop of instances in EC2. In general, HDC clusters are intended to be for ephemeral workloads. If you want to start and stop the compute resources to control costs then I recommend creating a persistent metastore backed by RDS and providing that when you create new clusters. This way you can spin up and destroy clusters as you see fit and the data can be persisted via the shared metastore and S3. I hope this helps.

bwilson · ‎11-11-2016

Hi @Saminathan A One thing you can do is drop the SplitLine processor and go straight to the ExtractText processor where you can use a regex to pull out the first 5 lines via a regex. Then you can use the groups within that regex to work on the individual groups (e.g., the first 5 lines) in the UpdateAttribute processor. This regex should work for you: ^(.*)\n(.*)\n(.*)\n(.*)\n(.*)\n.*

bwilson · ‎11-02-2016

There is also a good amount of detailing covering all of the knobs and dials related to configuring the Capacity Scheduler here. Note that in the latest versions of Ambari there is a Capacity Scheduler View where you can graphically configure the queues instead of getting into the weeds of the XML.

bwilson · ‎10-26-2016

Hi @Houssam Manik, The big benefit that you get by utilizing snapshots with distCP is that you can do incremental backups when distCP'ing the snapshotted directory in the future by leveraging the differential between the snapshots. Jing provides some context around this in the second answer here. The work to complete this is discussed in HDFS-7535 and some more context is provided there. This was first pulled into Hadoop 2.7.0

bwilson · ‎10-19-2016

Hi @Santhosh B Gowda, Assuming that this is happening on a single JournalNode then you can try the following: As a precaution, stop HDFS. This will shut down all Journalnodes as well. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location. Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node. Start HDFS. This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.

bwilson · ‎10-10-2016

Hi @Simran Kaur, You can still achieve this using out-of-the-box functions in Hive as you mentioned. You just missed getting the string in the right format. For clarity, the basic steps are: Replace the 'T' in the string with a space so the date is in the format expected by Hive. Convert the string to a unix timestamp using the unix_timestamp function. Convert the timestamp to your preferred date format using the from_unixtime function. Here is a quick example you can run in Hive to see the result for the string you provided: select from_unixtime(unix_timestamp(regexp_replace('2016-09-13T06:03:51Z', 'T',' ')), 'dd-MM-yyyy HH-mm-ss'); Notice that the only additional step is the replace operation.

Online	Offline
Last Visited	‎02-15-2022 08:21 PM

Member Since	‎09-14-2015 08:02 PM
Last Visited	‎02-15-2022 08:21 PM
Posts	79
Kudos received	88

Cloudera Community

Re: Install HDP 2.4.2 on my Dual Core, 8GB RAM Win...

Re: HDCloud datanodes unmount after restart of ser...

Re: How to extract first 5 record from flow file u...

Re: Export/Import HDFS snapshots

Re: journal node edit log issue

Re: Is there a way to reset the Atlas database?

Is there a way to reset the Atlas database?

Re: Install HDP 2.4.2 on my Dual Core, 8GB RAM Win...

Re: HDCloud datanodes unmount after restart of ser...

Re: HDCloud datanodes unmount after restart of ser...

Re: How to extract first 5 record from flow file u...

Re: Capacity schedular.

Re: Export/Import HDFS snapshots

Re: journal node edit log issue

Re: convert ISO8601 date in d mm yyhh mm ss format...