About SK1

jyadav · ‎05-18-2016

@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.

SK1 · ‎05-30-2016

@Sowmya Ramesh @Benjamin Leonhardi, I found the solution for this issue. Actually Before upgrade the value for "oozie.wf.rerun.failnodes" was "false". But after upgrade to HDP-2.3.4, value for "oozie.wf.rerun.failnodes" is "true",so that only failed action node in Oozie workflow instance run thus to prevent the rerun of successful action in Oozie. it is required to set following property in properties section in Process entity. <property name="oozie.wf.rerun.failnodes" value="false"/>

SK1 · ‎05-13-2016

Thanks @Sowmya Ramesh, I will try and will let you know.

sburagohain · ‎04-28-2017

This is a good article by our intern James Medel to protect against accidental deletion: USING HDFS SNAPSHOTS TO PROTECT IMPORTANT ENTERPRISE DATASETS Sometime back, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors. HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are: Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree Scalable: Snapshots do not create extra copies of blocks on the file system. Snapshots are highly optimized in memory and stored along with the NameNode’s file system namespace In this blog post we’ll walk through how to administer and use HDFS snapshots. ENABLE SNAPSHOTS In an example scenario, Web Server logs are being loaded into HDFS on a daily basis for processing and long term storage. The logs are loaded in a few times a day, and the dataset is organized into directories that hold log files per day in HDFS. Since the Web Server logs are stored only in HDFS, it’s imperative that they are protected from deletion. /data/weblogs /data/weblogs/20130901 /data/weblogs/20130902 /data/weblogs/20130903 In order to provide data protection and recovery for the Web Server log data, snapshots are enabled for the parent directory: hdfs dfsadmin -allowSnapshot /data/weblogs Snapshots need to be explicitly enabled for directories. This provides system administrators with the level of granular control they need to manage data in HDP. TAKE POINT IN TIME SNAPSHOTS The following command creates a point in time snapshot of the /data/weblogs/directory and its subtree: hdfs dfs -createSnapshot /data/weblogs This will create a snapshot, and give it a default name which matches the timestamp at which the snapshot was created. Users can provide an optional snapshot name instead of the default. With the default name, the created snapshot path will be: /data/weblogs/.snapshot/s20130903-000941.091. Users can schedule a CRON job to create snapshots at regular intervals. Example, when you run CRON job: 30 18 * * * rm /home/someuser/tmp/*, the comand tells your file system to run the content from the tmp folder at 18:30 every day. For instance, to integrate CRON jobs with HDFS snapshots, run the command: 30 18 * * * hdfs dfs -createSnapshot /data/weblogs/* to schedule Snapshots to be created each day at 6:30. To view the state of the directory at the recently created snapshot: hdfs dfs -ls /data/weblogs/.snapshot/s20130903-000941.091 Found3 items drwxr-xr-x - web hadoop 02013-09-0123:59/data/weblogs/.snapshot/s20130903-000941.091/20130901 drwxr-xr-x - web hadoop 02013-09-0200:55/data/weblogs/.snapshot/s20130903-000941.091/20130902 drwxr-xr-x - web hadoop 02013-09-0323:57/data/weblogs/.snapshot/s20130903-000941.091/20130903 RECOVER LOST DATA As new data is loaded into the web logs dataset, there could be an erroneous deletion of a file or directory. For example, an application could delete the set of logs pertaining to Sept 2nd, 2013 stored in the /data/weblogs/20130902 directory. Since /data/weblogs has a snapshot, the snapshot will protect from the file blocks being removed from the file system. A deletion will only modify the metadata to remove /data/weblogs/20130902 from the working directory. To recover from this deletion, data is restored by copying the needed data from the snapshot path: hdfs dfs -cp /data/weblogs/.snapshot/s20130903-000941.091/20130902/data/weblogs/ This will restore the lost set of files to the working data set: hdfs dfs -ls /data/weblogs Found3 items drwxr-xr-x - web hadoop 02013-09-0123:59/data/weblogs/20130901 drwxr-xr-x - web hadoop 02013-09-0412:10/data/weblogs/20130902 drwxr-xr-x - web hadoop 02013-09-0323:57/data/weblogs/20130903 Since snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail: hdfs dfs -rmdir /data/weblogs/.snapshot/s20130903-000941.091/20130902 NEXT STEPS With HDP 2.1, you can use snapshots to protect your enterprise data from accidental deletion, corruption and errors. Download HDP to get started.

PARTOMIA · ‎11-03-2016

@Saurabh Try doing : set hive.exec.scratchdir=/new_dir

SK1 · ‎04-12-2016

Thanks @Benjamin Leonhardi

bleonhardi · ‎04-05-2016

So Cascade works because he forces the delete of all objects belonging to that object ( similar to delete ... cascade for row deletes ) Now the question is why your drop function did not work. And I don't know we might have to look into logs to figure that out. But I have seen flakiness with functions in hive before on an older version of hive. So it might be just a bug or a restart required or something. But again without logs hard to say.

raunaqsharma · ‎08-03-2016

The issue can be resolved in lower versions also by using the distribute by keyword in the query

SK1 · ‎03-04-2016

@Shishir Saxena: Thanks for reply. Actually when I tried above location then it failed like below as expected. root@m1 ~]# hadoop fs -ls jceks://hdfs/user/ ls: No FileSystem for scheme: jceks But when I did ls to my user inside hdfs then it listed out that file. [root@m1 ~]# hadoop fs -ls /user/root/ Found 6 items drwxr-xr-x - root hdfs 0 2016-01-25 23:30 /user/root/.hiveJars drwx------ - root hdfs 0 2016-02-29 04:31 /user/root/.staging drwxr-xr-x - root hdfs 0 2016-02-24 18:16 /user/root/OozieTest -rwxr-xr-x 3 root hdfs 1484 2016-02-03 21:19 /user/root/Output.json -rwx------ 3 root hdfs 504 2016-03-02 04:14 /user/root/mysql.password.jceks [root@m1 ~]# hadoop fs -cat /user/root/mysql.password.jceks encodedParamst[B[encryptedContentq~Lsun.paramsAlgtLjava/lang/String;LsealAlgq~xpur[B??T?xp0xrjavax.crypto.SealedObject>6=?÷Tp[ _ܬ??uq~?"?5?????-?y?L;XF6??zQ !z???????"???>I?cU?ɾ! So It gave my question's answer. Thanks once again.

SK1 · ‎03-02-2016

@Neeraj Sabharwal : Thanks for your support, I found a issue actually there was an misconfiguration in hdfs-site.xml file. I did not add target cluster HA properties to client hdfs-site.xml and because of that it was failing. but now it is working fine.

Online	Offline
Last Visited	‎01-15-2021 02:12 AM

Member Since	‎05-29-2017 06:13 AM
Last Visited	‎01-15-2021 02:12 AM
Posts	408
Kudos received	123

Cloudera Community

Re: select * from table is not returning any rows ...

Re: Falcon feed throwing an error during feed sche...

Re: we are getting below error/exception while exe...

Re: bin/solr status throwing an error "clusterstat...

Re: How to solve buffer memory issue in Solr

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: How many times falcon retry for failed jobs in...

Re: Can we use EL in falcon feed entity ?

Re: How to protect HDFS directories from deletion ...

Re: Can we change location of staging data dir in ...

Re: Can we configure falcon UI with username and p...

Re: Not able to drop database in hive

Re: Failed with exception MetaException(message:In...

Re: Managing Passwords In Sqoop

Re: Distcp is failing in HA