About SK1

SK1 · ‎11-22-2017

When users run hive query in zeppelin via jdbc interperator then it is going to some anonymous user not an actual user. INFO [2017-11-02 03:18:20,405] ({pool-2-thread-2} RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:546) – Push local angular object registry from ZeppelinServer to remote interpreter group 2CNQZ1ES5:shared_process WARN [2017-11-02 03:18:21,825] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2058) – Job 20171031-075630_2029577092 is finished, status: ERROR, exception: null, result: %text org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Unable to fetch table ushi_gl. org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode=”/apps/hive/warehouse/adodb.db/ushi_gl”:hive:hdfs:drwxr-x— at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:381) at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:338) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:109) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4111) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:854) RootCause: It is bug in zeppelin 0.7.0.2 and is going to fix in newer version of zeppelin. Resolution: Add your username and password in credential option in zeppelin.

SK1 · ‎02-28-2017

Sometime we get a situation where we have to get lists of all long running and based on threshold we need to kill them.Also sometime we need to do it for a specific yarn queue. In such situation following script will help you to do your job. #!/bin/bash if [ "$#" -lt 1 ]; then echo "Usage: $0 <max_life_in_mins>" exit 1 fi yarn application -list 2>/dev/null | grep "report" | grep "RUNNING" | awk '{print $1}' > job_list.txt for jobId in `cat job_list.txt` do finish_time=`yarn application -status $jobId 2>/dev/null | grep "Finish-Time" | awk '{print $NF}'` if [ $finish_time -ne 0 ]; then echo "App $jobId is not running" exit 1 fi time_diff=`date +%s`-`yarn application -status $jobId 2>/dev/null | grep "Start-Time" | awk '{print $NF}' | sed 's!$!/1000!'` time_diff_in_mins=`echo "("$time_diff")/60" | bc` echo "App $jobId is running for $time_diff_in_mins min(s)" if [ $time_diff_in_mins -gt $1 ]; then echo "Killing app $jobId" yarn application -kill $jobId else echo "App $jobId should continue to run" fi done [yarn@m1.hdp22 ~]$ ./kill_application_after_some_time.sh 30 (pass x tim in mins) App application_1487677946023_5995 is running for 0 min(s) App application_1487677946023_5995 should continue to run

SK1 · ‎02-27-2017

Sometime you may have a requirement where you need last accessed time of file in hdfs, then you may get it via following ways: Option 1: Ranger Audit: You can get it via ranger audit but I would not prefer it because output of audit file would be confusing due to many tmp files and dirs which present in audit db. So now to solve this requirement and to fulfill my purpose I have used java APIs and done my work very efficient and in a vary cleaned way. Option 2: Use java and build a small program. With the help of java program you can get some other useful information about hdfs file also in few number of lines. Step 1 : Create java program : package com.saurabh; import java.io.*; import java.util.*; import java.net.*; import java.nio.file.Files; import java.nio.file.Paths; import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; // For Date Conversion from long to human readable. import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Calendar; import java.util.Date; import java.util.concurrent.TimeUnit; public class Accesstime { public static void main(String[] args) throws Exception { System.out.println("usage: hadoop jar accessTime.jar <local file-path>"); System.out.println("********************************************************************"); System.out.println("Owner,LastAccessed(Days),LastAccessed(Date),FileName"); System.out.println("********************************************************************"); final String delimiter = ","; List<String> inputLines = new ArrayList<String>(); if (args.length != 0) { try { FileSystem fs = FileSystem.get(new Configuration()); Scanner myScanner = new Scanner(new File(args[0])); FileStatus status; while (myScanner.hasNextLine()) { String line = myScanner.nextLine(); status=fs.getFileStatus(new Path(line)); DateFormat df = new SimpleDateFormat("yyyy-MM-dd"); String owner = status.getOwner(); long lastAccessTimeLong = status.getAccessTime(); Date lastAccessTimeDate = new Date(lastAccessTimeLong); Date date = new Date(); String currentDate = df.format(date); // System.out.println(currentDate + " " + df.format(lastAccessTimeDate)); long diff = date.getTime() - lastAccessTimeDate.getTime(); inputLines.add(owner+delimiter+TimeUnit.DAYS.convert(diff, TimeUnit.MILLISECONDS)+delimiter+df.format(lastAccessTimeDate)+delimiter+line); } Comparator<String> comp = new Comparator<String>() { public int compare(String line1, String line2) { return (-1*(Long.valueOf(line1.split(delimiter)[1].trim()) .compareTo( Long.valueOf(line2.split(delimiter)[1] .trim())))); } }; Collections.sort(inputLines, comp); Iterator itr = inputLines.iterator(); // System.out.println("--------Printing Array List-----------"); while (itr.hasNext()) { System.out.println(itr.next()); } } catch (Exception e) { System.out.println("File not found"); e.printStackTrace(); } }else{ System.out.println("Please provide the absolute file path."); } } } Step 2: Export to jar and then copy jar file to your cluster. Step 3: Create one local file with absolute hdfs file path. [root@m1 ~]# cat input.txt /user/raghu/wordcount_in/words.txt /user/raghu/wordcount_out/_SUCCESS /user/raghu/wordcount_out/part-r-00000 Step 4: Now run your java jar file and it will give you all required details(Owner,LastAccessed(Days),LastAccessed(Date),FileName😞 [root@m1 ~]# hadoop jar accessTime.jar input.txt usage: hadoop jar accessTime.jar <local file-path> ******************************************************************** Owner,LastAccessed(Days),LastAccessed(Date),FileName ******************************************************************** saurkuma,20,2017-02-06,/user/raghu/wordcount_out/_SUCCESS raghu,16,2017-02-10,/user/raghu/wordcount_in/words.txt raghu,16,2017-02-10,/user/raghu/wordcount_out/part-r-00000

SK1 · ‎09-12-2016

When you create a database or internal tables in hive cli then by default it creates with 777 permission.Even though if you have umask in hdfs then also it will be same permission. But now you can change it with the help of following steps. 1.From the command line in the Ambari server node, edit the file vi /var/lib/ambari–server/resources/common–services/HIVE/0.12.0.2.0/package/scripts/hive.py Search for hive_apps_whs_dir which should go to this block: params.HdfsResource(params.hive_apps_whs_dir, type=“directory”, action=“create_on_execute”, owner=params.hive_user, group=params.user_group, mode=0755 ) 2. Modify the value for mode from 0777 to the desired permission, for example 0750.Save and close the file. 3. Restart the Ambari server to propagate the change to all nodes in the cluster: ambari–server restart 4. From the Ambari UI, restart HiveServer2 to apply the new permission to the warehouse directory. If multiple HiveServer2 instances are configured, any one instance can be restarted. hive> create database test2; OK Time taken: 0.156 seconds hive> dfs -ls /apps/hive/warehouse; Found 9 items drwxrwxrwx – hdpuser hdfs 0 2016-09-08 01:54 /apps/hive/warehouse/test.db drwxr-xr-x -hdpuser hdfs 0 2016-09-08 02:04 /apps/hive/warehouse/test1.db drwxr-x— -hdpuser hdfs 0 2016-09-08 02:09 /apps/hive/warehouse/test2.db I hope this will help you to serve your purpose.

SK1 · ‎08-01-2016

There is situation when unfortunately and unknowingly you delete /hdp/apps/2.3.4.0-3485 either with skipTrash or without skipTrash then you will be in trouble and other services will be impacted. You will not be able to run hive,mapreduce or sqoop command, You will get following error. Case 1: If you deleted it without skipTrash then it is very easy to recover: [root@m1 ranger-hdfs-plugin]# hadoop fs -rmr /hdp/apps/2.3.4.0-3485 rmr: DEPRECATED: Please use ‘rm -r’ instead. 16/07/28 01:59:22 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://HDPTSTHA/hdp/apps/2.3.4.0' to trash at: hdfs://HDPTSTHA/user/hdfs/.Trash/Current In this case it would be very easy to recover it as after deleting it goes to your current dir and you can recover it from there. hadoop fs -put hdfs://HDPTSTHA/user/hdfs/.Trash/Current//hdp/apps/2.3.4.0 /hdp/apps/ Case 2: If you deleted it with -skipTrash then you need to execute following steps: [root@m1 ranger-hdfs-plugin]# hadoop fs -rmr -skipTrash /hdp/apps/2.3.4.0-3485 rmr: DEPRECATED: Please use ‘rm -r’ instead. Deleted /hdp/apps/2.3.4.0-3485 So when I am trying to access to hive it is throwing below error. [root@m1 admin]# hive WARNING: Use “yarn jar” to launch YARN applications. 16/07/27 22:05:04 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties Exception in thread “main” java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: /hdp/apps/2.3.4.0-3485/tez/tez.tar.gz at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507) Resolution: Don’t worry friends you can resolve this issue by following give steps. Note: You have to replace version of your hdp. Step 1: First you will have to create following required dirs : hdfs dfs -mkdir -p /hdp/apps/<2.3.4.0-$BUILD>/mapreduce hdfs dfs -mkdir -p /hdp/apps/<2.3.4.0-$BUILD>/hive hdfs dfs -mkdir -p /hdp/apps/<2.3.4.0-$BUILD>/tez hdfs dfs -mkdir -p /hdp/apps/<2.3.4.0-$BUILD>/sqoop hdfs dfs -mkdir -p /hdp/apps/<2.3.4.0-$BUILD>/pig Step 2: Now you have to copy required jars in related dir. hdfs dfs -put /usr/hdp/2.3.4.0-$BUILD/hadoop/mapreduce.tar.gz /hdp/apps/2.3.4.0-$BUILD/mapreduce/ hdfs dfs -put /usr/hdp/2.3.2.0-<$version>/hive/hive.tar.gz /hdp/apps/2.3.2.0-<$version>/hive/ hdfs dfs -put /usr/hdp/<hdp_version>/tez/lib/tez.tar.gz /hdp/apps/<hdp_version>/tez/ hdfs dfs -put /usr/hdp/<hdp-version>/sqoop/sqoop.tar.gz /hdp/apps/<hdp-version>/sqoop/ hdfs dfs -put /usr/hdp/<hdp-version>/pig/pig.tar.gz /hdp/apps/<hdp-version>/pig/ Step 3: Now you need to change dir owner and then change permission: hdfs dfs -chown -R hdfs:hadoop /hdp hdfs dfs -chmod -R 555 /hdp/apps/2.3.4.0-$BUILD Now you will be able to start your hive CLI or other jobs. [root@m1 ~]# hive WARNING: Use “yarn jar” to launch YARN applications. 16/07/27 23:33:42 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties hive> I hope it will help you to restore your cluster. Please feel free to give your suggestion

SK1 · ‎07-31-2016

I have seen an issue with Application Timeline Server (ATS). Actually Application Timeline Server (ATS) uses a LevelDB database which is stored in the location specified by yarn.timeline-service.leveldb-timeline-store.path in yarn-site.xml.All metadata store in *.sst files under specified location. Due to this we may face an space issue.But It is not good practice to delete *.sst files directly. An *.sst file is a sorted table of key/value entries sorted by key and key/value entries are partitioned into different *.sst files by key instead of timestamp, such that there’s actually no old *.sst file to delete. But to solve the space of the leveldb storage, you can enable TTL (time to live). Once it is enabled, the timeline entities out of ttl will be discarded and you can set ttl to a smaller number than the default to give a timeline entity shorter lifetime. <property> <description>Enable age off of timeline store data.</description> <name>yarn.timeline-service.ttl-enable</name> <value>true</value> </property> <property> <description>Time to live for timeline store data in milliseconds.</description> <name>yarn.timeline-service.ttl-ms</name> <value>604800000</value> </property> But if by mistake you deleted these files manually as I did then you may see ATS issue or you may get following error. error code: 500, message: Internal Server Error{“message”:”Failed to fetch results by the proxy from url: http://server:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1469716920323&primaryFilter=user:$user&”,”status”:500,”trace”:”{\”exception\”:\”WebApplicationException\”,\”message\”:\”java.io.IOException: org.iq80.leveldb.DBException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store.ldb/6378017.sst: No such file or directory\”,\”javaClassName\”:\”javax.ws.rs.WebApplicationException\”}”} Or (AbstractService.java:noteFailure(272)) – Service org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst Resolution: Goto configured location /hadoop/yarn/timeline/leveldb-timeline-store.ldb and then you will see a text file named “CURRENT” cd /hadoop/yarn/timeline/leveldb-timeline-store.ldb ls -ltrh | grep -i CURRENT Copy your CURRENT file to some temporary location cp /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT /tmp Now you need to remove this file rm /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT Restart the YARN service via Ambari With the help of above steps I have resolved this issue. I hope it will help you as well.

SK1 · ‎07-21-2016

@Jonas Straub Can we configure ranger on solr without having kerberos in our cluster ?

SK1 · ‎07-15-2016

@Harsh Jain I am getting following error so Can you please help me to get it resolved. solr@m1 solr]$ bin/solr create -c SolrCollection -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/ -n mySolrConfigs -s 2 -rf 2 Connecting to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 Re-using existing configuration directory mySolrConfigs Creating new collection 'SolrCollection' using command: http://192.168.56.41:8983/solr/admin/collections?action=CREATE&name=SolrCollection&numShards=2&replicationFactor=2&maxShardsPerNode=4&collection.configName=mySolrConfigs { "responseHeader":{ "status":0, "QTime":1299}, "failure":{"":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://192.168.56.41:8983/solr: Error CREATEing SolrCore 'SolrCollection_shard2_replica1': Unable to create core [SolrCollection_shard2_replica1] Caused by: Open quote is expected for attribute \"{1}\" associated with an element type \"name\"."}}

SK1 · ‎07-05-2016

@Ali Bajwa I have resolved it by adding proxy setting for nifi user. hadoop.proxyuser.nifi.groups=* hadoop.proxyuser.nifi.hosts=*

SK1 · ‎07-05-2016

@Ali Bajwa I am getting below error when I tried to replicate same case in my sandbox VM Fusion.

Online	Offline
Last Visited	‎01-15-2021 02:12 AM

Member Since	‎05-29-2017 06:13 AM
Last Visited	‎01-15-2021 02:12 AM
Posts	408
Kudos received	123

Cloudera Community

hive jdbc in zeppelin throwing permission error to...

script to kill yarn application if it is running m...

How to get last access time of any files in hdfs

Change default permission of hive database

deleted /hdp/apps/ dir from hdfs

Application Timeline Server (ATS) issue error code...

Re: Securing Solr Collections with Ranger + Kerber...

Re: How to Deploy Apache Solr as SolrCloud on HDFS...

Re: Sample HDF/NiFi flow to Push Tweets into Solr/...

Re: Sample HDF/NiFi flow to Push Tweets into Solr/...