About nyadav

nyadav · ‎08-21-2017

All nodes we have lzo installed, and lzo is configured properly. Spark and mr jobs are running fine. But we are seeing exception while running spark job using java-action through oozie. As it is not able to load hadoop-native lib, verified it is present in spark-default.conf. Even in jobs we are not using lzo compression, not sure why it is picking lzo? spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 Below is the exception in yarn application logs ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2112) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:234) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:275) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR lzo.LzoCodec: Cannot load native-lzo without native-hadoop

nyadav · ‎06-30-2017

@rakanchi Currently Ambari does not support the definition of multiple Nameservices. It always assumed hdfs_site['dfs.nameservices'] is just a string defining one nameservice. Known ambari bug : https://issues.apache.org/jira/browse/AMBARI-15506 . Fixed in 2.4.0

nyadav · ‎06-30-2017

@rakanchi It is a known bug in Multi-threaded access to CredentialProviderFactory is not thread-safe. I had similar case with one of the customer. Had to apply hdfs patch : HADOOP-14195

nyadav · ‎06-30-2017

@rakanchi You need to configure storm_jaas.conf with client properties, and pass to storm topology storm_jaas.conf StormClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/hdfs.headless.keytab" storeKey=true useTicketCache=false serviceName="nimbus" principal="hdfs@example.com"; }; Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/hdfs.headless.keytab" storeKey=true useTicketCache=false serviceName="zookeeper" principal="hdfs@example.com"; }; And pass jaas file with -c option storm jar /usr/hdp/current/storm-client/contrib/storm-starter/storm-starter-*-jar-with-dependencies.jar storm.starter.WordCountTopology wordcount -c java.security.auth.login.config=/my/custom/jaas/path Let me know if it helps!

nyadav · ‎06-30-2017

@rakanchi Look like permission are not set correctly for topics, verify the Kafka policies in Ranger are correct for ATLAS_HOOK topic, ATLAS_ENTITIES topic check follow link, https://github.com/emaxwell-hw/Atlas-Ranger-Tag-Security. Also provide permissions to 'hive' and 'atlas' users on atlas kafka topics ATLAS_HOOK and ATLAS_ENTITIES. Let me know if it helps!

nyadav · ‎06-30-2017

@rakanchi Spark currently doesn't support "insert into feature", you need to create dataframe and append to the table. var data = sqlContext.createDataFrame(Seq(("ZZ", "m:x", 34.0))).toDF("pv", "metric", "value") data.show() data.write.mode("append").saveAsTable("results_test_hive") println(sqlContext.sql("select * from results_test_hive").count())

nyadav · ‎06-28-2017

@prsingh You need to pass databricks csv dependencies, either you need to download the jar or pass dependencies at run time. 1) download the dependency at run time pyspark --packages com.databricks:spark-csv_2.10:1.2.0 df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') or 2) pass the jars while starting a) downloaded the jars as follow: wget http://search.maven.org/remotecontent?filepath=org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar -O commons-csv-1.1.jar wget http://search.maven.org/remotecontent?filepath=com/databricks/spark-csv_2.10/1.0.0/spark-csv_2.10-1.0.0.jar -O spark-csv_2.10-1.0.0.jar b) then start the python spark shell with the arguments: ./bin/pyspark --jars "spark-csv_2.10-1.0.0.jar,commons-csv-1.1.jar" c) load as dataframe df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') Let me know if above helps!

nyadav · ‎06-28-2017

Facing issues with oozie workflow. It is failing after we changed schema on one of the table. ERROR SchemaCheckXCommand:517 - Found [1] extra indexes for columns in table [WF_ACTIONS]: [wf_id]

nyadav · ‎03-31-2017

Thanks @krajguru, that was exactly I was looking for 🙂

nyadav · ‎03-31-2017

How can I list the existing variable names and their values used across various configurations of Ambari through REST API for example {{namenode_heapsize}} ?

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎04-20-2016 12:05 PM
Last Visited	‎08-15-2019 06:35 AM
Posts	61
Kudos received	17

Cloudera Community

Re: NN failed during setting up cross realm trust ...

Re: java.io.IOException: Cannot find AWS access ke...

Re: Storm impersonation is not working. Appreciate...

Re: Unable to see Hive tables in Atlas UI after ...

Re: I am trying to insert into HIVE table through ...

ERROR lzo.GPLNativeCodeLoader: Could not load nati...

Re: NN failed during setting up cross realm trust ...

Re: java.io.IOException: Cannot find AWS access ke...

Re: Storm impersonation is not working. Appreciate...

Re: Unable to see Hive tables in Atlas UI after ...

Re: I am trying to insert into HIVE table through ...

Re: Pysprak issue

oozie workflow failing

Re: accessing variable names in Ambari through RE...

accessing variable names in Ambari through REST