About mqureshi

mqureshi · ‎07-06-2016

Check the link I just added to my answer.

mqureshi · ‎07-06-2016

Hi @Qi Wang Which user is running the sqoop command? Can you verify file /etc/hive/2.5.0.0-817/0/xasecure-audit.xml exists? Does the user running sqoop import has read access to this file? Also, check the following link. It might be your issue. https://community.hortonworks.com/questions/369/installed-ranger-in-a-cluster-and-running-into-the.html

mqureshi · ‎07-06-2016

@Sunile Manjee Yes. Here is what I did. Let me know if you have any questions. try{ UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(kerberos_principal, kerberos_keytab); objectOfMyType = ugi.doAs(new PrivilegedExceptionAction<MyType>(){ @Override public MyType run() throws Exception{ System.setProperty("spark.serializer","org.apache.spark.serializer.KryoSerializer"); System.setProperty("spark.kryo.registrator","fire.util.spark.Registrator"); System.setProperty("spark.akka.timeout","900"); System.setProperty("spark.worker.timeout","900"); System.setProperty("spark.storage.blockManagerSlaveTimeoutMs","3200000"); // create spark context SparkConf sparkConf = new SparkConf().setAppName("MyApp"); sparkConf.setMaster("local"); sparkConf.set("spark.broadcast.compress", "false"); sparkConf.set("spark.shuffle.compress", "false"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); DataFrame tdf = ctx.sqlctx().read().format("com.databricks.spark.csv") .option("header", String.valueOf(header)) // Use first line of all files as header .option("inferSchema", "true") // Automatically infer data types .option("delimiter", delimiter) .load(path); //some more application specific code here return objectOfMyType; } }); } catch (Exception exception){ exception.printStackTrace(); }

mqureshi · ‎07-06-2016

I figured this out. I changed master to local and then simply loading remote HDFS data. It was still giving an exception because it's a kerberized cluster. While I was using UserGroupInformation and then creating a proxy user with valid keytab to access my cluster, the reason it was failing was because I was creating JavaSparkContext outside of "doAs" method. Once I created JavaSparkContext using the right proxy user, everything worked.

mqureshi · ‎07-01-2016

Hive jdbc jar should be at the following location. You can copy it from here. /usr/hdp/current/hive-client/lib/hive-jdbc.jar

mqureshi · ‎07-01-2016

Hi I am trying to run an application from my eclipse so I can put break points as well as monitor changing values of my variables. I create a JavaSparkContext which uses "SparkConf" object. This object should have access to my yarn-site.xml and core-site.xml so it knows how to connect to the cluster. I have these files under /etc/hadoop/conf and two environment variables set "HADOOP_CONF_DIR" and "YARN_CONF_DIR" on my mac using ~/Library/LaunchAgents/environment.plist where I have eclipse. I have verified these variables are available when I boot up mac and I can view these variables in my my app in eclipse using "System.getenv("HADOOP_CONF_DIR") and they point to the right location. I have also tried adding environment variables in my build configuration in eclipse. After doing all this, my code consistently fails because it's unable to read yarn-site.xml or core-site.xml because I run into following issue INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:803216/07/01 00:57:16 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) As you can see, it's not trying to connect to the correct location of resource manager. Here is how the code looks in create(). Please let me know what you think as this is blocking me. public static JavaSparkContext create() { System.setProperty("spark.serializer","org.apache.spark.serializer.KryoSerializer"); System.setProperty("spark.kryo.registrator","fire.util.spark.Registrator"); System.setProperty("spark.akka.timeout","900"); System.setProperty("spark.worker.timeout","900"); System.setProperty("spark.storage.blockManagerSlaveTimeoutMs","3200000"); // create spark context SparkConf sparkConf = new SparkConf().setAppName("MyApp"); // if (clusterMode == false) { sparkConf.setMaster("yarn-client"); sparkConf.set("spark.broadcast.compress", "false"); sparkConf.set("spark.shuffle.compress", "false"); } JavaSparkContext ctx = new JavaSparkContext(sparkConf); <- Fails Here return ctx; }

mqureshi · ‎06-30-2016

@hoda moradi You will have to do some research but you might be missing a jar file. Are you sure you have jdbc jar files in classpath? See the following two links. https://community.hortonworks.com/questions/19396/oozie-hive-action-errors-out-with-exit-code-12.html https://community.hortonworks.com/articles/9148/troubleshooting-an-oozie-flow.html

mqureshi · ‎06-30-2016

Hi @hoda moradi Here is the issue you are running into. User: hive is not allowed to impersonate anonymous at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:266) at I am assuming this is simple development and you are not so much concerned about policies. If you are, then only your organization's security team can tell you which users can hive impersonate. But basically you need to enable hive impersonation. Can you see if following is set to true in your hive-site.xml? <property> <name>hive.server2.enable.impersonation</name> <description>Enable user impersonation for HiveServer2</description> <value>true</value> </property> and check the following link to setup proxyuser settings for hive user in core-site.xml http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_views_guide/content/_setup_HDFS_proxy_user.html You need to set the following. Remember, this definitely cannot be * if this is for work and that is where your security team comes in. They will tell you who hive use can impersonate. hadoop.proxyuser.hive.groups=* hadoop.proxyuser.hive.hosts=*

mqureshi · ‎06-28-2016

Do you have Ambari running? You should be able to check from Ambarithe status of your JHS. Otherwise, this should bring the UI assuming you haven't modified the default ports. http://<host>:19888

mqureshi · ‎06-28-2016

@hoda moradi Can you please share your log? Is your job history server running? Thanks

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Sandbox HDP 2.5 TP: error when running sqoop, ...

Re: Sandbox HDP 2.5 TP: error when running sqoop, ...

Re: Connecting to remote spark 1.6 as yarn-client ...

Re: Connecting to remote spark 1.6 as yarn-client ...

Re: Main class [org.apache.oozie.action.hadoop.Hiv...

Connecting to remote spark 1.6 as yarn-client from...

Re: Main class [org.apache.oozie.action.hadoop.Hiv...

Re: How fix the JA006 error in Oozie.

Re: How fix the JA006 error in Oozie.

Re: How fix the JA006 error in Oozie.