About sivaprasanna246

sivaprasanna246 · ‎02-15-2018

Why don't you use MergeContent processor to concatenate the flow-file content?

sivaprasanna246 · ‎02-04-2018

@Matt Krueger Your table is ACID i.e. transaction enabled. Spark doesn't support reading Hive ACID table. Take a look at SPARK-15348 and SPARK-16996

sivaprasanna246 · ‎02-04-2018

Could you share a bit more about your issue, like the scenario or the process that you are doing? You would get more responses that way. 🙂

sivaprasanna246 · ‎01-28-2018

Take a look at this guide: https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-Loadingfilesintotables You should either try INSERT INTO TABLE '${hiveconf:inputtable}' SELECT * FROM datafactory7 limit 14; or LOAD DATA INPATH '<HDFS PATH WHERE FILES LOCATED>' INTO TABLE ${hiveconf:inputtable};

sivaprasanna246 · ‎01-27-2018

I could see the following in the error log: ERROR: org.apache.hadoop.security.authorize.AuthorizationException: User: livy is not allowed to impersonate admin Looks like your hadoop cluster is a secure one. You need to grant livy the ability to impersonate as the originating user. You need to add two properties to core-site.xml. Take a look at this guide.

sivaprasanna246 · ‎01-27-2018

How different is it from using livy to do the same?

sivaprasanna246 · ‎01-27-2018

A small correction. It's introduced in Ranger 0.7 and policies should look like this: //HDFS resource: path=/home/{USER} user: {USER} //Hive resource: database=db_{USER}; table=*; column=* user: {USER} where {USER} would substitute the user id of the currently logged in user.

sivaprasanna246 · ‎01-24-2018

Spark by default looks for files in HDFS but for some reason if you want to load file from the local filesystem, you need to prepend "file://" before the file path. So your code will be Dataset<Row> jsonTest = spark.read().json("file:///tmp/testJSON.json"); However this will be a problem when you are submitting in cluster mode since cluster mode will execute on the worker nodes. All the worker nodes are expected to have that file in that exact path so it will fail. To overcome, you can pass the file path in the --files parameter while running spark-submit which will put the file on the classpath so you can refer the file by simply calling the file name alone. For ex, if you submitted the following way: > spark-submit --master <your_master> --files /tmp/testJSON.json --deploy-mode cluster --class <main_class> <application_jar> then you can simply read the file the following way: Dataset<Row> jsonTest = spark.read().json("testJSON.json");

sivaprasanna246 · ‎01-23-2018

Cloudera has a nice two part tuning guide. Attaching the links: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

sivaprasanna246 · ‎01-23-2018

Apart form JVM limitation, which you can increase, there are no definite limitation on size or number of flowfile records as such. I would say design your flow and if you feel that you're throttled, you can follow some good designing practices to tweak your flow. Take a look at this: https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html

Online	Offline
Last Visited	‎03-30-2018 11:35 AM

Member Since	‎02-07-2017 04:48 AM
Last Visited	‎03-30-2018 11:35 AM
Posts	23
Kudos received	2

Cloudera Community

Re: How to pick number of executors , cores for ea...

Re: Attempting to Concat flowfiles for Data Transf...

Re: Spark not reading data from a Hive managed tab...

Re: The Data Integration Service could not find th...

Re: Mismatch Input Error

Re: Apache Zeppelin Livy can't initialize SparkCon...

Re: Starting Spark jobs directly via YARN REST API

Re: Ranger User Variables use for HDFS policies

Re: Java Spark insert JSON into Hive from the loca...

Re: How to pick number of executors , cores for ea...

Re: NiFi Data Threshold Limit