About TimothySpann

TimothySpann · ‎10-28-2016

HDFS client, Pig, Sqoop, Spark, Tez, Falcon are all useful clients to have there. As well as JDK 8. Hadoop Configuration files. Java 8 and Hadoop Configuration files are useful for NiFi servers to have as well.

TimothySpann · ‎10-27-2016

Run it on a cluster so you have more RAM . Running on one machine won't support that data size 16/10/25 18:30:41 INFO BlockManager: Reporting 4 blocks to the master. Exception in thread "qtp1394524874-84" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap$KeySet.iterator(HashMap.java:912) at java.util.HashSet.iterator(HashSet.java:172) at sun.nio.ch.Util$2.iterator(Util.java:243) at org.spark-project.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600) at org.spark-project.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)

TimothySpann · ‎10-27-2016

How big is your cluster? Sounds like you may need more RAM. That's a big join when they come together. What does the history UI show. Try 895 executors and 32 G of RAM. How many nodes in the cluster do you have? How big are these files in Gigabytes? How much RAM is available on the cluster? Do NOT run from the shell. Run this as compiled code and submit to yarn-cluster. This is running in a shell, not designed for large jobs. Shell is more for developing and testing parts of your code. Can you upgrade to 1.6.2? Newer Spark is faster and more efficient. Here are some settings to do. https://community.hortonworks.com/articles/34209/spark-16-tips-in-code-and-submission.html

TimothySpann · ‎10-27-2016

You install HDF 2.0 on one cluster; HDP 2.5 on another. For a sandbox, download nifi.apache.org/downloads NIFI and you can run that as a standalone Java application on HDP sandbox. Just set the port to 8090

TimothySpann · ‎10-27-2016

it's avro format

TimothySpann · ‎10-27-2016

Your output is AVRO. I looked at your ZIP and that's an AVRO file. Flume outputs AVRO from twitter https://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm You can also ingest Twitter to HDFS via Apache NiFi http://hortonworks.com/blog/hdf-2-0-flow-processing-real-time-tweets-strata-hadoop-slack-tensorflow-phoenix-zeppelin/

TimothySpann · ‎10-27-2016

Okay, silly mistake myfilemap.foreachRDD(rdd => if (!rdd.isEmpty()) { rdd.collect().foreach(println) })

TimothySpann · ‎10-27-2016

Spark Scala Code // Batch val file = sc.textFile("hdfs://isi.xyz.com:8020/user/test/AsRun.txt") val testdataframe = file.map(x => x.split("\\|")) testdataframe.take(5) //Streaming Code import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.streaming.{Seconds, StreamingContext} import StreamingContext._ import org.apache.hadoop.conf._ import org.apache.hadoop.fs._ object RatingsMatch{ def main(args: Array[String]){ //set app name val sparkConf = new SparkConf().setAppName("RatingsMatch") val conf = new SparkContext(sparkConf) val ssc = new StreamingContext(conf, Seconds(240)) val file = ssc.textFileStream(args(0)) //file.foreachRDD(rdd=>rdd.map(x => x.split("\\|")).foreach(println)) //val myfilemap = file.map(x => x.split(",")) //myfilemap.print() val myfilemap = file.transform(rdd => {rdd.map(x => x.split("\\|"))}) myfilemap.print() //As Run Schema //myfilemap.foreachRDD{rdd => //rdd.foreach.toArray(println) //} ssc.start() ssc.awaitTermination() } } I am trying to set up a Spark Streaming job. I’ve been able to get the cookie cutter sample word count to run. Now I am trying with our data. I can split and map the text file from Zeppelin or in cli using the batch engine. However, when I do the same for the dstream I get the output (pasted below the code). Any thoughts? I’ve tried a handful of approaches with Streaming using dstream.map, foreachrdd, and dstream.transform. I thought it may have been the regular expression to parse so I tried to change to a “,”. However, I still get the same results. [Ljava.lang.String;@c080470 [Ljava.lang.String;@1d6b8b9 [Ljava.lang.String;@2876a606 [Ljava.lang.String;@7fe36aa3 [Ljava.lang.String;@3304daab [Ljava.lang.String;@723bf02 [Ljava.lang.String;@1af86f76 [Ljava.lang.String;@7eaab8f8 [Ljava.lang.String;@6b6ee404 [Ljava.lang.String;@71af9dc4

TimothySpann · ‎10-27-2016

Does the user running NIFI have permissions to those files? From the source code: // If Kerberos Service Principal and keytab location not configured, throws exception if (!properties.isKerberosSpnegoSupportEnabled() || kerberosService == null) { throw new IllegalStateException("Kerberos ticket login not supported by this NiFi."); https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/AccessResource.java See: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#kerberos_login_identity_provider Initial Admin Identity (New NiFi Instance) If you are setting up a secured NiFi instance for the first time, you must manually designate an “Initial Admin Identity” in the authorizers.xml file. This initial admin user is granted access to the UI and given the ability to create additional users, groups, and policies. The value of this property could be a DN (when using certificates or LDAP) or a Kerberos principal. If you are the NiFi administrator, add yourself as the “Initial Admin Identity”. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#authorizers-setup Do you have Kerberos?

TimothySpann · ‎10-27-2016

what version of NIFI are you running? Can you update to a newer one? I recommend you use http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html What user are you running as?

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Do we need to Install Hadoop on Edge Node

Re: Spark Broadcast Hash Join failing on 800+ mill...

Re: Spark Broadcast Hash Join failing on 800+ mill...

Re: Install NiFi on HDP 2.5?

Re: which file its producing , JSON or AVRO ?

Re: which file its producing , JSON or AVRO ?

Re: Printing Fields in Spark Streaming vs Spark

Printing Fields in Spark Streaming vs Spark

Re: NiFi Authorization Problem

Re: NiFi Authorization Problem