1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1999 | 04-03-2024 06:39 AM | |
| 3171 | 01-12-2024 08:19 AM | |
| 1725 | 12-07-2023 01:49 PM | |
| 2505 | 08-02-2023 07:30 AM | |
| 3517 | 03-29-2023 01:22 PM |
10-28-2016
01:28 PM
HDFS client, Pig, Sqoop, Spark, Tez, Falcon are all useful clients to have there. As well as JDK 8. Hadoop Configuration files. Java 8 and Hadoop Configuration files are useful for NiFi servers to have as well.
... View more
10-27-2016
06:44 PM
1 Kudo
Run it on a cluster so you have more RAM . Running on one machine won't support that data size 16/10/25 18:30:41 INFO BlockManager: Reporting 4 blocks to the master.
Exception in thread "qtp1394524874-84" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap$KeySet.iterator(HashMap.java:912)
at java.util.HashSet.iterator(HashSet.java:172)
at sun.nio.ch.Util$2.iterator(Util.java:243)
at org.spark-project.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
at org.spark-project.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
... View more
10-27-2016
06:43 PM
1 Kudo
How big is your cluster? Sounds like you may need more RAM. That's a big join when they come together. What does the history UI show. Try 895 executors and 32 G of RAM. How many nodes in the cluster do you have? How big are these files in Gigabytes? How much RAM is available on the cluster? Do NOT run from the shell. Run this as compiled code and submit to yarn-cluster. This is running in a shell, not designed for large jobs. Shell is more for developing and testing parts of your code. Can you upgrade to 1.6.2? Newer Spark is faster and more efficient. Here are some settings to do. https://community.hortonworks.com/articles/34209/spark-16-tips-in-code-and-submission.html
... View more
10-27-2016
06:41 PM
You install HDF 2.0 on one cluster; HDP 2.5 on another. For a sandbox, download nifi.apache.org/downloads NIFI and you can run that as a standalone Java application on HDP sandbox. Just set the port to 8090
... View more
10-27-2016
05:21 PM
it's avro format
... View more
10-27-2016
05:20 PM
1 Kudo
Your output is AVRO. I looked at your ZIP and that's an AVRO file. Flume outputs AVRO from twitter https://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm You can also ingest Twitter to HDFS via Apache NiFi http://hortonworks.com/blog/hdf-2-0-flow-processing-real-time-tweets-strata-hadoop-slack-tensorflow-phoenix-zeppelin/
... View more
10-27-2016
04:50 PM
2 Kudos
Okay, silly mistake myfilemap.foreachRDD(rdd => if (!rdd.isEmpty()) {
rdd.collect().foreach(println)
})
... View more
10-27-2016
04:30 PM
Spark Scala Code // Batch
val file = sc.textFile("hdfs://isi.xyz.com:8020/user/test/AsRun.txt")
val testdataframe = file.map(x => x.split("\\|"))
testdataframe.take(5)
//Streaming Code
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
object RatingsMatch{
def main(args: Array[String]){
//set app name
val sparkConf = new SparkConf().setAppName("RatingsMatch")
val conf = new SparkContext(sparkConf)
val ssc = new StreamingContext(conf, Seconds(240))
val file = ssc.textFileStream(args(0))
//file.foreachRDD(rdd=>rdd.map(x => x.split("\\|")).foreach(println))
//val myfilemap = file.map(x => x.split(","))
//myfilemap.print()
val myfilemap = file.transform(rdd => {rdd.map(x => x.split("\\|"))})
myfilemap.print()
//As Run Schema
//myfilemap.foreachRDD{rdd =>
//rdd.foreach.toArray(println)
//}
ssc.start()
ssc.awaitTermination()
}
}
I am trying to set up a Spark Streaming job. I’ve been able to get the cookie cutter sample word count to run. Now I am trying with our data. I can split and map the text file from Zeppelin or in cli using the batch engine. However, when I do the same for the dstream I get the output (pasted below the code). Any thoughts? I’ve tried a handful of approaches with Streaming using dstream.map, foreachrdd, and dstream.transform. I thought it may have been the regular expression to parse so I tried to change to a “,”. However, I still get the same results. [Ljava.lang.String;@c080470
[Ljava.lang.String;@1d6b8b9
[Ljava.lang.String;@2876a606
[Ljava.lang.String;@7fe36aa3
[Ljava.lang.String;@3304daab
[Ljava.lang.String;@723bf02
[Ljava.lang.String;@1af86f76
[Ljava.lang.String;@7eaab8f8
[Ljava.lang.String;@6b6ee404
[Ljava.lang.String;@71af9dc4
... View more
Labels:
- Labels:
-
Apache Spark
10-27-2016
01:15 PM
Does the user running NIFI have permissions to those files? From the source code: // If Kerberos Service Principal and keytab location not configured, throws exception if (!properties.isKerberosSpnegoSupportEnabled() || kerberosService == null) { throw new IllegalStateException("Kerberos ticket login not supported by this NiFi.");
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/AccessResource.java See: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#kerberos_login_identity_provider Initial Admin Identity (New NiFi Instance) If you are setting up a secured NiFi instance for the first time, you must manually designate an “Initial Admin Identity” in the authorizers.xml file. This initial admin user is granted access to the UI and given the ability to create additional users, groups, and policies. The value of this property could be a DN (when using certificates or LDAP) or a Kerberos principal. If you are the NiFi administrator, add yourself as the “Initial Admin Identity”. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#authorizers-setup Do you have Kerberos?
... View more
10-27-2016
01:09 PM
what version of NIFI are you running? Can you update to a newer one? I recommend you use http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html What user are you running as?
... View more