Member since
04-09-2016
27
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13761 | 08-04-2016 07:42 PM | |
2415 | 07-21-2016 11:44 PM | |
3372 | 07-14-2016 12:57 PM |
08-04-2016
07:42 PM
1 Kudo
got solved by below sudo -u hdfs hdfs dfsadmin -safemode leave
... View more
08-04-2016
01:53 PM
Hi All, My CDH 5.5 was running all fine but now when i typed spark-shell command ,i see below issue of sqlcontext,can anyone suggest what should i do to remove this issue,spark-shell was running perfectly till this issue. Please note i restarted Cloudera manager from admin console 16/08/04 13:38:51 ERROR Utils: Uncaught exception in thread main java.lang.NullPointerException at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152) 16/08/04 13:38:51 INFO SparkContext: Successfully stopped SparkContext org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /user/cloudera/.sparkStaging/application_1470339377450_0002. Name node is in safe mode. The reported blocks 919 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 921. The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1416) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNames <console>:10: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:10: error: not found: value sqlContext import sqlContext.sql
... View more
Labels:
- Labels:
-
Apache Spark
08-03-2016
11:43 AM
Awesome here is working code import org.apache.spark.SparkContext val data = scala.io.Source.fromURL("http://10.3.9.34:9900/messages").mkString val list = data.split("\n").filter(_ != "") val rdds = sc.parallelize(list) rdds.saveAsTextFile("/user/cloudera/spark/fromsource")
... View more
08-03-2016
10:24 AM
First of all thanks Umesh,you got my half problem solved ,appreciate that really but only issue is now its not saving at hdfs location /user/cloudera/flume because of illegal character scala> import org.apache.spark.SparkContext import org.apache.spark.SparkContext scala> val data = scala.io.Source.fromURL("http://10.3.9.34:9900/messages").mkString data: String = "Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 kernel: imklog 4.6.2, log source = /proc/kmsg started. Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1342" x-info="http://www.rsyslog.com"] (re)start Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic Aug 1 03:36:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic Aug 2 03:16:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic Aug 3 03:24:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic " scala> val list = data.split("\n").filter(_ != "") list: Array[String] = Array(Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 kernel: imklog 4.6.2, log source = /proc/kmsg started., Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1342" x-info="http://www.rsyslog.com"] (re)start, Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 1 03:36:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 2 03:16:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 3 03:24:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic) scala> val rdds = sc.parallelize(list) rdds: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:26 scala> rdds.saveAsTextFile(“/user/cloudera/flume”) <console>:1: error: illegal character '\u201c' rdds.saveAsTextFile(“/user/cloudera/flume”) ^ <console>:1: error: illegal character '\u201d' rdds.saveAsTextFile(“/user/cloudera/flume”) ^ scala> Can you please help
... View more
08-02-2016
11:36 PM
Hi My reqmnt is to create Analytics from http://10.3.9.34:9900/messages that is pull data from fromhttp://10.3.9.34:9900/messages and put this data in HDFS location /user/cloudera/flume and from HDFS create Analytics report using Tableau or HUE UI . i tried with below code at scala console of spark-shell of CDH5.5 but unable to fetch data from the http link import org.apache.spark.SparkContext
val dataRDD = sc.textFile(“http://10.3.9.34:9900/messages”)dataRDD.collect().foreach(println)dataRDD.count()dataRDD.saveAsTextFile(“/user/cloudera/flume”) I get below error at scala console :- java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2623) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2637) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2680) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2662) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:379) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
... View more
Labels:
07-23-2016
01:41 PM
i typed spark-shell and i got scala console
... View more
07-21-2016
11:44 PM
thanks this got solved by this pos https://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/CDH-5-5-VirtualBox-unable-to-connect-to-Spark-Master-Worker/td-p/34491
... View more
07-21-2016
09:40 PM
My intention is to pull data from webserver to HDFS,i tried with flume but data is not getting pushed to HDFS working hence i wrote simple below scala program in CDH 5.5. Please note i checked spark service is up at Cloudera managaer console Here is the program i tried running at scala console scala> import org.apache.spark.SparkContext import org.apache.spark.SparkContext scala> scala> val dataRDD = sc.textFile("http://10.3.9.34:9900/messages") <console>:14: error: not found: value sc val dataRDD = sc.textFile("http://10.3.9.34:9900/messages") ^ scala> dataRDD.collect().foreach(println) <console>:15: error: not found: value dataRDD dataRDD.collect().foreach(println) ^ scala> scala> dataRDD.count() <console>:15: error: not found: value dataRDD dataRDD.count() ^ scala> scala> import org.apache.spark.SparkContext import org.apache.spark.SparkContext scala> val dataRDD = sc.textFile("http://10.3.9.34:9900/messages") <console>:16: error: not found: value sc val dataRDD = sc.textFile("http://10.3.9.34:9900/messages") Exact error is :- 16/07/21 23:35:35 ERROR SparkContext: Error initializing SparkContext. org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/user/spark/applicationHistory":spark:supergroup:drwxr-xr-x
... View more
Labels:
- Labels:
-
Apache Spark
07-14-2016
12:57 PM
Thank you this got solved by below configuration and command agent1.sources = netcat-collect agent1.sinks = hdfs-write agent1.channels = memoryChannel # Describe/configure source1 agent1.sources.netcat-collect.type = exec agent1.sources.netcat-collect.bind = 10.3.9.34 agent1.sources.netcat-collect.port = 22 agent1.sources.netcat-collect.command = tail -F /var/log/wtmp # Describe solrSink agent1.sinks.hdfs-write.type = hdfs agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d agent1.sinks.hdfs-write.hdfs.rollSize = 1048576 agent1.sinks.hdfs-write.hdfs.rollCount = 100 agent1.sinks.hdfs-write.hdfs.rollInterval = 120 agent1.sinks.hdfs-write.hdfs.writeFormat = Text agent1.sinks.hdfs-write.hdfs.fileType = DataStream agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true agent1.sinks.hdfs-write.hdfs.idleTimeout = 10 # Use a channel which buffers events to a file # -- The component type name, needs to be FILE. agent1.channels.memoryChannel.type = memory agent1.channels.memoryChannel.capacity =10000 # Amount of time (in millis) between checkpoints agent1.channels.memoryChannel.checkpointInterval 300000 # Max size (in bytes) of a single log file agent1.channels.memoryChannel.maxFileSize = 2146435071 # Bind the source and sink to the channel agent1.sources.netcat-collect.channels = memoryChannel Below is the command to pull data from weblog to HDFS flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf
... View more
07-12-2016
10:54 AM
Hi Team, I am using cloudera VM with CDH5.5.0. I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :- flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import Below is my flume.conf details :- agent1.sources = netcat-collect agent1.sinks = hdfs-write agent1.channels = memory # Describe/configure source1 agent1.sources.netcat-collect.type = netcat agent1.sources.netcat-collect.bind = 10.3.9.34 agent1.sources.netcat-collect.port = 22 agent1.sources.netcat-collect.command = tail -F /var/log/wtmp # Describe solrSink agent1.sinks.hdfs-write.type = hdfs agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d agent1.sinks.hdfs-write.hdfs.rollSize = 1048576 agent1.sinks.hdfs-write.hdfs.rollCount = 100 agent1.sinks.hdfs-write.hdfs.rollInterval = 120 agent1.sinks.hdfs-write.hdfs.writeFormat = Text agent1.sinks.hdfs-write.hdfs.fileType = DataStream agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true agent1.sinks.hdfs-write.hdfs.idleTimeout = 10 # Use a channel which buffers events to a file # -- The component type name, needs to be FILE. agent1.channels.memoryChannel.type = memory agent1.channels.memoryChannel.capacity =10000 # Amount of time (in millis) between checkpoints agent1.channels.memoryChannel.checkpointInterval 3000 # Max size (in bytes) of a single log file agent1.channels.memoryChannel.maxFileSize = 2146435071 # Bind the source and sink to the channel agent1.sources.netcat-collect.channels = memoryChannel agent1.sinks.hdfs-write.channel = memoryChannel Execution log attached with this thread https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing Can someone help me in guiding what is the resolution
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
HDFS