Member since 
    
	
		
		
		04-09-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                27
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 14286 | 08-04-2016 07:42 PM | |
| 2879 | 07-21-2016 11:44 PM | |
| 3924 | 07-14-2016 12:57 PM | 
			
    
	
		
		
		08-04-2016
	
		
		07:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 got solved by below      sudo -u hdfs hdfs dfsadmin -safemode leave    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-04-2016
	
		
		01:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,     My CDH 5.5 was running all fine but now when i typed spark-shell command ,i see below issue of sqlcontext,can anyone suggest what should i do to remove this issue,spark-shell was running perfectly till this issue.     Please note i restarted Cloudera manager from admin console      16/08/04 13:38:51 ERROR Utils: Uncaught exception in thread main  java.lang.NullPointerException  at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)  16/08/04 13:38:51 INFO SparkContext: Successfully stopped SparkContext  org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /user/cloudera/.sparkStaging/application_1470339377450_0002. Name node is in safe mode.  The reported blocks 919 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 921.  The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1416)  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNames    <console>:10: error: not found: value sqlContext  import sqlContext.implicits._  ^  <console>:10: error: not found: value sqlContext  import sqlContext.sql    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		08-03-2016
	
		
		11:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Awesome here is working code     import org.apache.spark.SparkContext  val data = scala.io.Source.fromURL("http://10.3.9.34:9900/messages").mkString  val list = data.split("\n").filter(_ != "")  val rdds = sc.parallelize(list)  rdds.saveAsTextFile("/user/cloudera/spark/fromsource") 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2016
	
		
		10:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 First of all thanks Umesh,you got my half problem solved ,appreciate that really but only issue is now its not saving at hdfs location /user/cloudera/flume because of illegal character      scala> import org.apache.spark.SparkContext  import org.apache.spark.SparkContext     scala> val data = scala.io.Source.fromURL("http://10.3.9.34:9900/messages").mkString  data: String =  "Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 kernel: imklog 4.6.2, log source = /proc/kmsg started.  Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1342" x-info="http://www.rsyslog.com"] (re)start  Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic  Aug 1 03:36:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic  Aug 2 03:16:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic  Aug 3 03:24:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic  "  scala> val list = data.split("\n").filter(_ != "")  list: Array[String] = Array(Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 kernel: imklog 4.6.2, log source = /proc/kmsg started., Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1342" x-info="http://www.rsyslog.com"] (re)start, Jul 31 03:38:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 1 03:36:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 2 03:16:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic, Aug 3 03:24:01 MSAT-T8360-62-RHEL64-24-103934 rhsmd: This system is registered to RHN Classic)     scala> val rdds = sc.parallelize(list)  rdds: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:26     scala> rdds.saveAsTextFile(“/user/cloudera/flume”)  <console>:1: error: illegal character '\u201c'  rdds.saveAsTextFile(“/user/cloudera/flume”)  ^  <console>:1: error: illegal character '\u201d'  rdds.saveAsTextFile(“/user/cloudera/flume”)  ^  scala>     Can you please help 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2016
	
		
		11:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi My reqmnt is to create Analytics from http://10.3.9.34:9900/messages that is pull data from fromhttp://10.3.9.34:9900/messages and put this data in HDFS location /user/cloudera/flume and from HDFS create Analytics report using Tableau or HUE UI . i tried with below code at scala console of spark-shell of CDH5.5 but unable to fetch data from the http link     import org.apache.spark.SparkContext
val dataRDD = sc.textFile(“http://10.3.9.34:9900/messages”)dataRDD.collect().foreach(println)dataRDD.count()dataRDD.saveAsTextFile(“/user/cloudera/flume”)  I get below error at scala console :- java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2623) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2637) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2680) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2662) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:379) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		07-23-2016
	
		
		01:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i typed spark-shell and i got scala console 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-21-2016
	
		
		11:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks this got solved by this pos     https://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/CDH-5-5-VirtualBox-unable-to-connect-to-Spark-Master-Worker/td-p/34491 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-21-2016
	
		
		09:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    My intention is to pull data from webserver to HDFS,i tried with flume but data is not getting pushed to HDFS working hence i wrote simple below scala program in CDH 5.5.  Please note i checked spark service is up at Cloudera managaer console  Here is the program i tried running at scala console    scala> import org.apache.spark.SparkContext  import org.apache.spark.SparkContext  scala>  scala> val dataRDD = sc.textFile("http://10.3.9.34:9900/messages")  <console>:14: error: not found: value sc  val dataRDD = sc.textFile("http://10.3.9.34:9900/messages")  ^  scala> dataRDD.collect().foreach(println)  <console>:15: error: not found: value dataRDD  dataRDD.collect().foreach(println)  ^  scala>  scala> dataRDD.count()  <console>:15: error: not found: value dataRDD  dataRDD.count()  ^  scala>  scala> import org.apache.spark.SparkContext  import org.apache.spark.SparkContext  scala> val dataRDD = sc.textFile("http://10.3.9.34:9900/messages")  <console>:16: error: not found: value sc  val dataRDD = sc.textFile("http://10.3.9.34:9900/messages")     Exact error is :-  16/07/21 23:35:35 ERROR SparkContext: Error initializing SparkContext.  org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/user/spark/applicationHistory":spark:supergroup:drwxr-xr-x 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		07-14-2016
	
		
		12:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you this got solved by below configuration and command     agent1.sources = netcat-collect  agent1.sinks = hdfs-write  agent1.channels = memoryChannel  # Describe/configure source1  agent1.sources.netcat-collect.type = exec  agent1.sources.netcat-collect.bind = 10.3.9.34  agent1.sources.netcat-collect.port = 22  agent1.sources.netcat-collect.command = tail -F /var/log/wtmp  # Describe solrSink  agent1.sinks.hdfs-write.type = hdfs  agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d  agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d  agent1.sinks.hdfs-write.hdfs.rollSize = 1048576  agent1.sinks.hdfs-write.hdfs.rollCount = 100  agent1.sinks.hdfs-write.hdfs.rollInterval = 120  agent1.sinks.hdfs-write.hdfs.writeFormat = Text  agent1.sinks.hdfs-write.hdfs.fileType = DataStream  agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true  agent1.sinks.hdfs-write.hdfs.idleTimeout = 10     # Use a channel which buffers events to a file  # -- The component type name, needs to be FILE.  agent1.channels.memoryChannel.type = memory  agent1.channels.memoryChannel.capacity =10000    # Amount of time (in millis) between checkpoints  agent1.channels.memoryChannel.checkpointInterval 300000  # Max size (in bytes) of a single log file  agent1.channels.memoryChannel.maxFileSize = 2146435071  # Bind the source and sink to the channel  agent1.sources.netcat-collect.channels = memoryChannel    Below is the command to pull data from weblog to HDFS  flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-12-2016
	
		
		10:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Team,  I am using cloudera VM with CDH5.5.0.    I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress     I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :-    flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf  Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import  Below is my flume.conf details :-     agent1.sources = netcat-collect  agent1.sinks = hdfs-write  agent1.channels = memory  # Describe/configure source1  agent1.sources.netcat-collect.type = netcat  agent1.sources.netcat-collect.bind = 10.3.9.34  agent1.sources.netcat-collect.port = 22  agent1.sources.netcat-collect.command = tail -F /var/log/wtmp  # Describe solrSink  agent1.sinks.hdfs-write.type = hdfs  agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d  agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d  agent1.sinks.hdfs-write.hdfs.rollSize = 1048576  agent1.sinks.hdfs-write.hdfs.rollCount = 100  agent1.sinks.hdfs-write.hdfs.rollInterval = 120  agent1.sinks.hdfs-write.hdfs.writeFormat = Text  agent1.sinks.hdfs-write.hdfs.fileType = DataStream  agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true  agent1.sinks.hdfs-write.hdfs.idleTimeout = 10    # Use a channel which buffers events to a file  # -- The component type name, needs to be FILE.  agent1.channels.memoryChannel.type = memory  agent1.channels.memoryChannel.capacity =10000    # Amount of time (in millis) between checkpoints  agent1.channels.memoryChannel.checkpointInterval 3000  # Max size (in bytes) of a single log file  agent1.channels.memoryChannel.maxFileSize = 2146435071  # Bind the source and sink to the channel  agent1.sources.netcat-collect.channels = memoryChannel  agent1.sinks.hdfs-write.channel = memoryChannel     Execution log attached with this thread  https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing  Can someone help me in guiding what is the resolution    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Flume
 - 
						
							
		
			Apache Hadoop
 - 
						
							
		
			HDFS