Member since 
    
	
		
		
		12-09-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                43
            
            
                Posts
            
        
                18
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 14375 | 12-17-2015 07:27 AM | 
			
    
	
		
		
		01-09-2018
	
		
		06:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-08-2018
	
		
		10:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 $pyspark 
$json_file = sqlContext.read.json(sc.wholeTextFiles('/user/admin/emp/*').values())  
18/01/08 15:34:36 ERROR Utils: Uncaught exception in thread stdout writer for python2.7
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Exception in thread "stdout writer for python2.7" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		11-09-2016
	
		
		06:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i already import  import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._  still i have the same issue i am using HDP 2.3 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-08-2016
	
		
		07:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Matthieu Lamairesse  Error :  scala> df.write.format("orc").saveAsTable("default.sample_07_new_schema") <console>:33: error: value write is not a member of org.apache.spark.sql.DataFrame df.write.format("orc").saveAsTable("default.sample_07_new_schema")                                                                                   ^ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-04-2016
	
		
		02:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Is oozie can be installed and running without Hadoop. I refer oozie materials where hadoop is needed.  Let's say i have 2 plain java applications.Now I want to chain these 2 java applications in a oozie workflow and want to produce the final json output from the 2nd java code. I don't want to code these 2 java applications in Map/reduce program. They should be plain java code.  Please suggest.  how to run oozie  without hadoop ? is it possible  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache Kafka
- 
						
							
		
			Apache Oozie
			
    
	
		
		
		11-02-2016
	
		
		01:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hive Table:   Orginal table  Database Name : Student  Tabe name : Student_detail      id  name  dept    1  siva  cse     Need Output :  Database Name : CSE  Tabe name : New_tudent_detail     s_id  s_name  s_dept    1  siva  cse     i want Migrate Student_detail hive table into New_tudent_detail without data lose using spark  
  Different colum name  Different database   Different table  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Spark
			
    
	
		
		
		09-22-2016
	
		
		03:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Hi @Mats Johansson  
	i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .  
	after i ass new node i got   	WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks  
	so i am remove the corrupte file in my cluster   after i excute hdfs fsck / heal  The filesystem under path '/' is HEALTHY  change good  but   Under-replicated blocks:       1572982 (95.59069 %)  Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second  hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days   i want accuthe fille and balance replication for the data node   dfs.namenode.replication.work.multiplier.per.iteration 2  i dont hv below peroberty   dfs.namenode.replication.max-streams  dfs.namenode.replication.max-streams-hard-limit  i am using hadoop 1.x serice   what is the best way to balance my cluster 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-22-2016
	
		
		05:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i execute the cmd hadoop dfs -setrep -R -w 3 /  it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		03-22-2016
	
		
		10:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i alrady done that step  hive> add "somepath/mongo-hadoop-hive.jar"  hive> add "somepath/mongo-hadoop-core.jar" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-22-2016
	
		
		07:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Jar -> mongo-hadoop-core-1.4.0,mongo-hadoop-hive-1.4.0,mongo-java-driver-2.10.1   hive> CREATE EXTERNAL TABLE minute_bars
    > (
    >
    > id STRING,
    >     Symbol STRING,
    >     `Timestamp` STRING,
    >     Day INT,
    >     Open DOUBLE,
    >     High DOUBLE,
    >     Low DOUBLE,
    >     Close DOUBLE,
    >     Volume INT
    > )
    > STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
    > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id",
    >  "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
    > TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable
hive> 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
 
         
					
				













