Member since 
    
	
		
		
		03-21-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                18
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		06-25-2020
	
		
		02:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I will check our spark 2.4.5 application code compatibility with spark 2.3.2 version. Is Ambari & HDP going to be discontinued in near future as part of cloudera and hortonworks merger going ? We need to plan our choice of softwares accordingly. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-25-2020
	
		
		01:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks  for the reply. Can we install hadoop & spark 2.4.5 packages on multi node cluster without using hdp, ambari & cloudera ? We already have spark applications running on spark 2.4.5 version and we do not want to go back to backward versions. Even we are planning to upgrade them soon to spark 3 because of better delta lake compatibility.      If we install hadoop and spark packages manually on each node of the cluster, can there be any maintanance issues at later stage in production ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-24-2020
	
		
		02:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  I need to setup a 5 node cluster with Hadoop 3.1.0 and Spark 2.4.5 . Someone recommended to use Ambari to do so. I checked Ambari but it seems Ambari can be used only to install HDP and latest HDP do not support Spark 2.4.5 version.      Please suggest in this aspect, what will be the best way to setup the required big data cluster. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Hadoop
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		08-13-2017
	
		
		02:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Use mapPartitions if we want to add header in all files or if there is single partition.   topPriceResultsDF
.map(x => x.mkString(","))
.mapPartitions(iter => Iterator(header) ++ iter)
.saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv")  Use mapPartitionsWithIndex if we want to add header in only first file  topPriceResultsDF.map(x => x.mkString(","))
.repartition(2)
.mapPartitionsWithIndex ({
case (0, iter) => Iterator(header) ++ iter
case (_, iter) => iter
})
.saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv") 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-11-2017
	
		
		11:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,
How can we add a header to Spark SQL Query results before saving the results in a textfile?  Spark version is 1.6      val topPriceResultsDF = sqlContext.sql("SELECT * FROM retail_db.yahoo_stock_orc WHERE open_price > 40 AND high_price > 40 ORDER BY date ASC")
    topPriceResultsDF.map(x => x.mkString(",")).saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40_results(comma).csv")   It saves only data but I need to add header  like 
(date,open_price,high_price,low_price,close_price,volume,adj_price) as well . Please help if anyone has idea !!  I cannot use databricks library.   O/P should be like 
    date,open_price,high_price,low_price,close_price,volume,adj_price
    1997-07-09,40.75008,45.12504,40.75008,43.99992,37545600,1.83333   Thanks !! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		04-01-2017
	
		
		01:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error   Unable to open iterator foralias  I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well.  [mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin
[mapred@sandbox sbin]$ ls 
mr-jobhistory-daemon.sh
[mapred@sandbox sbin]$ mr-jobhistory-daemon.sh start historyserver 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-30-2017
	
		
		05:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hi All,  I have downloaded millionsongssubset data from http://static.echonest.com/millionsongsubset_full.tar.gz and tried to upload it   and print sample    songs = LOAD '/user/root/datasets/millionsongsubset_full.tar.gz'
songs_limit = LIMIT songs 10;
DUMP songs_limit;  Records are displayed as below. Please suggest how to upload above downloaded data in right format   grunt> DUMP songs_limit;2017-03-30 05:03:24,383 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT2017-03-30 05:03:24,458 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,459 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}2017-03-30 05:03:24,474 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 12017-03-30 05:03:24,479 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,507 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12017-03-30 05:03:24,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 12017-03-30 05:03:24,524 [main] INFO  org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]2017-03-30 05:03:24,609 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://sandbox.technocrafty:8020/tmp/temp-607255022/tmp-1815565156/_temporary/0/task__0001_m_0000012017-03-30 05:03:24,646 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,655 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12017-03-30 05:03:24,656 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1(MillionSongSubset/0000755000175000017500000000000011516357374014450 5ustar  thierrythierryMillionSongSubset/AdditionalFiles/0000755000175000017500000000000011516366075017501 5ustar  thierrythierryMillionSongSubset/AdditionalFiles/subset_unique_tracks.txt0000644000175000017500000317201311516365717024515 0ustar  thierrythierryTRAAAAW128F429D538<SEP>SOMZWCG12A8C13C480<SEP>Casual<SEP>I Didn't Mean To)(TRAAABD128F429CF47<SEP>SOCIWDW12A8C13D406<SEP>The Box Tops<SEP>Soul Deep)(TRAAADZ128F9348C2E<SEP>SOXVLOJ12AB0189215<SEP>Sonora Santanera<SEP>Amor De Cabaret)(TRAAAEF128F4273421<SEP>SONHOTT12A8C13493C<SEP>Adam Ant<SEP>Something Girls)(TRAAAFD128F92F423A<SEP>SOFSOCN12A8C143F5D<SEP>Gob<SEP>Face the Ashes)(TRAAAMO128F1481E7F<SEP>SOYMRWW12A6D4FAB14<SEP>Jeff And Sheri Easter<SEP>The Moon And I (Ordinary Day Album Version))(TRAAAMQ128F1460CD3<SEP>SOMJBYD12A6D4F8557<SEP>Rated R<SEP>Keepin It Real (Skit))(TRAAAPK128E0786D96<SEP>SOHKNRJ12A6701D1F8<SEP>Tweeterfriendly Music<SEP>Drop of Rain)(TRAAARJ128F9320760<SEP>SOIAZJW12AB01853F1<SEP>Planet P Project<SEP>Pink World)(TRAAAVG12903CFA543<SEP>SOUDSGM12AC9618304<SEP>Clp<SEP>Insatiable (Instrumental Version))  Thanks !! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Pig
 
			
    
	
		
		
		03-21-2017
	
		
		08:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks Jay !!  I got it. I need not to make change in Ambari, I can do it simply through CLI. It worked. Making change in Ambari was not reflected in "/etc/hive/conf/hive-site.xml". I don't know why.  hive> set hive.exec.post.hooks;
hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook
hive> set hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook;
hive> select current_database();
OK
default
Time taken: 3.074 seconds, Fetched: 1 row(s) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-21-2017
	
		
		08:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hive> select current_database();
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
   at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerDatabase(HiveMetaStoreBridge.java:109)
   at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerTable(HiveMetaStoreBridge.java:270)
   at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:309)
   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:202)
   at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:160)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-21-2017
	
		
		08:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 In  Advanced Settings > General > Property name "hive.exec.post.hooks" , I removed   "org.apache.atlas.hive.hook.HiveHook" entry  Still same error ! 
						
					
					... View more