Member since 
    
	
		
		
		08-16-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                642
            
            
                Posts
            
        
                131
            
            
                Kudos Received
            
        
                68
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3976 | 10-13-2017 09:42 PM | |
| 7471 | 09-14-2017 11:15 AM | |
| 3796 | 09-13-2017 10:35 PM | |
| 6031 | 09-13-2017 10:25 PM | |
| 6598 | 09-13-2017 10:05 PM | 
			
    
	
		
		
		02-08-2017
	
		
		08:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Just to be clear, you want the output on the MR job launch and progress right?  Like this...     INFO : Compiling command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb): select distinct wafernum_part('fab.op2451',wafernum) from fab.op2451 where storeday in ("2017-01-05")  INFO : converting to local hdfs:/lib/business-dedupe-2.1.0.jar  INFO : Added [/tmp/2e04052d-c322-4047-a4d5-c52d67ddc46c_resources/business-dedupe-2.1.0.jar] to class path  INFO : Added resources: [hdfs:/lib/business-dedupe-2.1.0.jar]  INFO : Semantic Analysis Completed  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null)  INFO : Completed compiling command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb); Time taken: 0.253 seconds  INFO : Executing command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb): select distinct wafernum_part('fab.op2451',wafernum) from fab.op2451 where storeday in ("2017-01-05")  INFO : Query ID = hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb  INFO : Total jobs = 1  INFO : Launching Job 1 out of 1  INFO : Starting task [Stage-1:MAPRED] in serial mode  INFO : Number of reduce tasks not specified. Estimated from input data size: 1  INFO : In order to change the average load for a reducer (in bytes):  INFO : set hive.exec.reducers.bytes.per.reducer=<number>  INFO : In order to limit the maximum number of reducers:  INFO : set hive.exec.reducers.max=<number>  INFO : In order to set a constant number of reducers:  INFO : set mapreduce.job.reduces=<number>  INFO : number of splits:8  INFO : Submitting tokens for job: job_1486193162125_5338  INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for mbige7303763: HDFS_DELEGATION_TOKEN owner=user, renewer=yarn, realUser=hive/hive_princ, issueDate=1486571884281, maxDate=1487176684281, sequenceNumber=67028, masterKeyId=147)  INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 0c 6d 62 69 67 65 37 33 30 33 37 36 33 0c 6d 62 69 67 65 37 33 30 33 37 36 33 2e 68 69 76 65 2f 61 62 6f 2d 6c 70 33 2d 65 78 74 65 64 30 31 2e 77 64 63 2e 63 6f 6d 40 48 49 54 41 43 48 49 47 53 54 2e 47 4c 4f 42 41 4c 8a 01 5a 1e 96 8f 7d 8a 01 5a 42 a3 13 7d 8e 0e b4 30  INFO : The url to track the job: https://RM_host:8090/proxy/application_1486193162125_5338/  INFO : Starting Job = job_1486193162125_5338, Tracking URL = https://RM_host:8090/proxy/application_1486193162125_5338/  INFO : Kill Command = /opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1486193162125_5338  INFO : Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 1  INFO : 2017-02-08 16:38:11,038 Stage-1 map = 0%, reduce = 0%  INFO : 2017-02-08 16:38:22,266 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 12.78 sec  INFO : 2017-02-08 16:38:24,308 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 116.9 sec  INFO : 2017-02-08 16:38:51,878 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 329.89 sec  INFO : 2017-02-08 16:38:52,896 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 330.75 sec  INFO : 2017-02-08 16:38:54,933 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 346.65 sec  INFO : 2017-02-08 16:38:55,952 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 348.19 sec  INFO : 2017-02-08 16:38:57,988 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 358.66 sec  INFO : 2017-02-08 16:38:59,009 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 360.07 sec  INFO : 2017-02-08 16:39:01,048 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 363.5 sec  INFO : 2017-02-08 16:39:07,176 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 367.49 sec  INFO : MapReduce Total cumulative CPU time: 6 minutes 7 seconds 490 msec  INFO : Ended Job = job_1486193162125_5338  INFO : MapReduce Jobs Launched:  INFO : Stage-Stage-1: Map: 8 Reduce: 1 Cumulative CPU: 367.49 sec HDFS Read: 942254005 HDFS Write: 64 SUCCESS  INFO : Total MapReduce CPU Time Spent: 6 minutes 7 seconds 490 msec  INFO : Completed executing command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb); Time taken: 64.188 seconds  INFO : OK     What are you logging levels in Hive, specifically for HS2?     HiveServer2 Logging Threshold in CM 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-07-2017
	
		
		08:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							try 'sudo yum list installed | grep <package>'    that will tell you if it is available in the currently installed repositories and which ones. Let me know if you can't find one or more of them.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-07-2017
	
		
		08:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Doesn't it bind to IP address 0.0.0.0 by default? Or maybe it is bound to 0.0.0.0 in the configs. I may be mistaken on that. You can try binding it to the correct IP or hostname.    In the same safety valve, Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini, added http_host under the desktop section..    [desktop]  http_host=hue-host.example.com
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-07-2017
	
		
		11:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							There are a number of settings for Jobhistory that can be causing it to expire and remove the logs.    The first is the number of milliseconds that it will keep a job and its logs around (log removal only applies if log aggregation is in use). The default is 7 days. The second is the number of jobs and the default is 20,000. That sounds like a lot of I have seen large, active clusters burn through that in 2 - 3 days.    mapreduce.jobhistory.max-age-ms  mapreduce.jobhistory.joblist.cache.size
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-07-2017
	
		
		11:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Fuse should be part of the CDH repo but httpd and openssl, etc. should come from the OS or possible epel repos. What OS are you using?    You will need to manage these dependencies manually or set up the CM repo on all nodes and use your package manager.    Just checking but is there a reason you are not pushing it through CM?
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-06-2017
	
		
		10:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Try it without env:    set CONSOLETYPE=vt;    This didn't throw errors for me. I did not test it further though to see if it actual changes anything.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-06-2017
	
		
		12:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							Ok, find this pythons script in your Hue install location. Below is the path for CDH.    /opt/cloudera/parcels/CDH/lib/hue/tools/app_reg/app_reg.py    This is the default for Hue.    /usr/share/hue/tools/app_reg/app_reg.py    Try to install the Impala app    /opt/cloudera/parcels/CDH/lib/hue/tools/app_reg/app_reg.py  --install /opt/cloudera/parcels/CDH/lib/hue/apps/impala    /usr/share/hue/tools/app_reg/app_reg.py --install /usr/share/hue/apps/impala
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2017
	
		
		11:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							It actually isn't listed.    app_blacklist=search,rdbms,zookeeper,security,pig,spark,security    The Impala app sections look ok as long as Impala is running on the same host as Hue.    Can you post a screen shot of your Hue groups?
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2017
	
		
		10:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							What I am reading is that you are passing the info to query a single table of the 1000 and insert it into your bigger table, is that right? So you would launch this script and Spark job 1000 times. I recommend a different approach to make better use of spark and I think I have the solution to your issue.    Warning: my last experience with Spark, Hive, and Parquet was in Spark 1.6.0 and Parquet took a lot of memory due to how the writer's behave.    I recommend that you change the job to create union of each DF. So in the Spark application you would loop through each table, read the data and then union it to the last. This be heavy on memory usage to hold all of it but more efficient use of Spark.    I can't get in a spark-shell right now but this doesn't look right. Format is a method of a DF but you have it just have it just after the SQL statement. What are you passing to 'repository'? Are the source tables in parquet format already?    sqlContext.sql('create external table if not exists testing.{}(iim_time string, plc_time string, package string, group string, tag_name string, tag_value float, ngp float) row format delimited fields terminated by "," stored as parquet'.format(repository))
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2017
	
		
		09:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							Is there a specific reason that you are looking for that version of the MR client jobclient jar?    The proper jar for your CDH version can be found at the below location.    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/    If you need that specific one try searching the CDH or maven repository for it.
						
					
					... View more