About mbigelow

mbigelow · ‎02-08-2017

Just to be clear, you want the output on the MR job launch and progress right? Like this... INFO : Compiling command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb): select distinct wafernum_part('fab.op2451',wafernum) from fab.op2451 where storeday in ("2017-01-05") INFO : converting to local hdfs:/lib/business-dedupe-2.1.0.jar INFO : Added [/tmp/2e04052d-c322-4047-a4d5-c52d67ddc46c_resources/business-dedupe-2.1.0.jar] to class path INFO : Added resources: [hdfs:/lib/business-dedupe-2.1.0.jar] INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb); Time taken: 0.253 seconds INFO : Executing command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb): select distinct wafernum_part('fab.op2451',wafernum) from fab.op2451 where storeday in ("2017-01-05") INFO : Query ID = hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Number of reduce tasks not specified. Estimated from input data size: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=<number> INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=<number> INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces=<number> INFO : number of splits:8 INFO : Submitting tokens for job: job_1486193162125_5338 INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for mbige7303763: HDFS_DELEGATION_TOKEN owner=user, renewer=yarn, realUser=hive/hive_princ, issueDate=1486571884281, maxDate=1487176684281, sequenceNumber=67028, masterKeyId=147) INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 0c 6d 62 69 67 65 37 33 30 33 37 36 33 0c 6d 62 69 67 65 37 33 30 33 37 36 33 2e 68 69 76 65 2f 61 62 6f 2d 6c 70 33 2d 65 78 74 65 64 30 31 2e 77 64 63 2e 63 6f 6d 40 48 49 54 41 43 48 49 47 53 54 2e 47 4c 4f 42 41 4c 8a 01 5a 1e 96 8f 7d 8a 01 5a 42 a3 13 7d 8e 0e b4 30 INFO : The url to track the job: https://RM_host:8090/proxy/application_1486193162125_5338/ INFO : Starting Job = job_1486193162125_5338, Tracking URL = https://RM_host:8090/proxy/application_1486193162125_5338/ INFO : Kill Command = /opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1486193162125_5338 INFO : Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 1 INFO : 2017-02-08 16:38:11,038 Stage-1 map = 0%, reduce = 0% INFO : 2017-02-08 16:38:22,266 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 12.78 sec INFO : 2017-02-08 16:38:24,308 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 116.9 sec INFO : 2017-02-08 16:38:51,878 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 329.89 sec INFO : 2017-02-08 16:38:52,896 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 330.75 sec INFO : 2017-02-08 16:38:54,933 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 346.65 sec INFO : 2017-02-08 16:38:55,952 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 348.19 sec INFO : 2017-02-08 16:38:57,988 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 358.66 sec INFO : 2017-02-08 16:38:59,009 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 360.07 sec INFO : 2017-02-08 16:39:01,048 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 363.5 sec INFO : 2017-02-08 16:39:07,176 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 367.49 sec INFO : MapReduce Total cumulative CPU time: 6 minutes 7 seconds 490 msec INFO : Ended Job = job_1486193162125_5338 INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: Map: 8 Reduce: 1 Cumulative CPU: 367.49 sec HDFS Read: 942254005 HDFS Write: 64 SUCCESS INFO : Total MapReduce CPU Time Spent: 6 minutes 7 seconds 490 msec INFO : Completed executing command(queryId=hive_20170208163838_f60cb50c-255e-4ea0-8c23-eb894bba7bbb); Time taken: 64.188 seconds INFO : OK What are you logging levels in Hive, specifically for HS2? HiveServer2 Logging Threshold in CM

mbigelow · ‎02-07-2017

try 'sudo yum list installed | grep <package>' that will tell you if it is available in the currently installed repositories and which ones. Let me know if you can't find one or more of them.

mbigelow · ‎02-07-2017

Doesn't it bind to IP address 0.0.0.0 by default? Or maybe it is bound to 0.0.0.0 in the configs. I may be mistaken on that. You can try binding it to the correct IP or hostname. In the same safety valve, Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini, added http_host under the desktop section.. [desktop] http_host=hue-host.example.com

mbigelow · ‎02-07-2017

There are a number of settings for Jobhistory that can be causing it to expire and remove the logs. The first is the number of milliseconds that it will keep a job and its logs around (log removal only applies if log aggregation is in use). The default is 7 days. The second is the number of jobs and the default is 20,000. That sounds like a lot of I have seen large, active clusters burn through that in 2 - 3 days. mapreduce.jobhistory.max-age-ms mapreduce.jobhistory.joblist.cache.size

mbigelow · ‎02-07-2017

Fuse should be part of the CDH repo but httpd and openssl, etc. should come from the OS or possible epel repos. What OS are you using? You will need to manage these dependencies manually or set up the CM repo on all nodes and use your package manager. Just checking but is there a reason you are not pushing it through CM?

mbigelow · ‎02-06-2017

Try it without env: set CONSOLETYPE=vt; This didn't throw errors for me. I did not test it further though to see if it actual changes anything.

mbigelow · ‎02-06-2017

Ok, find this pythons script in your Hue install location. Below is the path for CDH. /opt/cloudera/parcels/CDH/lib/hue/tools/app_reg/app_reg.py This is the default for Hue. /usr/share/hue/tools/app_reg/app_reg.py Try to install the Impala app /opt/cloudera/parcels/CDH/lib/hue/tools/app_reg/app_reg.py --install /opt/cloudera/parcels/CDH/lib/hue/apps/impala /usr/share/hue/tools/app_reg/app_reg.py --install /usr/share/hue/apps/impala

mbigelow · ‎02-05-2017

It actually isn't listed. app_blacklist=search,rdbms,zookeeper,security,pig,spark,security The Impala app sections look ok as long as Impala is running on the same host as Hue. Can you post a screen shot of your Hue groups?

mbigelow · ‎02-05-2017

What I am reading is that you are passing the info to query a single table of the 1000 and insert it into your bigger table, is that right? So you would launch this script and Spark job 1000 times. I recommend a different approach to make better use of spark and I think I have the solution to your issue. Warning: my last experience with Spark, Hive, and Parquet was in Spark 1.6.0 and Parquet took a lot of memory due to how the writer's behave. I recommend that you change the job to create union of each DF. So in the Spark application you would loop through each table, read the data and then union it to the last. This be heavy on memory usage to hold all of it but more efficient use of Spark. I can't get in a spark-shell right now but this doesn't look right. Format is a method of a DF but you have it just have it just after the SQL statement. What are you passing to 'repository'? Are the source tables in parquet format already? sqlContext.sql('create external table if not exists testing.{}(iim_time string, plc_time string, package string, group string, tag_name string, tag_value float, ngp float) row format delimited fields terminated by "," stored as parquet'.format(repository))

mbigelow · ‎02-05-2017

Is there a specific reason that you are looking for that version of the MR client jobclient jar? The proper jar for your CDH version can be found at the below location. /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/ If you need that specific one try searching the CDH or maven repository for it.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Console logs not showing in Beeline

Re: Installation of Hadoop using Cloudera manager ...

Re: Hue: The Cloudera Manager Agent is not able to...

Re: Job application has expired Cannot be found on...

Re: Installation of Hadoop using Cloudera manager ...

Re: Console logs not showing in Beeline

Re: Unable to access Impala app from HUE Web UI

Re: Unable to access Impala app from HUE Web UI

Re: Parquet is not a parquet file (too small)

Re: testDFSIO