About ledel

ledel · ‎06-12-2017

Hi @srinivas s if you're using rack awareness you should probably get rid of 50 datanodes by decomissioning them without loosing some blocks, otherwise you'll probably will. rebalance time depends on your network and cluster utilization, you can adjust some parameters to make it fastest if necessary, basically hdfs dfsadmin -setBalancerBandwidth <bandwidth (kb/s)> or within your HDFS params (example) : dfs.balance.bandwidthPerSec=100000000 dfs.datanode.max.transfer.threads=16384 dfs.datanode.balance.max.concurrent.moves=500 please check Accept if you're satisfied with my answer

ledel · ‎06-09-2017

Hi, you have a "Hortonworks Sandbox Archive" just below "Hortonworks Sandbox in the Cloud"

ledel · ‎06-07-2017

job.properties only contains properties to be propagated to your workflows, so small answer is yes you can have a single job.properties for your 20 workflows. Bundle and coordinators requires other parameters (like start/endDate) so it's better to have a specific job.properties for them

ledel · ‎01-18-2017

On RHEL/CentOS you might encounter an exception when trying to stop or restart Oozie : resource_management.core.exceptions.Fail: Execution of 'cd /var/tmp/oozie && /usr/hdp/current/oozie-server/bin/oozie-stop.sh' returned 1. -bash: line 0: cd: /var/tmp/oozie: No such file or directory This is likely because of a shell crontab /etc/cron.daily/tmpwatch which delete files/directories unmodified for 30d+ [root@local ~]# cat /etc/cron.daily/tmpwatch #! /bin/sh flags=-umc /usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \ -x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \ -X '/tmp/hsperfdata_*' 10d /tmp /usr/sbin/tmpwatch "$flags" 30d /var/tmp for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do if [ -d "$d" ]; then /usr/sbin/tmpwatch "$flags" -f 30d "$d" fi done Just recreate the directory and you're good to go [root@local ~]# mkdir /var/tmp/oozie [root@local ~]# chown oozie:hadoop /var/tmp/oozie [root@local ~]# chmod 755 /var/tmp/oozie

ledel · ‎01-06-2017

You should consider running hadoop streaming using your python mapper and reducer. Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow Try first to execute your streaming directly with something like yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command Please accept answer if I answered your question

ledel · ‎01-06-2017

@justlearning Oozie can't do mapreduce by itself, it's a Hadoop scheduler which launch workflows composed of jobs, which can be mapreduce. You here want to run a job defined by workflow.xml with parameters in job.properties, so the syntax is oozie job --oozie http://sandbox.hortonworks.com:11000/oozie -config job.properties -run

ledel · ‎11-24-2016

@rama did you try increasing your mappers memory? and is your request failing with hive.execution.engine=mr as well?

ledel · ‎10-23-2016

@Pierre Villard I got it working with -D mapred.job.name=mySqoopTest

ledel · ‎09-21-2016

hi @Roberto Sancho , you can put 2 agents config into the same config file : basically you'll have agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 agent2.sources = source2 agent2.channels = channel2 agent2.sinks = sink2

ledel · ‎09-20-2016

you can update mount points by overriding in a new Ambari Config Group (then restart those nodes, of course, but you'll be notified) since HDFS will recreate the missing blocks on those 3 disks per node, you shouldn't have any loss

Online	Offline
Last Visited	‎11-12-2024 09:12 AM

Member Since	‎09-17-2015 08:39 PM
Last Visited	‎11-12-2024 09:12 AM
Posts	103
Kudos received	63

Cloudera Community

Re: hive-interactive and hive.server2.enable.doas

Re: Hive primary on a partitioned column

Re: How can I download sandboxes from past release...

Re: How many job.properties per workflow, coordin...

Re: running/scheduling an Oozie job(s) with mapred...

Re: mass removal of datanodes impact!

Re: How can I download sandboxes from past release...

Re: How many job.properties per workflow, coordin...

/var/tmp/oozie: No such file or directory

Re: running/scheduling an Oozie job(s) with mapred...

Re: running/scheduling an Oozie job(s) with mapred...

Re: Hive job fail on TEZ due to out of memory..

Re: Sqoop job name

Re: I cant configure 2 flume anget

Re: HDFS - Different Storage Configurations on Ser...