Member since
09-17-2015
103
Posts
61
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2364 | 06-15-2017 11:58 AM | |
2231 | 06-15-2017 09:18 AM | |
2942 | 06-09-2017 10:45 AM | |
1456 | 06-07-2017 03:52 PM | |
3184 | 01-06-2017 09:41 PM |
07-08-2016
05:19 PM
@sankar rao you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed. There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only: #!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [ ! "$1" ]
then
echo $usage
exit 1
fi
now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [ $difference -gt $1 ]; then
hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
fi
done
... View more
06-28-2016
01:43 PM
1 Kudo
When using a postgresql DB for Hue, you might have encoutered
[root@hue ~]# cd /usr/lib/hue
[root@hue hue]# source ./build/env/bin/activate
(env)[root@hue hue]# hue syncdb
Traceback (most recent call last):
File "/usr/lib/hue/build/env/bin/hue", line 9, in <module>
load_entry_point('desktop==2.6.1', 'console_scripts', 'hue')()
File "/usr/lib/hue/desktop/core/src/desktop/manage_entry.py", line 60, in entry
execute_manager(settings)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 438, in execute_manager
utility.execute()
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 379, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 261, in fetch_command
klass = load_command_class(app_name, subcommand)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/management/__init__.py", line 67, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/South-0.8.2-py2.6.egg/south/management/commands/__init__.py", line 10, in <module>
import django.template.loaders.app_directories
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/template/loaders/app_directories.py", line 21, in <module>
mod = import_module(app)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/admin/__init__.py", line 1, in <module>
from django.contrib.admin.helpers import ACTION_CHECKBOX_NAME
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/admin/helpers.py", line 1, in <module>
from django import forms
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/forms/__init__.py", line 17, in <module>
from models import *
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/forms/models.py", line 6, in <module>
from django.db import connections
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/__init__.py", line 77, in <module>
connection = connections[DEFAULT_DB_ALIAS]
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/utils.py", line 91, in __getitem__
backend = load_backend(db['ENGINE'])
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/utils.py", line 32, in load_backend
return import_module('.base', backend_name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/backends/postgresql_psycopg2/base.py", line 24, in <module>
raise ImproperlyConfigured("Error loading psycopg2 module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading psycopg2 module: No module named psycopg2
Download the psycopg2 module https://pypi.python.org/packages/source/p/psycopg2/psycopg2-2.6.1.tar.gz#md5=842b44f8c95517ed5b792081a2370da1 Then install it with easy_install (env)[root@hue hue]# easy_install /root/psycopg2-2.6.1.tar.gz
Processing psycopg2-2.6.1.tar.gz
Running psycopg2-2.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-IDKkmV/psycopg2-2.6.1/egg-dist-tmp-CR0nZv
zip_safe flag not set; analyzing archive contents...
psycopg2.tests.test_types_basic: module references __file__
psycopg2.tests.test_module: module references __file__
Adding psycopg2 2.6.1 to easy-install.pth file
Installed /usr/lib/hue/build/env/lib/python2.6/site-packages/psycopg2-2.6.1-py2.6-linux-x86_64.egg
Processing dependencies for psycopg2==2.6.1
Finished processing dependencies for psycopg2==2.6.1
done ! (You might however consider Ambari views)
... View more
Labels:
05-12-2016
02:00 AM
1 Kudo
Using Sqoop (Sql from/to Hadoop), you can use a password file instead of a plaintext password, which is more secure : <arg>--password-file</arg>
<arg>hdfs://NAMENODE/teradata.password</arg> You may end with error like 3737 [main] ERROR org.apache.sqoop.teradata.TeradataSqoopExportHelper - Exception running Teradata export job
com.teradata.connector.common.exception.ConnectorException: java.sql.SQLException: [Teradata Database] [TeraJDBC 15.00.00.20] [Error 8017] [SQLState 28000] The UserId, Password or Account is invalid. If you set your “Password” pass with vi you’ll end with a line feed control character, making the password invalid. To find if there’s a LF ending the password file, use od (display file in octal format): [root@localhost ~]# od -c teradata.password
0000000 P a s s w o r d \n
0000011 You’ll have to delete your newline control character using tr : [root@localhost ~]# tr -d '\n' < teradata.password > teradata.password.new
[root@localhost ~]# od -c teradata.password.new
0000000 P a s s w o r d
0000010
... View more
Labels:
05-02-2016
01:52 PM
1 Kudo
It depends on the memory you have on your cluster. You have an amount of RAM allocated to YARN (yarn.nodemanager.resource.memory-mb) on a node, then each mapper has a size (in MR it's mapreduce.map.memory.mb), that gives you an idea (you'll also use memory for ApplicationMasters, Reducers,etc.)
... View more
05-02-2016
07:41 AM
@Sumit Nigam yes, well, it will split on 10G size. That means you won't have 10G regions at the end, that just means that regions will split when reaching 10G size...
... View more
04-29-2016
08:35 PM
2 Kudos
let's add that 10GB sizes are not fixed sizes since the default algorithm used is IncreasingToUpperBoundRegionSplitPolicy and not ConstantSizeRegionSplitPolicy (but you can set the latest by altering table from HBase shell for example).
This means that you can't have an estimate of HDFS size by doing simple maths from regions number and region size parameter. Regardless of the policy being used, a 10GB region which just split doesn't gives you two 10GB regions.
... View more
04-26-2016
09:18 AM
3 Kudos
Hi, if you just want to compare HDP components configurations you can use a simple shell script to export all configurations on each cluster and vimdiff the 2 files. I made a very simple script to achieve that, feel free to use/update https://github.com/laurentedel/hadoop-scripts/blob/master/backup_configs.sh
... View more
04-18-2016
04:11 PM
2 Kudos
you can force reducers by setting SET mapreduce.job.reduces=XX
... View more
04-14-2016
10:48 PM
2 Kudos
I would say
NameNode 8020 Datanode 50010 WebHDFS 50070 and HttpFS 14000 ?
... View more
- « Previous
- Next »