Member since
10-01-2015
3933
Posts
1148
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1105 | 05-03-2017 05:13 PM | |
954 | 05-02-2017 08:38 AM | |
1033 | 05-02-2017 08:13 AM | |
1276 | 04-20-2017 12:28 AM | |
1230 | 04-10-2017 10:51 PM |
04-10-2017
10:51 PM
2 Kudos
remove Atlas hook class in your hive-site.xml unless you are running Atlas you don't need it. <property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
... View more
03-28-2017
02:27 AM
1 Kudo
Answer here https://community.hortonworks.com/questions/66805/issue-when-deploying-a-new-parser-topology-in-stor.html#answer-66998
... View more
03-26-2017
02:41 AM
Does it mean with HDP 2.5+ we support date type in Avro 1.8+? Because that would be awesome Venkat.
... View more
03-26-2017
02:34 AM
3 Kudos
I used to see that behavior in the first two weeks of every new month but seeing that we are in 4th week, I can't explain it. Leaderboard is not always up to date. I'll a raise a bug.
... View more
03-24-2017
03:12 PM
4 Kudos
@Namit Maheshwari on the last moderator call, we decided to raise the reputation level to 1000 before someone can accept answers.
... View more
03-18-2017
10:04 AM
Is that host name part of the whole FQDN? What does hostname -f return? make sure /etc/hosts has the following IP servername.fqdn.com servername
... View more
03-17-2017
08:14 PM
@Carol Elliott actually see if you can use the -D or -conf options -conf <configuration file> specify an application configuration file
-D <property=value> use value for given property The -conf , -D , -fs and -jt arguments control the configuration and Hadoop server settings. For example, the -D mapred.job.name=<job_name> can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name. so in same fashion, try sqoop import -D hive.exec.scratchdir=... https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_using_generic_and_specific_arguments
... View more
03-17-2017
08:09 PM
@Carol Elliott I didn't try this but can you try the following export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/location_of_your_hive_site.xml/*
edit your hive-site.xml file to have own scratchdir so you will have your own copy of hive-site.xml then if it works you can add it to your bash_profile and source it. source ~/.bash_profile
... View more
03-17-2017
11:58 AM
Did you try to enable them In mapred-site and tez-site? It is interesting that it cannot find it. Also try setting it in mapred-site and include it with your workflow, not necessarily enabling it on the whole cluster.
... View more
03-17-2017
11:52 AM
As an alternative, you can change scratchdir like so https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration Using the set command in the CLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to /tmp/mydir for all subsequent statements:
set hive.exec.scratchdir=/tmp/mydir;
Using the --hiveconf option of the hive command (in the CLI) or beeline command for the entire session. For example:
bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir
In hive-site.xml. This is used for setting values for the entire Hive configuration (see hive-site.xml and hive-default.xml.template below). For example:
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
<description>Scratch space for Hive jobs</description>
</property>
... View more
03-17-2017
11:48 AM
HDFS has a mechanism called quotas, it is possible that your admin team set storage quotas on the individual user directories, you can set larger quota on your directory and avoid the situation # requires superuser privileges
# set space quota of 1kb on a directory, can be k, m, g, etc.
sudo -u hdfs hdfs dfsadmin -setSpaceQuota 1k /quotasdir
# add a file
sudo -u hdfs hdfs dfs -touchz /quotasdir/1
# notice file is 0 bytes
sudo -u hdfs hdfs dfs -ls /quotasdir/
# for demo purposes, we need to upload a large file, larger than 1kb into directory, watch the prompt
sudo -u hdfs hdfs dfs -chown -R root:hdfs /quotasdir
hdfs dfs -put /root/install.log /quotasdir/
15/11/25 15:10:47 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /quotasdir is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402653184 B = 384 MB
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211)
at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:907)
# remove space quota
sudo -u hdfs hdfs dfsadmin -clrSpaceQuota /quotasdir
... View more
03-16-2017
07:54 PM
@Sam Pat first of all thanks for checking out my article, I see you have company reference in your error message, please edit your comment and remove it. Secondly, can you run your python script w/out Oozie? I have a feeling you're trying to execute a Python 2 script with Python3 as default interpreter. You should add the interpreter line to your script and try again. Take a look at my scripts I have a version for Python 2 #! /usr/bin/env python
and Python 3 #! /usr/bin/env /usr/local/bin/python3.3
If your cluster has Python 3 installed, make sure it's across the whole cluster and has the same path. If it's Python2 then also make sure every node is configured correctly with the location of the interpreter.
... View more
03-16-2017
02:51 PM
@Amit Panda to do this on continuous basis you either need to setup an Oozie job that will run a script to determine old data and move it to new location. Alternatively, you can use Apache Nifi by watching a directory for old data and move it to new location. There's nothing out of the box that will do that for you.
... View more
03-15-2017
11:44 PM
Have you seen this http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/ams_performance_tuning.html
... View more
03-15-2017
10:47 PM
Please see this http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/ams_performance_tuning.html And https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning
... View more
03-15-2017
10:42 PM
please see this http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-reference/content/ch_amb_ref_customizing_ambari_log_pid_dirs.html In Ambari 2.5 you will also be able to easily configure log4j properties for other components.
... View more
03-15-2017
10:33 PM
1 Kudo
Please check whether failures only occur on the exact same node? Also can you drill down into the yarn job logs and see what error you get?
... View more
03-15-2017
07:40 PM
3 Kudos
@zaenal rifai the solution to your problem is to wait for Ambari 2.5 and use Workflow manager view. I just tested a workflow with 26 actions and it displayed well, except it's hard to see with so many actions. I attached screenshots to my response. The other option is of course to file a jira and either submit a patch or wait for community to work on the jira. I say use Workflow Manager when it comes out. The collapsed view is from the original design of the flow, the image titled many and many2 is result of running the workflow and representing the flow graph, which is what you're trying to do. With WFM, you actually can visually display your flow graph before and after the execution of the flow. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="Workflow3">
<start to="email_1"/>
<action name="email_1">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_2"/>
<error to="kill"/>
</action>
<action name="email_2">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_3"/>
<error to="kill"/>
</action>
<action name="email_3">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_4"/>
<error to="kill"/>
</action>
<action name="email_4">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_5"/>
<error to="kill"/>
</action>
<action name="email_5">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_6"/>
<error to="kill"/>
</action>
<action name="email_6">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_7"/>
<error to="kill"/>
</action>
<action name="email_7">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_8"/>
<error to="kill"/>
</action>
<action name="email_8">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_9"/>
<error to="kill"/>
</action>
<action name="email_9">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_10"/>
<error to="kill"/>
</action>
<action name="email_10">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_11"/>
<error to="kill"/>
</action>
<action name="email_11">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_12"/>
<error to="kill"/>
</action>
<action name="email_12">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_13"/>
<error to="kill"/>
</action>
<action name="email_13">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_14"/>
<error to="kill"/>
</action>
<action name="email_14">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_15"/>
<error to="kill"/>
</action>
<action name="email_15">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_16"/>
<error to="kill"/>
</action>
<action name="email_16">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_17"/>
<error to="kill"/>
</action>
<action name="email_17">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_18"/>
<error to="kill"/>
</action>
<action name="email_18">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_19"/>
<error to="kill"/>
</action>
<action name="email_19">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_20"/>
<error to="kill"/>
</action>
<action name="email_20">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_21"/>
<error to="kill"/>
</action>
<action name="email_21">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_22"/>
<error to="kill"/>
</action>
<action name="email_22">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_23"/>
<error to="kill"/>
</action>
<action name="email_23">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_24"/>
<error to="kill"/>
</action>
<action name="email_24">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_25"/>
<error to="kill"/>
</action>
<action name="email_25">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="email_26"/>
<error to="kill"/>
</action>
<action name="email_26">
<email xmlns="uri:oozie:email-action:0.2">
<to>address@hortonworks.com</to>
<subject>1</subject>
<body>1</body>
</email>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="end"/>
</workflow-app>
... View more
03-15-2017
07:07 PM
@Bhupesh Khanna please open a support ticket, we want to look into the issue directly.
... View more
03-15-2017
12:55 PM
to change an actual date of file you need to rewrite it. That was not the original question as far as I understand. Please open a new question with exact requirements.
... View more
03-14-2017
01:29 PM
if you use the log search utility, it automatically parses logs for you by severity level, if you intend to do it manually, you can search for an ERROR code.
... View more
03-14-2017
02:50 AM
@P D you have mismatch in version of Ambari and Grafana, can't conclusively say it is related to your issue but certainly not optimal, if your ambari version is 2.4.2, ambari-metrics-grafana must match the version. On my machine it looks like so ambari-metrics-monitor-2.5.0.1-51.x86_64
ambari-infra-solr-client-2.5.0.1-51.noarch
ambari-metrics-hadoop-sink-2.5.0.1-51.x86_64
ambari-metrics-grafana-2.5.0.1-51.x86_64
ambari-agent-2.5.0.1-51.x86_64
... View more
03-13-2017
10:19 PM
You can find logs in /var/log/hive unless you changed the directory, or you can also use Ambari Log Search if you enabled it. Log search http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/accessing_log_search.html
... View more
03-13-2017
08:17 PM
1 Kudo
@Sunile Manjee you can leverage WebHCat for this as one idea, https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat#WebHCatUsingWebHCat-ErrorCodesandResponses # this will execute a hive query and save result to hdfs file in your home directory called output curl -s -d execute="select+*+from+sample_08;" \
-d statusdir="output" \
'http://localhost:50111/templeton/v1/hive?user.name=root' # if you ls on the directory, it will have two files, stderr and stdout hdfs dfs -ls output # if the job succeeded, you can cat the stdout file and view the results hdfs dfs -cat output/stdout when you invoke the job, you will get a response with job id, then you can also check whether output directory exists and there's no error log with webhdfs API, in that case job succeedd. curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS" another idea is to leverage Oozie to wire the jobs together, once job completes, you can use SLA monitoring features of Oozie to check whether job completed or send an email (SLA not needed for this) whichever way you go, you can have Nifi watch these events either from JMS topic in ActiveMQ if you intend to use SLA or email alert. https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html probably even better idea is to query ATS via REST API https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html I think this is probably the most sane approach, you can query ATS for finished job and get status. So once you know the job ID, there are ways to get it, one of them is via my first example, then in the second processor you can query ATS for completion state.
... View more
03-13-2017
06:11 PM
2 Kudos
@Amit Panda here's a slightly modified script from stack overflow thread #!/bin/bash
usage="Usage: dir_diff.sh [directory] [days]"
if [[ $# -ne 2 ]]
then
echo $usage
exit 1
fi
now=$(date +%s)
hadoop fs -ls -R $1 | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [ $difference -gt $2 ]; then
echo $f;
fi
done
I don't have files older than 10 days on my HDFS so I execute with 1 day argument like so: sudo sh dir_diff.sh /tmp 1
drwx------ - ambari-qa hdfs 0 2017-03-11 15:41 /tmp/ambari-qa
drwx------ - ambari-qa hdfs 0 2017-03-11 15:41 /tmp/ambari-qa/staging
drwxr-xr-x - hdfs hdfs 0 2017-03-11 15:39 /tmp/entity-file-history
drwxr-xr-x - yarn hadoop 0 2017-03-11 15:39 /tmp/entity-file-history/active
drwx------ - hive hdfs 0 2017-03-11 15:42 /tmp/hive/hive/17c0213c-358a-4c89-b803-800762144a21
drwx------ - hive hdfs 0 2017-03-11 15:42 /tmp/hive/hive/17c0213c-358a-4c89-b803-800762144a21/_tmp_space.db
drwx------ - hive hdfs 0 2017-03-11 15:42 /tmp/hive/hive/96049638-4aee-42cc-95f6-0652b3a66cae
drwx------ - hive hdfs 0 2017-03-11 15:42 /tmp/hive/hive/96049638-4aee-42cc-95f6-0652b3a66cae/_tmp_space.db
drwx------ - hive hdfs 0 2017-03-11 15:41 /tmp/hive/hive/e4fe18d1-5cb4-4088-93ff-cf4aac410301
drwx------ - hive hdfs 0 2017-03-11 15:41 /tmp/hive/hive/e4fe18d1-5cb4-4088-93ff-cf4aac410301/_tmp_space.db
drwxr-xr-x - ambari-qa hdfs 0 2017-03-11 15:41 /tmp/tezsmokeinput
On my 2.5 Sandbox, it returns this sh dir_diff.sh /tmp 10
drwxr-xr-x - hdfs hdfs 0 2016-10-25 07:48 /tmp/entity-file-history
drwxr-xr-x - yarn hadoop 0 2016-10-25 07:48 /tmp/entity-file-history/active
drwxrwxrwx - guest hdfs 0 2017-01-12 18:42 /tmp/freewheel
drwxrwxrwx - guest hdfs 0 2017-01-12 18:46 /tmp/freewheel/hdfs
drwx-wx-wx - ambari-qa hdfs 0 2016-10-25 07:51 /tmp/hive
drwx------ - ambari-qa hdfs 0 2016-10-25 08:09 /tmp/hive/ambari-qa
drwx------ - hive hdfs 0 2017-01-23 20:51 /tmp/hive/hive/_tez_session_dir
drwx------ - hive hdfs 0 2017-01-16 16:03 /tmp/hive/hive/ff5fb9ba-01db-45d3-b924-e1bd6ee5203b
drwx------ - hive hdfs 0 2017-01-16 16:03 /tmp/hive/hive/ff5fb9ba-01db-45d3-b924-e1bd6ee5203b/_tmp_space.db
Once you get a list of those files, you can issue hdfs dfs -mv file newdir We're adding some new Grafana dashboards in the next release of Ambari that can tell with granularity who are hdfs users and what files they're creating. There's also an activity explorer dashboard you can check out in latest Ambari + Smartsense for some other HDFS file statistics, especially when you're looking for small files.
... View more
03-13-2017
02:17 PM
@Ali benchmarks are based on many factors, your setup may differ from other deployments. In my experience, separating the two would give optimal performance as you can see from one of the responses in the links I provided. It really depends on your volumes, to be cost-effective sure you can colocate but when your application becomes mission critical, you will regret making those decisions.
... View more
03-13-2017
12:52 PM
At the minimum, you can use similar pig script to count rows as well -- Sample script to count rows in an HBase table
SET DEFAULT_PARALLEL 20;
A = LOAD 'hbase://table_name' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-loadKey true') as (rowkey:bytearray);
B = GROUP A ALL;
C = FOREACH B GENERATE COUNT(A);
DUMP C;
... View more
03-13-2017
12:50 PM
1 Kudo
its not a good approach to count rows via Hive. Please use Hbase-native utility. Hive implementation relies on HBase Serde and I don't know how robust it is. hbase org.apache.hadoop.hbase.mapreduce.RowCounter $TABLE
... View more