Member since
08-02-2016
19
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3440 | 02-14-2017 04:43 AM |
06-02-2017
07:10 PM
Dr. Breitweg, you'll need to make the change with Ambari rather than manually editing the config file, please refer to the following page https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-operations/content/set_rack_id_individual_hosts.html
... View more
03-08-2017
09:23 PM
This looks to be a precision thing. Postgres 8.2 (which HAWQ is loosely based on) stores timestamps with microsecond precision, whereas the from_unixtime() function expects the number in seconds, which explains why you're a few centuries in the future 🙂
... View more
03-07-2017
06:16 PM
I had this issue, I modified the imports section of the topology_script to be Python 3 compatible: from __future__ import print_function
import sys, os
try:
from string import join
except ImportError:
join = lambda s: " ".join(s)
try:
import ConfigParser
except ModuleNotFoundError:
import configparser as ConfigParser
... View more
02-14-2017
04:46 PM
1 Kudo
Shikhar - I actually experienced the exact same issue - for me it was due to an incomplete PXF upgrade. Initially I noticed (on my system) the PXF RPMs were at a different build version than HAWQ was - this was the first thing I fixed. I then explored a number of dead ends, ultimately noticing the timestamps of the files in /var/pxf could not possibly be the correct ones, given when I upgraded HAWQ to a newer version. After I confirmed these files were created at runtime, I removed the entire directory and re-ran the "service pxf-service init", which re-created that folder and nested files from the correct RPM version. After doing that, all was well in PXF land. 🙂
... View more
02-14-2017
04:43 AM
Try removing /var/pxf, then run "service pxf-service init" on every HAWQ/PXF host
... View more
02-13-2017
09:23 PM
> 5.Also see whether you need to set pxf_service_address point to the hive metstore Shouldn't this be pointed at the namenode host:port or namenode host:51200?
... View more
10-17-2016
03:11 PM
1 Kudo
@Shikhar Agarwal - make sure you have the following options in place and applied on each machine you wish to run HAWQ on. You may need to follow the CLI installation guide.: kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1281 65535
net.core.netdev_max_backlog = 200000
vm.overcommit_memory = 2
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152 http://hdb.docs.pivotal.io/201/hdb/install/install-cli.html#topic_eqn_fc4_15
... View more
09-01-2016
11:16 PM
Alternatively, what are the limitations of out-of-stack version support for Falcon? The snapshot-based replication in Falcon 0.10 provides the ultimate functionality I'm looking for, but am currently running on HDP 2.3 / 2.4.
... View more
09-01-2016
06:55 PM
Would you be able to provide an example of what this code change might be similar to in the existing FeedReplicator code?
... View more
09-01-2016
06:19 PM
Hello,
I'm working with Falcon using the built-in HDFS mirroring capabilities and would like to enable two distcp options in the workflow XML: the -atomic flag and -strategy flags. Below is my Oozie workflow with these two options commented out, as this approach was unsuccessful. Is there a way to pass these in using a -D option or would the FeedReplicator class need to be modified for this functionality? <workflow-app xmlns='uri:oozie:workflow:0.3' name='falcon-dr-fs-workflow'>
<start to='dr-replication'/>
<!-- Replication action -->
<action name="dr-replication">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property> <!-- hadoop 2 parameter -->
<name>oozie.launcher.mapreduce.job.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.launcher.mapred.job.priority</name>
<value>${jobPriority}</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.java</name>
<value>distcp</value>
</property>
<property>
<name>oozie.launcher.oozie.libpath</name>
<value>${wf:conf("falcon.libpath")}</value>
</property>
<property>
<name>oozie.launcher.mapreduce.job.hdfs-servers</name>
<value>${drSourceClusterFS},${drTargetClusterFS}</value>
</property>
</configuration>
<main-class>org.apache.falcon.replication.FeedReplicator</main-class>
<arg>-Dmapred.job.queue.name=${queueName}</arg>
<arg>-Dmapred.job.priority=${jobPriority}</arg>
<!--arg>-atomic</arg>
<arg>-strategy</arg>
<arg>dynamic</arg-->
<arg>-maxMaps</arg>
<arg>${distcpMaxMaps}</arg>
<arg>-mapBandwidth</arg>
<arg>${distcpMapBandwidth}</arg>
<arg>-sourcePaths</arg>
<arg>${drSourceDir}</arg>
<arg>-targetPath</arg>
<arg>${drTargetClusterFS}${drTargetDir}</arg>
<arg>-falconFeedStorageType</arg>
<arg>FILESYSTEM</arg>
<arg>-availabilityFlag</arg>
<arg>${availabilityFlag == 'NA' ? "NA" : availabilityFlag}</arg>
<arg>-counterLogDir</arg>
<arg>${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName}</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>
Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name="end"/>
</workflow-app>
... View more
Labels:
- Labels:
-
Apache Falcon
-
Apache Hadoop
-
Apache Oozie
08-22-2016
04:58 PM
Thank you for the suggestion. distcp All alone runs fine on these sandboxes - it seems to be related to the number of required containers for a Falcon-based distcp replication, One top level app master container, one distcp app master container, then some number of worker containers to perform the actual replication. I believe this is resolved. Thank you again!
... View more
08-18-2016
09:58 PM
Falcon is going to drive me to an early grave. I've been able to create the Falcon replication job (hdfs-replication-workflow), define both the source and target clusters it uses. The process seems to start up in Falcon, two Oozie jobs also startup (see below), and some YARN jobs (see below) but can't seem to get the distcp portion to actually move data. I'm not sure if this is related to being in (two) sandbox environments or not but there isn't an obvious way to debug whats happening... most things either say "COMPLETED" or stay "RUNNING" forever - nothing noticeable is failing or creating helpful logs. Not sure if that is specific enough but Falcon itself is a can of worms that I don't totally understand yet... Oozie and YARN and dispersed logs... sorry, I'll stop complaining. Job ID App Name Status User Group Started Ended
------------------------------------------------------------------------------------------------------------------------------------
0000067-160817182129092-oozie-oozi-W falcon-dr-fs-workflowRUNNING gpadmin - 2016-08-18 21:55 GMT -
------------------------------------------------------------------------------------------------------------------------------------
0000066-160817182129092-oozie-oozi-W FALCON_PROCESS_DEFAULT_drSyncTestRUNNING gpadmin - 2016-08-18 21:55 GMT -
------------------------------------------------------------------------------------------------------------------------------------ Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2
Application-Id Application-Name Application-Type User Queue State
Final-State Progress Tracking-URL
application_1471553333960_0013 oozie:launcher:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-oozie-ooz
i-W MAPREDUCE gpadmin default RUNNING UNDEFINED 5% htt
p://sandbox2.hortonworks.com:18800
application_1471553333960_0014 distcp: oozie:action:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-ooz
ie-oozi-W MAPREDUCE gpadmin default ACCEPTED UNDEFINED
0% N/A
... View more
Labels:
08-04-2016
03:50 PM
Hi @sbhat - this is certainly helpful, thank you for the reference!
... View more
08-02-2016
05:32 PM
I'm trying to migrate from Pivotal Hadoop to HDP without re-deploying the cluster. I've managed to get Ambari 2.x to migrate the Postgres database schema and assume control of the cluster services but when I navigate to the Ambari Stack/Version page (main/admin/stack/versions) the expected "version tiles" aren't listed. I've tried to reference a working HDP 2.4 cluster to see if there are database entries missing that are used to popular these fields but haven't been able to track them down. In the manage versions page (views/ADMIN_VIEW/2.2.2.0/INSTANCE/#/stackVersions) I do see both 2.2.9 and 2.4.2 repositories registered correctly. Any ideas? Thanks.
... View more
Labels:
- Labels:
-
Apache Ambari