Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1172 | 01-16-2018 03:38 PM | |
6139 | 11-13-2017 05:45 PM | |
3032 | 11-13-2017 12:30 AM | |
1518 | 10-27-2017 03:58 AM | |
28427 | 10-19-2017 03:17 AM |
03-15-2017
08:26 PM
1 Kudo
An oozie sub-workflow is failing with message as "LauncherMapper died, check Hadoop LOG for job" On subsequent attempt, the workflow completes successfully. Various oozie jobs are failing randomly at any sub-workflow and work fine on subsequent attempt. I tried checking hadoop logs but when the failure happens there are no mapred or yarn logs for that job/application. Any hints are highly appreciated.
... View more
Labels:
- Labels:
-
Apache Oozie
03-14-2017
12:05 AM
1 Kudo
@Viswa - Kindly accept the answer if my answer as helped you.
... View more
03-09-2017
09:28 PM
1 Kudo
@Viswa - you are absolutely correct with this.
... View more
03-09-2017
09:04 PM
7 Kudos
@Viswa dfs.namenode.checkpoint.period – The number of seconds between two periodic
checkpoints. dfs.namenode.checkpoint.txns – The standby will create a checkpoint
of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless
of whether ‘dfs.namenode.checkpoint.period’ has expired. A real life analogy is
how we take our car for regular maintenance service. dfs.namenode.checkpoint.period >> number of months
(say 6 months) dfs.namenode.checkpoint.txns >>
number of miles driven (say 5000 miles) You will have to take
your car for service if: It has been 6 months
since your car’s last service and you may have driven less than 5000 miles OR It has not been 6
month since your car’s most recent service but you have driven 5000 miles
already. Thus a new checkpoint will be created if either the checkpoint
period is reached or number of unchecked transactions has maxed out, whichever
happens first.
This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.
... View more
03-06-2017
06:26 PM
1 Kudo
As it turns out, the command to find out FQDN is hostname --fqdn Even though I ignored all the warning and proceeded to next steps to add the node to the cluster, in the end it all worked well and post intall test was successful. So I guess this can be ignored for practice exam.
... View more
03-06-2017
04:13 PM
I launched the HDPCA practice exam in AWS and the there is a task to add 'node1'. While adding a new node using ambari, I have to mention the FQDN. This detail is not mentioned in the exam task html file. How do I find FQDN ? P.S. I am able to ssh into node1. Update: I tried using the following command to get fqdn hostname --fqdn This command also gave me the response as "node1". When I mention node1 as the FQDN in Ambari, I get the following warning message: "node1 is not a valid FQDN. Do you want to continue ?" This error message is confusing me regarding the FQDN. When I ignored the above warning, after install, the host check is failing.
... View more
Labels:
- Labels:
-
Apache Ambari
03-02-2017
07:08 PM
2 Kudos
@Viswa As per the exam objectives (https://hortonworks.com/services/training/certification/exam-objectives/#hdpca) there is no specific list of services which are in scope for the examination. As far as debugging/monitoring the installation is concerned, you may refer to this official documentation: http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-556d8737-67b1-43da-8331-bccb6ff28ac6 The above link lists generic steps on every aspect of installation right from stopping all services, selecting a service to install, configuring it, monitoring its progress. It also shows an output log snapshot & an error log snapshot from ambari-agent for the service you are trying to install. The logs will list the error messages like what command failed or if the repo used is wrong etc. You can derive hints from these logs to further your cause. Hope this helps.
... View more
02-28-2017
03:01 PM
1 Kudo
@Viswa You can check the following: 1. Check your security group rules. You need a security group rule that allows inbound traffic from your public IPv4 address on the proper port. 2. Check the route table for the subnet. You need a route that sends all traffic destined outside the VPC to the Internet gateway for the VPC. 3. Check the network access control list (ACL) for the subnet. The network ACLs must allow inbound and outbound traffic from your local IP address on the proper port. The default network ACL allows all inbound and outbound traffic. 4. If your computer is on a corporate network, ask your network administrator whether the internal firewall allows inbound and outbound traffic from your computer on port 22 (for Linux instances) or port 3389 (for Windows instances).If you have a firewall on your computer, verify that it allows inbound and outbound traffic from your computer on port 22 (for Linux instances) or port 3389 (for Windows instances). 5. Check that your instance has a public IPv4 address. For details on how to perform the above checks, you can check out the AWS troubleshooting guide which lists the steps for various known error messages(including the one you are getting). http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html In my case the problem was that I was behind a corporate firewall. So had to find my proxy server details and then update the same proxy configurations in Putty.
... View more
02-22-2017
11:03 AM
Yes you do not necessarily have to use the sandbox. The article link I shared was of a tutorial and hence it used a sandbox. In your case you can do the required tasks in your hadoop cluster as required.
... View more
02-22-2017
08:43 AM
2 Kudos
@aishwarya srivastava Ideally you could adapt the Nifi Change Data Capture use case to extract data from MySQL, Oracle, MSSQL and other traditional RDMS. You could refer this article on HCC - https://community.hortonworks.com/articles/55422/change-data-capture-using-nifi.html Here is an overview: When supported by the RDBMS that manages the source data table, turn on the table's CDC feature, which automatically creates in the background a dedicated CDC table which contains all of the columns in the source data table, as well as additional metadata columns that can be used to support down-stream ETL logic processing. The RDBMS will automatically detect the new and changed records within the source data table for you, and will duplicate those new and changed records into the dedicated CDC table. Against that dedicated CDC table, execute a QueryDatabaseTable processor which uses an SQL SELECT query to fetch the latest records written to the CDC table (since the last time the QueryDatabaseTable processor executed successfully). If the source data table has columns which hold the time-stamps of when the record was first created, or if recently updated, and if you do not have access to a Hadoop environment which supports Sqoop, you can still use NiFi to bulk extract the records in the source data table in parallel, using streams. First, you logically fragment the source data table into windows of time, such as a given month of a given year. For each window of time, create a corresponding QueryDatabaseTable processor. In this way, you can easily execute the extract across N threads of a NiFi node (or on N NiFi nodes). Essentially, you create the first QueryDatabaseTable, clone it N-1 time, and simply edit the predicate expression of each SQL SELECT so that it fetches the source data table records that were created within the desired window of time. If the source data table has the time-stamp for update events as well, then clones of these bulk extract QueryDatabaseTable processes can be slightly modified and used to grab on-going updates to records, as well as new records, created within those windows of time. These on-going CDC type QueryDatabaseTable processes can be scheduled to execute based on the probability of update events for a given window of time. The update time-stamp can then be used by NiFi to hand off individual CDC records to specific NiFi processors for routing, mediation, data transformation, data aggregation, data egress (e.g., PutKafka).
... View more