Member since
03-16-2016
707
Posts
1751
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1027 | 09-21-2018 09:54 PM | |
1426 | 03-31-2018 03:59 AM | |
435 | 03-31-2018 03:55 AM | |
556 | 03-31-2018 03:31 AM | |
1134 | 03-27-2018 03:46 PM |
11-13-2018
03:15 AM
It seems that the template is malformed and it has nothing to do with NiFi 1.8. The same issue with NiFi 1.3 which was the version used in the demo. I'll close this question.
... View more
11-13-2018
12:59 AM
I'm trying to re-use this demo: https://community.hortonworks.com/articles/121794/running-sql-on-flowfiles-using-queryrecord-process.html, with Apache NiFi 1.8. I edited the template to account for NiFi 1.8, I uploaded the template and when I try to add it to the workspace I get the following error: org.apache.nifi.processors.attributes.UpdateAttribute is not known to this NiFi instance. Any idea?
... View more
Labels:
09-21-2018
09:54 PM
1 Kudo
@Sami Ahmad 1) NiFi is definitely an option. If CDC is important for you, be aware that MySQL CDC processor is supported. Unfortunately, other supported CDC processors are not available due to licensing issues with vendors like Oracle etc. So, if you use NiFi, you need to write your queries smartly to catch the changes and limit the impact on source databases. 2) Another good option is Attunity, but that comes at higher cost. 3) I have seen others using Spark for your use case. 4) I am doing some guesswork here, because I don't have info from your 4 questions before. As I recall the incremental data import is supported via sqoop job and not directly via sqoop import. I see that you did it as a job, but could be it a typo in your syntax? I see a space between "dash-dash" and "import" 🙂 Joking, but you may want to check. I have seen strange messages out of Sqoop. https://community.hortonworks.com/questions/131541/is-sqoop-incremental-load-possible-for-hive-orc-ta.html Could you point to previous questions URLs or clarify/add more info to the current question? 5) Regarding, "Also i use "--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'" for non incremental loads and i was told that soon this would be working , still not ?" I need a little bit more context or maybe this was already another question you submitted and I can check.
... View more
09-21-2018
09:09 PM
@Sami Ahmad I doubt this is a conspiracy :). I found one instance of those 4th previous attempts: https://community.hortonworks.com/questions/131541/is-sqoop-incremental-load-possible-for-hive-orc-ta.html It would have been good if you could have referenced the URLs of the previous 4 attempt so we can get some historical information. It is not clear what version of Hive you use, 1.2.1 or 2.1.0, also how whether you created the target Hive table as transactional, but anyway, long story short, the following is the practice that Hortonworks recommends on HDP 2.6.0, assuming that is your HDP version as your question was tagged: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html
... View more
07-11-2018
03:22 PM
1 Kudo
@Amit Ranjan Falcon has been deprecated and replaced by more comprehensive services included in Data Plane Services: https://docs.hortonworks.com/HDPDocuments/DPS1/DPS-1.1.0/index.html I agree with your assessment of those three tools. However, I would like to point-out that NiFi provides reporting tasks and I have seen enterprises enabling those reporting tasks and built custom dashboards (grafana). https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_user-guide/content/Reporting_Tasks.html https://pierrevillard.com/2017/05/16/monitoring-nifi-ambari-grafana/ Keep in mind that NiFi can execute Spark jobs interactively via Livy, also that it can start flows on schedule or event. Each flow can be considered a job and can be monitored via reporting task, so if you build a dashboard monitoring all the flows, you could have that operational monitoring per "job". Additionally, remember that with NiFi you get the lineage and data governance integration with Atlas, to not mention integrated security via Ranger. Data Plane Services, specifically "Data Steward Studio" will provide with that enterprise level data governance combining information from multiple clusters. See: https://docs.hortonworks.com/HDPDocuments/DSS1/DSS-1.0.0/getting-started/content/dss_data_steward_studio_overview.html
... View more
05-22-2018
02:46 PM
@Milind More It is no part of the sandbox. Read more here: https://hortonworks.com/products/data-services/ Documentation and installation instructions: https://docs.hortonworks.com/HDPDocuments/DPS1/DPS-1.1.0/index.html If this response was helpful, please vote and accept it as the best answer.
... View more
05-07-2018
06:30 PM
@Milind More DataPlane Services / Data LifeCycle Manager: https://hortonworks.com/products/data-services/#dlm DLM supports HDFS and Hive for now, but it will support soon HBase and Kafka. DLM is a successor of Falcon, but it goes beyond what Falcon was able to do. It will also support auto-tiering, among others, so you can replicate subset of data from your on-prem to object stores like AWS S3.
... View more
04-26-2018
05:48 PM
2 Kudos
@Tony Zhang 10500 is the default port (unsecure) for interactive query (LLAP). If you don't use LLAP then your port should be 10000, but it is even better to use Zookeeper ServiceDiscovery beeline -i testbench.settings jdbc:hive2://localhost:2181/tpcds_bin_partitioned_orc_30;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 You may have to check on ZK port 2181 to be open.
... View more
04-15-2018
08:53 PM
13 Kudos
Abstract This
article objective is to cover key design and deployment considerations for High Availability Apache Kafka service. Introduction Kafka’s
through its distributed design allows a large number of permanent or ad-hoc
consumers, being highly available and resilient to node failures, also supporting
automatic recovery. These characteristics make Kafka an ideal fit for
communication and integration between components of large scale data systems. Kafka
application resilience is not enough to have a true HA system. Consumers and producers or the network design need also to be HA. Zookeepers
and brokers need to be able to communicate among themselves, producers and
consumers need to be able to access Kafka API. A fast car can reach the maximum speed only on a good track. A swamp is a swamp for a fast or a slow car. Kafka Design Kafka
utilizes Zookeeper for storing metadata information about the brokers, topics,
and partitions. Writes to Zookeeper are only performed on changes to the
membership of consumer groups or on changes to the Kafka cluster itself. This
amount of traffic is minimal, and it does not justify the use of a dedicated
Zookeeper ensemble for a single Kafka cluster. Many deployments will use a
single Zookeeper ensemble for multiple Kafka clusters. Prior
to Apache Kafka 0.9.0.0, consumers, in addition to the brokers, utilized
Zookeeper to directly store information about the composition of the consumer
group, what topics it was consuming, and to periodically commit offsets for
each partition being consumed (to enable failover between consumers in the
group). With version 0.9.0.0, a new consumer interface was introduced which
allows this to be managed directly with the Kafka brokers. However, there is a
concern with consumers and Zookeeper under certain configurations. Consumers
have a configurable choice to use either Zookeeper or Kafka brokers for
committing offsets, and they can also configure the interval between commits.
If the consumer uses Zookeeper for offsets, each consumer will perform a
Zookeeper write at every interval for every partition it consumes. A reasonable
interval for offset commits is 1 minute, as this is the period of time over
which a consumer group will read duplicate messages in the case of a consumer
failure. These commits can be a significant amount of Zookeeper traffic,
especially in a cluster with many consumers, and will need to be taken into
account. It may be necessary to use a longer commit interval if the Zookeeper
ensemble is not able to handle the traffic. However, it is recommended that
consumers using the latest Kafka libraries use Kafka brokers for committing
offsets, removing the dependency on Zookeeper. Outside of using a single ensemble
for multiple Kafka clusters, it is not recommended to share the ensemble with
other applications, if it can be avoided. Kafka
is sensitive to Zookeeper latency and timeouts, and an interruption in
communications with the ensemble will cause the brokers to behave
unpredictably. This can easily cause multiple brokers to go offline at the same
time, should they lose Zookeeper connections, which will result in offline
partitions. It also puts stress on the cluster controller, which can show up as
subtle errors long after the interruption has passed, such as when trying to
perform a controlled shutdown of a broker. Other applications that can put
stress on the Zookeeper ensemble, either through heavy usage or improper
operations, should be segregated to their own ensemble. Infrastructure and Network Design Challenges Application
is the last layer in top of other 6 layers of OSI stack, including Network,
Data Link and Physical. A power source that is not redundant can take out the
rack switch and none of the servers in the rack are accessible. There are at
least two issues with that implementation, e.g. non-redundant power source for
the switch, lack of redundancy for the actual rack switch. Add that there is a
single communication path from consumers and producers and exactly that is down
so your mission critical system does not deliver the service that is maybe your
main stream of revenue. Add to that rack-awareness is not implemented and all
topic partitions reside in the infrastructure hosted by that exact rack that
failed due a bad switch or a bad power source. Did it happen to you? What was the price that you afforded to pay when
that bad day came? Network Design and Deployment Considerations Implementing
a resilient Kafka cluster is similar with implementing a resilient HDFS cluster.
Kafka or HDFS reliability is for data/app reliability, at most compensating for
limited server failure, but not for network failure, especially in cases when
infrastructure and network has many single points of failure coupled with a bad
deployment of Kafka zookeepers, brokers or replicated topics. Dual NIC, dedicated core switches and
redundant, rack top switches, balancing replicas across racks
are common for a good network design for HA. Kafka, like HDFS, supports rack
awareness. A single point of failure should not impact more than one zookeeper
or one broker, or one partition of a topic. Zookeepers and Brokers should have
HA communication, if one path is down, another path is used. They should be
distributed ideally in different racks. Network
redundancy needs to provide the alternate access path to brokers for producers
and consumers, when failure arises. Also, as a good practice, brokers should be
distributed across multiple racks. Configure Network
Correctly Network
configuration with Kafka is similar to other distributed systems, with a few
caveats mentioned below. Infrastructure, whether on-premises or cloud, offers a
variety of different IP and DNS options. Choose an option that keeps
inter-broker network traffic on the private subnet and allows clients to
connect to the brokers. Inter-broker and client communication use the same
network interface and port. When
a broker is started, it registers its hostname with ZooKeeper. The producer
since Kafka 0.8.1 and the consumer since 0.9.0 are configured with a
bootstrapped (or “starting”) list of Kafka brokers. Prior versions were
configured with ZooKeeper. In both cases, the client makes a request (either to
a broker in the bootstrap list, or to ZooKeeper) to fetch all broker hostnames
and begin interacting with the cluster. Depending
on how the operating system’s hostname and network are configured, brokers on server
instances may register hostnames with ZooKeeper that aren’t reachable by
clients. The purpose of advertised.listeners is to address exactly
this problem; the configured protocol, hostname, and port in advertised.listeners is
registered with ZooKeeper instead of the operating system’s hostname. In
a multi-datacenter architecture, careful consideration has to be made for
MirrorMaker. Under the covers, MirrorMaker is simply a consumer and a producer
joined together. If MirrorMaker is configured to consume from a static ID, the
single broker tied to the static IP will be reachable, but the other brokers in
the source cluster won’t be. MirrorMaker needs access to all brokers in the
source and destination data center, which in most cases is best implemented
with a VPN between data centers. Client
service discovery can be implemented in a number of different ways. One option
is to use HAProxy on each
client machine, proxying localhost requests to an available
broker. Synapse works
well for this. Another option is to use a Load Balancer appliance. In this
configuration, ensure the Load Balancer is not public to the internet. Sessions
and stickiness do not need to be configured because Kafka clients only make a
request to the load balancer at startup. A health check can be a ping or a
telnet. Distribute Kafka brokers across multiple racks Kafka
was designed to run within a single data center. As such, we discourage
distributing brokers in a single cluster across multiple data centers. However,
we recommend “stretching” brokers in a single cluster across multiple racks. It
could be in the same private network or over multiple private networks within
the same data center. There are considerations that each enterprise is
responsible for implementing resilient systems. A
multi-rack cluster offers stronger fault tolerance because a failed rack won’t
cause Kafka downtime. However,
in this configuration, prior to Kafka 0.10, you must assign partition replicas manually
to ensure that replicas for each partition are spread across availability
zones. Replicas can be assigned manually either when a topic is created, or by
using the kafka-reassign-partitions command line tool. Kafka 0.10 or later
supports rack awareness, which makes spreading replicas across racks much
easier to configure. At
the minimum, for the HA of your Kafka-based service: Use dedicated “Top of Rack”
(TOR) switches (can be shared with Hadoop cluster). Use dedicated core switching
blades or switches. If deployed to a physical
environment, then make certain to place the cluster on in a VLAN. For the switch communicating
between the racks you will want to establish HA connections. Implement Kafka
rack-awareness configuration. It does not apply retroactively to the existent
zookeepers, brokers or topics. Test periodically your infrastructure
for resiliency and impact on Kafka service availability, e.g. disconnect one
rack switch impacting a single broker for example. If a correct design of the
network and topic replication was implemented, producers and consumers should
be able to work as usual, the only acceptable impact should be due to reduced
capacity producers may accumulate a lag in processing or consumers may have
some performance impact, however, everything else should be 100% functional. Implement continuous monitoring
of producers and consumers to detect failure events. Alerts and possibly automated
corrective actions. See below a
simplified example of a logical architecture for HA. The
above diagram shows an implementation for HA for an enterprise that uses 5
racks, 5 zookeepers distributed across multiple racks, multiple brokers
distributed across those 5 racks, replicated topics, HA producers and
consumers. This is similar with HDFS HA network design. Distribute ZooKeeper nodes across multiple racks ZooKeeper
should be distributed across multiple racks as well, to increase fault
tolerance. To tolerate a rack failure, ZooKeeper must be running in at least
three different racks. Obviously, there is also the private network concept of
failure. An enterprise can have zookeepers separated in different networks.
That comes at the price of latency. It is a conscious decision between
performance and reliability. Separating in different racks is a good compromise.
In a configuration where three ZooKeepers are running in two racks, if the
rack with two ZooKeepers fails, ZooKeeper will not have quorum and will not be
available. Monitor broker performance and terminate poorly performing brokers Kafka broker performance can decrease unexpectedly over time for unknown
reasons. It is a good practice to terminate and replace a broker if, for example, the 99 percentile of
produce/fetch request latency is higher than is tolerable for your application. Datacenter Layout For
development systems, the physical location of the Kafka brokers within a
datacenter is not as much of a concern, as there is not as severe an impact if
the cluster is partially or completely unavailable for short periods of time. When serving production traffic, however, downtime means dollars lost, whether
through loss of services to users or loss of telemetry on what the users are
doing. This is when it becomes critical to configure replication within the
Kafka cluster, which is also when it is important to consider the physical
location of brokers in their racks in the datacenter. If a deployment model
across multiple racks and rack-awareness configuration was not implement prior
to deploying Kafka, expensive maintenance to move servers around may be needed.
The Kafka broker has no rack-awareness when assigning new partitions to brokers
so everything to the date when the rack-awareness was implemented could be
brittle to failure. This means that it cannot take into account that two
brokers may be located in the same physical rack, or in the same network, therefore
can easily assign all replicas for a partition to brokers that share the same
power and network connections in the same rack. Should that rack have a
failure, these partitions would be offline and inaccessible to clients. In
addition, it can result in additional lost data on recovery due to an unclean
leader election. The best practice is to have each Kafka broker in a cluster
installed in a different rack, or at the very least not share single points of
failure for infrastructure services such as power and network. This typically
means at least deploying the servers that will run brokers with dual power
connections (to two different circuits) and dual network switches (with a
bonded interface on the servers themselves to failover seamlessly). Even with
dual connections, there is a benefit to having brokers in completely separate
racks. From time to time, it may be necessary to perform physical maintenance
on a rack or cabinet that requires it to be offline (such as moving servers
around, or rewiring power connections). Other Good
Practices I
am sure I missed other good practices, so here are a few more: Produce
to a replicated topic. Consume
from a replicated topic (consumers must be in the same consumer group). Each
partition gets assigned a consumer; need to have more partitions than consumers. Resilient
producers – spill data to local disk until Kafka is available. Other Useful References Mirroring data between clusters: http://kafka.apache.org/documentation.html#basic_ops_mirror_maker Data centers: http://kafka.apache.org/documentation.html#datacenters Thanks I'd like to thank all Apache Kafka contributors and the community that drives the innovation. I mean to thank everyone that contributes to improve documentation and publications on how to design and deploy Kafka as a Service in HA mode.
... View more
Labels:
03-31-2018
04:03 AM
1 Kudo
@heta desai Go to http://AtlasHostName:21000/login.jsp Replace AtlasHostName with the host name for your Atlas.
... View more
03-31-2018
03:59 AM
2 Kudos
@Timothy Spann Reproducible in brand new HDF 3.1 cluster using Ambari 2.6.1.0. It happened to me too, exactly. I upgraded Ambari to latest and that did it for me.
... View more
03-31-2018
03:55 AM
1 Kudo
@Kalyan Das No. You don't have to.
... View more
03-31-2018
03:52 AM
1 Kudo
@hema moger Use Ambari to install Hortonworks Data Platform (HDP) or Hortonworks DataFlow (HDF) Software Download: https://hortonworks.com/downloads/ Documentation: docs.hortonworks.com
... View more
03-31-2018
03:44 AM
2 Kudos
@Bhushan Kandalkar It seems that you are looking to mirror to a previous release. That is possible because 0.10 is backward compatible with 0.9.0. There is no special setup, just use MirrorMaker 0.10.1. It is like you would mirror 0.10.1 to 0.10.1. Good references: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/running-mirrormaker.html https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html ++++ If helpful, please vote and accept best answer.
... View more
03-31-2018
03:31 AM
3 Kudos
@Arvind Ramugade SmartSense is not quoted separately. It is part of HDP or HDF support which is quoted mostly by node. Please validate your cost understanding with sales: https://hortonworks.com/contact-sales/
... View more
03-31-2018
03:14 AM
3 Kudos
@Paula DiTallo Hive is part of HDP sandbox. However, you have HDFS and HBase as part of HDF sandbox. To fully use these services, you must allocate more memory to the sandbox’s virtual machine or turn off existing services. If you want these services to automatically start, turn off maintenance mode from within the Ambari dashboard. To learn how to connect from HDF to HDP check this out: https://www.simonellistonball.com/technology/nifi-sandbox-hdfs-hdp/ +++ Please don't forget to vote a helpful answer and accept the best answer.
... View more
03-31-2018
12:10 AM
4 Kudos
@Saikrishna Tarapareddy You can use their API to download the data from those tables. Those examples show how to select the data. However, you may deal with a lot of data. You may want to extract it from BigQuery, store it in Google Cloud Storage Bucket (GCS) and connect NiFi to GCS which is supported nicely with GCS processors to list, fetch, put, delete from GCS. That is the most efficient way. Look at this reference to see how to extract the data: https://cloud.google.com/bigquery/docs/exporting-data You can schedule a job to extract and put to GCS bucket and NiFi will just pick it up.
... View more
03-31-2018
12:03 AM
4 Kudos
@Leszek Leszczynski This is a known bug with the version you use. Let's wait for Richard to advise.
... View more
03-31-2018
12:01 AM
4 Kudos
@rdoktorics Richard, Cloudbreak 1.16.5 is the version that is presented on hortonworks.com website, Software Download/ HDP section. However, documentation shows Cloudbreak 2.4 (see https://docs.hortonworks.com/). Is that right? Where should Leszek go to download Cloudbreak 2.4 for his HDP 2.6.4 installation requirement?
... View more
03-30-2018
07:00 PM
4 Kudos
@Leszek Leszczynski It could be a bug. The reason is well explained in this article, however, in your case, data node services are not present as expected. https://community.hortonworks.com/articles/12981/impact-of-hdfs-using-cloudbreak-scale-dwon.html I'll escalate the question to Cloudbreak team.
... View more
03-30-2018
06:48 PM
5 Kudos
@Max Gluzman I can't think of another cause than due to shutting the VM by brute force some services need extremely long time to recover on restart. Sandbox runs with all services. I suggest a graceful shutdown of all services before shutting the VM. To start/stop all services, please see: https://community.hortonworks.com/questions/88211/rest-api-for-stopping-and-starting-all-services-on.html I also suggest to keep up and running only the services that you need. That way, more resources are available for what you need and the stop and restart will take less time. +++ If this helped, please vote-up. Also, if any answer was the best, please accept it.
... View more
03-30-2018
06:44 PM
5 Kudos
@Rudy Hartono It is difficult to get to the root cause from the error message that you have provided. Can you share detailed error message from the hiveserver2 log and application log? You can generate application log by running 'yarn logs -applicationId <application_id>'. Also, could you provide the DDL statement for your table?
... View more
03-30-2018
06:34 PM
5 Kudos
I am not clear of the benefit to merge all these XML files into a single huge XML file that seems to be in order for tens of GB. NiFi has a default limit of 1 GB per flowfile, but that can be changed, however, tens of GB is a huge single file. What happens with this file eventually? What efficient method is used to ingest such a file instead of multiple files. Any tool I know ingests better multiple files sized properly such that parallelism can be properly achieved. XML is not the most optimal format for a large file ingest. I'd love to hear more about the reasoning around this one big file to be ingested and why does it have to be still XML and not a more efficient format. NiFi could have converted XML to something else. An alternative of NiFi for this task would be to use Spark with XML processing framework.
... View more
03-30-2018
06:32 PM
5 Kudos
@Arun Yadav Apache NiFi (part of HDF DataFlow) allows you to build a pipeline capable to perform all your needed actions. You would build a flow that for most part will look like this: screen-shot-2018-03-30-at-22942-pm.png Here are some good references: https://pierrevillard.com/2017/09/07/xml-data-processing-with-apache-nifi/ https://community.hortonworks.com/articles/65400/xml-processing-encoding-validation-parsing-splitti.html With the latest versions, you can also benefit from the use of Record-based processors. +++ If this was helpful, please vote. Also, accept the best answer.
... View more
03-30-2018
06:24 PM
5 Kudos
@Alex Woolford Look at: https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic https://issues.apache.org/jira/browse/KAFKA-4565 If helpful, please vote and accept best answer.
... View more
03-28-2018
01:32 PM
3 Kudos
@Saikrishna Tarapareddy The community processor mentioned by Tim is a good example on how to write a custom processor. It is limited to Put action and quite old. You would have to rebuild it using more up-to-date libraries. Community processors are not supported by Hortonworks.
... View more
03-28-2018
04:22 AM
3 Kudos
@Mushtaq Rizvi Yes. Please follow the instructions on how to add HDF components to an existent HDP 2.6.1 cluster: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_installing-hdf-on-hdp/content/upgrading_ambari.html This is not the latest HDF, but it is compatible with HDP 2.6.1 and I was pretty happy with its stability and recommend it. You would be able to add Apache NiFi 1.5, but also Schema Registry. NiFi Registry is part of the latest HDF 3.1.x, however, you would have to install it in a separate cluster and it is not worth it the effort for what you are trying to achieve right now. I would proceed with HDP upgrade when you are ready for HDF 3.2 which will be probably launched in the next couple months. In case that you can't add another node to your cluster for NiFi, try to use one of the nodes that has low CPU utilization and some disk available for NiFi lineage data storage. It depends on how much lineage you want to preserve, but you should be probably fine with several tens of GB for starters. If this response helped, please vote and accept answer.
... View more
03-28-2018
04:13 AM
5 Kudos
@Saikrishna Tarapareddy Unfortunately, there is no specialized processor to connect to Google Big Query and execute queries. There has been some discussions about a set of new processors to support various Google Cloud services, but those processors are still to be planned into a release. Until then you can use ExecuteScript processor. Here is an example on how to write a script using Python: https://cloud.google.com/bigquery/create-simple-app-api#bigquery-simple-app-print-result-python . At https://cloud.google.com/bigquery/create-simple-app-api you can see other examples using other languages also supported by ExecuteScript processor. Obviously, there is always the possibility to develop your own processor leveraging the Java example provided by Google doc. Example of how to build NiFi custom processor: https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html If this response addressed reasonably your question, please vote and accept answer.
... View more
03-27-2018
09:39 PM
4 Kudos
@Binu Varghese Unfortunately, but not built-in. Just to get you started, here are a few good reference that could help you to build one: https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html https://community.hortonworks.com/articles/138117/how-to-setup-a-custom-ambari-metrics-alert.html https://cwiki.apache.org/confluence/display/AMBARI/3.+Monitoring+Scenarios#id-3.MonitoringScenarios-alerts Anyhow, WebHDFS is the REST API you need to use: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html https://dzone.com/articles/hadoop-rest-api-webhdfs
... View more
03-27-2018
09:37 PM
3 Kudos
@Christian Lunesa As you probably know, the 500 Internal Server Error is a very general HTTP status code that means something has gone wrong on the website's server, but the server could not be more specific on what the exact problem is. You need to provide more information. There could be many reasons. 1) Does it happen with other tables? 2) Is your cluster kerberized? 3) Did you check Ambari server log for more details?
... View more