Member since
07-30-2019
16
Posts
36
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1352 | 02-22-2017 01:16 PM |
03-31-2022
03:25 PM
1 Kudo
Intro
Online shopping is on the rise as more of us stay at home and let our credit cards do the walking. Keeping pace with that trend is an unfortunate increase in credit card fraud.
It’s no surprise, really. According to Forbes, online fraud has been a growing problem for the past few years. And now, as consumers and businesses adapt to the worldwide pandemic and make more credit card transactions in the card-not-present (CNP) space, the resulting uptick in online shopping and e-commerce has opened up an even bigger playground for fraudsters to try out new tricks.
Fraud detection has been a major issue for financial services and institutions. But artificial intelligence has an enormous potential to reduce financial fraud. Artificial intelligence applications have a great potential to detect and prevent fraud.
Therefore, we will start a series of articles talking about that and how we can use Cloudera mechanisms to implement a whole Credit Card Fraud detection solution. But first, let's begin with a simple way to implement that:
Keep It Simple
On this MVP, let's start by using Apache NiFi to ingest and transforming simulated data from a public API, converting that data into data in the format expected by our fraud detection algorithm, throwing that data into an Apache Kafka topic, and using Apache Flink's SQL console to process a simple fraud detection algorithm. All of this will be even better with scalability, so the icing on the cake will be to convert the data transformation ingest flow into Cloudera Data Flow Services with Kubernetes.
All commented components are available in CDF (Cloudera Data Flow) and CSA Cloudera Streaming Analytics:
CLOUDERA DATA-IN-MOTION PLATFORM
Prerequisites
We will use CDP Public Cloud with CDF, and CSA data hubs:
Data Hub: 7.2.14 - Flow Management Light Duty with Apache NiFi, Apache NiFi Registry
Data Hub: 7.2.14 - Streams Messaging Light Duty: Apache Kafka, Schema Registry, Streams Messaging Manager, Streams Replication Manager, Cruise Control
Data Hub: 7.2.14 - Streaming Analytics Light Duty with Apache Flink
1 - Data ingestion
Let's get started ingesting our data in NiFi. With InvokeHTTP Processor, we can collect all data from randomuser API.
A simple call to: https://randomuser.me/api/?nat=br will return something like this:
{
"results": [
{
"gender": "female",
"name": {
"title": "Miss",
"first": "Shirlei",
"last": "Freitas"
},
"location": {
"street": {
"number": 6133,
"name": "Rua Santa Luzia "
},
"city": "Belford Roxo",
"state": "Amapá",
"country": "Brazil",
"postcode": 88042,
"coordinates": {
"latitude": "78.0376",
"longitude": "74.2175"
},
"timezone": {
"offset": "+11:00",
"description": "Magadan, Solomon Islands, New Caledonia"
}
},
"email": "shirlei.freitas@example.com",
"login": {
"uuid": "d73f9a11-d61c-424d-8309-51d6d8e83a73",
"username": "organicfrog175",
"password": "1030",
"salt": "yhVkrYWm",
"md5": "2bf9beb695c663a0a83aa060f27629c0",
"sha1": "f4dfdef9f2d2a9d04a0622636d0851b5d000164a",
"sha256": "e0a96117182914b3fa7fef22829f6692607bd58eb012b8fee763e34b21acf043"
},
"dob": {
"date": "1991-09-06T08:31:08.082Z",
"age": 31
},
"registered": {
"date": "2009-06-26T00:02:49.893Z",
"age": 13
},
"phone": "(59) 5164-1997",
"cell": "(44) 4566-5655",
"id": {
"name": "",
"value": null
},
"picture": {
"large": "https://randomuser.me/api/portraits/women/82.jpg",
"medium": "https://randomuser.me/api/portraits/med/women/82.jpg",
"thumbnail": "https://randomuser.me/api/portraits/thumb/women/82.jpg"
},
"nat": "BR"
}
],
"info": {
"seed": "fad8d9259d3f2b0b",
"results": 1,
"page": 1,
"version": "1.3"
}
}
Using JoltTransformJSON processor, we can easily transform this previous Json to our JSON structure:
We are going to use JOLT transformation to clean and adjust our data:
[
{
"operation": "shift",
"spec": {
"results": {
"*": {
"login": { "username": "customer_id", "uuid": "account_number" },
"name": { "first": "name", "last": "lastname" },
"email": "email",
"gender": "gender",
"location": {
"street": { "number": "charge_amount" },
"country": "country",
"state": "state",
"city": "city",
"coordinates": {
"latitude": "lat",
"longitude": "lon"
}
},
"picture": { "large": "image" }
}
}
}
},
{
"operation": "default",
"spec": {
"center_inferred_lat": -5.0000,
"center_inferred_lon": -5.0000,
"max_inferred_distance": 0.0,
"max_inferred_amount": 0.0
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"lat": "=toDouble",
"lon": "=toDouble"
}
}
]
And our output transformed data will be:
Result:
{
"customer_id" : "organicfrog175",
"account_number" : "d73f9a11-d61c-424d-8309-51d6d8e83a73",
"name" : "Shirlei",
"lastname" : "Freitas",
"email" : "shirlei.freitas@example.com",
"gender" : "female",
"charge_amount" : 6133,
"country" : "Brazil",
"state" : "Amapá",
"city" : "Belford Roxo",
"lat" : 78.0376,
"lon" : 74.2175,
"image" : "https://randomuser.me/api/portraits/women/82.jpg",
"max_inferred_distance" : 0.0,
"center_inferred_lat" : -5.0,
"center_inferred_lon" : -5.0,
"max_inferred_amount" : 0.0
}
Now, we can use the UpdateRecord processor to improve that and get some random numbers in some fields, and so, put our JSON data in Kafka using PublishKafka2RecordCDP Processor.
UpdateRecord Processor
PublishKafka2RecordCDP Processor
(It's important to pay attention to Kafka brokers variables that must be filled according to Kafka Cluster endpoints.)
In the end, our NiFi flow will be something like this:
(You can download this flow definition attached to this article)
2 - Data Buffering
On Kafka Clusters, we can create a new Kafka topic just by hitting the button "Add new" in the SMM (Streaming Messaging Manager) component: I've created the skilltransactions as an example.
Once we already have NiFi flow and Kafka topic created, it is time to turn on your flow and see our data getting into our Kafka topic. You can also take a look at data explorer icons to see all ingested data so far.
3 - Streaming SQL Analytics
Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management.
Flink's Table API is a SQL-like expression language for relational stream and batch processing that can be embedded in Flink's Java and Scala DataSet and DataStream APIs. The Table API and SQL interface operate on a relational Table abstraction. Tables can be created from external data sources or existing DataStreams and DataSets.
Cloudera has developed an application called Cloudera SQL Stream Builder that can map our Kafka Topics and query all data as a table through Flink's Table API.
We will easily create our "virtual table" mapping on Table Connector on SSB:
After creating this "virtual table" we can use SQL to do some mathematical calculations on how far a transaction has been made using power, sin, and radians SQL functions:
select account_number, charge_amount,
2 * 3961 * asin(sqrt(
power(
power((sin(radians((lat - center_inferred_lat) / 2))) , 2)
+ cos(radians(center_inferred_lat)) * cos(radians(lat))
* (sin(radians((lon - center_inferred_lon) / 2)))
, 2))) as distance, max_inferred_distance, max_inferred_amount
from `skilltransactions`
WHERE
2 * 3961 * asin(sqrt(
power(
power((sin(radians((lat - center_inferred_lat) / 2))) , 2)
+ cos(radians(center_inferred_lat)) * cos(radians(lat))
* (sin(radians((lon - center_inferred_lon) / 2)))
, 2))) > max_inferred_distance
To see more details about this query, please visit this great article by @sunile_manjee, on our Cloudera Community.
We can also create our function and just call it on or query.
For instance, let's create a DISTANCE_BETWEEN function and use it on our final query.
Final query
select account_number, charge_amount, DISTANCE_BETWEEN(lat, lon, center_inferred_lat, center_inferred_lon) as distance, max_inferred_distance, max_inferred_amount
from `skilltransactions`
WHERE DISTANCE_BETWEEN(lat, lon, center_inferred_lat, center_inferred_lon) > max_inferred_distance
OR charge_amount > max_inferred_amount
At this moment, our query should be able to detect suspicious transactions in real-time and you can call the police. 😜
But Wait! There's more!
It's time to see it in Production mode!
4 - From Development to Production
With this architecture, maybe you will face some issues on a BlackFriday or a big event like that. For that, you will need to ingest all streaming data with high performance and scalability; in other words… NiFi in Kubernetes.
Cloudera DataFlow service can deploy NiFi flows in Kubernetes, providing all scalability needed for a production environment.
CLOUDERA DATA FLOW SERVICE – PUBLIC CLOUD
Follow the deployment wizard to see your flow living in containers mode:
DEPLOYMENT WIZARD
KEY PERFORMANCE INDICATORS
DASHBOARD
DEPLOYMENT MANAGER
5 - Conclusion
This is the very first article on this streaming journey; here we can use Cloudera Data Flow to ingest, buffer, process events in real-time. I hope after this article you can understand CDF and CSA, see all Cloudera Streaming capabilities, and after all, also call the police.
See you in the next article, where we will use machine learning on Kubernetes (Cloudera Machine Learning) to accurate our Simple Credit Card Fraud Detection and go live in production.
... View more
05-28-2018
04:21 PM
3 Kudos
How do you know if your customers (and potential customers) are talking about you on social media? The key to making the most of social media is listening to what your audience has to say about you, your competitors, and the market in general. Once you have the data you can undertake analysis, and finally, reach social business intelligence; using all these insights to know your customers better and improve your marketing strategy.
This is
the third part of the series of articles on how to ingest social media data
like streaming using the integration of HDP and HDF tools.
To
implement this article, you first need to implement the previous two.
https://community.hortonworks.com/articles/177561/streaming-tweets-with-nifi-kafka-tranquility-druid.html https://community.hortonworks.com/content/kbentry/182122/integrating-nifi-to-druid-with-a-custom-processor.html
Let’s get
started!
In this
new article, we will address two main points:
1 - How
to collect data from various social networks at the same time
2 - How
to integrate storage between HIVE and DRUID making the most of this
integration.
I've called this project "The Social Media
Stalker"
So, our new architecture diagram would look like this:
Ok, it’s time to hands on!
Let's divide this work into 3
parts:
Create Nifi ingestion for Druid
Setup Hive-Druid Integration
Update
our SuperSet Dashboard
1. Create
Nifi ingestion for Druid
You can
access all social media in their own API’s, but it's takes time and patience...
Let’s cute the chase going straight to a single point able to collect all
social data in a single API.
There are
a lot of social media monitoring tools:
https://www.brandwatch.com/blog/top-10-free-social-media-monitoring-tools/
We are
going to use this one:
https://www.social-searcher.com/
(Their main advantage is having sentimental analysis in the API response).
It’s
quite simple to get data based on Social Networks you want, just make the
request in the API by passing parameters like
q = search term and network
= desired social network
.
Example: https://api.social-searcher.com/v2/search?q=Hortonworks&network=facebook&limit=20
To make
this request, let’s pick InvokeHTTP Processor on Nifi
This
request will result in a data schema like this: { “userId”, “lang”, “location”, “name”, “network”, “posted”, “sentiment”, “text” } I did the same for 7 social networks (Twitter, Facebook, Youtube, Instagram, Reddit, GooglePlus, Vimeo) to have a flow
like this:
*To build
this Nifi flow, follow the previous article. I've
updated my replace text processor with: {"userId":"${user.userId}","lang":"${user.lang}","location":"${user.location}","name":"${user.name}","network":"${user.network}","posted":"${user.posted}","sentiment":"${user.sentiment}","text":"${user.text:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}","timestamp":${now()}} Obviously,
you should create your table in the Druid adding the Sentiment and Network
fields as described in the previous articles. I called my Druid Table “SocialStalker”
- Once you have done it, just push play to see all data at Druid. 2. Setup Hive-Druid
Integration Our main goal is to be able to index data from Hive
into Druid, and to be able to query Druid datasources from Hive. Completing
this work will bring benefits to the Druid and Hive systems alike: – Efficient
execution of OLAP queries in Hive. Druid is a system specially
well tailored towards the execution of OLAP queries on event data. Hive will be
able to take advantage of its efficiency for the execution of this type of
queries. – Introducing
a SQL interface on top of Druid. Druid queries are expressed in JSON, and Druid
is queried through a REST API over HTTP. Once a user has declared a Hive table
that is stored in Druid, we will be able to transparently generate Druid JSON
queries from the input Hive SQL queries. – Being
able to execute complex operations on Druid data. There are multiple
operations that Druid does not support natively yet, e.g. joins. Putting Hive
on top of Druid will enable the execution of more complex queries on Druid data
sources. – Indexing
complex query results in Druid using Hive. Currently, indexing in
Druid is usually done through MapReduce jobs. We will enable Hive to index the
results of a given query directly into Druid, e.g., as a new table or a
materialized view (HIVE-10459), and start querying and using
that dataset immediately. Integration
brings benefits both to Apache Druid and Apache Hive like: –
Indexing complex query results in Druid using Hive –
Introducing a SQL interface on top of Druid – Being
able to execute complex operations on Druid data – Efficient execution of OLAP
queries in Hive And even
there is an overlap between both if you are using Hive LLAP, it's important to
see each advantage in separated way:
The power
of Druid comes from precise IO optimization, not brute compute force. End
queries are performed to drill down on selected dimensions for a given
timestamp predicate for better performance. Druid
queries should use the timestamp predicates; so, druid knows how many segments
to scan. This will yield better results. Any UDFs or
SQL functions to be executed on Druid tables will be performed by Hive.
Performance of these queries solely depend on Hive. At this point they do not
function as Druid queries. If
aggregations over aggregated data are needed, queries will run as Hive LLAP
query not as a Druid query. hands on Hive-Druid! To
perform this, first you need be using Hive Interactive (with LLAP) to use the
Druid integration. --> Enable
Hive Interactive Query
-->
Download hive-druid-handler If you do
not have hive-druid-handler in your HDP version, just download it: https://javalibs.com/artifact/org.apache.hive/hive-druid-handler https://github.com/apache/hive/tree/master/druid-handler …and copy
it into hive-server2/lib folder cp hive-druid-handler-3.0.0.3.0.0.3-2.jar /usr/hdp/current/hive-server2/lib ... restart
your Hive. We
need to provide Druid data sources information to Hive: Let’s register
Druid data sources in Hive (CREATE EXTERNAL TABLE) for all Data that already is
stored in Druid // ADD JAR /usr/hdp/current/hive-server2/lib/hive-druid-handler-3.0.0.3.0.0.3-2.jar;
CREATE EXTERNAL TABLE SocialStalkerSTORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "SocialStalker"); Now, you
should be able to query all Druid Data in Hive, but for that you MUST use Beeline in Interactive mode. Beeline
!connect jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Finally, you
can use your beeline terminal, to make any query in your druid table.
You can insert data into your table, make some changes
in your SuperSet Slices (as previous articles) to complete step 3 and see your
Superset Dashboard like this one: Conclusion: You can
use HDP and HDF to build an end-to-end platform which allow you achieve the
success of your social media marketing campaign as well as the ultimate success
of your business. If you don’t pay attention to how your business is doing, you
are really only doing half of the job. It is the difference between walking
around in the dark and having an illuminated path that allows you to have an
understanding and an awareness of how your business is doing and how you can
continually make improvements that will bring you more and more exposure and a
rock-solid reputation. Many
companies are using social media monitoring to strengthen their businesses.
Those business people are savvy enough to realize the importance of social
media, how it positively influences their businesses and how critical the
monitoring piece of the strategy is to their ultimate success. References: https://pt.slideshare.net/Hadoop_Summit/interactive-analytics-at-scale-in-apache-hive-using-druid-80145456 https://cwiki.apache.org/confluence/display/Hive/Druid+Integration https://br.hortonworks.com/blog/sub-second-analytics-hive-druid/
... View more
Labels:
06-07-2019
06:40 AM
can you provide the XML of this flow?
... View more
06-19-2018
01:25 PM
Hi when i launch tranquility it stops on its own : ...
2018-06-19 10:30:11,753 [main] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a], end rebalancing consumer tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a try #0
2018-06-19 10:30:11,755 [main] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a], Creating topic event watcher for topics (couldwork)
2018-06-19 10:30:11,764 [main] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a], Topics to consume = List(couldwork)
2018-06-19 10:30:11,768 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Verifying properties
2018-06-19 10:30:11,769 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property client.id is overridden to tranquility-kafka
2018-06-19 10:30:11,769 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to fr-001slli124.groupinfra.com:6667
2018-06-19 10:30:11,769 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property request.timeout.ms is overridden to 30000
2018-06-19 10:30:11,787 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.client.ClientUtils$ - Fetching metadata from broker id:1001,host:fr-001slli124.groupinfra.com,port:6667 with correlation id 0 for 1 topic(s) Set(couldwork)
2018-06-19 10:30:11,790 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.producer.SyncProducer - Connected to fr-001slli124.groupinfra.com:6667 for producing
2018-06-19 10:30:11,808 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO kafka.producer.SyncProducer - Disconnecting from fr-001slli124.groupinfra.com:6667
2018-06-19 10:30:11,855 [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001], Starting
2018-06-19 10:30:11,858 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1529404211455] Added fetcher for partitions ArrayBuffer([[couldwork,0], initOffset 45 to broker id:1001,host:fr-001slli124.groupinfra.com,port:6667] )
2018-06-19 10:30:13,278 [Thread-4] INFO c.metamx.tranquility.kafka.KafkaMain - Initiating shutdown...
2018-06-19 10:30:13,278 [Thread-4] INFO c.m.tranquility.kafka.KafkaConsumer - Shutting down - attempting to flush buffers and commit final offsets
2018-06-19 10:30:13,281 [Thread-4] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a], ZKConsumerConnector shutting down
2018-06-19 10:30:13,288 [Thread-4] INFO k.c.ZookeeperTopicEventWatcher - Shutting down topic event watcher.
2018-06-19 10:30:13,288 [Thread-4] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1529404211455] Stopping leader finder thread
2018-06-19 10:30:13,288 [Thread-4] INFO k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread], Shutting down
2018-06-19 10:30:13,289 [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread] INFO k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread], Stopped
2018-06-19 10:30:13,289 [Thread-4] INFO k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-leader-finder-thread], Shutdown completed
2018-06-19 10:30:13,289 [Thread-4] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1529404211455] Stopping all fetchers
2018-06-19 10:30:13,290 [Thread-4] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001], Shutting down
2018-06-19 10:30:13,291 [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001] INFO kafka.consumer.SimpleConsumer - Reconnect due to socket error: java.nio.channels.ClosedByInterruptException
2018-06-19 10:30:13,291 [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001], Stopped
2018-06-19 10:30:13,291 [Thread-4] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a-0-1001], Shutdown completed
2018-06-19 10:30:13,292 [Thread-4] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1529404211455] All connections stopped
2018-06-19 10:30:13,294 [ZkClient-EventThread-14-10.80.145.201:2181,10.80.145.200:2181,10.80.145.199:2181] INFO org.I0Itec.zkclient.ZkEventThread - Terminate ZkClient event thread.
2018-06-19 10:30:13,298 [Thread-4] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1641734cd6b0000 closed
2018-06-19 10:30:13,298 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down
2018-06-19 10:30:13,298 [Thread-4] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_FR-001SLLI129-1529404211282-a8c8a12a], ZKConsumerConnector shutdown completed in 17 ms
2018-06-19 10:30:13,298 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Commit thread interrupted.
2018-06-19 10:30:13,299 [Thread-4] INFO c.m.tranquility.kafka.KafkaConsumer - Finished clean shutdown.
... View more
09-03-2017
11:17 AM
Have you checked Nifi throughput using Content Repo in a JBOD mode instead of Raid? Basically, let application decide for the distribution of data.
... View more
06-11-2016
08:46 PM
1 Kudo
Hello @Thiago. It is possible to achieve communication across secured and unsecured clusters. A common use case for this is using DistCp for transfer of data between clusters. As mentioned in other answers, the configuration property ipc.client.fallback-to-simple-auth-allowed=true tells a secured client that it may enter a fallback unsecured mode when the unsecured server side fails to satisfy authentication. However, I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster. Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication. This potentially expands the attack surface for man-in-the-middle attacks. Here is an example of overriding the setting on the command line while running DistCp: hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo The command must be run while logged into the secured cluster, not the unsecured cluster. This is adapted from one of my prior answers: https://community.hortonworks.com/questions/294/running-distcp-between-two-cluster-one-kerberized.html
... View more
06-12-2016
07:35 PM
The easiest way to do it: Just log in to the Ambari using these credentials: User: admin Pass: 4o12t0n cheers
... View more