Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4067 | 08-20-2018 08:26 PM | |
| 1962 | 08-15-2018 01:59 PM | |
| 2390 | 08-13-2018 02:20 PM | |
| 4138 | 07-23-2018 04:37 PM | |
| 5045 | 07-19-2018 12:52 PM |
11-11-2016
04:24 PM
@Timothy Spann counts on ORC tables should be fast as it can use the strip footer info and run much faster. Have you run stats on the table?
... View more
11-11-2016
03:00 AM
1 Kudo
I would say as the latest release of HDP, I see very little to any reason to use MR over Tez. I would say default to tez and use MR if and when required (not may use cases).
... View more
11-10-2016
03:58 AM
One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_ The formatted json is below: {
"repoType":1,
"repo":"abc_hadoop",
"reqUser":"ams",
"evtTime":"2016-09-19 13:14:40.197",
"access":"READ",
"resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
"resType":"path",
"result":1,
"policy":41,
"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
"enforcer":"ranger-acl",
"cliIP":"123.129.390.140",
"agentHost":"hostname.sample.com",
"logType":"RangerAudit",
"id":"94143368-600c-44b9-a0c8-d906b4367537",
"seq_num":1240883,
"event_count":1,
"event_dur_ms":0
}
since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option
... View more
11-10-2016
12:34 AM
As a root user you should be able to see same files. You might be in a differerent directory when you use the web shell (4200) vs when you ssh into linux box. run the pwd command and verify you are in the same location when you issue ls
... View more
11-10-2016
12:22 AM
I found the issue. 9092 was not my port. I went to ambari and found the listening port was set to 6667
... View more
11-09-2016
09:31 PM
On hdp 2.5 I am running simple test to create message on a kafka topic test 1 and it fails. I have 1 broker and running this on broker node. [kafka@sunman0 bin]$ ./kafka-console-producer.sh --broker-list localhost:9092 --topic test1
jump
[2016-11-09 21:21:45,184] ERROR Error when sending message to topic test1 with key: null, value: 4 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
Any ideas?
... View more
Labels:
- Labels:
-
Apache Kafka
11-09-2016
05:39 PM
1 Kudo
is it possible to export linage from atlas via kafka? I don't see that possible using the topics Atlas creates. However worth a ask on HCC.
... View more
Labels:
- Labels:
-
Apache Atlas
11-09-2016
05:16 PM
Does HDP officially support multipule Kafka brokers on single node? If that is the case, can someone point me in the direction how to set this up correctly for it to be supported?
... View more
Labels:
- Labels:
-
Apache Kafka
11-08-2016
09:39 PM
1 Kudo
ah my bad I found the answer here https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html it is not the same. UI runs on all nodes. Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface. Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.
... View more
11-08-2016
09:37 PM
1 Kudo
I found the information here https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html NiFi Cluster Coordinator: A NiFi Cluster Cluster Coordinator is the node in a NiFI cluster that is responsible for carrying out tasks to manage which nodes are allowed in the cluster and providing the most up-to-date flow to newly joining nodes. When a DataFlow Manager manages a dataflow in a cluster, they are able to do so through the User Interface of any node in the cluster. Any change made is then replicated to all nodes in the cluster.
... View more