Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3101 | 08-25-2017 03:09 PM | |
1964 | 08-22-2017 06:52 PM | |
3383 | 08-09-2017 01:10 PM | |
8061 | 08-04-2017 02:34 PM | |
8113 | 08-01-2017 11:35 AM |
08-22-2017
06:52 PM
This Rest call will get you all host names of nodes in the cluster http://your.ambari.server/api/v1/clusters/yourClusterName/hosts See these links on the Ambari API
https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/hosts.md https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md
... View more
08-22-2017
02:21 PM
2 Kudos
What happens when you try the url in your browser? Which browser are you using? Did you try another browser? What happens when you use the ambari login url http://http://localhost:8080/#/login? Which version of HDP sandbox?
... View more
08-21-2017
08:20 PM
Followed instructions and it worked. Thanks @Sriharsha Chintalapani
... View more
08-15-2017
07:27 PM
I have installed HDFS 3.0.1. I am using a PublishKafkaRecord processor in NiFI to access a schema via HortonworksSchemaRegistry. I am getting the below error from PutKafkaRecord. Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "validationLevel" (class com.hortonworks.registries.schemaregistry.SchemaMetadata), not marked as ignorable (6 known properties: "compatibility", "type", "name", "description", "evolve", "schemaGroup"])
at [Source: {"schemaMetadata":{"type":"avro","schemaGroup":"test","name":"simple","description":"simple","compatibility":"BACKWARD","validationLevel":"ALL","evolve":true},"id":3,"timestamp":1502815970781}; line: 1, column: 140] (through reference chain: com.hortonworks.registries.schemaregistry.SchemaMetadataInfo["schemaMetadata"]->com.hortonworks.registries.schemaregistry.SchemaMetadata["validationLevel"]) How do I get NiFi to ignore the validationLevel attribute for the schema and not throw this error?
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
-
Schema Registry
08-14-2017
11:27 PM
2 Kudos
@Bala Vignesh N V You will need to do a little data engineering to prep your data for the hive table ... basically, replacing the pipe | with a comma. You can do this easily in pig by running this script: a = LOAD '/sourcefilepath' as (fullrecord:chararray);
b = FOREACH x generate REPLACE(fullrecord, '\\|', ',');
STORE b INTO '/targetfilepath ' USING PigStorage (','); You could also do this pipe replace in sed before loading to hdfs. Pig is advantageous however because it will run in map-reduce or tez and be much faster (parallel processing) especially for large files. The fact that you have some values that include the delimiter is a problem ... unless there is a clear pattern you will have to write a program that finds each record with too many delimiters and then write a script to replace these one by one (e.g replace 'new york, usa' with 'new york usa' . If you used pig, b = would have to be repeated for each such value with delim. If you are unfamiliar with pig, this is a good tutorial to show how to implement the above https://hortonworks.com/tutorial/how-to-process-data-with-apache-pig/
... View more
08-10-2017
09:49 PM
1 Kudo
Simple SAM flow: Kafka -> (Storm) filter -> Kafka Fails at Storm, which reports: com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [123] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapsh Wondering what could cause this. (Schema seems properly configured)
... View more
Labels:
08-10-2017
01:57 PM
2 Kudos
I think this link will show you how to change INFO to WARN or ERROR, as well as adjust rotating log sizes https://community.hortonworks.com/content/supportkb/49455/atlas-default-logging-is-filling-up-file-system-ho.html
... View more
08-09-2017
04:10 PM
Streaming the data to hadoop does consume a lot of CPU even though the data is only passing through. Putting the client on the edge node isolates this and thus prevents CPU contention on the cluster doing jobs. The edge node typically is used for client implementation a) to isolate users from logging into master or worker nodes, and b) for isolating resource usage as with CPU with Sqoop.
... View more
08-09-2017
01:10 PM
3 Kudos
All good questions and fortunately the answer is very simple: all data passes through the edge node with no staging or landing there. Even better, the data passes directly to hadoop where it performs a map-reduce job (all mappers, no reducers) to import the rows in parallel. Useful refs: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+MR+Execution+Engine https://blogs.apache.org/sqoop/entry/apache_sqoop_overview https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781784396688/6/ch06lvl1sec59/sqoop-2-architecture
... View more
08-05-2017
05:44 PM
HDP 2.6 allows {user} variable in Ranger policies, e.g. row-level filtering. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/user_variable_ref.html https://community.hortonworks.com/questions/102532/set-user-user-in-ranger-policy.html Are there any other variables besides {user} available, perhaps group?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Ranger