About gkeys

gkeys · ‎08-22-2017

This Rest call will get you all host names of nodes in the cluster http://your.ambari.server/api/v1/clusters/yourClusterName/hosts See these links on the Ambari API https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/hosts.md https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md

gkeys · ‎08-22-2017

What happens when you try the url in your browser? Which browser are you using? Did you try another browser? What happens when you use the ambari login url http://http://localhost:8080/#/login? Which version of HDP sandbox?

gkeys · ‎08-21-2017

Followed instructions and it worked. Thanks @Sriharsha Chintalapani

gkeys · ‎08-15-2017

I have installed HDFS 3.0.1. I am using a PublishKafkaRecord processor in NiFI to access a schema via HortonworksSchemaRegistry. I am getting the below error from PutKafkaRecord. Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "validationLevel" (class com.hortonworks.registries.schemaregistry.SchemaMetadata), not marked as ignorable (6 known properties: "compatibility", "type", "name", "description", "evolve", "schemaGroup"]) at [Source: {"schemaMetadata":{"type":"avro","schemaGroup":"test","name":"simple","description":"simple","compatibility":"BACKWARD","validationLevel":"ALL","evolve":true},"id":3,"timestamp":1502815970781}; line: 1, column: 140] (through reference chain: com.hortonworks.registries.schemaregistry.SchemaMetadataInfo["schemaMetadata"]->com.hortonworks.registries.schemaregistry.SchemaMetadata["validationLevel"]) How do I get NiFi to ignore the validationLevel attribute for the schema and not throw this error?

gkeys · ‎08-14-2017

@Bala Vignesh N V You will need to do a little data engineering to prep your data for the hive table ... basically, replacing the pipe | with a comma. You can do this easily in pig by running this script: a = LOAD '/sourcefilepath' as (fullrecord:chararray); b = FOREACH x generate REPLACE(fullrecord, '\\|', ','); STORE b INTO '/targetfilepath ' USING PigStorage (','); You could also do this pipe replace in sed before loading to hdfs. Pig is advantageous however because it will run in map-reduce or tez and be much faster (parallel processing) especially for large files. The fact that you have some values that include the delimiter is a problem ... unless there is a clear pattern you will have to write a program that finds each record with too many delimiters and then write a script to replace these one by one (e.g replace 'new york, usa' with 'new york usa' . If you used pig, b = would have to be repeated for each such value with delim. If you are unfamiliar with pig, this is a good tutorial to show how to implement the above https://hortonworks.com/tutorial/how-to-process-data-with-apache-pig/

gkeys · ‎08-10-2017

Simple SAM flow: Kafka -> (Storm) filter -> Kafka Fails at Storm, which reports: com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [123] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapsh Wondering what could cause this. (Schema seems properly configured)

gkeys · ‎08-10-2017

I think this link will show you how to change INFO to WARN or ERROR, as well as adjust rotating log sizes https://community.hortonworks.com/content/supportkb/49455/atlas-default-logging-is-filling-up-file-system-ho.html

gkeys · ‎08-09-2017

Streaming the data to hadoop does consume a lot of CPU even though the data is only passing through. Putting the client on the edge node isolates this and thus prevents CPU contention on the cluster doing jobs. The edge node typically is used for client implementation a) to isolate users from logging into master or worker nodes, and b) for isolating resource usage as with CPU with Sqoop.

gkeys · ‎08-09-2017

All good questions and fortunately the answer is very simple: all data passes through the edge node with no staging or landing there. Even better, the data passes directly to hadoop where it performs a map-reduce job (all mappers, no reducers) to import the rows in parallel. Useful refs: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+MR+Execution+Engine https://blogs.apache.org/sqoop/entry/apache_sqoop_overview https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781784396688/6/ch06lvl1sec59/sqoop-2-architecture

gkeys · ‎08-05-2017

HDP 2.6 allows {user} variable in Ranger policies, e.g. row-level filtering. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/user_variable_ref.html https://community.hortonworks.com/questions/102532/set-user-user-in-ranger-policy.html Are there any other variables besides {user} available, perhaps group?

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: API + how to know by API command all machines ...

Re: Can't access the 127.0.0.1:8888 port

Re: SAM error: com.hortonworks.registries.schemare...

Schema Registry metadata error from NiFi: Unrecogn...

Re: Multi-delimiter in Hive using regex

SAM error: com.hortonworks.registries.schemaregist...

Re: Is there any way to reduce log file sizes in A...

Re: Does data get copied in edge node from externa...

Re: Does data get copied in edge node from externa...

HDP 2.6 allows {user} variable in Ranger policies....