About bhagan

bhagan · ‎10-04-2016

I faced the same issue. I used sqoop to import a table, then the search function just hung. I reimported the vm, and now I can't access the Atlas dashboard. I get a 503 error.

bhagan · ‎09-30-2016

Sweet. Glad I could help.

bhagan · ‎09-30-2016

Try this: ExecuteSQL > SplitAvro > ConvertAvrotoJSON > EvaluateJsonPath SplitAvro creates individual Avro records ConvertAvrotoJSON creates JSON from Avro EvaluateJsonPath allows you to create new FlowFile attributes from JSON path.

bhagan · ‎09-30-2016

Try replacing your ConvertAvrotoJSON with a SplitAvro processor. So try a flow like this: ExecuteSQL > SplitAvro > ConvertAvrotoJSON > PutMongo

bhagan · ‎09-30-2016

Are you sure that only the first record was written? The NiFi doc says ConvertAvrotoJson converts to a single JSON object.

bhagan · ‎09-15-2016

Oh, sorry I missed that.

bhagan · ‎09-13-2016

I have done the following: In my main method: public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(conf); ... My app needs a json file, so in my run configuration, I just put the following on the Arguments > Program arguments tab: /Users/bhagan/Documents/jsonfile.json And make sure you have all the dependencies you need in your pom.xml: Run it and see the output: Give it a shot and let us know if you get it working.

bhagan · ‎08-23-2016

@Sunile Manjee Yes, I did flatten the json. Here is what I used (all one line): {"enumTypes":[],"structTypes":[],"traitTypes": [{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"EXPIRES_ON","attributeDefinitions":[{"name":"expiry_date","dataTypeName":"string","multiplicity":"required","isComposite":false,"isUnique":false,"isIndexable":true,"reverseAttributeName": null}]}],"classTypes":[]} But for me, I had left out an attribute.

bhagan · ‎07-29-2016

I was reviewing some posts related to Pig, and found the following question interesting: https://community.hortonworks.com/questions/47720/apache-pig-guarantee-that-all-the-value-in-a-colum.html#answer-47767 I wanted to share an alternative solution using Pentaho Data Integration (PDI), an open source ETL tool, that provides visual mapreduce capabilities. PDI is YARN ready, so when you configure PDI to use your HDP cluster (or sandbox) and run the attached job, it will run as a YARN application. The following image is your Mapper. Above, you see the main transformation. It reads input, which you configure in the Pentaho MapReduce Job (seen below). The transformation follows a pattern, which is to immediately split the delimited file into individual fields. Next, I use a Java Expression to determine if a field is numeric. If not, the we set the value of the field as the String, null. Next, to prepare for MapReduce output, we concatenate the fields back together as a single value and pass the key / value to the MapReduce Output. Once you have the main MapReduce transformation created, you wrap that into a PDI MapReduce Job. If you're familiar with MapReduce, you will recognize the configuration options below, which you would set in your code. Next, configure your Mapper. The Job Succeeds! And the file is in HDFS.

bhagan · ‎07-26-2016

It is often the case that we need to install Hortonworks in environments with strict requirements. One such requirement may be that all http traffic must go through a dedicated proxy server. When installing Hortoworks HDP using Ambari, you can find instructions for configuring Ambari to use the proxy on the docs.hortonworks.com website. For example, here is the page for configuring Ambari 2.2 http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_reference_guide/content/ch_setting_up_an_internet_proxy_server_for_ambari.html Notice that the instructions mention that you must also configure yum to use the proxy. It’s important to note that the above instructions for yum will configure all repositories to use the proxy, and you may not want this behavior. So while it is great to set the proxy at yum’s global level, you should review any existing repository configurations to determine if they should not use the proxy. If any repositories should not use the proxy, then you can update their configurations with the following option: proxy=_none_ Additionally, while preparing for an HDP installation, you will also use the tools wget and curl. I suggest that you confirm that these tools are also setup to use the proxy. If not, it’s as easy as setting the proxy option in their configuration files. Wget has a global file /usr/local/etc/wgetrc. Wget Options: use_proxy = on http_proxy = http://proxyhost:port Curl does not have a global file, so you can create .curlrc in your home directory. proxy <[protocol://][user:password@]proxyhost[:port] Once you have Ambari, yum, wget, and curl configured to use your proxy, you’ll be ready to start the installation.

Online	Offline
Last Visited	‎01-10-2022 11:19 AM

Member Since	‎09-29-2015 03:09 PM
Last Visited	‎01-10-2022 11:19 AM
Posts	142
Kudos received	45

Cloudera Community

Re: HIVE insert/update/delete

Re: updating and inserting new data to mysql using...

Re: requirement for ACLs

Re: Ambari - adding custom service

Re: nifi dataflow - get result, parse it, and save...

Re: How to get Atlas up and running in HDP 2.5 San...

Re: Nifi: ExecuteSQL returns multiple rows of data...

Re: how i extract attribute from json file using n...

Re: Nifi: ExecuteSQL returns multiple rows of data...

Re: Nifi: ExecuteSQL returns multiple rows of data...

Re: Is there a easy way to test spark applications...

Re: Is there a easy way to test spark applications...

Re: Create Trait Types in Atlas

Finding Non-Numerics in a File - Pig Alternative

Preparing to Install HDP behind a Proxy