About kumarvaibhav199

kumarvaibhav199 · ‎08-25-2018

@Dan Chaffelson myid file is already present in the required path inside docker container, I doubt there is need to give docker ip's in the connection string, host address properties of zookeeper and nifi properties file. As of now I Have given my host IP with the two ports open on two dockers(8051,8052) so i think it is trying to find it on the host only and not docker containers. Nifi has embedded ZK individually running on each docker instance. Quorum only comes to picture if there is individual ZK nodes set up and we need to specify which is the primary one

kumarvaibhav199 · ‎08-09-2018

Thanks @Shu

kumarvaibhav199 · ‎05-24-2018

@shu tried almost all the thing mentioned above but still no luck

kumarvaibhav199 · ‎03-14-2018

@Shu Appreciate your efforts.

Former Member · ‎07-06-2017

You might consider using the HDP Sandbox on Azure: https://hortonworks.com/hadoop-tutorial/deploying-hortonworks-sandbox-on-microsoft-azure/

eberezitsky · ‎01-26-2017

Hi @Vaibhav Kumar, If you want to create a bag matching target table's structure, you can do as following: a = load 'file.csv' as PigStorage(',') as (x,y,w); b = foreach a generate x, y, (int)null as z, w; describe b; -- b: {x: int,y: int,z: int,w: int}

aervits · ‎02-01-2017

@Vaibhav Kumar recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv. grunt> register /usr/hdp/current/pig-client/piggybank.jar; grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float); grunt> describe a; a: {Num: int,time: int,BJsales: float} grunt> b = limit a 5; grunt> dump b; output (1,1,200.1) (2,2,199.5) (3,3,199.4) (4,4,198.9) (5,5,199.0) notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.

pardeep_kumar · ‎11-03-2016

Glad that it got resolved.

kumarvaibhav199 · ‎10-08-2016

I'm not sure about my data because here even composite keys can produce duplicates. so going by analytical function is not a good choice for me.Any how query is not taking that much time for me. I voted up for your Solution .Thanks For your Response. 🙂 @Constantin Stanca

maheshmsh88 · ‎10-04-2016

Hi Vaibhav, Please go through this link https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

Online	Offline
Last Visited	‎01-07-2019 11:10 AM

Member Since	‎07-19-2016 06:01 PM
Last Visited	‎01-07-2019 11:10 AM
Posts	88
Kudos received	13

Cloudera Community

Re: Load Hive table using Hiveconf Variable

Re: PIG Store Command not working in local mode

Re: Nifi instances in docker containers doesn't st...

Re: update a get request to elastic search in nifi

Re: Connecting to mysql using nifi

Re: Converting json to CSV giving blank using Nifi

Re: Hortonworks Sandbox 2.6 Stucked

Re: Handing nullls as alias at pig level

Re: Pig Error : ERROR org.apache.pig.tools.grunt.G...

Re: Load Hive table using Hiveconf Variable

Re: Select nth row in hive

Re: Dynamic transpose of rows to columns