About Biswajit16

reyaan26 · ‎09-16-2020

I believe this will fail if you stop your job today and run it tomorrow.. now will change to other day and you will miss the data...

Shu_ashu · ‎02-02-2018

@Biswajit Chakraborty If you are using GetFTP processor then after pulling files then processor going to add getftp.remote.source attribute to the flowfile, then you can use this flowfile attribute then prepare filename in update attribute processor Add new property in update attribute filename ${filename}_${getftp.remote.source} //add remote source name to the filename as you can change the way of using expression language to change filename as following ${filename:append(${getftp.remote.source})} //result 711866091328995HDF04-1 (or) ${filename}${getftp.remote.source} //result 711866091328995HDF04-1 Example:- if you are having filename value as 711866091328995 and getftp.remote.source value as HDF04-1 then output flowfile from update attribute will have filename as 711866091328995_HDF04-1 //because we are adding remote source value to filename with underscore (or) if you are having issue with the same filenames and they are getting overwritten, The FlowFile will also have an attribute named uuid, By using UUID(which is a unique identifier for this FlowFile) as filename, will keep every filename as unique so that we are not going to have any overwriting issues. Configs:- filename ${uuid}

Biswajit16 · ‎01-15-2018

@Bala , sorry for vary late response .... Actually my purpose is read some data file(server log) , transform those into proper format and prepare a data warehouse (that in my case , HIVE) for analysis on latter. So , in my project I have 3 different activities mainly 1) read and transform data from txt/log file (For which I am using Spark -- frequency : daily job) 2) prepare a data-ware house with those daily data (for which , I am inserting those Spark DF into HIVE table --- frequency : daily job) 3) Show the result (for this I am using again spark SQL along with HIVE as that is faster than using only HIVE query , and will use Zeppelin or tableau for data visualization --frequency : weekly job or as on required ) Though as my reading and understanding , I guess SpakSQL alone + cache will be much faster the spark+hive , but I think I do ont have any other option as I have to do analysis on repository data. Do you suggest any other approach for this use case?

Biswajit16 · ‎09-23-2017

Thanks a lot for your help , you saved my day ... thanks again .....

JordanMoore · ‎01-23-2018

@Bipin Pradhan, please post your question as a brand new post.

mgaido1 · ‎06-14-2017

Hi, the problem is that your query is syntactically wrong. The right query to achieve your goal is: select memberid, max(insertdtm) from finaldata group by memberid having datediff(current_date, max(insertdtm))>30; Hope it helps.

aervits · ‎08-11-2016

Please consider publishing an article on this, others will find it useful as it's not an obvious find.

Biswajit16 · ‎07-13-2016

thanks a lot.... my bad 😞 , a very bad silly mistake I did.....

fwelsch · ‎07-11-2016

@Biswajit Chakraborty The official Hortonworks documentation for deploying HBase clusters has been spread out in multiple guides, which makes it difficult to find. A refresh of docs.hortonworks.com that will be coming soon with a new release of HDP should correct this problem. For now, you can find some information in these links: HBase Cluster Capacity and Region Sizing Add HBase RegionServer Optimizing HBase I/O I think you are set with installing HBase. But in case you are not, one way to access the installation steps is to use the links on Using Apache HBase and Apache Phoenix (this information will also be enhanced and moved soon). Let me know if this helps. Thanks for your patience.

pminovic · ‎07-05-2016

Cool, but you accepted the wrong answer 🙂

Online	Offline
Last Visited	‎04-27-2017 12:23 AM

Member Since	‎06-28-2016 02:33 PM
Last Visited	‎04-27-2017 12:23 AM
Posts	34
Kudos received	1

Cloudera Community

Re: NiFi with rolloing file pattern

Re: NiFi with updateattribute

Re: Process big size file in spark / Hive

Re: prefix or suffix in NIFI tailFile Processor

Re: Spark with HIVE JDBC connection

Re: Datediff in HIVE with SQL support

Re: MRUnit wih DistributedCache support

Re: HBase installation issue in cluster

Re: Any Link or Documeny for HBASE implementation ...

Re: Incrimental update in HIVE table using sqoop