About mqureshi

mqureshi · ‎07-25-2017

@PJ These directories exists on journal nodes if that's what you are using or whatever disk you will specify in ambari for namenode when you do your install. I think you will find the following link helpful. https://hortonworks.com/blog/hdfs-metadata-directories-explained/

jgeorge · ‎07-26-2017

Thanks @Matt Clarke

sahilbhange · ‎11-19-2018

Here's the detailed implementation of slowly changing dimension type 2 in Hive using exclusive join approach. Assuming that the source is sending a complete data file i.e. old, updated and new records. Steps: Load the recent file data to STG table Select all the expired records from HIST table select * from HIST_TAB where exp_dt != '2099-12-31' Select all the records which are not changed from STG and HIST using inner join and filter on HIST.column = STG.column as below select hist.* from HIST_TAB hist inner join STG_TAB stg on hist.key = stg.key where hist.column = stg.column Select all the new and updated records which are changed from STG_TAB using exclusive left join with HIST_TAB and set expiry and effective date as below select stg.*, eff_dt (yyyy-MM-dd), exp_dt (2099-12-31) from STG_TAB stg left join (select * from HIST_TAB where exp_dt = '2099-12-31') hist on hist.key = stg.key where hist.key is null or hist.column != stg.column Select all updated old records from the HIST table using exclusive left join with STG table and set their expiry date as shown below: select hist.*, exp_dt(yyyy-MM-dd) from (select * from HIST_TAB where exp_dt = '2099-12-31') hist left join STG_TAB stg on hist.key= stg.key where hist.key is null or hist.column!= stg.column unionall queries from 2-5 and insert overwrite result to HIST table More detailed implementation of SCD type 2 can be found here- https://github.com/sahilbhange/slowly-changing-dimension Hope this helps!

prakharagrawal6 · ‎07-11-2017

Sorry....I attached the wrong code...please find it here nodereadfrommongo.txt

suhel_khan · ‎05-29-2017

@mqureshi The cluster currently only has one active name node. Is there a better way to find out the 'Active Node' ? I used the following as well.. but does not distinguish curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active dh01 ~]$ curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active [1] 16533 -bash: metrics/dfs/FSNamesystem/HAState=active: No such file or directory [ayguha@dh01 ~]$ { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE", "items" : [ { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au/host_components/NAMENODE", "HostRoles" : { "cluster_name" : "belong1", "component_name" : "NAMENODE", "host_name" : "dh01.int.belong.com.au" }, "host" : { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au" } }, { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au/host_components/NAMENODE", "HostRoles" : { "cluster_name" : "belong1", "component_name" : "NAMENODE", "host_name" : "dh02.int.belong.com.au" }, "host" : { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au" } } ] } Also hdfs-site.xml does not have the property dfs.namenode.rpc-address.

fmm_pires · ‎05-22-2017

@mqureshi The raw file written have the same problem when i view in "Files View" in Ambari. Perhaps the problem is when visualizing although the right encoding in UTF-8.

Adda_Fuentes2 · ‎05-22-2017

I was able to figure it out. I used the EvaluateJsonPath processor and grabbed the 'Raw_Json' and the 'partition_date' column and then I used the AttributestoJson processor to turn those two attributes into a Json. Afterwards the Inferavroschema processor was able to infer the 'Raw_Json" column as a string and it is now putting it into the Hive table via HiveStreaming correctly.

adib_elaraki · ‎05-21-2017

here are my interface configuration both from my linux host, and my hortonworks sandbox where nifi belongs: Thanks again !

hussnainahmad85 · ‎06-08-2017

Dear Vinay Upala Pls join Me ON Skype to proceed my request i have same issue Commands can't run

mqureshi · ‎05-12-2017

@Shiv Kabra I think there might be a confusion of what Nifi does. I also think you are making it more complex then this needs to be. First thing first. There is a ReplaceText processor which you *might* be able to use to mask data by changing data content and replacing them with your masking values. It supports Regular expressions. Now, since you are new to Nifi, I will try to give you an overview of what Nifi is purpose built for. Nifi is a data flow management tool. It helps creates a data flow in a few minutes without writing a single line of code. Nifi enables you to ingest data from multiple sources using different protocols where data might be in different format and process the data by may be enriching metadata, changing format (for example JSON to Avro), filtering records, track lineage, move data across data centers (cloud and on-prem) securely, send it to different destinations and much more. Companies use Nifi to manage enterprise data flow. Its rich features include queuing (at each processor level), back pressure and lineage. 2. Can I pass the tables list as an input parameter to the process To do what? Which processor? Check the list of processors below: https://nifi.apache.org/docs.html 3. Can I restart a process - in case there is any failure during execution One of the best features of Nifi. When a failure occurs, you can replay records, stop flow at a processor level, make changes and restart it. 4. Does have any inbuilt Process to handle such requests i.e. doing masking of the sensitive information in tables. I think ReplaceText should do what you are looking for. Nifi is extensible so you can also write your own processor if one of the 200 plus is not enough for you. There is also an executescript processor that you can use to call outside scripts.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: namenode metadata directories

Re: HDF 3.0 - Issue with Adding a new NiFi Node(s)...

Re: Best and Easy way to implement and create SCD2...

Re: I am getting error "not a supported property" ...

Re: HDFS Balancer exits without balancing

Re: Nifi handle Special Characters

Re: Need Help infering an avro schema for a json f...

Re: How to load data from cassandra into NIFI

Re: jps command is not working after opening HDP s...

Re: Using Nifi to mask sensitive data