About sbomma

sbomma · ‎10-06-2016

Please contact hortonworks support that administers the test, You should be able to reschedule the test once the issues get resolved.

sbomma · ‎10-03-2016

This is awesome, where we can demo clustered NIFI servers to clients instead of a standalone instance.

sbomma · ‎09-19-2016

Does Confluent Inc Support cost extra if someone wants to install this in Production? Is there a hortonworks out of the box solution to get RESTful APIs data to be ingested into HDFS.

sbomma · ‎09-16-2016

I had questions about the need for the triggers, The main reason for creating Triggers in mysql are 1) Triggers set up date and time stamp whenever a row is inserted or updated and NIFI processor is polling on the date and time column to pull the latest data from RDBMS into nifi to generate a flow file. Date and time field is critical. 2) Also, it helps to figure out if the record was inserted or updated in Mysql as well as in Hive. So we know the state of the record in the source system. This field is just being used for demo purpose, its not really required to set this data.

sbomma · ‎09-14-2016

Thanks @Joshua Adeleke, this solution is not acceptable for my client as they want to install metastore db in Oracle. Was there any Support tickets opened for this issue?

sbomma · ‎09-13-2016

Was this ERROR solved? One of the clients i am working is facing the same error. Is there a patch available for metastore to work with oracle.

sbomma · ‎09-12-2016

I am assuming you are using Hortonworks data platform. Create an external table pointing to the HDFS location of these CSV files. Once the data is loaded onto your server, move the files to this HDFS location using a cron job like ncron to move the files once it completely transferred. You could write a hive API to read the files using select statements via your java program using jdbc/rest or whatever. ( ncron triggers as soon as a file is copied into the source directory ).

sbomma · ‎09-08-2016

Prerequisites 1)Download HDP Sandbox 2)MySQL database (Should already be present in the sandbox) 3)Nifi 0.6 or later ( Download and install a new version of NIFI or use Ambari to install NIFI in the sandbox) MySQL setup (Source Database) In this setup we will create a table in MySQL tables and create a few triggers on the tables to emulate transactions. These triggers will find out if the change introduced was an insert or an update also will update the time stamp on the updated/inserted row. ( This is very important as Nifi Will be polling on this column to extract changes based on the time stamp) unix> mysql –u root –p unix>Enter password:<enter> mysql> mysql> create database test_cdc; mysql> create user 'test_cdc'@'localhost' identified by 'test_cdc'; mysql> GRANT ALL PRIVILEGES ON *.* TO 'test_CDC'@'%' IDENTIFIED BY 'test_CDC' WITH GRANT OPTION; mysql>Flush Privileges mysql> exit; unix> mysql –u test_cdc –p test_cdc mysql>create table CDC_TEST ( Column_A int, Column_B text, Created_date datetime, INFORMATION text ); Create Triggers in MYSQL mysql> create trigger CDC_insert before insert on cdc_test for each row set NEW.created_date =NOW() , NEW.information = 'INSERT'; mysql> create trigger CDC_UPDATE before update on cdc_test for each row set NEW.created_date = NOW() , NEW.information = 'UPDATE'; HIVE setup (Destination Database) In hive, we have created an external table, with exactly same data structure as MySQL table, NIFI would be used to capture changes from the source and insert them into the Hive table. Using AMBARI Hive view or from HIVE CLI create the following table in the hive default database: I have used hive cli to create the table: Unix> hive Hive> create external table HIVE_TEST_CDC ( COLUMN_A int , COLUMN_B string, CREATED_DATE string, INFORMATION string) stored as avro location '/test-nifi/CDC/' Note: I am not including how to create Managed Hive table with ORC format, that would be covered in a different article. Nifi Setup : This is a simple NIFI setup, the queryDatabase table processor is only available as part of default processors from version 0.6 of Nifi. queryDatabaseProcessor Configuration Its very intuitive The main things to configure is DBCPConnection Pool and Maximum-value Columns Please choose this to be the date-time stamp column that could be a cumulative change-management column This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination. This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions. DBCPConnectionPool Configuration: putHDFS processor configure the Hadoop Core-site.xml and hdfs-site.xml and destination HDFS directory in this case it is /test-nifi/CDC Make sure this directory is present in HDFS otherwise create it using the following command Unix> hadoop fs –mkdir –p /test-nifi/CDC Make sure all the processors are running in NiFi Testing CDC Run a bunch of insert statements on MySQL database. mysql –u test_cdc –p at the mysql CLI run the following inserts: insert into cdc_test values (3, 'cdc3', null, null); insert into cdc_test values (4, 'cdc3', null, null); insert into cdc_test values (5, 'cdc3', null, null); insert into cdc_test values (6, 'cdc3', null, null); insert into cdc_test values (7, 'cdc3', null, null); insert into cdc_test values (8, 'cdc3', null, null); insert into cdc_test values (9, 'cdc3', null, null); insert into cdc_test values (10, 'cdc3', null, null); insert into cdc_test values (11, 'cdc3', null, null); insert into cdc_test values (12, 'cdc3', null, null); insert into cdc_test values (13, 'cdc3', null, null); select * from cdc_test go to hive using cli and check if the records were transferred over using NIFI. Hive> select * from hive_test_cdc Voila…

sbomma · ‎08-24-2016

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/ Check out section 3.4 from the tutorial using the reg-ex. This would help you load the table with columns you need. Also, Another way is like @srai said, create an external table, mapped it to the the csv file. Create a managed table and insert the data using insert into managed table select from external table, explicitly state the columns you want to load with the insert statement.

sbomma · ‎08-24-2016

try including this option –map-column-java blob_column_name=String,clob_column_name=String

Online	Offline
Last Visited	‎01-17-2019 01:14 AM

Member Since	‎05-18-2016 08:07 PM
Last Visited	‎01-17-2019 01:14 AM
Posts	71
Kudos received	39

Cloudera Community

Re: Ambiguities in the Tutorial and things that do...

Re: HDPCA question.

Re: Sqoop hcatalog/hive incremental import in ORC ...

Re: HDPCA exam not launched in examslocal.com

Re: Huge CSV import to Cassandra

Re: HDPCA exam not launched in examslocal.com

Re: Creating a 3 node NiFi cluster using Vagrant a...

Re: Is it possible to ingest data into kafka from ...

Re: Change Data Capture using NiFi

Re: Critical alerts for Hive Metastore. Hive not a...

Re: Critical alerts for Hive Metastore. Hive not a...

Re: Huge CSV import to Cassandra

Change Data Capture using NiFi

Re: Is there is any workaround to map csv columns ...

Re: Sqoop DB2 Import with BLOB Datatype (Has XML D...