About Shu_ashu

Shu_ashu · ‎10-11-2017

Hi @Gayathri Devi, You can use spark sql to get data from Hive table and create a dataframe. There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine You can refer to the below links how to get data directly from HBase without using Hive table. https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/ https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://github.com/hortonworks-spark/shc

Shu_ashu · ‎10-10-2017

Hi @Gayathri Devi You don't have to mention any compression format property in create Hive table statement. Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default. Just create table statement without compression formats property like below, CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname"); Example:- i have created a HBase table with snappy compression and i put 3 records to it then scanned the table. hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' } hbase(main)#put 'tbl_snp','1','cf:name','hcc' hbase(main)#put 'tbl_snp','2','cf:name','hdp' hbase(main)#put 'tbl_snp','3','cf:name','hdf' hbase(main)#scan 'tbl_snp' ROW COLUMN+CELL 1 column=cf:name, timestamp=1507641820083, value=hcc 2 column=cf:name, timestamp=1507641848288, value=hdp 3 column=cf:name, timestamp=1507641855165, value=hdf 3 row(s) in 0.0190 seconds Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table Create Table Statement:- create external table default.tbl_snp(id int, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp"); select * from default.tbl_snp; +-------------+---------------+--+ | tbl_snp.id | tbl_snp.name | +-------------+---------------+--+ | 1 | hcc | | 2 | hdp | | 3 | hdf | +-------------+---------------+--+ 3 rows selected (0.876 seconds) i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.

Shu_ashu · ‎10-10-2017

@Sammy Gold, i tried to add new element to the csv content we can add new field by changing AvroSchemaRegistry to below {"name": "origFormatName","namespace": "someFields","type": "record","fields": [{ "name": "name", "type": "string" },{ "name": "age", "type": "int" },{ "name": "height", "type": "int" },{ "name": "weight", "type": "int" },{ "name": "school", "type": "string" },{ "name": "heightmm", "type": "int" }]} but we cannot do math calculations i think.. Following things i tried:- Replacement value strategy as Record Path value but it wont does any math on the existing /height field /heightmm as /height*10,/height:multiply(10) When can this processor works to add values? It will work if you are going to have heightmm value as /height value then this processor works Replacement value strategy as Record Path value /heightmm as /height Example:- name,age,height,weight,school,heightmm james,19,2222,56,,2222 jake,20,2222,62,,2222 sam,21,2222,55,,2222 Mike,24,2222,64,,2222 we can replace only with the existing values, but we cannot do math operations i think. Another way:- To add new record for the existing csv data you can extract the height content as attribute by using extract attribute processor and use replace text processor to create new record based on height attribute. here we are keeping Replacement Strategy property to Append it will keeps the content as is and append the new value to the content. (or) you can use Replacement Strategy property to regexreplace Replacement value as $1${height:multiply(10)} it will give same result as Append method did.

Shu_ashu · ‎10-09-2017

Hi @dhieru singh, you can copy hdfs-site.xml,core-site.xml to nifi lib path and restart nifi, then you don't have to specify the path because nifi will load all the .xml from lib path. path:- /usr/hdf/current/nifi/lib If you are having xml's in nifi lib path then you dont have to give any path for Hadoop Configuration Resources property. (or) you can use hdfs-site.cml,core-site.xml from /etc/hdp/hdfs-site.xml,/etc/hdp/core-site.xml and specify the path in the HDFS processor configurations then nifi will refer to those configuration resources from your specified path. If you don't have xml's in nifi lib path then you have to give any path for Hadoop Configuration Resources property.

Shu_ashu · ‎10-09-2017

@eric valoschin Can you try the following command hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \ -Dmapred.reduce.tasks=1 \ -input "<path-to-input-directory>" \ -output "<path-to-output-directory>" \ -mapper cat \ -reducer cat make sure which version of hadoop streaming jar you are using by going to /usr/hdp then give the input path and make sure the output directory is not existed as this job will merge the files and creates the output directory for you. Here what i tried:- #hdfs dfs -ls /user/yashu/folder2/ Found 2 items -rw-r--r-- 3 hdfs hdfs 150 2017-09-26 17:55 /user/yashu/folder2/part1.txt -rw-r--r-- 3 hdfs hdfs 20 2017-09-27 09:07 /user/yashu/folder2/part1_sed.txt #hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \ > -Dmapred.reduce.tasks=1 \ > -input "/user/yashu/folder2/" \ > -output "/user/yashu/folder1/" \ > -mapper cat \ > -reducer cat Folder2 having 2 files after running the above command, i am storing the merged files to folder1 directory and the 2 files got merged into 1 file as you can see below. #hdfs dfs -ls /user/yashu/folder1/ Found 2 items -rw-r--r-- 3 hdfs hdfs 0 2017-10-09 16:00 /user/yashu/folder1/_SUCCESS -rw-r--r-- 3 hdfs hdfs 174 2017-10-09 16:00 /user/yashu/folder1/part-00000

Shu_ashu · ‎10-08-2017

@Sammy Gold, in updaterecord processor can you change /fields[2]/height to search property to /height as shown in the screenshot below, it will replace all the height to 2222 value.

Shu_ashu · ‎10-08-2017

@Foivos A, Keep your ftp command in shell script and keep this shell script in local because you want to list HDFS files Script1:- ftp.sh ncftpput -f /path/to/login.txt /path/to/ftp/remote${1} Then call the above shell script1 from shell script2 . include $1 in script 2 because script 1 accepts one argument then only your script2 gets the arguments from nifi and calls script 1 with those arguments. Script2:- streamcommand_ftp.sh <path-to-ftp.sh>/ftp.sh $1 then call streamcommand_ftp.sh from Executestreamcommand processor and specify Command Arguments property as your filename.

Shu_ashu · ‎10-08-2017

@Shailesh Nookala, i tested out with your csv file its working fine, can you attach all of your processors configs screenshots please.

Shu_ashu · ‎10-06-2017

@Gayathri Devi, We cannot create Hive-Hbase table in Avro format. As Hive is just a wrapper on top of HBase table. HBase having all the data got stored and we are mapping fields to Hive table and just exposing data in Structured manner from Hive. If you want to create Avro (or) orc format table then you can prepare a snapshot table and use them for your needs. Create table default.avro_table stored as avro as select * from hive_hbase_tablename; Create table default.orc_table stored as orc as select * from hive_hbase_tablename; In this way you can create Avro (or) orc tables for Hive-Hbase tables and you can use where clause to get only the required data from hive_hbase_table.

Shu_ashu · ‎10-05-2017

@Sumit Sharma I think there are no links to share but i have attached my .xml file, you can download and upload that xml change to that to your requirements. flow-extract-mergexml.xml you can refer to below link to how to import xml file into your nifi canvas https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_user-guide/content/Import_Template.html

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Improving performance on spark for hive

Re: Hive External table for Hbase can we create ...

Re: add field to csv file based on other field val...

Re: Standard way of giving Hadoop Configuration Re...

Re: merge file in hdfs

Re: add field to csv file based on other field val...

Re: Create files from GetHDFS processor flowfiles

Re: I used GetFTP processor to get a CSV file from...

Re: Can we create a external table in Hive with H...

Re: Read text file using GetFile Processor and Sp...