Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11116 | 04-15-2020 05:01 PM | |
7019 | 10-15-2019 08:12 PM | |
3061 | 10-12-2019 08:29 PM | |
11238 | 09-21-2019 10:04 AM | |
4189 | 09-19-2019 07:11 AM |
10-11-2017
01:08 PM
1 Kudo
Hi @Gayathri Devi, You can use spark sql to get data from Hive table and create a dataframe. There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine You can refer to the below links how to get data directly from HBase without using Hive table. https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/ https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://github.com/hortonworks-spark/shc
... View more
10-10-2017
01:38 PM
2 Kudos
Hi @Gayathri Devi You don't have to mention any compression format property in create Hive table statement. Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default. Just create table statement without compression formats property like below, CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname"); Example:- i have created a HBase table with snappy compression and i put 3 records to it then scanned the table. hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' }
hbase(main)#put 'tbl_snp','1','cf:name','hcc'
hbase(main)#put 'tbl_snp','2','cf:name','hdp'
hbase(main)#put 'tbl_snp','3','cf:name','hdf'
hbase(main)#scan 'tbl_snp'
ROW COLUMN+CELL
1 column=cf:name, timestamp=1507641820083, value=hcc
2 column=cf:name, timestamp=1507641848288, value=hdp
3 column=cf:name, timestamp=1507641855165, value=hdf
3 row(s) in 0.0190 seconds Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table Create Table Statement:- create external table default.tbl_snp(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp"); select * from default.tbl_snp;
+-------------+---------------+--+
| tbl_snp.id | tbl_snp.name |
+-------------+---------------+--+
| 1 | hcc |
| 2 | hdp |
| 3 | hdf |
+-------------+---------------+--+
3 rows selected (0.876 seconds) i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.
... View more
10-10-2017
01:20 AM
2 Kudos
@Sammy Gold, i tried to add new element to the csv content we can add new field by changing AvroSchemaRegistry to below {"name": "origFormatName","namespace": "someFields","type": "record","fields": [{ "name": "name", "type": "string" },{ "name": "age", "type": "int" },{ "name": "height", "type": "int" },{ "name": "weight", "type": "int" },{ "name": "school", "type": "string" },{ "name": "heightmm", "type": "int" }]} but we cannot do math calculations i think.. Following things i tried:- Replacement value strategy as Record Path value but it wont does any math on the existing /height field /heightmm as /height*10,/height:multiply(10) When can this processor works to add values? It will work if you are going to have heightmm value as /height value then this processor works Replacement value strategy as Record Path value /heightmm as /height Example:- name,age,height,weight,school,heightmm james,19,2222,56,,2222
jake,20,2222,62,,2222
sam,21,2222,55,,2222
Mike,24,2222,64,,2222 we can replace only with the existing values, but we cannot do math operations i think. Another way:- To add new record for the existing csv data you can extract the height content as attribute by using extract attribute processor and use replace text processor to create new record based on height attribute. here we are keeping Replacement Strategy property to Append it will keeps the content as is and append the new value to the content. (or) you can use Replacement Strategy property to regexreplace Replacement value as $1${height:multiply(10)} it will give same result as Append method did.
... View more
10-09-2017
11:04 PM
Hi @dhieru singh, you can copy hdfs-site.xml,core-site.xml to nifi lib path and restart nifi, then you don't have to specify the path because nifi will load all the .xml from lib path. path:- /usr/hdf/current/nifi/lib If you are having xml's in nifi lib path then you dont have to give any path for Hadoop Configuration Resources property. (or) you can use hdfs-site.cml,core-site.xml from /etc/hdp/hdfs-site.xml,/etc/hdp/core-site.xml and specify the path in the HDFS processor configurations then nifi will refer to those configuration resources from your specified path. If you don't have xml's in nifi lib path then you have to give any path for Hadoop Configuration Resources property.
... View more
10-09-2017
08:07 PM
1 Kudo
@eric valoschin
Can you try the following command hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \
-Dmapred.reduce.tasks=1 \
-input "<path-to-input-directory>" \
-output "<path-to-output-directory>" \
-mapper cat \
-reducer cat make sure which version of hadoop streaming jar you are using by going to /usr/hdp then give the input path and make sure the output directory is not existed as this job will merge the files and creates the output directory for you. Here what i tried:- #hdfs dfs -ls /user/yashu/folder2/
Found 2 items
-rw-r--r-- 3 hdfs hdfs 150 2017-09-26 17:55 /user/yashu/folder2/part1.txt
-rw-r--r-- 3 hdfs hdfs 20 2017-09-27 09:07 /user/yashu/folder2/part1_sed.txt #hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \
> -Dmapred.reduce.tasks=1 \
> -input "/user/yashu/folder2/" \
> -output "/user/yashu/folder1/" \
> -mapper cat \
> -reducer cat Folder2 having 2 files after running the above command, i am storing the merged files to folder1 directory and the 2 files got merged into 1 file as you can see below. #hdfs dfs -ls /user/yashu/folder1/
Found 2 items
-rw-r--r-- 3 hdfs hdfs 0 2017-10-09 16:00 /user/yashu/folder1/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 174 2017-10-09 16:00 /user/yashu/folder1/part-00000
... View more
10-08-2017
07:04 PM
@Sammy Gold, in updaterecord processor can you change /fields[2]/height to search property to /height as shown in the screenshot below, it will replace all the height to 2222 value.
... View more
10-08-2017
06:07 PM
@Foivos A, Keep your ftp command in shell script and keep this shell script in local because you want to list HDFS files Script1:- ftp.sh ncftpput -f /path/to/login.txt /path/to/ftp/remote${1} Then call the above shell script1 from shell script2 . include $1 in script 2 because script 1 accepts one argument then only your script2 gets the arguments from nifi and calls script 1 with those arguments. Script2:- streamcommand_ftp.sh <path-to-ftp.sh>/ftp.sh $1 then call streamcommand_ftp.sh from Executestreamcommand processor and specify Command Arguments property as your filename.
... View more
10-08-2017
05:41 PM
@Shailesh Nookala, i tested out with your csv file its working fine, can you attach all of your processors configs screenshots please.
... View more
10-06-2017
02:34 AM
1 Kudo
@Gayathri Devi, We cannot create Hive-Hbase table in Avro format. As Hive is just a wrapper on top of HBase table. HBase having all the data got stored and we are mapping fields to Hive table and just exposing data in Structured manner from Hive. If you want to create Avro (or) orc format table then you can prepare a snapshot table and use them for your needs. Create table default.avro_table stored as avro as select * from hive_hbase_tablename; Create table default.orc_table stored as orc as select * from hive_hbase_tablename; In this way you can create Avro (or) orc tables for Hive-Hbase tables and you can use where clause to get only the required data from hive_hbase_table.
... View more
10-05-2017
11:28 PM
@Sumit Sharma I think there are no links to share but i have attached my .xml file, you can download and upload that xml change to that to your requirements. flow-extract-mergexml.xml you can refer to below link to how to import xml file into your nifi canvas https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_user-guide/content/Import_Template.html
... View more