Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11278 | 04-15-2020 05:01 PM | |
| 7167 | 10-15-2019 08:12 PM | |
| 3155 | 10-12-2019 08:29 PM | |
| 11614 | 09-21-2019 10:04 AM | |
| 4393 | 09-19-2019 07:11 AM |
05-03-2018
05:10 AM
1 Kudo
@B
X
We can reduce the fields by using convert record processor by using Schema Access Strategy as Use Embedded Avro Schema in Record Reader and Record Writer define your desired schema(only the required fields) For renaming the fields and create unique field(used as row key in hbase) use Update Record processor. Example:- Here is my sample flow to reduce and create/rename fields First 3 processors are used to generate avro data file with Input Data:- id,name,age,state1,foo,20,FL
2,foo2,30,TX
3,foo3,40,CA Convertrecord processor:- Record Reader is Avro Reader and use embedded avro schema Record Writer:- In writer i used Schema Access Strategy as use "schema text" property but you can use schema registry with schema name as access strategy. Now as we have avro file with 4 fields but avro set writer is writing only id,state fields as output. So the output flowfile content will have only 2 fields(id,state) now we have reduced the content of the input flowfile from avro reader to writer. Ouptut flowfile will be in avro format to view the data i used AvroToJson processor [{
"id": "1",
"state": "FL"
},
{
"id": "2",
"state": "TX"
},
{
"id": "3",
"state": "CA"
}]
Update Record:- We are going to create/rename the fields in this processor Record Reader use as embedded avro schema Record Writer:- 1.Creating a field /row_id
concat( /id, '${filename}' , /state ,'${UUID()}') in the above value we are concatinating id,state(from record path) filename,UUID(associated with file) by doing this you will get unique id and use as hbase row key also. 2.Changing the field names To change the fields we need to swap them to new desired names /rename_id
/id
/rename_state
/state get the id,state values from record path and assign to rename_id,rename_state fields Avro Setwriter configs:- We are adding new fields and renaming existing so change the writer schema Ouptut flowfile will be in avro format to view the data i used AvroToJson processor and the new content of the flowfile will have our new/renamed fileds. [ {
"row_id" : "1194346641658863.avroFL0d7a8c16-d986-4329-8c02-02662db1cb98",
"rename_id" : "1",
"rename_state" : "FL"
}, {
"row_id" : "2194346641658863.avroTX1db33ffe-142c-4afa-aba4-e1583b1d4d91",
"rename_id" : "2",
"rename_state" : "TX"
}, {
"row_id" : "3194346641658863.avroCAfe096132-525b-4f72-939d-324f2e62407d",
"rename_id" : "3",
"rename_state" : "CA"
} ] i have attached my flow.xml to understand easily save/upload template to your instance. 188134-reduce-and-create-rename-fields.xml
... View more
05-02-2018
03:21 AM
@Paul
Byrum I don't think we can write to c:\ but can you use the below commands to check the files in test directory(is there any setting causing to hide these files?). Open cmd terminal c:\Users>cd .. c:\>cd c:\test c:\>dir then you are going to see all the files in the directory. if you don't see any files still then use PutFile[id=1967bc4d-0163-1000-1af5-b40c46be92b6] Produced copy of StandardFlowFileRecord[uuid=58d18018-a864-4c12-a98d-498bbf8d19d1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1525145201278-982, container=default, section=982], offset=336986, length=168493],offset=0,name=3706261132832753,size=168493] at location c:/test/3706261132832753 Assuming with above logs to configure GetFile processor and configure the processor with your test directory and file filter as same filename that you have stored into the directory Input Directory C:\test File Filter
3706261132832753 Keep Source File
false
... View more
05-02-2018
01:10 AM
@Sami Ahmad Unfortunately, i'm not able to reproduce the scenario in my end. i have created countries table in hbase with 40 versions Then ran sqoop import to this table with 1 mapper sqoop import --connect '**' --username root--password 'root' --query "SELECT '1' as CountryID,CountryCode from countries where \$CONDITIONS" -m 1 --hbase-table countries --column-family c --hbase-row-key CountryID i'm assigning value '1' for each countryid and i'm having 241 records in counties table once the import got finished even though i'm having table with 40 versions hbase only showing the last record(that means sqoop import to hbase table will not store all the versions but from hbase shell it will store all the versions) hbase> scan 'countries',{ VERSIONS =>40}
ROW COLUMN+CELL
1 column=c:CountryCode, timestamp=1525221297408, value=ZW I ran the same sqoop import again to this table then i'm able to see 2 versions of as shown below. hbase>scan 'countries',{ VERSIONS =>40}
ROW COLUMN+CELL
1 column=c:CountryCode, timestamp=1525221408212, value=ZW
1 column=c:CountryCode, timestamp=1525221297408, value=ZW for third run i'm able to see 3 versions hbase> scan 'countries',{ VERSIONS =>40}
ROW COLUMN+CELL
1 column=c:CountryCode, timestamp=1525221470733, value=ZW
1 column=c:CountryCode, timestamp=1525221408212, value=ZW
1 column=c:CountryCode, timestamp=1525221297408, value=ZW if i run same sqoop import for 40 times then i'm having 40 versions with same value as zw in hbase table. My hbase version is 1.1.2.2.5.3.0-37 and sqoop 1.4.6.2.5.3.0-37, For your case even for first import you are able to see 3 versions(i think because of -m 4 argument is doing 3 versions instead of 1). Try this hbase> disable 'PUR_ACCT_PHX1' hbase> drop 'PUR_ACCT_PHX1' hbase> create 'PUR_ACCT_PHX1',{NAME =>'cf1', VERSIONS =>40} run sqoop import with 1 mapper sqoop import--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))"--username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX1 --column-family cf1 --hbase-row-key "ACCT_NUM"-m 1 hbase(main):001:0> scan 'PUR_ACCT_PHX1',{VERSIONS=>40} I think with this scan command you have to see only 1 version instead of 3 versions.
... View more
05-01-2018
11:27 PM
@Suhag Pandya You can use instr function in hive to return the first occurance of the substring in your string. select instr(string str, string substr); Example:- hive> select instr('foobar','o'); //returns first position of o in foobar string
+------+--+
| _c0 |
+------+--+
| 2 |
+------+--+ To return the first position of the substring from nth position in your string then use locate function in hive Example:- hive> select locate('o','fooobar',4);//returns the first occurance from 4th position
+------+--+
| _c0 |
+------+--+
| 4 |
+------+--+ For case insensitive use lower function with locate function Example:- hive> select locate('o',lower('FOOOBAR'),4);
+------+--+
| _c0 |
+------+--+
| 4 |
+------+--+ There is no built in function that can gives your nth occurrence and match regex of the substr in hive yet, but you can create UDF for this functionality in hive. - If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
05-01-2018
09:40 AM
@srini Could you make sure you are having valid json message in "/user/***********/json2/" this path and keep all the json message in one line(no new line) (or) just create a new file in your local and then put that file into your hadoop directory bash$ vi sample_json.txt //create a new file with the below sample message in it.<br>{ "purchaseid": { "ticketnumber": "23546852222", "location": "vizag", "Travelerhistory": { "trav": { "fname": "ramu", "lname": "gogi", "travelingarea": { "destination": { "stationid": "KAJKL", "stationname": "hyd" } }, "food": { "foodpref": [{ "foodcode": "CK567", "foodcodeSegment": "NOVEG" }, { "foodcode": "MM98", "foodcodeSegment": "VEG" } ] } } } } }
bash$ hadoop fs -rm -r /user/***********/json2/ //for testing purposes we are deleting all the files in the dirbash$ hadoop fs -put -f sample_json.txt /user/***********/json2/ //put the newly created file into the hdfs dir - if you are still having issue attach screenshot of your input data in /user/***********/json2/ this path.
... View more
05-01-2018
05:02 AM
1 Kudo
@srini
Try with the below create table statement hive> Create table ds1414(
purchaseid struct<ticketnumber:string,location:string,
Travelerhistory:struct<
trav:struct<fname:string,lname:string,
travelingarea:struct< destination :struct<stationid:string,stationname:string>>,
food :struct<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>
>
>
>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/***********/json2/';
Examples of Accessing data from hive table as follows hive> select * from ds1414;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| ds1414.purchaseid |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| {"ticketnumber":"23546852222","location":"vizag","travelerhistory":{"trav":{"fname":"ramu","lname":"gogi","travelingarea":{"destination":{"stationid":"KAJKL","stationname":"hyd"}},"food":{"foodpref":[{"foodcode":"CK567","foodcodesegment":"NOVEG"},{"foodcode":"MM98","foodcodesegment":"VEG"}]}}}} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
hive> select purchaseid.ticketnumber from ds1414;
+---------------+--+
| ticketnumber |
+---------------+--+
| 23546852222 |
+---------------+--+
hive> select ds1414.purchaseid.ticketnumber,purchaseid.location from ds1414;
+---------------+-----------+--+
| ticketnumber | location |
+---------------+-----------+--+
| 23546852222 | vizag |
+---------------+-----------+--+
hive> select purchaseid.Travelerhistory.trav.food.foodpref from ds1414;
+-----------------------------------------------------------------------------------------------+--+
| foodpref |
+-----------------------------------------------------------------------------------------------+--+
| [{"foodcode":"CK567","foodcodesegment":"NOVEG"},{"foodcode":"MM98","foodcodesegment":"VEG"}] |
+-----------------------------------------------------------------------------------------------+--+
hive> select purchaseid.Travelerhistory.trav.food.foodpref.foodcode from ds1414;
+-------------------+--+
| foodcode |
+-------------------+--+
| ["CK567","MM98"] |
+-------------------+--+
hive> select purchaseid.Travelerhistory.trav.food.foodpref.foodcode[0] from ds1414;
+--------+--+
| _c0 |
+--------+--+
| CK567 |
+--------+--+ Let us know if you are having any questions..
... View more
04-30-2018
12:01 AM
@Paul
Byrum
Funnel is just for testing out where the flowfiles are routing to i.e is they are going to Success or Failure relationship, there is no other configs needed for funnel. Now 9 flowfiles are routed to Success relationship(i.e no permission issues encountered), now check the files in the directory(c:\data). If you are still not able to see the files in the directory share PutFile processor configs and list of files in data directory using ui (or) cmd screenshots.
... View more
04-29-2018
10:51 PM
@Paul
Byrum
Check once the permissions because there needs to be 10.1 MB in and also Out. For debug purpose use funnel and connect success and failure relationships from PutFile processor. Sample Flow: Check where the files are routing to success or failure relationships, if the files are routing to failure relation then go to nifi-app.log to find the root cause.
... View more
04-29-2018
09:23 PM
@Paul
Byrum
Use cmd terminal to go to data directory and mention the full path in directory value. You need to mention path as below if you want to see those files in your data directory C:\Users\<username>\Data - Go to cd c:\Data directory using command prompt(cmd) to check the already written files from NiFi Example:- If i want to write any file from NiFi to Downloads directory i need to mention below path in PutFile directory value C:\Users\Shu\Downloads if i use C:\Downloads we are writing the file into C:\Downloads not to C:\Users\Shu\Downloads, use cmd terminal(cd C:\Downloads) to see the files.
... View more
04-29-2018
03:44 AM
@Abhinav Joshi Yes if the ports are blocked we need to open them from NiFi servers.The above example i tried on my local NiFi instance and there are no ports are blocked.
... View more