Member since
06-03-2016
66
Posts
21
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1310 | 12-03-2016 08:51 AM | |
393 | 09-15-2016 06:39 AM | |
533 | 09-12-2016 01:20 PM | |
464 | 09-11-2016 07:04 AM | |
621 | 09-09-2016 12:19 PM |
12-23-2019
09:55 PM
@MattWho , Thank you for the details. I can understand your point that same file cannot be accessed across cluster nodes. But, I am using Nifi is single node (without cluster) and was thinking that this should work. Yes, i do use "Update Attribute" with the below file name conventions. This generates separate flow file for every message. I am trying to have this as one single file per node. ${filename:replace(${filename},"FileName_"):append(${now():format("yyyy-MM-dd-HH-mm-ss")})}.Json Thank you
... View more
09-02-2019
01:44 AM
What are the impacts on other ports if I change from TCP6 to TCP? And will my Ambari server work on TCP?
... View more
03-13-2019
03:53 AM
I'm trying to import mongodb data into hive. The jar versions that i have used are ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar;
ADD JAR /root/HDL/mongo-hadoop-hive-2.0.2.jar;
ADD JAR /root/HDL/mongo-hadoop-core-2.0.2.jar; And my cluster versions are Ambari - Version 2.6.0.0, HDFS 2.7.3, Hive 1.2.1000, HBase 1.1.2, Tez 0.7.0 MongoDB Server version :- 3.6.5 Hive Script:- CREATE TABLE swipeCardTable
( ID STRING,
EmpID STRING,
BeginDate STRING,
EndDate STRING,
Time STRING,
Type STRING,
Location STRING,
Terminal STRING)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id","EmpID":"emp_id","BeginDate":"begin_date","EndDate":"end_date","Time":"time","Type":"time_event_type","Location":"location","Terminal":"terminal"}')
TBLPROPERTIES('mongo.uri'='mongodb://ro-myworld:romyworld123@10.10.170.43:27017/myworld.swipeCardTable') Output:- hive> select * from sampletable; OK Failed with exception java.io.IOException:java.io.IOException: Failed to aggregate sample documents. Note that this Splitter implementation is incompatible with MongoDB versions prior to 3.2. Please suggest me how can i solve this. Thanks, Mohan V
... View more
Labels:
09-12-2018
04:54 AM
Hello All,
Iam trying to put mongodb data into mysql server.
My GetMongo processor details:- My sample Json looks like this:- {
"_id" : "MMH",
"_class" : "com.SwipecardLocationDetails",
"location_description" : "INDIA"
} PutSQL:- DBCPConnectionPool:- I have used putsql query as: insert into locations values (${_id}, ${_class},${location_description}) But it giving me error as @Shu please correct me. Thanks, Mohan V
... View more
09-11-2018
01:58 PM
@Shu Im still having the same issue. I checked the firewalls and they are off. And i have downloded the jdbc drivers for sql and put it onto C:\Program Files\Java\jre1.8.0_171\lib\ext this directory In PutDatabaserecord:- Url:- jdbc:sqlserver://hjcorpsql-04:1433;databaseName=Test1;user=MyorganizationName\pbiuser;password=Secure@99; ClassName:- com.microsoft.sqlserver.jdbc.SQLServerDriver Driver Location:- E:/Software/sqljdbc_6.0/enu/jre8/sqljdbc42.jar untitled5.png Im running my nifi onto production system. And the sql server is SQL Server Enterprise Edition 2016 and Windows Server 2012 R2. Please help me. I have been struck here from last one week.
... View more
06-07-2017
12:38 PM
@Mohan V A few observations about your above flow... You are trying to pass an absolute path and filename to the "directory" property of the putFile processor. The PutFile is designed to write files to the target directory using the filename associated to the FlowFile it receives. What you are doing will not work. Instead you should add an updateAttribute processor between your MergeContent processor and your PutFile processor to set a new desired filename on your merged files. How do you plan on handling multiple merged FlowFiles since they will all then end up with same filename? I suggest making them unique by adding the FlowFile UUID to the filename. Below is example of doing this using UpdateAttribute: Out of your MergeContent processor you are routing both original (all your un-merged FlowFiles) and merged relationships to the putFile processor. Why? Typically the original relationship is auto-terminated or routed elsewhere if needed. I also see from your screenshot that the PutFile processor is producing a "bulletin" (red square in upper right corner. Floating your cursor over the red square will pop up the bulletin. The bulletin should explain why the putFile is failing. It appears as though you are auto-terminating the failure relationship on PutFile. This is a dangerous practice as it could easily result in data loss. A more typical scenario is to loop failure relationship back on PutFile processor to trigger a retry in the event of failure. Thanks, Matt
... View more
03-21-2017
05:43 PM
Check /var/log/hive/hivemetastore.out as well.
... View more
12-21-2016
04:41 AM
Thanks for the reply Ward Bekker. I have tried what you have suggested but i still didnt get what i exactly need. In my table column family is cd and table name is companydetail. Sample table data:- ROW COLUMN+CELL \x00\x00\x00\x00\x00\x00\x06\xA6 column=cd:cct, timestamp=1475738991531, value=Atlanta \x00\x00\x00\x00\x00\x00\x06\xA6 column=cd:cnt, timestamp=1475740226346, value=Network ICE Corp. \x00\x00\x00\x00\x00\x00\x06\xA6 column=cd:ct, timestamp=1475740596684, value=ISYI srl I believe here cct is a col_prefix. And here we are only giving the column family but we are not mentioning anywhere what is the table that we want to get the data. When i tried this, I get nothing. 0 records.
CREATE TABLE hbase_11(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cd:cct.*,:key" ); I have tried an another way. CREATE EXTERNAL TABLE hbase_table_1(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cd:cct.*,:key" ) TBLPROPERTIES("hbase.table.name" = "companydetail", "hbase.mapred.output.outputtable" = "companydetail_hive"); I get the output as {"cct":null} NULL
{"cct":null} NULL
{"cct":null} NULL
{"cct":null} NULL
{} NULL
{"cct":null} NULL
{} NULL
{"cct":null} NULL
{"cct":null} NULL
{"cct":null} NULL
{"cct":null} NULL
Time taken: 0.45 seconds, Fetched: 1291 row(s)
Table consist of 1291 rows, and if as i said cct is col_prefix for the table, then it is having 1291 col_prefixes. I cant get the whole table data because i have to do it for all the 1291 prefixes. Please help me what i am missing here.
... View more
12-16-2016
07:45 AM
2 Kudos
Hello @Mohan V PutKafka is designed to work with Kafka version 0.8 series. If you're using Kafka 0.9, please use PublishKafka processor instead. Thanks, Koji
... View more
12-03-2016
08:51 AM
Thanks for the suggestion jss. But it could'nt solved the issue completely. I have moved those files into temp directory and again tried to start the server but, now it given another error as ERROR: Exiting with exit code -1.
REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information. when i checked into the logs, there i have found that the current version db is not comapatable with the server. then i have tried these steps
wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.1.0/ambari.repo
yum install ambari-server -y ambari-server setup -y
wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.1.1/ambari.repo
yum upgrade ambari-server -y
ambari-server upgrade ambari-server start when i run these commands after that ambari server did started but here is the amazing this has happened. actually, i removed the ambari completely and trying to reinstall it. when i completed all the above steps, and when i entered into the ambari ui, it is again pointing to the same host which i have removed previously. I was just shocked by seeing that with heartbeat lost. then i realised that ambari agent is not at installed, then i installed ambari agent and started it . yum -y install ambari-agent ambari-agent start then,when i tried to start the services it didnt worked. i checked in command prompt that, is these all serivecs still exist or not ?, by entering zookeeper . but that command is not found, because service is not installed in my host. Then i started to remove the services from the host which is present in a dead mode,using these commands. curl
-u admin:admin -H “X-Requested-By: Ambari” -X DELETE http://localhost:8080/api/v1/clusters/hostname/services/servicename but it didnt worked, I got a error msg as
message" : "CSRF protection is turned on. X-Requested-By HTTP header is required." then i have edited the ambari-server.properties file and added these lines into that vi /etc/ambari-server/conf/ambari.properties
api.csrfPrevention.enabled=false
ambari-server restart then again i have retried it, at this time it did worked. But when i tried to remove hive, it didnt,because mysql is running in my machine. when i tried this command it did worked. curl -u admin:admin -X DELETE -H 'X-Requested-By:admin' http://localhost:8080/api/v1/clusters/mycluster/hosts/host/host_components/MYSQL_SERVER then, when i tried to add the services starting with zookeeper,again, it given me error like "resource_management.core.exceptions.Fail: Applying
Directory['/usr/hdp/current/zookeeper-client/conf'] failed, looped
symbolic links found while resolving
/usr/hdp/current/zookeeper-client/con Then i have checked the directories, i got to know that these links were pointing back to the same directories. So, i have tried these commands to solve this issue. rm /usr/hdp/current/zookeeper-client/conf
ln -s /etc/zookeeper/2.3.2.0-2950/0 /usr/hdp/current/zookeeper-client/conf And it did worked. at last i have successfully reinstalled the ambari as well as hadoop in my machine. Thank you.
... View more
12-01-2016
08:28 AM
thanks for the reply jss. i have tried all what you have suggested already. but still getting the same issue. when i start the datanode through ambari ui follwoing error is occured, File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. /etc/profile: line 45: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
-bash: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
starting datanode, logging to /data/log/hadoop/hdfs/hadoop-hdfs-datanode-.out
/usr/hdp/2.3.4.7-4//hadoop-hdfs/bin/hdfs.distro: line 30: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh: line 187: /dev/null: Permission denied
... View more
12-03-2016
02:04 PM
tried what you have suggested Ankit Singhal, but still getting same issue. a = load 'hbase://tablename' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cd','-loadKey -caster HBaseBinaryConverter') as (rowKey:chararray,cd:map[]); output: (���,[sicd#Commercial Nonphysical Research,parent_cid#ؗ,parent_cn#Algorithmics (UK) Ltd.,emp#,industry#Corporate Services,sic#8732,equifaxId#,subIndustry#Market Research Services,revenue#,cct#London,street#101 Finsbury Pavement,ic#12,state#England,fax#44 20 7862 4008,ultimate_pcoId#
�,parent_ccnt#United Kingdom,zip#ec2a 1rs,cnt#United Kingdom,ultimate_pconame#International Business Machines Corp.,subsidiary#�,ultimate_pcocn#United States,cs#Subsidiary,ct#Private,rc#USD,phone#+44 20 7862 4000,naicsDescription#Marketing Research and Public Opinion Polling,name#Algorithmics Risk Management Limited,naics#541910,fd#2002]) please suggest me.
... View more
11-26-2016
09:35 PM
@cmcbugg, pls see details of the issue in link -> https://community.hortonworks.com/questions/68497/kafka-error-while-fetching-metadata-topicmetadata.html Essentially, this is a fresh HDP 2.4 instance, and i've just enabled Ranger-Kafka plugin. Zookeper nodes permission : [kafka1@sandbox ~]$ ls -lrt /hadoop/zookeeper/ total 8 -rw-r--r-- 1
root root 1 2016-03-14 14:17 myid drwxr-xr-x 2 zookeeper hadoop 4096
2016-11-26 19:44 version-2 kafka1 User permission (on hdfs) : [kafka1@sandbox ~]$ hadoop fs -ls /user/ Found 11 items drwxrwx--- -
ambari-qa hdfs 0 2016-03-14 14:18 /user/ambari-qa drwxr-xr-x - hcat hdfs
0 2016-03-14 14:23 /user/hcat drwxr-xr-x - hive hdfs 0 2016-03-14 14:23
/user/hive drwxr-xr-x - kafka1 hdfs 0 2016-11-26 20:31 /user/kafka1
drwxr-xr-x - kafka2 hdfs 0 2016-11-26 20:32 /user/kafka2 Any ideas on what needs to be changed, to enable this ?
... View more
09-20-2016
07:43 AM
2 Kudos
Its working now. I had to change my ambari.properties file... added db.mysql.jdbc.name=/var/lib/ambari-server/resources/mysql-connector-java-5.1.28.jar and modified these lines server.jdbc.rca.url=jdbc:mysql://localhost:3306/ambari server.jdbc.url=jdbc:mysql://localhost:3306/ambari
... View more
09-19-2016
05:38 PM
@Mohan V The error message you are seeing is coming from Pig correct? What do the Elasticsearch logs indicate is happening on that side? Looking at your initial tweet example, I wonder if the problem may be related to a Left-To-Right, Right-To-Left language issue causing a problem. I can't say that I've seen it with your particular example before, but it can be known to cause issues.
... View more
09-12-2016
01:38 PM
@gkeys please try to look in to this.and suggest me where am i missing. https://community.hortonworks.com/questions/56017/pig-to-elasesticsearch-stringindexoutofboundsexcep.html
... View more
09-11-2016
09:30 PM
@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3E
... View more
09-12-2016
04:12 AM
thank you gkeys... You are....the best...
... View more
09-11-2016
12:48 PM
@Mohan V Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂
... View more
09-09-2016
12:19 PM
1 Kudo
I think i got it on my own. Actually I have forgotten the credentials and entered the wrong password. But at last its done by entering right credentials.
... View more
09-09-2016
05:54 PM
Bottom line is if the xlsx is a single tab (sheet) in the spreadsheet, you can use the piggybank function CSVExcelStorage to load the spreadsheet as below REGISTER <pathTo>/piggybank.jar
rawdata = load 'myData.xlsx' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (col1,col2,..); If your xlsx is multiple tabs (sheets), you can separate each sheet into separate xlsx files and use piggybank as above for each resulting file. To separate sheets manually from inside Excel see https://www.extendoffice.com/documents/excel/628-excel-split-workbook.html To separate programatically using VB scripting see http://bredow.me/home/visual-basic-for-excel/vb-processes-and-snippets/split-worksheets-into-separate-files/ See also: https://community.hortonworks.com/questions/31968/hi-is-there-a-way-to-load-xlsx-file-into-hive-tabl.html
... View more
09-08-2016
12:51 PM
2 Kudos
@Mohan V I would:
Land the data in a landing zone in hdfs. Decide to keep this going forward or not (you may want to reuse the raw data). Then use pig scripts to transform the data into your hbase tables as tab-delimited output (see next step). Importantly, this involves inserting a key as the first column of your resulting tsv file. HBase of course is all about well-designed keys. You will use pig's CONCAT() function to create a key from existing fields. It is often useful to concatenate fields into a key with a "-" separating each field in the resulting composite key. A single tsv output will be used to bulk load a single hbase table (next step). These should be outputted to a tmp dir in hdfs to be used as input in the next step. Note: you could take your pig scripting to the next level and create a single flexible pig script for creating tsv output for all hbase tables. See https://community.hortonworks.com/content/kbentry/51884/pig-doing-yoga-how-to-build-superflexible-pig-scri.html . Not necessary though. 3. Then do a bulk import into your hbase table for each tsv. See the following links on bulk imports. (Inserting record by record will be much too slow for large tables. http://hbase.apache.org/0.94/book/arch.bulk.load.html http://hbase.apache.org/book.html#importtsv I have used this workflow frequently, including loading 2.53 billion relational records into a HBase table. The more you do it, the more automated you find yourself making it.
... View more
09-08-2016
02:37 PM
@Artem Ervits thanks for your valuable explanation. By using that i have tried it in another way. I.e without storing the output to a text file and again loading back by using pigstorage, before itself i have tried to filter based on word and tried to store it in hbase. Above I have mentioned only the scenario what i need.but here is the actual script and data that i have used. Output & Script: A = foreach (group epoch BY epochtime) { data = foreach epoch generate created_at,id,user_id,text; generate group as pattern, data; }
By using this I got the below output
(word1_1473344765_265217609700,{(Wed Apr 20 07:23:20 +0000 2016,252479809098223616,450990391,rt @joey7barton: ..give a word1 about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ...),(Wed Apr 22 07:23:20 +0000 2016,252455630361747457,118179886,@dawnriseth word1 and then we will have to prove it again by reelecting obama in 2016, 2020... this race-baiting never ends.)})
(word2_1473344765_265217609700,{(Wed Apr 21 07:23:20 +0000 2016,252370526411051008,845912316,@maarionymcmb word2 mere ta dit tu va resté chez toi dnc tu restes !),(Wed Apr 23 07:23:20 +0000 2016,252213169567711232,14596856,rt @chernynkaya: "have you noticed lately that word2 is getting credit for the president being in the lead except pres. obama?" ...)})
Now without dump or storing it into a file, I tried this.
B = FILTER A BY pattern = 'word1_1473325383_265214120940';
describe B;
B: {pattern: chararray,data: {(json::created_at: chararray,json::id: chararray,json::user_id: chararray,json::text: chararray)}}
STORE B into 'hbase://word1_1473325383_265214120940' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:data');
Output given as success but there is no data stored into table.When I checked the logs below is the warning. 2016-09-08 19:45:46,223 [Readahead Thread #2] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile
EBADF: Bad file descriptor
Please don't hesitate to suggest me what I am missing here. thank you.
... View more
09-12-2016
01:20 PM
1 Kudo
thanks for your reply Artem Ervits. I think it is because of the difference versions that i have used in my script. When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys. script:- REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER json-simple-1.1.1.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
extracted = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;
dump extracted;
And it worked fine.
... View more
09-07-2016
07:18 AM
I think i got it on my own. here is the script that i have written. res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
res1= foreach (group epoch by epochtime){data}
dump res1;
... View more
09-06-2016
01:07 PM
I think i found the answer on my own. B = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime())); I just removed A. from the inner CONCAT. And it worked fine.
... View more
09-02-2016
06:19 PM
1 Kudo
As far as using pig to insert the data to a hbase table, these links should be helpful: https://community.hortonworks.com/questions/31164/hbase-insert-from-pig.html http://princetonits.com/blog/technology/loading-customer-data-into-hbase-using-a-pig-script/
... View more
09-02-2016
07:16 AM
Yes It worked Thank you very much.
... View more
09-15-2016
06:39 AM
2 Kudos
I got it on my own I think it is because of the difference versions that i have used in my script. When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys. script:- REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER json-simple-1.1.1.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
extracted =foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;
dump extracted; And it worked fine.
... View more