Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5432 | 08-12-2016 01:02 PM | |
2205 | 08-08-2016 10:00 AM | |
2613 | 08-03-2016 04:44 PM | |
5523 | 08-03-2016 02:53 PM | |
1430 | 08-01-2016 02:38 PM |
02-17-2016
10:32 AM
1 Kudo
No if you would implement a JDBCSpout there would be nothing in HDFS at all. Storm by itself has nothing to do with HDFS. It is however often used together with HDFS for storing realtime results. Using the HDFSBolt. I have also seen implementations reading from HDFS as well but its not a requirement. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_storm-user-guide/content/writing-data-with-storm-hdfs-connector.html By default Storm has no dependencies on HDFS. It is not that common to use HDFS as a source anyway, since it normally works on realtime data. ( Kafka, MQ, http calls, TCP input, reading from a spooling directory whatever ). So if you would implement a JDBCSpout using the DB2 JDBC library it would not store anything in HDFS unless you use an HDFSBolt
... View more
02-17-2016
09:48 AM
1 Kudo
Internal server error? Can you look into the yarn logs and see what he complains about? You said it works fine without kerberos but perhaps you don't have the rights to kick off an application or something. /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-sandbox.hortonworks.com.log
... View more
02-17-2016
09:44 AM
1 Kudo
Perhaps just a simple thing, can you try it with the -u : instead of -u: Below is my prefix I use for a webhdfs command. curl --negotiate -u : -i -s -X PUT
... View more
02-17-2016
09:09 AM
4 Kudos
I do not see a native way to stream data from a database in Storm. There is a JDBC connector but it is for Insert of results and lookups. Its not impossible however. Other streaming products I worked with in the past could stream from a database. ( Essentially by specifying WHERE conditions or requerying the same table every x seconds) So you could definitely implement a storm bolt like that. Depending on data volumes you might have to partition the load similar to Sqoop does it ( having multiple spouts that read with a where condition by some id ) or if the volumes are not too large just have a single spout. If it is a simple single connection example I am sure you could implement it in very short time. http://storm.apache.org/documentation/storm-jdbc.html If I was to implement an JDBCSpout I would use the Twitter Example from here and replace the Twitter code with a JDBC connection being opened against DB2. If you read some kind of staging table that gets refreshed every x seconds you would read it completely and then check if a specific amount of time has passed ( the nextTuple method is called continuously ) . If you only want to read new tuples you would have to add a WHERE condition based on some timestamp in the DB2 table. It has also some pointers how to make parallel Spouts in case a single connection is not fast enough. https://github.com/storm-book/examples-ch04-spouts https://www.safaribooksonline.com/library/view/getting-started-with/9781449324025/ch04.html
... View more
02-16-2016
11:00 AM
1 Kudo
There are ways to make a custom stack. Its all open source. Here are some of @Ali Bajwa excellent demo stacks. https://github.com/abajwa-hw/ambari-workshops This article may also help: http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/
... View more
02-16-2016
10:00 AM
3 Kudos
Not completely sure what you mean but it is possible to create separate config groups for these two hosts. ( Manage Config Groups at the top of the config page ). This means you could create a config for host1 and a different config for host2. I don't think its possible to start/stop them separately through ambari though. Unless you go directly to the host and start/stop them there.
... View more
02-16-2016
09:36 AM
3 Kudos
Essentially Ambari uses something called stacks. It is a module that governs one of the services in the cluster with functions to install/start/stop dependencies etc. Unlike parcels Most of the stacks are based on yum/zypper packages under the cover for install which is nice and linux standard. https://cwiki.apache.org/confluence/display/AMBARI/Stacks+and+Services Not too familiar with parcels but I would think that is more or less the equivalent? Just to consolidare the comment below, here are some example stacks by @Ali Bajwa https://github.com/abajwa-hw/ambari-workshops And another helpful article: http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/
... View more
02-15-2016
10:40 AM
1 Kudo
You have a retention period for your feed. Which means Falcon comes up and tries to delete all folders that belong to feed instances that are older than 5 minutes. So he expects something like: path='/inputfolder/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE}' So folders in HDFS that look like /inputfolder/2016-01-01-09-15 etc. pp. And then he can delete all folders that are older than that. It is all very nicely explained in the oozie documentation ( Falcon is build on oozie ) http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset
... View more
02-15-2016
10:24 AM
2 Kudos
Can you add your feed xml? There should be a location something like this: <locations>
<location type="data" path="/mydata/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
</locations> Normally feeds are parametrized to correspond to the frequency of the feed. So my question is, does the folder exist in HDFS? Do you have any retention policy? Eviction sounds like that and would not work in a folder without a time component. Easiest way would be to upload your feed xml.
... View more
02-13-2016
11:39 AM
1 Kudo
If you have root. Try the command as yarn Chmod -R 755 /data/slot7... But if you get disk hardware error now I would assume something is wrong with one of the drives of the node. Can you copy files in that folder? Do an health check? Etc.
... View more