About bleonhardi

bleonhardi · ‎02-17-2016

No if you would implement a JDBCSpout there would be nothing in HDFS at all. Storm by itself has nothing to do with HDFS. It is however often used together with HDFS for storing realtime results. Using the HDFSBolt. I have also seen implementations reading from HDFS as well but its not a requirement. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_storm-user-guide/content/writing-data-with-storm-hdfs-connector.html By default Storm has no dependencies on HDFS. It is not that common to use HDFS as a source anyway, since it normally works on realtime data. ( Kafka, MQ, http calls, TCP input, reading from a spooling directory whatever ). So if you would implement a JDBCSpout using the DB2 JDBC library it would not store anything in HDFS unless you use an HDFSBolt

bleonhardi · ‎02-17-2016

Internal server error? Can you look into the yarn logs and see what he complains about? You said it works fine without kerberos but perhaps you don't have the rights to kick off an application or something. /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-sandbox.hortonworks.com.log

bleonhardi · ‎02-17-2016

Perhaps just a simple thing, can you try it with the -u : instead of -u: Below is my prefix I use for a webhdfs command. curl --negotiate -u : -i -s -X PUT

bleonhardi · ‎02-17-2016

I do not see a native way to stream data from a database in Storm. There is a JDBC connector but it is for Insert of results and lookups. Its not impossible however. Other streaming products I worked with in the past could stream from a database. ( Essentially by specifying WHERE conditions or requerying the same table every x seconds) So you could definitely implement a storm bolt like that. Depending on data volumes you might have to partition the load similar to Sqoop does it ( having multiple spouts that read with a where condition by some id ) or if the volumes are not too large just have a single spout. If it is a simple single connection example I am sure you could implement it in very short time. http://storm.apache.org/documentation/storm-jdbc.html If I was to implement an JDBCSpout I would use the Twitter Example from here and replace the Twitter code with a JDBC connection being opened against DB2. If you read some kind of staging table that gets refreshed every x seconds you would read it completely and then check if a specific amount of time has passed ( the nextTuple method is called continuously ) . If you only want to read new tuples you would have to add a WHERE condition based on some timestamp in the DB2 table. It has also some pointers how to make parallel Spouts in case a single connection is not fast enough. https://github.com/storm-book/examples-ch04-spouts https://www.safaribooksonline.com/library/view/getting-started-with/9781449324025/ch04.html

bleonhardi · ‎02-16-2016

There are ways to make a custom stack. Its all open source. Here are some of @Ali Bajwa excellent demo stacks. https://github.com/abajwa-hw/ambari-workshops This article may also help: http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/

bleonhardi · ‎02-16-2016

Not completely sure what you mean but it is possible to create separate config groups for these two hosts. ( Manage Config Groups at the top of the config page ). This means you could create a config for host1 and a different config for host2. I don't think its possible to start/stop them separately through ambari though. Unless you go directly to the host and start/stop them there.

bleonhardi · ‎02-16-2016

Essentially Ambari uses something called stacks. It is a module that governs one of the services in the cluster with functions to install/start/stop dependencies etc. Unlike parcels Most of the stacks are based on yum/zypper packages under the cover for install which is nice and linux standard. https://cwiki.apache.org/confluence/display/AMBARI/Stacks+and+Services Not too familiar with parcels but I would think that is more or less the equivalent? Just to consolidare the comment below, here are some example stacks by @Ali Bajwa https://github.com/abajwa-hw/ambari-workshops And another helpful article: http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/

bleonhardi · ‎02-15-2016

You have a retention period for your feed. Which means Falcon comes up and tries to delete all folders that belong to feed instances that are older than 5 minutes. So he expects something like: path='/inputfolder/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE}' So folders in HDFS that look like /inputfolder/2016-01-01-09-15 etc. pp. And then he can delete all folders that are older than that. It is all very nicely explained in the oozie documentation ( Falcon is build on oozie ) http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset

bleonhardi · ‎02-15-2016

Can you add your feed xml? There should be a location something like this: <locations> <location type="data" path="/mydata/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/> </locations> Normally feeds are parametrized to correspond to the frequency of the feed. So my question is, does the folder exist in HDFS? Do you have any retention policy? Eviction sounds like that and would not work in a folder without a time component. Easiest way would be to upload your feed xml.

bleonhardi · ‎02-13-2016

If you have root. Try the command as yarn Chmod -R 755 /data/slot7... But if you get disk hardware error now I would assume something is wrong with one of the drives of the node. Can you copy files in that folder? Do an health check? Etc.

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: Is there any way to keep the data in DB2 and u...

Re: YARN REST API for kerberos enabled cluster

Re: YARN REST API for kerberos enabled cluster

Re: Is there any way to keep the data in DB2 and u...

Re: What is the alternatives of parcel

Re: monitoring flume in a clustered environment us...

Re: What is the alternatives of parcel

Re: Feed : Unable to resolve pattern for feedPath

Re: Feed : Unable to resolve pattern for feedPath

Re: How to do if meet with "Input/output error" wh...