Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to use flume http-source to POST data from a weather website into hdfs?

Highlighted

How to use flume http-source to POST data from a weather website into hdfs?

Explorer

I am trying to get data from a weather website that allows acess via api. I am able to do a get request. and the response.text is some xml data. But i am still struggling how to write data from the api to hdfs sink via flume. I am using the ambari UI to configure my agent. Probably i will run a python code to do get requests to the weather api. Do i need to make post requests to the api and flume agent will automatically listen to post request ?

4 REPLIES 4

Re: How to use flume http-source to POST data from a weather website into hdfs?

Master Collaborator

There are variety of ways to do this. The easiest would be to use ExecSource. So you would write the Python code to fetch the data and print it as lines per event on the stdout. Here is a sample agent conf:

agent.sources = pstream
agent.channels = memoryChannel
agent.channels.memoryChannel.type = memory
agent.sources.pstream.channels = memoryChannel
agent.sources.pstream.type = exec
agent.sources.pstream.command = python /tmp/print_weatherInfo.py
agent.sinks = hdfsSink
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.hdfsSink.hdfs.path = hdfs://hdp/weatherdata
agent.sinks.hdfsSink.hdfs.fileType = DataStream
agent.sinks.hdfsSink.hdfs.writeFormat = Text

You can also write a custom source but that needs a little bit of Java coding.

Re: How to use flume http-source to POST data from a weather website into hdfs?

Explorer

@Deepesh Well i got some idea that i can just make a post request to the http-source which should accept the data in it and take it to the sink. I am aiming to use flume Http-source. I posted at another place where i am facing the issue. From python when i do the below post i get status of 400 which is not understandable. another post This link is quite related to this issue.

  • >>> r=requests.post('http://hdp.localdomain:8989',data={'k':'v'})
  • >>> r.status_code
  • 400
  • Re: How to use flume http-source to POST data from a weather website into hdfs?

    New Contributor

    here is the full solution:

    #weather.conf starts here 
    WeatherAgent.sources=pstream 
    WeatherAgent.channels=memoryChannel
    WeatherAgent.sinks=HDFS 
    
    
    WeatherAgent.sources.pstream.type=exec
    WeatherAgent.sources.pstream.command = python /home/DA186007/print_weatherInfo.py
    
    
    WeatherAgent.sinks.HDFS.type=hdfs
    WeatherAgent.sinks.HDFS.hdfs.path=hdfs://hdp133m1.labs.teradata.com:8020/user/DA186007/Weather_feed
    WeatherAgent.sinks.HDFS.hdfs.fileType=DataStream
    WeatherAgent.sinks.HDFS.hdfs.writeFormat=Text
    
    
    WeatherAgent.channels.memoryChannel.type=memory
    WeatherAgent.channels.memoryChannel.capacity=1000
    WeatherAgent.channels.memoryChannel.transactionCapacity=100
    
    
    WeatherAgent.sinks.HDFS.channel=memoryChannel
    WeatherAgent.sources.pstream.channels=memoryChannel
    #weather.conf end here 
    
    
    #print_weatherInfo.py starts here 
    import urllib2
    response = urllib2.urlopen('https://samples.openweathermap.org/data/2.5/weather?q=London,uk&appid=b6907d289e10d714a6e88b30761fae22')
    html = response.read()
    print(html) 
    #print_weatherInfo.py ends here 
    
    
    #flume command to run the job 
    flume-ng agent -n WeatherAgent -c conf -f weather.conf
    
    
    
    ,
    #weather.conf starts here 
    WeatherAgent.sources=pstream 
    WeatherAgent.channels=memoryChannel
    WeatherAgent.sinks=HDFS 
    
    
    WeatherAgent.sources.pstream.type=exec
    WeatherAgent.sources.pstream.command = python /home/DA186007/print_weatherInfo.py
    
    
    WeatherAgent.sinks.HDFS.type=hdfs
    WeatherAgent.sinks.HDFS.hdfs.path=hdfs://hdp133m1.labs.teradata.com:8020/user/DA186007/Weather_feed
    WeatherAgent.sinks.HDFS.hdfs.fileType=DataStream
    WeatherAgent.sinks.HDFS.hdfs.writeFormat=Text
    
    
    WeatherAgent.channels.memoryChannel.type=memory
    WeatherAgent.channels.memoryChannel.capacity=1000
    WeatherAgent.channels.memoryChannel.transactionCapacity=100
    
    
    WeatherAgent.sinks.HDFS.channel=memoryChannel
    WeatherAgent.sources.pstream.channels=memoryChannel
    #weather.conf end here 
    
    
    #print_weatherInfo.py starts here 
    import urllib2
    response = urllib2.urlopen('https://samples.openweathermap.org/data/2.5/weather?q=London,uk&appid=b6907d289e10d714a6e88b30761fae22')
    html = response.read()
    print(html) 
    #print_weatherInfo.py ends here 
    
    
    #flume command to run the job 
    flume-ng agent -n WeatherAgent -c conf -f weather.conf
    
    
    

    Re: How to use flume http-source to POST data from a weather website into hdfs?

    Super Guru

    Don't use Flume anymore it is remove from HDP 3

    Also you can't use samples.open... the correct URL is api.open...

    You also need to sign up for a real API key

    Use NiFi.

    Don't have an account?
    Coming from Hortonworks? Activate your account here