Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3663 | 05-03-2017 05:13 PM | |
| 3018 | 05-02-2017 08:38 AM | |
| 3280 | 05-02-2017 08:13 AM | |
| 3223 | 04-10-2017 10:51 PM | |
| 1690 | 03-28-2017 02:27 AM |
02-05-2016
03:04 PM
@John Smith you got me there, as you see my attempt with your file worked. Alternatively take a look at CSVExcelStorage as that has more capability as opposed to PigStorage. link I am not saying this is the case, I don't know what's wrong but here's a note, not sure how valid it is anymore as this note has been around for a while and they don't mention which version of Pig they were using Limitations PigStorage is an extremely simple loader that does not handle special cases such as embedded delimiters or escaped control characters; it will split on every instance of the delimiter regardless of context. For this reason, when loading a CSV file it is recommended to use CSVExcelStorage rather than PigStorage with a comma delimiter.
... View more
02-05-2016
01:40 PM
@Pradeep kumar great, glad to help.
... View more
02-05-2016
01:37 PM
@John Smith then look at how to infer schema in Java API. You don't need avro-tools in that case.
... View more
02-05-2016
01:36 PM
@John Smith those are all valid questions :), I haven't tried as there was never a need. Try it out, post an article! As far as accessing from Pig, not sure that's possible? Again, try it out. You might be able to look at source code and write a UDF that does what avro-tools tries to do, I don't know. By the way, avro-tools coincides with the version of avro, so I'd suggest downloading the latest avro-tools available, which at this moment is 1.8.0.
... View more
02-05-2016
01:31 PM
4 Kudos
@Daniel Hendrix I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below Flume
# HDP 2.3.2 Sandbox
# Example, single-node Flume configuration using netcat source, memory channel and logger sink
# install telnet
yum install -y telnet
# start flume with this configuration
******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************
# in another terminal
telnet localhost 44444
# type anything
# then in the original terminal
tail -f /var/log/flume/flume-a1.log
# Exampe netcat source, hdfs sink as DataStream
# create hdfs flume directory
sudo -u hdfs hdfs dfs -mkdir /flume
sudo -u hdfs hdfs dfs -mkdir /flume/events
sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events
******************************************************************************
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
******************************************************************************
# show the output in hdfs
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -ls /flume/events/
sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*
... View more
02-05-2016
12:16 PM
1 Kudo
@ARUNKUMAR RAMASAMY
take a look at our disaster-recovery tag you might find a lot of useful content Link It depends what services you arerunning, there is info on a lot of topics like Hive DR and using Falcon. Your question is too general
... View more
02-05-2016
12:04 PM
@John Smithcan you clarify, are you trying to do this programmatically using Java or in a pig script? You can look up schema using avro tools and pass getschema flag Link. I once kept schema in hdfs as XML but it can be any format even json ouut of avro tools and then process new records. Maybe what you suggest is better, to get schema. You can probably try reading it and passing hdfs scheme rather than file:///
... View more
02-05-2016
11:44 AM
@keerthana gajarajakumar you need to have a tool like putty or cygwin and then use SCP tool to upload jar. scp -P 2222 -r file root@127.0.0.1: Your file will then be in /root directory
... View more
02-05-2016
11:40 AM
@keerthana gajarajakumar please post your solution as answer so we could close this
... View more