About aervits

aervits · ‎02-05-2016

look in /etc/hadoop/conf directory @John Smith

aervits · ‎02-05-2016

@John Smith you got me there, as you see my attempt with your file worked. Alternatively take a look at CSVExcelStorage as that has more capability as opposed to PigStorage. link I am not saying this is the case, I don't know what's wrong but here's a note, not sure how valid it is anymore as this note has been around for a while and they don't mention which version of Pig they were using Limitations PigStorage is an extremely simple loader that does not handle special cases such as embedded delimiters or escaped control characters; it will split on every instance of the delimiter regardless of context. For this reason, when loading a CSV file it is recommended to use CSVExcelStorage rather than PigStorage with a comma delimiter.

aervits · ‎02-05-2016

@Pradeep kumar great, glad to help.

aervits · ‎02-05-2016

@John Smith then look at how to infer schema in Java API. You don't need avro-tools in that case.

aervits · ‎02-05-2016

@John Smith those are all valid questions :), I haven't tried as there was never a need. Try it out, post an article! As far as accessing from Pig, not sure that's possible? Again, try it out. You might be able to look at source code and write a UDF that does what avro-tools tries to do, I don't know. By the way, avro-tools coincides with the version of avro, so I'd suggest downloading the latest avro-tools available, which at this moment is 1.8.0.

aervits · ‎02-05-2016

@Daniel Hendrix I wrote a small tutorial on flume as a how-to rather than use-case based as the ones you specified. I think by far the best resource would be the Flume website. It has examples for every possible sink, source and channel. Please see below Flume # HDP 2.3.2 Sandbox # Example, single-node Flume configuration using netcat source, memory channel and logger sink # install telnet yum install -y telnet # start flume with this configuration ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # in another terminal telnet localhost 44444 # type anything # then in the original terminal tail -f /var/log/flume/flume-a1.log # Exampe netcat source, hdfs sink as DataStream # create hdfs flume directory sudo -u hdfs hdfs dfs -mkdir /flume sudo -u hdfs hdfs dfs -mkdir /flume/events sudo -u hdfs hdfs dfs -chown -R flume:hdfs /flume/events ****************************************************************************** # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ****************************************************************************** # show the output in hdfs sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -ls /flume/events/ sudo -u flume hdfs dfs -cat /flume/events/*/*/*/*

aervits · ‎02-05-2016

@ARUNKUMAR RAMASAMY take a look at our disaster-recovery tag you might find a lot of useful content Link It depends what services you arerunning, there is info on a lot of topics like Hive DR and using Falcon. Your question is too general

aervits · ‎02-05-2016

@John Smithcan you clarify, are you trying to do this programmatically using Java or in a pig script? You can look up schema using avro tools and pass getschema flag Link. I once kept schema in hdfs as XML but it can be any format even json ouut of avro tools and then process new records. Maybe what you suggest is better, to get schema. You can probably try reading it and passing hdfs scheme rather than file:///

aervits · ‎02-05-2016

@keerthana gajarajakumar you need to have a tool like putty or cygwin and then use SCP tool to upload jar. scp -P 2222 -r file root@127.0.0.1: Your file will then be in /root directory

aervits · ‎02-05-2016

@keerthana gajarajakumar please post your solution as answer so we could close this

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: read a AVRO file stored in HDFS

Re: PigStorage in mapreduce mode

Re: Token expiry issue due to System time on machi...

Re: read a AVRO file stored in HDFS

Re: read a AVRO file stored in HDFS

Re: Flume Tutorials

Re: Suggestions for DR hadoop cluster

Re: read a AVRO file stored in HDFS

Re: How can I create a simple topology? I want to ...

Re: Is there an option to update a column in a tab...