Member since
06-05-2018
3
Posts
0
Kudos Received
0
Solutions
06-20-2018
02:50 PM
Hi Gerd, Thanks much for your valuable inputs. I can store the files directly in HDFS as you said.Say if I am using Apache Spark for processing the files (already we have our application in java for processing the files) , can we Integrate our existing java application with Spark believe that is very hard and needs a huge code change. Any high level suggestions on how to integrate this. Thanks Sathiyanarayana kumar. N
... View more
06-06-2018
03:43 AM
Hi Gerd, Thanks much for your valuable reply. My requirement is to process Huge files which are transported from an upstream system. As of today we have Spring Batch to split the files into smaller in sizes and do a batch process and store it in Oracle DB , just each smaller files in a row as a compressed blob (For persisting). We poll the DB and pull the files and process each file with our application , the processed/transformed files are fed into Active MQ for further processing , we face lot of Queue related issues like persisting the objects in queue and restarting the queue on system failures and reprocessing the whole stuff again irrespective of whether it is processed or not. Here i thought of bringing in Kafka which you mentioned it does not suit my need.As on today we have pipeline of queues for three levels (Process 1 -> Queue -> Process2 -> Queue -> Process3 -> Final Product). As you said viable solution is to use HDFS/Spark/Hive , what about different transformation stages , how to handle it? Thanks again for your valuable suggestion. Sathiyanarayana Kumar.N
... View more
06-05-2018
06:40 AM
I have files with several GB's and i convert those files into smaller chunks.After converting it into smaller chunks still the smallest chunk is of size 5 to 10 MB. Think it is not a good idea to post objects of 5 to 10 MB size to Kafka server.How can split the 5 MB size file further into chunks of few Kilo Bytes and push it to the Kafka server and later recreate the file in the consumer end and process the file?, Is there any better way to process big files?
... View more
Labels:
- Labels:
-
Apache Kafka