Member since
03-08-2018
11
Posts
0
Kudos Received
0
Solutions
04-17-2018
11:55 AM
is there a way for spark to get the content of a sequence file in hdfs one by one. my problem is that i have a collection of large sequnce file(>15 gb). this sequence file is created by merging small files. i want to iterate and load and process these small files one by one to reduce memory consumption of loading the 15 gb in memory. example JavaPairRDD<String, Byte> file = jsc.sequenceFile("url", String.class, Byte.class);
//my wanted operation pseudocode
file.foreach((string,byte)->{
process(string,byte);
commit&continue();
});
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
04-11-2018
11:54 AM
running the server on windows and using the same code to produce from linux machine worked normally. could it be a security or ports issue@Geoffrey Shelton Okot
... View more
04-10-2018
10:06 AM
@Geoffrey Shelton Okot also note that the cli from windows(using the .bat files) does not work
... View more
04-10-2018
09:33 AM
@Geoffrey Shelton Okot the usual code found on the net System.out.println("start produce");
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.137.130:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
for (int i = 0; i < 800;i++)
{
System.out.println(i);
String sentence = "{i=" + i+ " \"_id\" : { \"$oid\" : \"5a8eefc21013462f98eb78e3\" }, \"metadata\" : { \"airportid\" : 0.0 }, \"filename\" : \"1.jpg\", \"aliases\" : null, \"chunkSize\" : { \"$numberLong\" : \"261120\" }, \"uploadDate\" : { \"$date\" : 1519316930479 }, \"length\" : { \"$numberLong\" : \"1504413\" }, \"contentType\" : null, \"md5\" : \"28573cc502292a21ec0dfb6bea15d0fb\" }";
producer.send(new ProducerRecord<String, String>("test1", sentence, sentence));
}
producer.close();
}
... View more
04-09-2018
03:42 PM
i am saying that the api (consuming from java code)is not working, surly i used the .bat file to test cli and it didn't work. @Jordan Moore, @Geoffrey Shelton Okot
... View more
04-09-2018
08:33 AM
sorry actually it is not working from windows host. i tried a linux(centos) host it worked normally. note that the kafka server is running on linux(centos) host @Geoffrey Shelton Okot
... View more
04-07-2018
09:46 PM
kafka produce and consume api only working from linux host. trying to connect through api from another window machine i cannot produce any data. note that i tried to configure the advertised listener as indicated in many blogs, and the telnet from the remote windows machine is working. but still i cannot produce not consume the messages. note that when i run the kafka server on windows and the produce/consumer api on linux it work normally.
... View more
Labels:
- Labels:
-
Apache Kafka