I have used the combination in cases where the data model was changing over time and where it was complex.Its pretty easy to create an avro schema and the java bindings..There are cases where avro is a best fit over parquet.In case you are not sure it may be worthwhile to start with avro,do performace analysize and you can always change to parquet very easily. Nishan
... View more
I just installed CDH5.4 Sandbox and trying to access to HDFS from Java getting this error: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-545953227-127.0.0.1-1429800393650:blk_1073742225_1401 file=/tmp/b.txt at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:888) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:56 😎 I have another VM (CDH 5.3) and it's working just incluing in the classpath the core-site.xml and hdfs-site.xml with the same code, so it seems that something is wrong in that VM (5.4) I can read "b.txt" with hadoop fs -cat /tmp/b.txt so the file is right. I have been checking the state of HDFS with hadoop fsck and dfsadmin and there're not missed blocks. I included as well the hostname/ip in the hosts file in Windows. What's it wrong?? any clue?
... View more