Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
Hi, I'm looking to extract a large amount of data from a few Oracle tables and transfering it to a HDFS file system. There appears to be two possible ways of achieving this: Use Sqoop to extract the data and copy it across the network directly to HDFS Extract the data to a local file system. Then this has been completed copy (ftp?) the data to the Hadoop system. Clearly the second option is more work, however that isn't the issue. It's been suggested that because Sqoop is copying data across the network that the locks on the Oracle table might remain for longer would be otherwise required. I'll be extracting large amounts of data and copying it to a remote location (so there will be significant network latency). Does anybody know if this is correct?
... View more
I know that Cloudera is keen to replace classic Map-Reduce with Spark and that this will require that Pig supports Spark as a valid execution mode. I've read other pages in the Cloudera website that this is going to be supported, but I find any dates or version number telling me when this is going to happen. Any ideas?
... View more