Details
Description
This is to be an umbrella issue in the path towards having multicloud EMR with whirr.
Some of the things that must happen towards multicloud EMR (as discussed in IRC):
- Hadoop deployment must be "rock solid"
- Submitting and monitoring an hadoop mapreduce job through whirr
- distcp from blobstore to hadoop/hbase cluster
- cli component for job submission and monitoring.
Some of the things that would be nice to have additionally:
- pig service
- hive service
- sqoop service
- regular+spot instances in EMR
- multistage provisioning (different cluster sizes for different phases)