DO NOT USE THIS INSTANCE FOR LIVE DATA!!!!
500 out of 100M rows are duplicate. See details at http://search-hadoop.com/m/9UY0h26jwA21rW0i1/v=threaded