hadoop - How does Pig store results to multiple locations with a single map-only job? -
i'm beginner pig , hadoop. i'm trying understand what's going on behind scenes in simple pig script. i'm reading in data, splitting 3 new relations, , storing each in different directory. script runs on psuedo distributed hadoop installation 1 map-only job.
i have been trying figure out how implement in plain java map/reduce in single map-only job. trivial achieve filtering/splitting, don't know how i'd map job send different key/value pairs different outputs. come think of it, don't know how i'd able send output multiple places in 1 full map/reduce job.
rawtweets = load 'geotaggedtweets' using pigstorage(',') (...); split rawtweets ustweets if country == 'us', gbtweets if country == 'gb', idtweets if country == 'id'; store ustweets 'testustweets' using pigstorage(','); store gbtweets 'testgbtweets' using pigstorage(','); store idtweets 'testidtweets' using pigstorage(',');
edit: ugghh... i've done again. seem not able come answers questions until i've gone through whole process of writing , submitting question. hadoop class i'm looking multipleoutputs
Comments
Post a Comment