hadoop - How does Pig store results to multiple locations with a single map-only job? -

- January 15, 2015

i'm beginner pig , hadoop. i'm trying understand what's going on behind scenes in simple pig script. i'm reading in data, splitting 3 new relations, , storing each in different directory. script runs on psuedo distributed hadoop installation 1 map-only job.

i have been trying figure out how implement in plain java map/reduce in single map-only job. trivial achieve filtering/splitting, don't know how i'd map job send different key/value pairs different outputs. come think of it, don't know how i'd able send output multiple places in 1 full map/reduce job.

rawtweets = load 'geotaggedtweets' using pigstorage(',') (...);  split rawtweets ustweets if country == 'us', gbtweets if country == 'gb', idtweets if country == 'id';  store ustweets 'testustweets' using pigstorage(','); store gbtweets 'testgbtweets' using pigstorage(','); store idtweets 'testidtweets' using pigstorage(',');

edit: ugghh... i've done again. seem not able come answers questions until i've gone through whole process of writing , submitting question. hadoop class i'm looking multipleoutputs

Search This Blog

Convert PH

hadoop - How does Pig store results to multiple locations with a single map-only job? -

Comments

Post a Comment

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -