1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Here is Homework PIG Script, technically it works, but is it good?



0
Technically this script does word count.
But because of the way it used GROUP, it does not seem like there would be any way to break it up into separate mappers, so I'm not sure if this a good PIG script.

---WORDCOUNT.PIG
-- load lines, seems about the same as USING TextLoader
LINES = LOAD 'wcdata.txt' AS (line:chararray);
-- TOKENIZE
TOKES = FOREACH LINES GENERATE TOKENIZE(line) AS linelist;
-- tuple of bag of tuple of chararray?? - must flatten out!

WL = FOREACH TOKES GENERATE FLATTEN(linelist) AS words;
-- wow that actually worked!

WG = GROUP WL BY words;

-- note GROUP command gives key field the name group
-- and the (inner) bag the of what it was grouped by (WL)
WC = FOREACH WG GENERATE group AS word, COUNT(WL) AS ct;

1 Answer(s)


0

hi Chip,

Here is the wordcount Pig script

lines = LOAD 'file.txt' AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words);
DUMP wordcount;

Here is the explanation:

first splits each line into words using the TOKENIZE operator. The tokenize function creates a bag of words. Using the FLATTEN function, the bag is converted into a tuple . In the third statement, the words are grouped together so that the count can be computed which is done in fourth statement.

Thanks

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password