Here is Pig programs for wordcount
myinput = load '/sample.txt' as (line);
//TOKENIZE splits the line into a field for each word.
//flatten will take the collection of records returned by TOKENIZE and
//produce a separate record for each one, calling the single field in the
//record word.
words = foreach myinput generate flatten(TOKENIZE(line)) as word;
grpd = group words by word;
cntd = foreach grpd generate group, COUNT(words);
dump cntd;
Keep updated with
www.facebook.com/coebda
If you need raw data comment here:
myinput = load '/sample.txt' as (line);
//TOKENIZE splits the line into a field for each word.
//flatten will take the collection of records returned by TOKENIZE and
//produce a separate record for each one, calling the single field in the
//record word.
words = foreach myinput generate flatten(TOKENIZE(line)) as word;
grpd = group words by word;
cntd = foreach grpd generate group, COUNT(words);
dump cntd;
Keep updated with
www.facebook.com/coebda
If you need raw data comment here:
No comments:
Post a Comment