Monday 20 February 2017

Pig Exercise Part-II

Suppose we have 4 column data with field language, website, pagecount and page_size

    en google.com 50 100
    en yahoo.com 60 100
    us google.com 70 100
    en google.com 68 100

and we want


google.com 118
yahoo.com 60 as output.


records = LOAD '/webcount' using PigStorage(' ') as  (projectname:chararray, pagename:chararray, pagecount:int,pagesize:int);

filtered_records = FILTER records by projectname=='en';

grouped_records = GROUP filtered_records by pagename;     

results = FOREACH grouped_records generate group,SUM(filtered_records.pagecount);

sorted_result = ORDER results by $1 desc;

STORE sorted_result INTO '/YOUROUTPUT';


Keep updated with
www.facebook.com/coebda

 

No comments:

Post a Comment