Suppose we have 4 column data with field language, website, pagecount and page_size
en google.com 50 100
en yahoo.com 60 100
us google.com 70 100
en google.com 68 100
and we want
google.com 118
yahoo.com 60 as output.
records = LOAD '/webcount' using PigStorage(' ') as (projectname:chararray, pagename:chararray, pagecount:int,pagesize:int);
filtered_records = FILTER records by projectname=='en';
grouped_records = GROUP filtered_records by pagename;
results = FOREACH grouped_records generate group,SUM(filtered_records.pagecount);
sorted_result = ORDER results by $1 desc;
STORE sorted_result INTO '/YOUROUTPUT';
Keep updated with
www.facebook.com/coebda
en google.com 50 100
en yahoo.com 60 100
us google.com 70 100
en google.com 68 100
and we want
google.com 118
yahoo.com 60 as output.
records = LOAD '/webcount' using PigStorage(' ') as (projectname:chararray, pagename:chararray, pagecount:int,pagesize:int);
filtered_records = FILTER records by projectname=='en';
grouped_records = GROUP filtered_records by pagename;
results = FOREACH grouped_records generate group,SUM(filtered_records.pagecount);
sorted_result = ORDER results by $1 desc;
STORE sorted_result INTO '/YOUROUTPUT';
Keep updated with
www.facebook.com/coebda
No comments:
Post a Comment