Note the following about bags:
- 
         
A bag can have duplicate tuples.
 
 
- 
         
A bag can have tuples with differing numbers of fields. However, if 
Pig tries to access a field that does not exist, a null value is 
substituted.
 
 
- 
         
A bag can have tuples with fields that have different data types. 
However, for Pig to effectively process bags, the schema of the tuples 
within those bags should be the same. For example, if half of the tuples
 include chararray fields and while the other half include float fields,
 only half of the tuples will participate in any kind of computation 
because the chararray fields will be converted to null.
 
 
 Bags have two forms: outer bag (or relation) and inner bag.
 
 Example: Outer BagIn this example A is a relation or bag of tuples. You can think of this bag as an outer bag.
 A = LOAD 'data' as (f1:int, f2:int, f3;int); DUMP A; (1,2,3) (4,2,1) (8,3,4) (4,3,3) Example: Inner BagNow, suppose we group relation A by the first field to form relation X.
 In this example X is a relation or bag of tuples.
 
 The tuples in relation X have two fields. The first field is type int. The second field is type bag; you can think of this bag as an inner bag.
 X = GROUP A BY f1; DUMP X; (1,{(1,2,3)}) (4,{(4,2,1),(4,3,3)}) (8,{(8,3,4)})
 
 For more updates visit:
 www.facebook.com/coebda
 
 For raw data comment here.
 
 
No comments:
Post a Comment