Monday 20 February 2017

Pig Exercise-part I

Note the following about bags:
  • A bag can have duplicate tuples.

  • A bag can have tuples with differing numbers of fields. However, if Pig tries to access a field that does not exist, a null value is substituted.

  • A bag can have tuples with fields that have different data types. However, for Pig to effectively process bags, the schema of the tuples within those bags should be the same. For example, if half of the tuples include chararray fields and while the other half include float fields, only half of the tuples will participate in any kind of computation because the chararray fields will be converted to null.


    Bags have two forms: outer bag (or relation) and inner bag.

    Example: Outer Bag
    In this example A is a relation or bag of tuples. You can think of this bag as an outer bag.
     
    A = LOAD 'data' as (f1:int, f2:int, f3;int);
    DUMP A;
    (1,2,3)
    (4,2,1)
    (8,3,4)
    (4,3,3)
     
    Example: Inner Bag
    Now, suppose we group relation A by the first field to form relation X.
    In this example X is a relation or bag of tuples.

    The tuples in relation X have two fields. The first field is type int. The second field is type bag; you can think of this bag as an inner bag.
    X = GROUP A BY f1;
    DUMP X;
    (1,{(1,2,3)})
    (4,{(4,2,1),(4,3,3)})
    (8,{(8,3,4)})


     For more updates visit:
    www.facebook.com/coebda

    For raw data comment here.

No comments:

Post a Comment