Hadoop GenericWritable sample usage

GenericWritable is another Hadoop feature that let you pass values with different types to the reducer, it’s a wrapper for Writable instances.

Suppose we have different input formats (see MultipleInput), one is FirstClass and another one is SecondClass.(note: you can have multiple, not just 2). And you want to include both of them in your reducer based on the same key value, here is what you can do:

We use the same code used in MultipleInput.

1. Write a GenericWritable class MyGenericWritable

public class MyGenericWritable extends GenericWritable {

    private static Class<? extends Writable>[] CLASSES = null;

    static {
        CLASSES = (Class<? extends Writable>[]) new Class[] {
            FirstClass.class,
            SecondClass.class
             //add as many different class as you want
        };
    }
    //this empty initialize is required by Hadoop
    public MyGenericWritable() {
    }

    public MyGenericWritable(Writable instance) {
        set(instance);
    }

    @Override
    protected Class<? extends Writable>[] getTypes() {
        return CLASSES;
    }

    @Override
    public String toString() {
        return "MyGenericWritable [getTypes()=" + Arrays.toString(getTypes()) + "]";
    }
}

2. In your Mappers,

public static class FirstMap extends Mapper<Text, FirstClass, Text, MyGenericWritable> {
     public void map(Text key, FirstClass value, Context context) throws IOException, InterruptedException {
         System.out.println("FirstMap:" + key.toString() + " " + value.toString());
         context.write(key, new MyGenericWritable(value));
     }
}
public static class SecondMap extends Mapper<Text, SecondClass, Text, MyGenericWritable> {
     public void map(Text key, SecondClass value, Context context) throws IOException, InterruptedException {
         System.out.println("FirstMap:" + key.toString() + " " + value.toString());
         context.write(key, new MyGenericWritable(value));
     }
}

3. In your Reducer, use it like the following:

public class Reduce extends Reducer<Text, MyGenericWritable, Text, Text> {
    public void reduce(Text key, Iterable<MyGenericWritable> values, Context context) throws IOException, InterruptedException {
        for (MyGenericWritable value : values) {
            Writable rawValue = value.get();
            if(rawValue instanceof FirstClass){
                FirstClass firstClass = (FirstClass)rawValue;
                //do something
            }
            if(rawValue instanceof SecondClass){
                SecondClass firstClass = (SecondClass)rawValue;
                //do something
            }
        }
    }
}

3. In your job configuration, change the map output value class to MyGenericWritable

job.setMapOutputValueClass(MyGenericWritable.class);

pretty simple right?

8 thoughts on “Hadoop GenericWritable sample usage

  1. lidoublewen

    我也是这样是用GenricWritable的,为什么在reduce的时候会出现java.lang.nosuchmethodException的?

    Reply
  2. Darya

    Hi there,
    Thank you for sharing these great code! I am getting the error while using them. The error is “The type of instance is: class org.apache.hadoop.io.Text, which is not registered. ” at the end of getTypes() method.
    Do you have any idea about this error?
    Best regards.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *