GenericWritable is another Hadoop feature that let you pass values with different types to the reducer, it’s a wrapper for Writable instances.
Suppose we have different input formats (see MultipleInput), one is FirstClass and another one is SecondClass.(note: you can have multiple, not just 2). And you want to include both of them in your reducer based on the same key value, here is what you can do:
We use the same code used in MultipleInput.
1. Write a GenericWritable class MyGenericWritable
public class MyGenericWritable extends GenericWritable { private static Class<? extends Writable>[] CLASSES = null; static { CLASSES = (Class<? extends Writable>[]) new Class[] { FirstClass.class, SecondClass.class //add as many different class as you want }; } //this empty initialize is required by Hadoop public MyGenericWritable() { } public MyGenericWritable(Writable instance) { set(instance); } @Override protected Class<? extends Writable>[] getTypes() { return CLASSES; } @Override public String toString() { return "MyGenericWritable [getTypes()=" + Arrays.toString(getTypes()) + "]"; } }
2. In your Mappers,
public static class FirstMap extends Mapper<Text, FirstClass, Text, MyGenericWritable> { public void map(Text key, FirstClass value, Context context) throws IOException, InterruptedException { System.out.println("FirstMap:" + key.toString() + " " + value.toString()); context.write(key, new MyGenericWritable(value)); } }
public static class SecondMap extends Mapper<Text, SecondClass, Text, MyGenericWritable> { public void map(Text key, SecondClass value, Context context) throws IOException, InterruptedException { System.out.println("FirstMap:" + key.toString() + " " + value.toString()); context.write(key, new MyGenericWritable(value)); } }
3. In your Reducer, use it like the following:
public class Reduce extends Reducer<Text, MyGenericWritable, Text, Text> { public void reduce(Text key, Iterable<MyGenericWritable> values, Context context) throws IOException, InterruptedException { for (MyGenericWritable value : values) { Writable rawValue = value.get(); if(rawValue instanceof FirstClass){ FirstClass firstClass = (FirstClass)rawValue; //do something } if(rawValue instanceof SecondClass){ SecondClass firstClass = (SecondClass)rawValue; //do something } } } }
3. In your job configuration, change the map output value class to MyGenericWritable
job.setMapOutputValueClass(MyGenericWritable.class);
pretty simple right?
我也是这样是用GenricWritable的,为什么在reduce的时候会出现java.lang.nosuchmethodException的?
你能把具体的exception贴出来吗?
Awesome post buddy! You rocks…. Thanks for such a great example
Nice one.
Thanks, used it a message class for Apache Giraph and it works great.
Glad to know it helps:)
Hi there,
Thank you for sharing these great code! I am getting the error while using them. The error is “The type of instance is: class org.apache.hadoop.io.Text, which is not registered. ” at the end of getTypes() method.
Do you have any idea about this error?
Best regards.
I could resolve it. Your code worked perfectly.
Thanks!