Record Reader Input format



0
Maximum size of the record can be controlled by mapred.linerecordreader.maxlength -
a) what happens if the record size is more than this.
b) what is the default size
c) will map throw an exception if the record size is greater than the defined value?

1 Answer(s)


0

Yes, it is true that one can restrict the size of a record using mapred.linerecordreader.maxlength.
Before jumping into questions, let us understand the data-flow in map-reduce :
File->Split<->InputFormat->RecordReader->Map->Reduce
1. InputFormat takes care of split generation and calling record reader.
2. Record Reader, then, reads from split, line-by-line and generates key-value pair and calls map for each generated key-value pair.
3. While generating value part, record reader checks this property, which if set to some value, reads the record bytes only upto that length and ignores rest of the record.[Default RecordReader implementation]

Now to answer your questions:
a. If record size is more than the value specified in mapred.linerecordreader.maxlength property, record reader will simply ignore the rest of the record.
b. Default size is: INTEGER.MAX_VALUE
c. Map won't throw any exception as record reader already has handled this by silently ignoring the rest of the record part.

Hope this helps.

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password