Hadoop has a concept of 'split' that may comprise of one or more HDFS blocks.
May 13 2014 11:36 PM
A map task operates on individual split so line split between two or more block issue is resolved.
Now next question could be how hadoop ensures complete line reading as line may be split up among multiple splits?
So to answer that, hadoop performs a remote-read operation [taken care in RecordReader] till it reaches to the EOL.
If a map task got a split that contains start of one line but not the end, hadoop will continue to read it from the next split until it reaches to EOL of this line.
Next map task will first seek to the first EOL found[we already read this in previous map-job] in split and then start reading the next line.