Process unstructured data in HDFS

Let us say we have 550 MB text file. We process this file in Hadoop, run MapReduce and this file is stored in 20 blocks on different nodes in hdfs. Can we go after few days reconstruct this file from hdfs exactly the way it was before the file was loaded into hdfs?

4 Answer(s)


Copy from HDFS to Local Directory and then use winscp or webui to download it in system.
bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
Point your web browser to HDFS WEBUI(namenode_machine:50070), browse to the file you intend to copy, scroll down the page and click on download the file


Thank you Abhijit. My question is, is it possible to reconstruct back to original form if the data in the txt document is broken into key value pairs in hdfs?


Yes, it is back in original form. File is just broken into chunks and easily constructed back using command provided by Abhijit.

this is same as you break 550MB file using winzip/7zip and break it into zip of say 64 MB each. but using all zip u can easily construct original file back.


hi Trinath,

You do not need to worry about re-constructing the file. The DFS will do it for you. It's similar to how you store a file on your laptop, which in turn is split into 4-32kb blocks depending on the type of File system and when you read the file, the File System manager joins the blocks and give the data to you.