Sep 20 2015 09:54 PM
How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
>> Write a MapReduce program which will take older data values and write to new data format
How to write data with compression?
>> For compression of data, use one of the hadoop supported compression formats like snappy ,gzip. etc, checkout more on http://comphadoop.weebly.com/
How to convert data from one set of values to another
>> Not clear on what tool you are using Pig/Hive/Mapreduce?
How to purge bad records from a data set, e.g., null values?
>> Depends on what tool you are using, if you are using Mapreduce, using the reporter capture the bad records, report them or delete them