File Format and Data manipulation in HDFS


2 Answer(s)


hi Trinath,
How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
>> Write a MapReduce program which will take older data values and write to new data format
How to write data with compression?
>> For compression of data, use one of the hadoop supported compression formats like snappy ,gzip. etc, checkout more on http://comphadoop.weebly.com/
How to convert data from one set of values to another
>> Not clear on what tool you are using Pig/Hive/Mapreduce?
How to purge bad records from a data set, e.g., null values?
>> Depends on what tool you are using, if you are using Mapreduce, using the reporter capture the bad records, report them or delete them


Hi,

Thanks for answering my questions.

1. How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
>> Write a MapReduce program which will take older data values and write to new data format.
** How can we do this in PIG or Hive?

2.How to convert data from one set of values to another
>> Not clear on what tool you are using Pig/Hive/Mapreduce?
** The tools I am using are PIG and Hive.

3.How to purge bad records from a data set, e.g., null values?
>> Depends on what tool you are using, if you are using Mapreduce, using the reporter capture the bad records, report them or delete them
** I am using Pig and Hive, could you suggest ways to purge bad records using these tools?