1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

File Format and Data manipulation in HDFS



0
Hi,

Could you please briefly explain the following.

How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
How to write data with compression?
How to convert data from one set of values to another (e.g., Postal Address using an external library)?
How to purge bad records from a data set, e.g., null values?
How to perform deduplication and merge data in HDFS?
How to denormalize data from multiple disparate data sets?

2 Answer(s)


0

hi Trinath,
How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
>> Write a MapReduce program which will take older data values and write to new data format
How to write data with compression?
>> For compression of data, use one of the hadoop supported compression formats like snappy ,gzip. etc, checkout more on http://comphadoop.weebly.com/
How to convert data from one set of values to another
>> Not clear on what tool you are using Pig/Hive/Mapreduce?
How to purge bad records from a data set, e.g., null values?
>> Depends on what tool you are using, if you are using Mapreduce, using the reporter capture the bad records, report them or delete them


0

Hi,

Thanks for answering my questions.

1. How to convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/Hcatalog?
>> Write a MapReduce program which will take older data values and write to new data format.
** How can we do this in PIG or Hive?

2.How to convert data from one set of values to another
>> Not clear on what tool you are using Pig/Hive/Mapreduce?
** The tools I am using are PIG and Hive.

3.How to purge bad records from a data set, e.g., null values?
>> Depends on what tool you are using, if you are using Mapreduce, using the reporter capture the bad records, report them or delete them
** I am using Pig and Hive, could you suggest ways to purge bad records using these tools?

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password