I am trying to create hive paritioned table with the weather dataset shared during course. We have weather data orgnized in files. These files are named after each month.
201201hourly.txt, 201202hourly.txt, 201203hourly.txt, 201204hourly.txt.
What is the correct approach to partition this data (in hive table) which is already orginized in files in the way we want.
- Can we partition weather Data table using these file names? Is there something inbuilt in hive which support paritioning by input directory / files names
- Is there a way we can partition based on derived value from one of the columns in a file
- Example, weather data has date format "20120101" (where 5th and 6th char represents month), can I partition based those 2 chars in the column.
One of the options is to transform data to have another column (for month) and then parition/load data to hive table, but that would be resource heavy process.