Hive Partitioning based on file name or value derived from one of the columns


I am trying to create hive paritioned table with the weather dataset shared during course. We have weather data orgnized in files. These files are named after each month.


What is the correct approach to partition this data (in hive table) which is already orginized in files in the way we want.

- Can we partition weather Data table using these file names? Is there something inbuilt in hive which support paritioning by input directory / files names

- Is there a way we can partition based on derived value from one of the columns in a file

          - Example, weather data has date format "20120101" (where 5th and 6th char represents month), can I partition based those 2 chars in the column.

One of the options is to transform data to have another column (for month) and then parition/load data to hive table, but that would be resource heavy process.

1 Answer(s)


Hi Pankaj,

Hive has parition but required parts values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

You need to run some non-hive script to create a extra column contains only last two digits of date string.

Hope this helps.