How to manage the number of connections from MR to Database?

How to avoid database from crashing when many Map Reduce jobs persists data to Database in parallel? Is there a way to manage DB connections at Map reduce/HDFS end?

3 Answer(s)


Hi Deepya,
You can create backup of the databases.
But databases are build to taken care of multiple queries.
Let take a scenario where too jobs are created to update table values. In this case, database will allow first thread to write the database while other can only read the values.
So, it the in-built mechanism of the database that helps it to prevent it for damage.
But for the safety, you can regular backup the database by using some ETL tools or in-built backup option if database support it.

Hope this helps.


Thanks for your quick reply, Abhijit. The problem here is more number of connections to the database since many jobs are running in parallel and they should be run in parallel.


If you read the database, it won't effect the database or cause failure.
It like reading a same text file by 50-100 user at the same time.
Problem only comes when you are writing the databases.