Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.4.2
-
None
Description
Sqoop always tries to achieve the best possible throughput with exports, which might not be desirable in all cases. Sometimes we need to export large data with Sqoop to a live relational database (MySQL in our case), that is, a database that is under a high load serving random queries from the users of our product.
While data consistency issues during the export can be easily solved with a staging table, there is still a problem: the performance impact caused by the heavy export.
First off, the resources of MySQL dedicated to the import process can affect the performance of the live product, both on the master and on the slaves. Second, even if the servers can handle the import with no significant performance impact (mysqlimport should be relatively "cheap"), importing big tables (GB+) can cause serious replication lag in the cluster risking data consistency.
My suggestion is quite simple. Using the already existing "checkpoint" feature of the MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism.