Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
Currently, we assign remaining parfor parallelism conservatively to operations of the parfor body. Consider, for example, a Kmeans or MSVM scenario with 10 runs or 10 classes respectively. On a box with 16 HW threads, we assign k=10 to the parfor and floor(16/10) to remaining operations. Since it is usually a good idea to slightly over-provision CPU in order to get full utilization (due to barriers at the end of each operation), we should tune this to round(16/10) which provides performance improvements of about 15% in above examples.