TaskscheDuler module is used to interact with DagscheDuler and is responsible for the specific scheduling and operation of the task. The task scheduling module is based on two Trait: TaskscheDuler and SchedulerBackend.
Taskscheduler: Define the external interface (Submittasks, etc.) for the task scheduling module for DagscheDuler calls.
- TaskschedulerIMPL is the specific implementation of TaskscheDuler to complete the scheduling of resources and tasks.
- SCHEDULERBACKEND encapsulates various Backend to interact with the underlying resource scheduling system, and cooperate with TaskscheDulerIMPL to achieve the resource allocation required for task execution.
- schedulabuilder is responsible for the scheduling of Taskset.
- tasksetManager is responsible for a task scheduling in a taskset.
schedulerbackend method Reviveoffers
- 1, filter out the living Executors
- 2, create a new workeroffer object, call scheduler.Resourceofoffer () allocate resources to each Executor;
- 3, allocate task to Executor, execute your own Lauchtasks (), send the distributed task to Launchtasks information;
TASK allocation algorithm ResourceOffers
- Remove Node in the blacklist
- The available Executor is used for the shuffle, and the load balancing is achieved as much as possible
- After removing the sortedtasksets from the rootpool
- For each taskSet, start from the best [localized level] traversing, call ResourceOffersingletaskSet () method
- Localization level classification:
- Process_local: Localization of the process, RDD’s Partition and Task enters an Executor, the fastest speed
- node_local: node localization, RDD’s partition and task are not in an Executor, that is, not in the same process, but on a worker
- no_pref: indifferent localization level
- RACK_LOCAL: Localization of the rack, at least the partition of RDD and Task on a rack
- Any: Any localization level
- Localization level classification: