kube-scheduler在Kubernetes中负责Pods的调度,其主要流程是获取未被调度的pod,然后根据pod的信息过滤出符合要求的nodes,接着对这些符合要求的nodes进行打分,最后把得分最高的node作为pod的调度结果。所以,在kube-scheduler中有两类算法,一种是用来过滤nodes的算法,称为predicate类;另一种是来用打分的算法,称为priority类。本次分析,就是介绍kube-scheduler是如何对算法进行管理的。
algorithmprovider
在kube-scheduler的入口,/plugin/cmd/kube-scheduler/app/server.go中,有如下引入包:
1
| _ "k8s.io/kubernetes/plugin/pkg/scheduler/algorithmprovider"
|
在引入包时,Go语言会自动执行该包的init()函数。所以我们来看下algorithmprovider的init()函数,定义在/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
| func init() { factory.RegisterPredicateMetadataProducerFactory( func(args factory.PluginFactoryArgs) algorithm.MetadataProducer { return predicates.NewPredicateMetadataFactory(args.PodLister) }) factory.RegisterPriorityMetadataProducerFactory( func(args factory.PluginFactoryArgs) algorithm.MetadataProducer { return priorities.PriorityMetadata }) factory.RegisterAlgorithmProvider(factory.DefaultProvider, defaultPredicates(), defaultPriorities()) factory.RegisterAlgorithmProvider(ClusterAutoscalerProvider, defaultPredicates(), copyAndReplace(defaultPriorities(), "LeastRequestedPriority", "MostRequestedPriority")) factory.RegisterFitPredicate("PodFitsPorts", predicates.PodFitsHostPorts) factory.RegisterFitPredicate("PodFitsHostPorts", predicates.PodFitsHostPorts) factory.RegisterFitPredicate("PodFitsResources", predicates.PodFitsResources) factory.RegisterFitPredicate("HostName", predicates.PodFitsHost) factory.RegisterFitPredicate("MatchNodeSelector", predicates.PodSelectorMatches) factory.RegisterGetEquivalencePodFunction(GetEquivalencePod) factory.RegisterPriorityConfigFactory( "ServiceSpreadingPriority", factory.PriorityConfigFactory{ Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction { return priorities.NewSelectorSpreadPriority(args.ServiceLister, algorithm.EmptyControllerLister{}, algorithm.EmptyReplicaSetLister{}) }, Weight: 1, }, ) factory.RegisterPriorityFunction2("EqualPriority", scheduler.EqualPriorityMap, nil, 1) factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1) factory.RegisterPriorityFunction2("MostRequestedPriority", priorities.MostRequestedPriorityMap, nil, 1) }
|
init()中通过调用RegisterAlgorithmProvider()注册了DefaultProvider;通过RegisterFitredicate()注册predicate类函数;通过RegisterPriorityFunctions2()注册priority类函数。
在kube-scheduler中注册了以下算法函数:
predicates类:NoVolumeZoneConflict, MaxEBSVolumeCount, MaxGCEPDVolumeCount, MatchInterPodAffinity, NoDiskConflict, GeneralPredicates, PodToleratesNodeTaints, CheckNodeMemoryPressure, CheckNodeDIskPressure, ClusterAutoscalerProvider, PodFitsPorts, PodFitsHostPorts, PodFitsResource, HostName, MatchNodeSelector。
priority类:SelectorSpreadPriority, InterPodAffinityPriority, LeastRequestedPriority, BalancedResourceAllocation, NodePreferAvoidPodsPriority, NodeAffinityPriority, TaintTolerationPriority, servicespreadingPriority, EqualPriority, ImageLocalityPriority, MostRequestedPriority。
这些算法的功能会在以后逐一介绍。
先来看下:
1
| factory.RegisterAlgorithmProvider(factory.DefaultProvider, defaultPredicates(), defaultPriorities())
|
该代码注册了DefaultProvider,来看defaultPredicates():
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
| func defaultPredicates() sets.String { return sets.NewString( factory.RegisterFitPredicateFactory( "NoVolumeZoneConflict", func(args factory.PluginFactoryArgs) algorithm.FitPredicate { return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo) }, ), factory.RegisterFitPredicateFactory( "MaxEBSVolumeCount", func(args factory.PluginFactoryArgs) algorithm.FitPredicate { maxVols := getMaxVols(aws.DefaultMaxEBSVolumes) return predicates.NewMaxPDVolumeCountPredicate(predicates.EBSVolumeFilter, maxVols, args.PVInfo, args.PVCInfo) }, ), factory.RegisterFitPredicateFactory( "MaxGCEPDVolumeCount", func(args factory.PluginFactoryArgs) algorithm.FitPredicate { maxVols := getMaxVols(DefaultMaxGCEPDVolumes) return predicates.NewMaxPDVolumeCountPredicate(predicates.GCEPDVolumeFilter, maxVols, args.PVInfo, args.PVCInfo) }, ), factory.RegisterFitPredicateFactory( "MatchInterPodAffinity", func(args factory.PluginFactoryArgs) algorithm.FitPredicate { return predicates.NewPodAffinityPredicate(args.NodeInfo, args.PodLister, args.FailureDomains) }, ), factory.RegisterFitPredicate("NoDiskConflict", predicates.NoDiskConflict), factory.RegisterFitPredicate("GeneralPredicates", predicates.GeneralPredicates), factory.RegisterFitPredicate("PodToleratesNodeTaints", predicates.PodToleratesNodeTaints), factory.RegisterFitPredicate("CheckNodeMemoryPressure", predicates.CheckNodeMemoryPressurePredicate), factory.RegisterFitPredicate("CheckNodeDiskPressure", predicates.CheckNodeDiskPressurePredicate), ) }
|
可以看出,defaultPredicates()通过调用RegisterFitPredicateFactory()或RegisterFitPredicate()来注册predicates相关函数。defaultsPredicates()返回的是注册的函数的名称的集合。
再来看下defaultPriorities():
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| func defaultPriorities() sets.String { return sets.NewString( factory.RegisterPriorityConfigFactory( "SelectorSpreadPriority", factory.PriorityConfigFactory{ Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction { return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister) }, Weight: 1, }, ), factory.RegisterPriorityConfigFactory( "InterPodAffinityPriority", factory.PriorityConfigFactory{ Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction { return priorities.NewInterPodAffinityPriority(args.NodeInfo, args.NodeLister, args.PodLister, args.HardPodAffinitySymmetricWeight, args.FailureDomains) }, Weight: 1, }, ), factory.RegisterPriorityFunction2("LeastRequestedPriority", priorities.LeastRequestedPriorityMap, nil, 1), factory.RegisterPriorityFunction2("BalancedResourceAllocation", priorities.BalancedResourceAllocationMap, nil, 1), factory.RegisterPriorityFunction2("NodePreferAvoidPodsPriority", priorities.CalculateNodePreferAvoidPodsPriorityMap, nil, 10000), factory.RegisterPriorityFunction2("NodeAffinityPriority", priorities.CalculateNodeAffinityPriorityMap, priorities.CalculateNodeAffinityPriorityReduce, 1), factory.RegisterPriorityFunction2("TaintTolerationPriority", priorities.ComputeTaintTolerationPriorityMap, priorities.ComputeTaintTolerationPriorityReduce, 1), ) }
|
defaultPriorities()通过RegisterPriorityConfigFactory()或RegisterPriorityFunction2()来注册priority相关函数。defaultPriorities()返回的是注册的函数的名称的集合。
注册管理
算法函数的注册是通过3个公共Map来管理的,定义在/plugin/pkg/scheduler/factory/plugins.go中:
1 2 3 4 5
| fitPredicateMap = make(map[string]FitPredicateFactory) priorityFunctionMap = make(map[string]PriorityConfigFactory) algorithmProviderMap = make(map[string]AlgorithmProviderConfig)
|
其中fitPredicateMap记录了算法名称和predicate算法函数的map关系;priorityFunctionMap记录了算法名称和priority算法函数的map关系;algorithmProviderMap记录了名称和AlgorithmProviderConfig的关系。
AlgorithmProviderConfig的定义如下:
1 2 3 4 5
| type AlgorithmProviderConfig struct { FitPredicateKeys sets.String PriorityFunctionKeys sets.String }
|
可以看出,AlgorithmProviderConfig包含FitPredicateKeys和PriorityFunctionKeys,用来记录指定providerName对应的算法的名称。这就可以理解这三个全局变量的关系了,algorithmProviderMap可以根据providerName找到算法名称集;根据算法名称又可以在fitPredicateMap或prioritFunctionMap中找到对应的算法函数。
我们先来看如何向fitPredicateMap注册,相关函数定义在/plugin/pkg/scheduler/factory/plugins.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string { return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate }) } func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string { schedulerFactoryMutex.Lock() defer schedulerFactoryMutex.Unlock() validateAlgorithmNameOrDie(name) fitPredicateMap[name] = predicateFactory return name }
|
可以使用RegisterFitPredicate()或RegiserFitPredicateFactory()向fitPredicateMap中注册predicate算法函数。
再来看下如何向priorityFunctionMap注册,相关函数定义在/plugin/pkg/scheduler/factory/plugins.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| func RegisterPriorityFunction2( name string, mapFunction algorithm.PriorityMapFunction, reduceFunction algorithm.PriorityReduceFunction, weight int) string { return RegisterPriorityConfigFactory(name, PriorityConfigFactory{ MapReduceFunction: func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) { return mapFunction, reduceFunction }, Weight: weight, }) } func RegisterPriorityConfigFactory(name string, pcf PriorityConfigFactory) string { schedulerFactoryMutex.Lock() defer schedulerFactoryMutex.Unlock() validateAlgorithmNameOrDie(name) priorityFunctionMap[name] = pcf return name }
|
可以使用RegisterPriorityFunctions2()或registerPriorityConfigFactory()向priorityFunctionMap中注册priority算法函数。
最后来看下如何向algorithmProviderMap中注册。相关函数定义在/plugin/pkg/scheduler/factory/plugins.go:
1 2 3 4 5 6 7 8 9 10 11
| func RegisterAlgorithmProvider(name string, predicateKeys, priorityKeys sets.String) string { schedulerFactoryMutex.Lock() defer schedulerFactoryMutex.Unlock() validateAlgorithmNameOrDie(name) algorithmProviderMap[name] = AlgorithmProviderConfig{ FitPredicateKeys: predicateKeys, PriorityFunctionKeys: priorityKeys, } return name }
|
总结
所以,在kube-scheduler的初始化过程中,使用fitPredicateMap,priorityFunctionMap,algorithmProviderMap三个公共变量来管理算法函数,所有需要的算法函数都会按类别注册到fitPredicateMap或priorityFuncitonMap中。