Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      From the mailing list:

      General idea:
      Add an elastic scaling monitor and coordinator, i.e. a whirr process that would be running on some or all of the nodes that:

      • would collect load metrics (both generic and specific to each application)
      • would feed them through an elastic decision making engine (also specific to each application as it depends on the specific metrics)
      • would then act on those decisions by either expanding or contracting the cluster.

      Some specifics:

      • it must not be completely distributed, i.e. it can have a specific assigned node that will monitor/coordinate but this node must not be fixed, i.e. it could/should change if the previous coordinator leaves the cluster.
      • each application would define the set of metrics that it emits and use a local monitor process to feed them to the coordinator.
      • the monitor process should emit some standard metrics (Disk I/O, CPU Load, Net I/O, memory)
      • the coordinator would have a pluggable decision engine policy also defined by the application that would consume metrics and make a decision.
      • whirr would take care of requesting/releasing nodes and adding/removing them from the relevant services.

      Some implementation ideas:

      • it could tun on top of zookeeper. zk is already a requirement for several services and would allow to reliably store coordinator state so that another node can pickup if the previous coordinator leaves the cluster.
      • it could use Avro to serialize/deserialize metrics data
      • it should be optional, i.e. simply another service that the whirr cli starts
      • it would also be nice to have a monitor/coordinator web page that would display metrics and view cluster status in an aggregated view.

      Attachments

        Issue Links

          Activity

            dr-alves David Alves added a comment -

            Some additional thoughts:

            Use Sigar for generic metrics gathering (they recently changed to Apache license 2.0)
            The whirr cli starts a monitor process in all the cluster nodes.
            The monitor process is passed the addresses of the zk ensemble.
            The monitor process uses zk's ephemeral nodes to elect a leader that will be the coordinator.
            Each monitor will gather generic metrics from sigar and will optionally accept application specific metrics gatherers (e.g. reqs/sec in HBase, or current jobs in Hadoop).
            Each monitor will feed these metrics embedded with its own unique id to the elected coordinator (through Avro).
            The coordinator will gather metrics and feed them to a scaling rule engine.
            The rule engine may be generic, i.e., only based on load, but can be receive (be replaced? use composite?) application specific rule engines.
            There should exist configurable and default hard limits for scaling (min,max) so that coordinator bugs would not cause clusters to wrongly scale to huge numbers.
            All non-volatile coordinator state is to be kept in zk.
            When a coordinator node leaves the cluster another one is elected and starts where the previous left off (probably not problematic to loose some metrics).
            Maybe in the future there could be some simple DDL for rules but for now just keep things simple, rule engines receive metrics and programatically start/stop nodes using whirr's cli.

            dr-alves David Alves added a comment - Some additional thoughts: Use Sigar for generic metrics gathering (they recently changed to Apache license 2.0) The whirr cli starts a monitor process in all the cluster nodes. The monitor process is passed the addresses of the zk ensemble. The monitor process uses zk's ephemeral nodes to elect a leader that will be the coordinator. Each monitor will gather generic metrics from sigar and will optionally accept application specific metrics gatherers (e.g. reqs/sec in HBase, or current jobs in Hadoop). Each monitor will feed these metrics embedded with its own unique id to the elected coordinator (through Avro). The coordinator will gather metrics and feed them to a scaling rule engine. The rule engine may be generic, i.e., only based on load, but can be receive (be replaced? use composite?) application specific rule engines. There should exist configurable and default hard limits for scaling (min,max) so that coordinator bugs would not cause clusters to wrongly scale to huge numbers. All non-volatile coordinator state is to be kept in zk. When a coordinator node leaves the cluster another one is elected and starts where the previous left off (probably not problematic to loose some metrics). Maybe in the future there could be some simple DDL for rules but for now just keep things simple, rule engines receive metrics and programatically start/stop nodes using whirr's cli.
            savu.andrei Andrei Savu added a comment -

            Sounds great! I also need to be able to monitor a running cluster for the research work I'm doing. David let me know when you will start working on this - maybe we can split some of the tasks.

            savu.andrei Andrei Savu added a comment - Sounds great! I also need to be able to monitor a running cluster for the research work I'm doing. David let me know when you will start working on this - maybe we can split some of the tasks.
            savu.andrei Andrei Savu added a comment -

            David, did you consider using Ganglia for monitoring? I vote for using an existing system even if the proposed architecture sounds cool. I don't see why we should write a new tool from scratch. Plus we can reuse existing monitoring scripts:

            http://wiki.apache.org/hadoop/GangliaMetrics
            https://github.com/andreisavu/zookeeper-monitoring/tree/master/ganglia

            savu.andrei Andrei Savu added a comment - David, did you consider using Ganglia for monitoring? I vote for using an existing system even if the proposed architecture sounds cool. I don't see why we should write a new tool from scratch. Plus we can reuse existing monitoring scripts: http://wiki.apache.org/hadoop/GangliaMetrics https://github.com/andreisavu/zookeeper-monitoring/tree/master/ganglia
            dr-alves David Alves added a comment -

            I'm also for using whatever exists. I've used both Ganglia and the hadoop metrics collection system in the past. Ganglia as a metrics gatherer/display and hadoop for metrics collection/publishing.
            I was thinking of using the hadoop metrics gathering system (even though I always end having to hack it a bit), it integrates well with hadoop and hbase, and provides a lot of functionality out of the box (namely publishing to Ganglia).
            Regarding Ganglia, I see it as an out of the box solution for the monitoring web page, but not as a solution for the distributed coordinator with pluggable scaling rules problem.

            dr-alves David Alves added a comment - I'm also for using whatever exists. I've used both Ganglia and the hadoop metrics collection system in the past. Ganglia as a metrics gatherer/display and hadoop for metrics collection/publishing. I was thinking of using the hadoop metrics gathering system (even though I always end having to hack it a bit), it integrates well with hadoop and hbase, and provides a lot of functionality out of the box (namely publishing to Ganglia). Regarding Ganglia, I see it as an out of the box solution for the monitoring web page, but not as a solution for the distributed coordinator with pluggable scaling rules problem.
            savu.andrei Andrei Savu added a comment -

            Indeed. Ganglia is not a solution for scaling but the coordinator could fetch all the needed data from it easily (there is a python plugin in Ganglia/contrib that shows how this can be done). This should allow us to focus on the important part; in this case the scaling coordinator and the policies.

            savu.andrei Andrei Savu added a comment - Indeed. Ganglia is not a solution for scaling but the coordinator could fetch all the needed data from it easily (there is a python plugin in Ganglia/contrib that shows how this can be done). This should allow us to focus on the important part; in this case the scaling coordinator and the policies.
            dr-alves David Alves added a comment -

            Ah ok I get your point
            Ok my thoughts:
            My particular use case requires that coordinators be able to move from one node to another when the previous coordinator leaves the cluster. This is not only for fault-tolerance it is actually a functional requirement.
            The way I had thought about it metrics producers would know (through zk) when another coordinator was up and send the metrics there in a point-to-point connection. AFAIK in Ganglia the only way for metrics producing nodes not to know where the metrics consumer is is through multicast, which would not work in EC2 (I don't know about the other cloud providers).
            That being said I had not thought about Ganglia as the metrics display sub-system. I think it would be rather easy to make gmetad start when a new coordinator starts and make the coordinator publish the metrics there (both per node and aggregated) giving a nice view of cluster status almost for free.
            What do you think?

            dr-alves David Alves added a comment - Ah ok I get your point Ok my thoughts: My particular use case requires that coordinators be able to move from one node to another when the previous coordinator leaves the cluster. This is not only for fault-tolerance it is actually a functional requirement. The way I had thought about it metrics producers would know (through zk) when another coordinator was up and send the metrics there in a point-to-point connection. AFAIK in Ganglia the only way for metrics producing nodes not to know where the metrics consumer is is through multicast, which would not work in EC2 (I don't know about the other cloud providers). That being said I had not thought about Ganglia as the metrics display sub-system. I think it would be rather easy to make gmetad start when a new coordinator starts and make the coordinator publish the metrics there (both per node and aggregated) giving a nice view of cluster status almost for free. What do you think?
            savu.andrei Andrei Savu added a comment -

            I understand Keep me in the loop. Maybe we can work together.

            savu.andrei Andrei Savu added a comment - I understand Keep me in the loop. Maybe we can work together.

            People

              Unassigned Unassigned
              dr-alves David Alves
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: