• Trevor Conn's avatar
    Patch docker-compose timing issues in export/rules-engine (#1550) · b160daec
    Trevor Conn authored
    Fix #1544
    
    The primary issue is observed when one runs "docker-compose up" after
    pruning local docker volumes.
    
    The following problematic symptoms have been observed. The requisite
    code changes in this PR are enumerated below. I hesitate to properly
    call this a "fix" because I think more analysis is warranted for a long
    term solution. However for the purpose of stabilizing the release, this
    should suffice.
    
    - Observed that population of configuration data in Consul and
      subsequent creation of DBs/Collections and credentials in Mongo were
      delayed when creating volumes for the first time. This lead to timing
      issues in the following areas
    - I observed conditions where the Config-Seed was populating Consul for
      a given service at the same time that that service's configuration was
      being read into Consul by the config-seed. Thus services would come
      up and read their configuration from Consul when that config was only
      partially populated. In Delhi this would not be a problem because the
      listenForConfigChanges() function would listen for a change anywhere
      in the service's config in Consul. This would cause the entirety of
      the ConfigurationStruct to be populated when the last key was written.
      In Edinburgh, we only listen for changes to the "Writable" section,
      which is written first.
      -- This is why initializeConfiguration() has been modified to look at
         the last value of the service's configuration within Consul to see
         if it has been populated. If it is not, then we throw an error --
         forcing a retry one second later. If it is populated, the whole
         config has been written to Consul and we can proceed.
    - All services write logging information when they are bootstrapping.
      If the service is configured to log remotely, those calls go to
      support-logging. Because of the observed latency (up to 5 seconds) in
      populating configuration and Mongo with new volumes, any service trying
      to send its startup messages to support-logging would throw an error.
      This lead to many, mnay errors in the starup process. Once support-
      logging came up, the remote logging would work. Understand that every
      service must receive its configuration information from Consul and,
      where applicable, establish DB connectivity before it activates its
      API handlers. If a service calls support-logging (or any other service)
      before the API is enabled, the caller will receive a "connection re-
      fused".
      -- This was also mitigated by the code change above.
    - In the case where a service is delayed in its startup, it will not
      register its endpoint with Consul. Export-client does not require a
      database connection and so it will come up before export-distro. It
      was observed that the rules-engine registers with export-client in
      order to receive events. As a result of that, export-client attempts
      to POST a notification to export-distro. Because export-distro had not
      registered itself with Consul at the time export-client was being
      configured, the endpoint information would be blank, leading to an
      error like this (some fields omitted):
        level=ERROR app=edgex-export-client source=registration.go:310
        msg="error from distro: Put http://:0/api/v1/notify/registrations
    
    :
             dial tcp :0: connect: connection refused"
      Notice the the host and port.
      -- The code change to address this can be found in
         internal/pkg/startup/endpoint.go
      -- This only works due to the non-HA mode in which we deploy from
         docker-compose. Right now, what's in the docker configuration is
         always the same as what is (or will be) in Consul.
      -- Note also that there is a circular dependency between distro/client
         w/r/t how they call each other. Because the result of a notification
         sent to distro in the above case is that distro simply calls back
         to export-client.
    
    Signed-off-by: default avatarTrevor Conn <trevor_conn@dell.com>
    b160daec