-
Trevor Conn authored
Fix #1544 The primary issue is observed when one runs "docker-compose up" after pruning local docker volumes. The following problematic symptoms have been observed. The requisite code changes in this PR are enumerated below. I hesitate to properly call this a "fix" because I think more analysis is warranted for a long term solution. However for the purpose of stabilizing the release, this should suffice. - Observed that population of configuration data in Consul and subsequent creation of DBs/Collections and credentials in Mongo were delayed when creating volumes for the first time. This lead to timing issues in the following areas - I observed conditions where the Config-Seed was populating Consul for a given service at the same time that that service's configuration was being read into Consul by the config-seed. Thus services would come up and read their configuration from Consul when that config was only partially populated. In Delhi this would not be a problem because the listenForConfigChanges() function would listen for a change anywhere in the service's config in Consul. This would cause the entirety of the ConfigurationStruct to be populated when the last key was written. In Edinburgh, we only listen for changes to the "Writable" section, which is written first. -- This is why initializeConfiguration() has been modified to look at the last value of the service's configuration within Consul to see if it has been populated. If it is not, then we throw an error -- forcing a retry one second later. If it is populated, the whole config has been written to Consul and we can proceed. - All services write logging information when they are bootstrapping. If the service is configured to log remotely, those calls go to support-logging. Because of the observed latency (up to 5 seconds) in populating configuration and Mongo with new volumes, any service trying to send its startup messages to support-logging would throw an error. This lead to many, mnay errors in the starup process. Once support- logging came up, the remote logging would work. Understand that every service must receive its configuration information from Consul and, where applicable, establish DB connectivity before it activates its API handlers. If a service calls support-logging (or any other service) before the API is enabled, the caller will receive a "connection re- fused". -- This was also mitigated by the code change above. - In the case where a service is delayed in its startup, it will not register its endpoint with Consul. Export-client does not require a database connection and so it will come up before export-distro. It was observed that the rules-engine registers with export-client in order to receive events. As a result of that, export-client attempts to POST a notification to export-distro. Because export-distro had not registered itself with Consul at the time export-client was being configured, the endpoint information would be blank, leading to an error like this (some fields omitted): level=ERROR app=edgex-export-client source=registration.go:310 msg="error from distro: Put http://:0/api/v1/notify/registrations : dial tcp :0: connect: connection refused" Notice the the host and port. -- The code change to address this can be found in internal/pkg/startup/endpoint.go -- This only works due to the non-HA mode in which we deploy from docker-compose. Right now, what's in the docker configuration is always the same as what is (or will be) in Consul. -- Note also that there is a circular dependency between distro/client w/r/t how they call each other. Because the result of a notification sent to distro in the above case is that distro simply calls back to export-client. Signed-off-by: Trevor Conn <trevor_conn@dell.com>