Timeline:
Start Time: 9/14/23 9:26 PM PT
End Time: 9/18/23 3:30 PM PT
Customer Impact:
Hearsay's infrastructure used a cache service which became unavailable. Due to this, Hearsay's publishing and crawl services experienced latency issues and partial outage due to the inability to read and write to the cache at full capacity.
This resulted in some posts only achieving a "Sent" state but did not get updated with the "Success" or "Failed" network response. Because of this, posts that went out successfully have been found by our crawlers and were saved incorrectly as native posts. Posts that failed to publish during this time were rescheduled and reposted.
Resolution:
Hearsay Engineering created a new cache cluster service, which allowed our service to be fully operational and resolved the issue. Crawling activities that used the same cache service previously were successfully crawled.
Future Improvements:
Additional monitoring will be implemented for this type of instance to ensure it's caught moving forward. This will help us spot any unusual patterns or issues early on so we can quickly fix them.