SysEleven Status and Incidents https://syseleven-status.de Get all incidents by feed http://www.rssboard.org/rss-specification python-feedgen https://www.syseleven.de/wp-content/uploads/2020/10/SysEleven_XL_Logo_quer_RGB.png SysEleven Status and Incidents https://syseleven-status.de de Sat, 22 Feb 2025 08:42:58 +0000 INCIDENT: PVC Failure of a hardware node, region BKI <p>Affected Components: <strong>Hardware node failure, region XXX</strong></p> <p>Incident Start: <strong>2024-10-01 09:15 UTC+02:00 (CEST)</strong> Incident End: <strong>2024-10-01 09:36 UTC+02:00 (CEST)</strong></p> <hr /> <p>Description:</p> <ul> <li>Malfunction of a hardware node</li> <li>Restart of the hardware node is necessary </li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>During the period of restart, there will be a short interruption in the availability of the systems.</li> <li>Affected customers were notified via E-Mail.</li> <li>Please check the affected systems for their full functionality.</li> </ul> 529 Tue, 01 Oct 2024 05:15:00 +0000 INCIDENT: SysEleven STACK Network performance issues, region DBL <p>Affected Components: <strong>SysEleven Stack Network, region DBL</strong></p> <p>Incident Start: <strong>2024-10-01 14:17 UTC+02:00 (CEST)</strong></p> <p>Incident End: <strong>2024-10-01 14:34 UTC+02:00 (CEST)</strong></p> <hr /> <p>Description:</p> <p>At the moment, we are facing issues with the Network in Region DBL.</p> <hr /> <p>Customer Impact:</p> <ul> <li>Network performance issue</li> </ul> <hr /> <p><strong>Update: 2024-10-01 14:34 UTC+02:00 (CEST)</strong></p> <p>The incident is over, and all services are operational.</p> <hr /> 530 Tue, 01 Oct 2024 10:17:00 +0000 INCIDENT: SysEleven STACK issues in region HAM1 <p>Affected Components: <strong>SysEleven Stack, region HAM1</strong></p> <p>Incident Start: <strong>2024-10-10 23:55</strong> Incident End: <strong>2024-10-11 00:00</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors were investigated. </li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity was restricted</li> </ul> <hr /> <p><strong>Update: 01:00</strong></p> <ul> <li>We can observe further short term issues with the HAM1 region connectivity, the provider is aware of the issues and is currently investigating the situation</li> </ul> <hr /> <p><strong>Update: 01:30</strong></p> <ul> <li>The network provider is proceeding with a network maintenance until 05:00, we are on standby</li> </ul> 532 Thu, 10 Oct 2024 19:55:00 +0000 INCIDENT: SysEleven STACK API issues <p>Affected Components: <strong>SysEleven Stack API</strong></p> <p>Incident Start: <strong>2024-10-30 09:30 CET</strong></p> <p>Incident End: <strong>2024-10-30 12:05 CET</strong></p> <hr /> <p>Description:</p> <ul> <li>Accessibility of the SysEleven Stack API is not ensured.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Spawning new virtual machines (VMs) or changing existing resources is not possible.</li> </ul> <hr /> <p><strong>Update: 10:40</strong></p> <p>We are still investigating the situation and are in contact with our external network provider to further analyze the problems.</p> <hr /> <p><strong>Update: 11:30</strong></p> <p>The issue has been identified, we are waiting for our external network provider to further fix the situation.</p> <hr /> <p><strong>Update: 12:05</strong></p> <p>The issue has been resolved.</p> 534 Wed, 30 Oct 2024 07:30:00 +0000 INCIDENT: SysEleven STACK Object Storage issues, region DBL <p>Affected Components: <strong>SysEleven Stack Object Storage, region DBL</strong></p> <p>Incident Start: <strong>2024-11-12 17:45 UTC+01:00 (CET)</strong></p> <p>Incident End: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>At the moment we are facing issues with the Object Storage in Region DBL.</p> <hr /> <p>Customer Impact:</p> <ul> <li>Writing or reading of objects maybe restricted.</li> </ul> <hr /> <p>Update: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p> <p>We mitigated the problem and do further investigation</p> 537 Tue, 12 Nov 2024 15:45:00 +0000 INCIDENT: SysEleven STACK issues in region CBK <p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p> <p>Incident Start: <strong>2024-11-12 22:35</strong> Incident End: <strong>2024-11-13 00:00</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors are currently being investigated.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is restricted</li> </ul> <hr /> <p><strong>Update: 23:08</strong></p> <p>The announced maintenance is having a bigger impact than expected, we are investigating the situation</p> <hr /> <p><strong>Update: 23:45</strong></p> <p>We were able to pin down the rootcause and prepare a fix to mitigate the problems</p> <hr /> <p><strong>Update: 12:00</strong></p> <p>The network problems were mitigated. If you still encounter issues please contact us!</p> 538 Tue, 12 Nov 2024 20:35:00 +0000 INCIDENT: Partial outage of MetaKube Control Plane Services Region in region FES <p>Affected Components: <strong>MetaKube Control Plane Services, region FES</strong></p> <p>Incident Start: <strong>2024-11-18 11:30 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>Infrastructure hosting the MetaKube Control Plane Services has problems.</p> <hr /> <p>Customer Impact:</p> <ul> <li>MetaKube Control Plane might be slow or not answering</li> </ul> <hr /> <p><strong>UPDATE 2024-11-18 12:30 UTC+01:00 (CET)</strong></p> <p>We have identified networking problems as the cause, currently working to resolve them.</p> <p><strong>UPDATE 2024-11-18 13:27 UTC+01:00 (CET)</strong></p> <p>We have increased conntrack table size on hardware nodes to avoid networking problems.</p> <p>We continue to have issues with overloaded pods which we are working on.</p> <p><strong>UPDATE 2024-11-18 14:00 UTC+01:00 (CET)</strong></p> <p>We managed to get the overloaded pods running by isolating them on dedicated nodes and raising the resource limits. This stopped other issues as well.</p> <p>We still need to investigate what caused the overloading of certain pods.</p> <p>Incident is over.</p> 541 Mon, 18 Nov 2024 09:30:00 +0000 INCIDENT: Partial degradation of SysEleven IAM services <p>Affected Components: <strong>SysEleven IAM, regions DUS and HAM</strong></p> <p>Incident Start: <strong>2024-11-28 12:00 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <ul> <li>We're currently investigating a service degradation in the SysEleven IAM. Inviting users to an organization is currently not possible.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Inviting users to an organization is currently not possible.</li> </ul> <hr /> <p><strong>UPDATE 2024-11-28 13:10 UTC+01:00 (CET)</strong></p> <p>The issue has been resolved and inviting users to organizations is possible again</p> 543 Thu, 28 Nov 2024 10:00:00 +0000 INCIDENT: major outage of metakube control plane services in ham1 <p>Affected Components: <strong>metakube control plane services, region ham1</strong></p> <p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p> <p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>The metakube control plane services in ham1 can't be reached currently due to slow i/o</p> <hr /> <p>Customer Impact:</p> <ul> <li>Metakube services e.g. clusters in ham1 can't be reached</li> </ul> <hr /> <p>Customer Actions:</p> <ul> <li>Please inform us if you notice any irregularities</li> </ul> <hr /> <p>Update 13:23</p> <p>The situation improved.</p> 544 Fri, 29 Nov 2024 09:00:00 +0000 INCIDENT: SysEleven STACK Storage issues, region HAM1 <p>Affected Components: <strong>SysEleven Stack, Storage, region HAM1</strong></p> <p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p> <p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>We are facing issues with the distributed file system, a core component of the SysEleven Stack.</p> <hr /> <p>Customer Impact:</p> <ul> <li>Starting of virtual machines (VMs) partially not possible.</li> <li>Writing Access to volumes (VM disks) maybe restricted.</li> </ul> <hr /> <p>Update 13:23</p> <p>The situation improved, storage access latencies are back to normal.</p> 545 Fri, 29 Nov 2024 09:00:00 +0000 INCIDENT: SysEleven STACK API issues, region FES <p>Affected Components: <strong>SysEleven Stack API, region FES</strong></p> <p>Incident Start: <strong>2024-12-04 19:07 UTC+01:00 (CET)</strong></p> <p>Incident End: <strong>2024-12-04 20:10 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <ul> <li>Accessibility of the SysEleven Stack API is not ensured.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Requests on OpenStack API may return an error code</li> <li>Spawning new virtual machines (VMs) or changing existing resources may fail</li> </ul> <hr /> <p><strong>Update: 2024-12-04 20:00 UTC+01:00 (CET)</strong></p> <p>We identified the likely root cause and a fix is being applied.</p> <hr /> <p><strong>Update: 2024-12-04 20:10 UTC+01:00 (CET)</strong></p> <p>OpenStack API is now working again as expected.</p> 546 Wed, 04 Dec 2024 17:07:00 +0000 INCIDENT: SysEleven STACK issues in region dus2 <p>Affected Components: <strong>SysEleven Stack, region dus2</strong></p> <p>Incident Start: **2024-12-06 17:45 UTC+01:00 (CET)</p> <hr /> <p>Description:</p> <ul> <li>We are seeing some network connectivity issues in the region and are investigating.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is degraded for MetaKube Services in dus2 </li> <li>Connectivity to Database as a Service in dus2 is degraded</li> </ul> <hr /> <p><strong>Update: 2024-12-06 18:45 UTC+01:00 (CET)</strong></p> <ul> <li>We identified an issue with one of our gateways and are working on a fix</li> </ul> 547 Fri, 06 Dec 2024 15:45:00 +0000 INCIDENT: Partial outage of Control Planes in Regions FES, DBL, CBK <p>Affected Components: <strong>Control Planes in Regions FES, DBL, CBK</strong></p> <p>Incident Start: <strong>2024-12-09 18:30 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>Some cluster control planes are not reachable.</p> <hr /> <p>Update (22:20 CET):</p> <p>We have found a way to mitigate the issues temporarily and start applying the fix.</p> <p>Update (22:50 CET):</p> <p>We applied the fix everywhere. We don't see any broken clusters anymore.</p> <hr /> <p>Update <strong>2024-12-10 12:30 UTC+01:00 (CET)</strong>:</p> <p>We identified the root cause: An unintended side-effect of an upgrade to a kube-proxy setting changed the proxy mode. This created iptables rules which were not cleaned up and through which traffic was dropped.</p> <p>We are taking measures to prevent this in the future.</p> 548 Mon, 09 Dec 2024 16:30:00 +0000 INCIDENT: SysEleven STACK network issues in region DBL <p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p> <p>Incident Start: <strong>2024-12-13 14:09 UTC+01:00 (CET)</strong></p> <p>Incident End: <strong>2024-12-13 16:09 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <ul> <li>DBL network has performance issue</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Network performance degradation </li> </ul> <hr /> <p><strong>Update: 2024-12-13 15:17 UTC+01:00 (CET)</strong></p> <p>Situation is back to normal.</p> <hr /> <p><strong>Update: 2024-12-13 15:57 UTC+01:00 (CET)</strong></p> <p>We notice performance degradation again.</p> <hr /> <p><strong>Update: 2024-12-13 16:09 UTC+01:00 (CET)</strong></p> <p>Situation is back to normal.</p> 549 Fri, 13 Dec 2024 12:09:00 +0000 INCIDENT: SysEleven STACK issues in region DBL <p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p> <p>Incident Start: <strong>2024-18-12 00:47</strong> Incident Start: <strong>2024-18-12 01:27</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors are currently being investigated.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is restricted</li> </ul> <hr /> <p>Update: <strong>2024-18-12 01:27</strong></p> <ul> <li>We could see problems with cross region connectivity outgoing from the DBL region, between 00:35 - 01:10, at the moment traffic seems to have normalized again, we are still investigating</li> </ul> <hr /> <p>Update: <strong>2024-18-12 02:00</strong></p> <ul> <li>Device causing the network issues was identified, root cause will be further investigated</li> </ul> 550 Tue, 17 Dec 2024 22:40:00 +0000 INCIDENT: SysEleven STACK issues in region CBK <p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p> <p>Incident Start: <strong>2025-06-01 08:18</strong></p> <p>Incident Start: <strong>2025-06-01 08:50</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors are currently being investigated.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is restricted</li> </ul> <hr /> 551 Mon, 06 Jan 2025 08:35:12 +0000 INCIDENT: SysEleven STACK issues in region CBK <p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p> <p>Incident Start: <strong>2025-06-01 10:00</strong></p> <p>Incident End: <strong>2025-06-01 10:30</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors are currently being investigated.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is restricted</li> </ul> <hr /> 552 Mon, 06 Jan 2025 10:14:32 +0000 INCIDENT: SysEleven STACK issues in region CBK <p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p> <p>Incident Start: <strong>2025-01-07 09:16</strong></p> <p>Incident End: <strong>2025-01-07 09:45</strong></p> <hr /> <p>Description:</p> <ul> <li>Occurring errors are currently being investigated.</li> </ul> <hr /> <p>Customer Impact:</p> <ul> <li>Connectivity is restricted</li> </ul> <hr /> <p>Update : 09:45</p> <p>Situation stabilized again, we are investigating the situation at the moment</p> 553 Tue, 07 Jan 2025 09:25:38 +0000 INCIDENT: minor outage of Database as a Service <p>Affected Components: Database as a Service, all regions</p> <p>Incident Start: <strong>2025-01-21 10:30 UTC+01:00 (CET)</strong> Incident End: <strong>2025-01-21 14:45 UTC+01:00 (CET)</strong></p> <hr /> <p>Description:</p> <p>Database as a Service is currently only available via the API and terraform, the UI is not working</p> <hr /> <p>Update <strong>2025-01-21 14:45</strong></p> <p>We fixed the underlying issue, therefore Database as a Service is also working again in the UI</p> 554 Tue, 21 Jan 2025 14:41:37 +0000 INCIDENT: MetaKube Control Plane issues, region FES <p>Affected Components: <strong>MetaKube Control Planes, region FES</strong></p> <p>Incident Start: <strong>2025-02-11 06:00 UTC+01:00 (CET)</strong></p> <p><strong>State: Resolved</strong></p> <hr /> <p>Description:</p> <ul> <li>Accessibility of the MetaKube API is not ensured.</li> <li>After a scheduled maintenance to the network in FES, the MetaKube control cluster (which is hosting the customer control planes) has problems reaching DNS. This is causing issues to the customer control planes.</li> <li>This also affects Database as a Service and Observability as a Service</li> <li>All times below are CET</li> </ul> <p><strong>UPDATE 2025-02-13 12:20</strong> - We consider all service disruptions of the incident to be mitigated - Although we are not expecting any more service disruptions, we are still watching all systems closly</p> <p><strong>Previous Updates in reverse chronological order</strong></p> <p><strong>UPDATE 2025-02-11 10:00</strong></p> <ul> <li>Still investigating the DNS issue. We sent out a notifier to all potentially affected customers.</li> </ul> <hr /> <p><strong>UPDATE 2025-02-11 11:15</strong> - We have used the time since the last update to narrow down the root cause of the incident. We excluded some possibilites but did not find the root-cause. We are now preparing a partial rollback to downgrade the SDN again.</p> <hr /> <p><strong>UPDATE 2025-02-11 12:05</strong> - We completed a partial OVN/SDN downgrade, however this has not yet resolved the incident. - We are investigating further</p> <hr /> <p><strong>UPDATE 2025-02-11 12:25</strong> - We’re exploring further downgrade approaches (previous rollbacks were, as announced, partial) and are in parallel investigating further.</p> <hr /> <p><strong>UPDATE 2025-02-11 13:00</strong> - As we originally updated the SDN due to a critical security gap, it was decided that we will not perform a full OVN/SDN rollback to the initial state. - We have now activated several teams who will be developing and evaluating different solutions until 1.30 pm. An update on how we proceed will follow then.</p> <hr /> <p><strong>UPDATE 2025-02-11 13:50</strong> - Our Teams will continue developing and evaluating solutions in break out session as there are further leads but no breakthrough, yet. - In parallel we are preparing a failover for IAM and Alloy to DUS/HAM</p> <hr /> <p><strong>UPDATE 2025-02-11 15:30</strong> - Our teams investigation in SDN traffic loss is ongoing - Our teams continue developing and evaluating solutions and possible workarounds - In parallel we are evaluating a rebuild of the SDN (software defined network)</p> <hr /> <p><strong>UPDATE 2025-02-11 17:47</strong> - Part of the services are still not functional - We are still working hard to resolve the issues but we will roll back the update of the SDN if no progress is made. - The planned maintenance period begins today, 11 February 2025, at 23:00 and is expected to last until around 06:00 CET on 12 February 2025, during which time there may be repeated interruptions to services. - The maintenance is also announced via notifier. You will get an info of the end of the maintenance also via notifier.</p> <hr /> <p><strong>UPDATE 2025-02-11 20:15</strong> - IAM, Alloy and Observability as a Service are restored to full functionality - The Database as a Service API has also been restored, but the API still has some issues which are related to the wider SDN problem. The Databases themselves were at no point affected by the incident.</p> <hr /> <p><strong>UPDATE 2025-02-11 23:00</strong> - The situation with the API improved. We will continue watching it.</p> <hr /> <p><strong>UPDATE 2025-02-12 10:15</strong> - The previous maintenance work did not achieve the desired success. - Workarounds have been implemented, so operations should be able to continue without disruptions. - Maintainance work will continue during the upcoming night to fully resolve the incident.</p> <hr /> <p><strong>UPDATE 2025-02-12 14:55</strong> - We scheduled another maintenance window for this night, February 12th, from 11:00 PM to 6:00 AM the following day. - During this maintenance window, there may be brief interruptions or limited availability of certain services. - The goal is the complete resolution of the incident caused by Monday's update. - An RfO will be available in our Helpdesk after the incident is mitigated.</p> <hr /> <p><strong>UPDATE 2025-02-13 12:20</strong> - We consider all service disruptions of the incident to be mitigated - Although we are not expecting any more service disruptions, we are still watching all systems closly</p> <hr /> <p><strong>UPDATE 2025-02-13 15:30</strong> - Incident is resolved</p> 555 Tue, 11 Feb 2025 08:04:01 +0000