https://syseleven-status.deSysEleven Status and Incidents2025-02-22T08:38:24.610940+00:00SysElevensupport@syseleven.depython-feedgenhttps://www.syseleven.de/wp-content/uploads/2020/10/SysEleven_XL_Logo_quer_RGB.pngGet all incidents by feed529INCIDENT: PVC Failure of a hardware node, region BKI2025-02-22T08:38:24.709034+00:00<p>Affected Components: <strong>Hardware node failure, region XXX</strong></p>
<p>Incident Start: <strong>2024-10-01 09:15 UTC+02:00 (CEST)</strong>
Incident End: <strong>2024-10-01 09:36 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Malfunction of a hardware node</li>
<li>Restart of the hardware node is necessary </li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>During the period of restart, there will be a short interruption in the availability of the systems.</li>
<li>Affected customers were notified via E-Mail.</li>
<li>Please check the affected systems for their full functionality.</li>
</ul>2024-10-01T05:15:00+00:00530INCIDENT: SysEleven STACK Network performance issues, region DBL2025-02-22T08:38:24.707551+00:00<p>Affected Components: <strong>SysEleven Stack Network, region DBL</strong></p>
<p>Incident Start: <strong>2024-10-01 14:17 UTC+02:00 (CEST)</strong></p>
<p>Incident End: <strong>2024-10-01 14:34 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<p>At the moment, we are facing issues with the Network in Region DBL.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Network performance issue</li>
</ul>
<hr />
<p><strong>Update: 2024-10-01 14:34 UTC+02:00 (CEST)</strong></p>
<p>The incident is over, and all services are operational.</p>
<hr />2024-10-01T10:17:00+00:00532INCIDENT: SysEleven STACK issues in region HAM12025-02-22T08:38:24.706241+00:00<p>Affected Components: <strong>SysEleven Stack, region HAM1</strong></p>
<p>Incident Start: <strong>2024-10-10 23:55</strong>
Incident End: <strong>2024-10-11 00:00</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors were investigated. </li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity was restricted</li>
</ul>
<hr />
<p><strong>Update: 01:00</strong></p>
<ul>
<li>We can observe further short term issues with the HAM1 region connectivity, the provider is aware of the issues and is currently investigating the situation</li>
</ul>
<hr />
<p><strong>Update: 01:30</strong></p>
<ul>
<li>The network provider is proceeding with a network maintenance until 05:00, we are on standby</li>
</ul>2024-10-10T19:55:00+00:00534INCIDENT: SysEleven STACK API issues2025-02-22T08:38:24.704800+00:00<p>Affected Components: <strong>SysEleven Stack API</strong></p>
<p>Incident Start: <strong>2024-10-30 09:30 CET</strong></p>
<p>Incident End: <strong>2024-10-30 12:05 CET</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Accessibility of the SysEleven Stack API is not ensured.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Spawning new virtual machines (VMs) or changing existing resources is not possible.</li>
</ul>
<hr />
<p><strong>Update: 10:40</strong></p>
<p>We are still investigating the situation and are in contact with our external network provider to further analyze the problems.</p>
<hr />
<p><strong>Update: 11:30</strong></p>
<p>The issue has been identified, we are waiting for our external network provider to further fix the situation.</p>
<hr />
<p><strong>Update: 12:05</strong></p>
<p>The issue has been resolved.</p>2024-10-30T07:30:00+00:00537INCIDENT: SysEleven STACK Object Storage issues, region DBL2025-02-22T08:38:24.703583+00:00<p>Affected Components: <strong>SysEleven Stack Object Storage, region DBL</strong></p>
<p>Incident Start: <strong>2024-11-12 17:45 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>At the moment we are facing issues with the Object Storage in Region DBL.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Writing or reading of objects maybe restricted.</li>
</ul>
<hr />
<p>Update: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p>
<p>We mitigated the problem and do further investigation</p>2024-11-12T15:45:00+00:00538INCIDENT: SysEleven STACK issues in region CBK2025-02-22T08:38:24.702134+00:00<p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p>
<p>Incident Start: <strong>2024-11-12 22:35</strong>
Incident End: <strong>2024-11-13 00:00</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />
<p><strong>Update: 23:08</strong></p>
<p>The announced maintenance is having a bigger impact than expected, we are investigating the situation</p>
<hr />
<p><strong>Update: 23:45</strong></p>
<p>We were able to pin down the rootcause and prepare a fix to mitigate the problems</p>
<hr />
<p><strong>Update: 12:00</strong></p>
<p>The network problems were mitigated. If you still encounter issues please contact us!</p>2024-11-12T20:35:00+00:00541INCIDENT: Partial outage of MetaKube Control Plane Services Region in region FES2025-02-22T08:38:24.700355+00:00<p>Affected Components: <strong>MetaKube Control Plane Services, region FES</strong></p>
<p>Incident Start: <strong>2024-11-18 11:30 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>Infrastructure hosting the MetaKube Control Plane Services has problems.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>MetaKube Control Plane might be slow or not answering</li>
</ul>
<hr />
<p><strong>UPDATE 2024-11-18 12:30 UTC+01:00 (CET)</strong></p>
<p>We have identified networking problems as the cause, currently working to resolve them.</p>
<p><strong>UPDATE 2024-11-18 13:27 UTC+01:00 (CET)</strong></p>
<p>We have increased conntrack table size on hardware nodes to avoid networking problems.</p>
<p>We continue to have issues with overloaded pods which we are working on.</p>
<p><strong>UPDATE 2024-11-18 14:00 UTC+01:00 (CET)</strong></p>
<p>We managed to get the overloaded pods running by isolating them on dedicated nodes and raising the resource limits. This stopped other issues as well.</p>
<p>We still need to investigate what caused the overloading of certain pods.</p>
<p>Incident is over.</p>2024-11-18T09:30:00+00:00543INCIDENT: Partial degradation of SysEleven IAM services2025-02-22T08:38:24.699301+00:00<p>Affected Components: <strong>SysEleven IAM, regions DUS and HAM</strong></p>
<p>Incident Start: <strong>2024-11-28 12:00 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>We're currently investigating a service degradation in the SysEleven IAM. Inviting users to an organization is currently not possible.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Inviting users to an organization is currently not possible.</li>
</ul>
<hr />
<p><strong>UPDATE 2024-11-28 13:10 UTC+01:00 (CET)</strong></p>
<p>The issue has been resolved and inviting users to organizations is possible again</p>2024-11-28T10:00:00+00:00544INCIDENT: major outage of metakube control plane services in ham12025-02-22T08:38:24.698096+00:00<p>Affected Components: <strong>metakube control plane services, region ham1</strong></p>
<p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>The metakube control plane services in ham1 can't be reached currently due to slow i/o</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Metakube services e.g. clusters in ham1 can't be reached</li>
</ul>
<hr />
<p>Customer Actions:</p>
<ul>
<li>Please inform us if you notice any irregularities</li>
</ul>
<hr />
<p>Update 13:23</p>
<p>The situation improved.</p>2024-11-29T09:00:00+00:00545INCIDENT: SysEleven STACK Storage issues, region HAM12025-02-22T08:38:24.696920+00:00<p>Affected Components: <strong>SysEleven Stack, Storage, region HAM1</strong></p>
<p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>We are facing issues with the distributed file system, a core component of the SysEleven Stack.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Starting of virtual machines (VMs) partially not possible.</li>
<li>Writing Access to volumes (VM disks) maybe restricted.</li>
</ul>
<hr />
<p>Update 13:23</p>
<p>The situation improved, storage access latencies are back to normal.</p>2024-11-29T09:00:00+00:00546INCIDENT: SysEleven STACK API issues, region FES2025-02-22T08:38:24.695466+00:00<p>Affected Components: <strong>SysEleven Stack API, region FES</strong></p>
<p>Incident Start: <strong>2024-12-04 19:07 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-12-04 20:10 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Accessibility of the SysEleven Stack API is not ensured.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Requests on OpenStack API may return an error code</li>
<li>Spawning new virtual machines (VMs) or changing existing resources may fail</li>
</ul>
<hr />
<p><strong>Update: 2024-12-04 20:00 UTC+01:00 (CET)</strong></p>
<p>We identified the likely root cause and a fix is being applied.</p>
<hr />
<p><strong>Update: 2024-12-04 20:10 UTC+01:00 (CET)</strong></p>
<p>OpenStack API is now working again as expected.</p>2024-12-04T17:07:00+00:00547INCIDENT: SysEleven STACK issues in region dus22025-02-22T08:38:24.694019+00:00<p>Affected Components: <strong>SysEleven Stack, region dus2</strong></p>
<p>Incident Start: **2024-12-06 17:45 UTC+01:00 (CET)</p>
<hr />
<p>Description:</p>
<ul>
<li>We are seeing some network connectivity issues in the region and are investigating.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is degraded for MetaKube Services in dus2 </li>
<li>Connectivity to Database as a Service in dus2 is degraded</li>
</ul>
<hr />
<p><strong>Update: 2024-12-06 18:45 UTC+01:00 (CET)</strong></p>
<ul>
<li>We identified an issue with one of our gateways and are working on a fix</li>
</ul>2024-12-06T15:45:00+00:00548INCIDENT: Partial outage of Control Planes in Regions FES, DBL, CBK2025-02-22T08:38:24.692698+00:00<p>Affected Components: <strong>Control Planes in Regions FES, DBL, CBK</strong></p>
<p>Incident Start: <strong>2024-12-09 18:30 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>Some cluster control planes are not reachable.</p>
<hr />
<p>Update (22:20 CET):</p>
<p>We have found a way to mitigate the issues temporarily and start applying the fix.</p>
<p>Update (22:50 CET):</p>
<p>We applied the fix everywhere. We don't see any broken clusters anymore.</p>
<hr />
<p>Update <strong>2024-12-10 12:30 UTC+01:00 (CET)</strong>:</p>
<p>We identified the root cause: An unintended side-effect of an upgrade to a kube-proxy setting changed the proxy mode. This created iptables rules which were not cleaned up and through which traffic was dropped.</p>
<p>We are taking measures to prevent this in the future.</p>2024-12-09T16:30:00+00:00549INCIDENT: SysEleven STACK network issues in region DBL2025-02-22T08:38:24.690944+00:00<p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p>
<p>Incident Start: <strong>2024-12-13 14:09 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-12-13 16:09 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>DBL network has performance issue</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Network performance degradation </li>
</ul>
<hr />
<p><strong>Update: 2024-12-13 15:17 UTC+01:00 (CET)</strong></p>
<p>Situation is back to normal.</p>
<hr />
<p><strong>Update: 2024-12-13 15:57 UTC+01:00 (CET)</strong></p>
<p>We notice performance degradation again.</p>
<hr />
<p><strong>Update: 2024-12-13 16:09 UTC+01:00 (CET)</strong></p>
<p>Situation is back to normal.</p>2024-12-13T12:09:00+00:00550INCIDENT: SysEleven STACK issues in region DBL2025-02-22T08:38:24.689611+00:00<p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p>
<p>Incident Start: <strong>2024-18-12 00:47</strong>
Incident Start: <strong>2024-18-12 01:27</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />
<p>Update: <strong>2024-18-12 01:27</strong></p>
<ul>
<li>We could see problems with cross region connectivity outgoing from the DBL region, between 00:35 - 01:10, at the moment traffic seems to have normalized again, we are still investigating</li>
</ul>
<hr />
<p>Update: <strong>2024-18-12 02:00</strong></p>
<ul>
<li>Device causing the network issues was identified, root cause will be further investigated</li>
</ul>2024-12-17T22:40:00+00:00551INCIDENT: SysEleven STACK issues in region CBK2025-02-22T08:38:24.688606+00:00<p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p>
<p>Incident Start: <strong>2025-06-01 08:18</strong></p>
<p>Incident Start: <strong>2025-06-01 08:50</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />2025-01-06T08:35:12+00:00552INCIDENT: SysEleven STACK issues in region CBK2025-02-22T08:38:24.687163+00:00<p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p>
<p>Incident Start: <strong>2025-06-01 10:00</strong></p>
<p>Incident End: <strong>2025-06-01 10:30</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />2025-01-06T10:14:32+00:00553INCIDENT: SysEleven STACK issues in region CBK2025-02-22T08:38:24.685562+00:00<p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p>
<p>Incident Start: <strong>2025-01-07 09:16</strong></p>
<p>Incident End: <strong>2025-01-07 09:45</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />
<p>Update : 09:45</p>
<p>Situation stabilized again, we are investigating the situation at the moment</p>2025-01-07T09:25:38+00:00554INCIDENT: minor outage of Database as a Service2025-02-22T08:38:24.683882+00:00<p>Affected Components: Database as a Service, all regions</p>
<p>Incident Start: <strong>2025-01-21 10:30 UTC+01:00 (CET)</strong>
Incident End: <strong>2025-01-21 14:45 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>Database as a Service is currently only available via the API and terraform, the UI is not working</p>
<hr />
<p>Update <strong>2025-01-21 14:45</strong></p>
<p>We fixed the underlying issue, therefore Database as a Service is also working again in the UI</p>2025-01-21T14:41:37+00:00555INCIDENT: MetaKube Control Plane issues, region FES2025-02-22T08:38:24.677458+00:00<p>Affected Components: <strong>MetaKube Control Planes, region FES</strong></p>
<p>Incident Start: <strong>2025-02-11 06:00 UTC+01:00 (CET)</strong></p>
<p><strong>State: Resolved</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Accessibility of the MetaKube API is not ensured.</li>
<li>After a scheduled maintenance to the network in FES, the MetaKube control cluster (which is hosting the customer control planes) has problems reaching DNS. This is causing issues to the customer control planes.</li>
<li>This also affects Database as a Service and Observability as a Service</li>
<li>All times below are CET</li>
</ul>
<p><strong>UPDATE 2025-02-13 12:20</strong>
- We consider all service disruptions of the incident to be mitigated
- Although we are not expecting any more service disruptions, we are still watching all systems closly</p>
<p><strong>Previous Updates in reverse chronological order</strong></p>
<p><strong>UPDATE 2025-02-11 10:00</strong></p>
<ul>
<li>Still investigating the DNS issue. We sent out a notifier to all potentially affected customers.</li>
</ul>
<hr />
<p><strong>UPDATE 2025-02-11 11:15</strong>
- We have used the time since the last update to narrow down the root cause of the incident. We excluded some possibilites but did not find the root-cause. We are now preparing a partial rollback to downgrade the SDN again.</p>
<hr />
<p><strong>UPDATE 2025-02-11 12:05</strong>
- We completed a partial OVN/SDN downgrade, however this has not yet resolved the incident.
- We are investigating further</p>
<hr />
<p><strong>UPDATE 2025-02-11 12:25</strong>
- We’re exploring further downgrade approaches (previous rollbacks were, as announced, partial) and are in parallel investigating further.</p>
<hr />
<p><strong>UPDATE 2025-02-11 13:00</strong>
- As we originally updated the SDN due to a critical security gap, it was decided that we will not perform a full OVN/SDN rollback to the initial state.
- We have now activated several teams who will be developing and evaluating different solutions until 1.30 pm. An update on how we proceed will follow then.</p>
<hr />
<p><strong>UPDATE 2025-02-11 13:50</strong>
- Our Teams will continue developing and evaluating solutions in break out session as there are further leads but no breakthrough, yet.
- In parallel we are preparing a failover for IAM and Alloy to DUS/HAM</p>
<hr />
<p><strong>UPDATE 2025-02-11 15:30</strong>
- Our teams investigation in SDN traffic loss is ongoing
- Our teams continue developing and evaluating solutions and possible workarounds
- In parallel we are evaluating a rebuild of the SDN (software defined network)</p>
<hr />
<p><strong>UPDATE 2025-02-11 17:47</strong>
- Part of the services are still not functional
- We are still working hard to resolve the issues but we will roll back the update of the SDN if no progress is made.
- The planned maintenance period begins today, 11 February 2025, at 23:00 and is expected to last until around 06:00 CET on 12 February 2025, during which time there may be repeated interruptions to services.
- The maintenance is also announced via notifier. You will get an info of the end of the maintenance also via notifier.</p>
<hr />
<p><strong>UPDATE 2025-02-11 20:15</strong>
- IAM, Alloy and Observability as a Service are restored to full functionality
- The Database as a Service API has also been restored, but the API still has some issues which are related to the wider SDN problem. The Databases themselves were at no point affected by the incident.</p>
<hr />
<p><strong>UPDATE 2025-02-11 23:00</strong>
- The situation with the API improved. We will continue watching it.</p>
<hr />
<p><strong>UPDATE 2025-02-12 10:15</strong>
- The previous maintenance work did not achieve the desired success.
- Workarounds have been implemented, so operations should be able to continue without disruptions.
- Maintainance work will continue during the upcoming night to fully resolve the incident.</p>
<hr />
<p><strong>UPDATE 2025-02-12 14:55</strong>
- We scheduled another maintenance window for this night, February 12th, from 11:00 PM to 6:00 AM the following day.
- During this maintenance window, there may be brief interruptions or limited availability of certain services.
- The goal is the complete resolution of the incident caused by Monday's update.
- An RfO will be available in our Helpdesk after the incident is mitigated.</p>
<hr />
<p><strong>UPDATE 2025-02-13 12:20</strong>
- We consider all service disruptions of the incident to be mitigated
- Although we are not expecting any more service disruptions, we are still watching all systems closly</p>
<hr />
<p><strong>UPDATE 2025-02-13 15:30</strong>
- Incident is resolved</p>2025-02-11T08:04:01+00:00