https://syseleven-status.deSysEleven Status and Incidents2024-12-22T06:19:49.131041+00:00SysElevensupport@syseleven.depython-feedgenhttps://www.syseleven.de/wp-content/uploads/2020/10/SysEleven_XL_Logo_quer_RGB.pngGet all incidents by feed520Cloudflare related network problems2024-12-22T06:19:49.231277+00:00<p>Affected Components: Setups that route traffic through Cloudflare experience issues</p>
<p>Investigation Start: <strong>2024-09-02 13:20 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>We are investigating network problems reported by customers that route traffic through Cloudflare</p>
<hr />
<p>Customer Impact:</p>
<p>Customers that are routing traffic to SysEleven networks through Cloudflare experience high retransmission rates and traffic loss.</p>
<hr />
<p><strong>Update: 2024-09-02 14:20 UTC+01:00 (CET)</strong></p>
<p>We are still investigating.</p>
<hr />
<p><strong>Update: 2024-09-02 14:49 UTC+01:00 (CET)</strong></p>
<p>We are observing significant improvements of the retransmit rates on some affected loadbalancers but will continue to observe and analyse the issues.</p>
<hr />
<p><strong>Update: 2024-09-02 16:25 UTC+01:00 (CET)</strong></p>
<p>Since the Problem seems to be solved for most customers, we declare the incident as solved. Further investigations into the reasons for the problems will continue.</p>2024-09-02T10:20:00+00:00522INCIDENT: SysEleven STACK API issues, region HAM12024-12-22T06:19:49.229987+00:00<p>Affected Components: <strong>SysEleven Stack API, region HAM1 and DUS2</strong></p>
<p>Incident Start: <strong>2024-09-04 15:20 UTC+02:00 (CEST)</strong></p>
<p>Incident End: <strong>2024-09-04 16:30 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>New VMs could not be created</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Spawning new virtual machines (VMs) </li>
</ul>
<hr />
<p><strong>Update: 16:20</strong></p>
<p>We rolled out a configuration change and are observing.</p>
<p><strong>Update: 16:30</strong></p>
<p>We see that VMs can be created again in our automated tests. We are closing the incident</p>2024-09-04T11:20:00+00:00523INCIDENT: SysEleven STACK Designate API issues2024-12-22T06:19:49.227984+00:00<p>Affected Components: <strong>SysEleven Stack Designate API</strong></p>
<p>Incident Start: <strong>2024-09-17 20:42 UTC+02:00 (CEST)</strong></p>
<p>Incident End: <strong>2024-09-17 22:15 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Managing DNS zones and records through the Designate API may fail.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Loss of control for DNS resources.</li>
</ul>
<hr />
<p><strong>Update: 2024-09-17 21:20 UTC+02:00 (CEST)</strong></p>
<p>At 21:15, we reverted a configuration change that was rolled out earlier today, which fixed the issue. We are watching the situation.</p>
<hr />
<p><strong>Update: 2024-09-17 22:15 UTC+02:00 (CEST)</strong></p>
<p>Underlying issues that were persistent, but not impacting the API, are now fully resolved. Incident is over.</p>
<hr />2024-09-17T16:42:00+00:00525INCIDENT: SysEleven STACK API issues, region DBL2024-12-22T06:19:49.226887+00:00<p>Affected Components: <strong>SysEleven Stack API, region DBL</strong></p>
<p>Incident Start: <strong>2024-09-19 08:23</strong>
Incident End: <strong>2024-09-19 08:35</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Loss of control situation for volume and compute services</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Creating new virtual machines (VMs) or changing existing resources is not possible.</li>
</ul>
<hr />
<p>Update: <strong>2024-09-18 08:35</strong></p>
<p>Some nova services also seem to be affected, we investigated the situation and bring the services back up</p>2024-09-19T04:23:00+00:00526INCIDENT: SysEleven STACK Block Storage issues, region CBK2024-12-22T06:19:49.225669+00:00<p>Affected Components: <strong>SysEleven Stack Block Storage, region CBK</strong></p>
<p>Incident Start: <strong>2024-09-24 10:55 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<p>At the moment we are facing issues with the Block Storage in Region CBK.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Volumes may be unavailable</li>
<li>Instances booted from a volume may be unavailable</li>
</ul>
<hr />
<p><strong>Update: 2024-09-24 11:00 UTC+02:00 (CEST)</strong></p>
<p>Issue identified and we are fixing the problem.</p>
<hr />
<p><strong>Update: 2024-09-24 12:15 UTC+02:00 (CEST)</strong></p>
<p>Problem is fixed. Instances using volumes had to be restarted.</p>2024-09-24T06:55:00+00:00529INCIDENT: PVC Failure of a hardware node, region BKI2024-12-22T06:19:49.224564+00:00<p>Affected Components: <strong>Hardware node failure, region XXX</strong></p>
<p>Incident Start: <strong>2024-10-01 09:15 UTC+02:00 (CEST)</strong>
Incident End: <strong>2024-10-01 09:36 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Malfunction of a hardware node</li>
<li>Restart of the hardware node is necessary </li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>During the period of restart, there will be a short interruption in the availability of the systems.</li>
<li>Affected customers were notified via E-Mail.</li>
<li>Please check the affected systems for their full functionality.</li>
</ul>2024-10-01T05:15:00+00:00530INCIDENT: SysEleven STACK Network performance issues, region DBL2024-12-22T06:19:49.223363+00:00<p>Affected Components: <strong>SysEleven Stack Network, region DBL</strong></p>
<p>Incident Start: <strong>2024-10-01 14:17 UTC+02:00 (CEST)</strong></p>
<p>Incident End: <strong>2024-10-01 14:34 UTC+02:00 (CEST)</strong></p>
<hr />
<p>Description:</p>
<p>At the moment, we are facing issues with the Network in Region DBL.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Network performance issue</li>
</ul>
<hr />
<p><strong>Update: 2024-10-01 14:34 UTC+02:00 (CEST)</strong></p>
<p>The incident is over, and all services are operational.</p>
<hr />2024-10-01T10:17:00+00:00532INCIDENT: SysEleven STACK issues in region HAM12024-12-22T06:19:49.221870+00:00<p>Affected Components: <strong>SysEleven Stack, region HAM1</strong></p>
<p>Incident Start: <strong>2024-10-10 23:55</strong>
Incident End: <strong>2024-10-11 00:00</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors were investigated. </li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity was restricted</li>
</ul>
<hr />
<p><strong>Update: 01:00</strong></p>
<ul>
<li>We can observe further short term issues with the HAM1 region connectivity, the provider is aware of the issues and is currently investigating the situation</li>
</ul>
<hr />
<p><strong>Update: 01:30</strong></p>
<ul>
<li>The network provider is proceeding with a network maintenance until 05:00, we are on standby</li>
</ul>2024-10-10T19:55:00+00:00534INCIDENT: SysEleven STACK API issues2024-12-22T06:19:49.220378+00:00<p>Affected Components: <strong>SysEleven Stack API</strong></p>
<p>Incident Start: <strong>2024-10-30 09:30 CET</strong></p>
<p>Incident End: <strong>2024-10-30 12:05 CET</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Accessibility of the SysEleven Stack API is not ensured.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Spawning new virtual machines (VMs) or changing existing resources is not possible.</li>
</ul>
<hr />
<p><strong>Update: 10:40</strong></p>
<p>We are still investigating the situation and are in contact with our external network provider to further analyze the problems.</p>
<hr />
<p><strong>Update: 11:30</strong></p>
<p>The issue has been identified, we are waiting for our external network provider to further fix the situation.</p>
<hr />
<p><strong>Update: 12:05</strong></p>
<p>The issue has been resolved.</p>2024-10-30T07:30:00+00:00537INCIDENT: SysEleven STACK Object Storage issues, region DBL2024-12-22T06:19:49.219251+00:00<p>Affected Components: <strong>SysEleven Stack Object Storage, region DBL</strong></p>
<p>Incident Start: <strong>2024-11-12 17:45 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>At the moment we are facing issues with the Object Storage in Region DBL.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Writing or reading of objects maybe restricted.</li>
</ul>
<hr />
<p>Update: <strong>2024-11-12 18:40 UTC+01:00 (CET)</strong></p>
<p>We mitigated the problem and do further investigation</p>2024-11-12T15:45:00+00:00538INCIDENT: SysEleven STACK issues in region CBK2024-12-22T06:19:49.217779+00:00<p>Affected Components: <strong>SysEleven Stack, region CBK</strong></p>
<p>Incident Start: <strong>2024-11-12 22:35</strong>
Incident End: <strong>2024-11-13 00:00</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />
<p><strong>Update: 23:08</strong></p>
<p>The announced maintenance is having a bigger impact than expected, we are investigating the situation</p>
<hr />
<p><strong>Update: 23:45</strong></p>
<p>We were able to pin down the rootcause and prepare a fix to mitigate the problems</p>
<hr />
<p><strong>Update: 12:00</strong></p>
<p>The network problems were mitigated. If you still encounter issues please contact us!</p>2024-11-12T20:35:00+00:00541INCIDENT: Partial outage of MetaKube Control Plane Services Region in region FES2024-12-22T06:19:49.216109+00:00<p>Affected Components: <strong>MetaKube Control Plane Services, region FES</strong></p>
<p>Incident Start: <strong>2024-11-18 11:30 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>Infrastructure hosting the MetaKube Control Plane Services has problems.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>MetaKube Control Plane might be slow or not answering</li>
</ul>
<hr />
<p><strong>UPDATE 2024-11-18 12:30 UTC+01:00 (CET)</strong></p>
<p>We have identified networking problems as the cause, currently working to resolve them.</p>
<p><strong>UPDATE 2024-11-18 13:27 UTC+01:00 (CET)</strong></p>
<p>We have increased conntrack table size on hardware nodes to avoid networking problems.</p>
<p>We continue to have issues with overloaded pods which we are working on.</p>
<p><strong>UPDATE 2024-11-18 14:00 UTC+01:00 (CET)</strong></p>
<p>We managed to get the overloaded pods running by isolating them on dedicated nodes and raising the resource limits. This stopped other issues as well.</p>
<p>We still need to investigate what caused the overloading of certain pods.</p>
<p>Incident is over.</p>2024-11-18T09:30:00+00:00543INCIDENT: Partial degradation of SysEleven IAM services2024-12-22T06:19:49.214844+00:00<p>Affected Components: <strong>SysEleven IAM, regions DUS and HAM</strong></p>
<p>Incident Start: <strong>2024-11-28 12:00 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>We're currently investigating a service degradation in the SysEleven IAM. Inviting users to an organization is currently not possible.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Inviting users to an organization is currently not possible.</li>
</ul>
<hr />
<p><strong>UPDATE 2024-11-28 13:10 UTC+01:00 (CET)</strong></p>
<p>The issue has been resolved and inviting users to organizations is possible again</p>2024-11-28T10:00:00+00:00544INCIDENT: major outage of metakube control plane services in ham12024-12-22T06:19:49.213594+00:00<p>Affected Components: <strong>metakube control plane services, region ham1</strong></p>
<p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>The metakube control plane services in ham1 can't be reached currently due to slow i/o</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Metakube services e.g. clusters in ham1 can't be reached</li>
</ul>
<hr />
<p>Customer Actions:</p>
<ul>
<li>Please inform us if you notice any irregularities</li>
</ul>
<hr />
<p>Update 13:23</p>
<p>The situation improved.</p>2024-11-29T09:00:00+00:00545INCIDENT: SysEleven STACK Storage issues, region HAM12024-12-22T06:19:49.212507+00:00<p>Affected Components: <strong>SysEleven Stack, Storage, region HAM1</strong></p>
<p>Incident Start: <strong>2024-11-29 11:00 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-11-29 13:40 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>We are facing issues with the distributed file system, a core component of the SysEleven Stack.</p>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Starting of virtual machines (VMs) partially not possible.</li>
<li>Writing Access to volumes (VM disks) maybe restricted.</li>
</ul>
<hr />
<p>Update 13:23</p>
<p>The situation improved, storage access latencies are back to normal.</p>2024-11-29T09:00:00+00:00546INCIDENT: SysEleven STACK API issues, region FES2024-12-22T06:19:49.211138+00:00<p>Affected Components: <strong>SysEleven Stack API, region FES</strong></p>
<p>Incident Start: <strong>2024-12-04 19:07 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-12-04 20:10 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Accessibility of the SysEleven Stack API is not ensured.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Requests on OpenStack API may return an error code</li>
<li>Spawning new virtual machines (VMs) or changing existing resources may fail</li>
</ul>
<hr />
<p><strong>Update: 2024-12-04 20:00 UTC+01:00 (CET)</strong></p>
<p>We identified the likely root cause and a fix is being applied.</p>
<hr />
<p><strong>Update: 2024-12-04 20:10 UTC+01:00 (CET)</strong></p>
<p>OpenStack API is now working again as expected.</p>2024-12-04T17:07:00+00:00547INCIDENT: SysEleven STACK issues in region dus22024-12-22T06:19:49.209972+00:00<p>Affected Components: <strong>SysEleven Stack, region dus2</strong></p>
<p>Incident Start: **2024-12-06 17:45 UTC+01:00 (CET)</p>
<hr />
<p>Description:</p>
<ul>
<li>We are seeing some network connectivity issues in the region and are investigating.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is degraded for MetaKube Services in dus2 </li>
<li>Connectivity to Database as a Service in dus2 is degraded</li>
</ul>
<hr />
<p><strong>Update: 2024-12-06 18:45 UTC+01:00 (CET)</strong></p>
<ul>
<li>We identified an issue with one of our gateways and are working on a fix</li>
</ul>2024-12-06T15:45:00+00:00548INCIDENT: Partial outage of Control Planes in Regions FES, DBL, CBK2024-12-22T06:19:49.208428+00:00<p>Affected Components: <strong>Control Planes in Regions FES, DBL, CBK</strong></p>
<p>Incident Start: <strong>2024-12-09 18:30 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<p>Some cluster control planes are not reachable.</p>
<hr />
<p>Update (22:20 CET):</p>
<p>We have found a way to mitigate the issues temporarily and start applying the fix.</p>
<p>Update (22:50 CET):</p>
<p>We applied the fix everywhere. We don't see any broken clusters anymore.</p>
<hr />
<p>Update <strong>2024-12-10 12:30 UTC+01:00 (CET)</strong>:</p>
<p>We identified the root cause: An unintended side-effect of an upgrade to a kube-proxy setting changed the proxy mode. This created iptables rules which were not cleaned up and through which traffic was dropped.</p>
<p>We are taking measures to prevent this in the future.</p>2024-12-09T16:30:00+00:00549INCIDENT: SysEleven STACK network issues in region DBL2024-12-22T06:19:49.206925+00:00<p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p>
<p>Incident Start: <strong>2024-12-13 14:09 UTC+01:00 (CET)</strong></p>
<p>Incident End: <strong>2024-12-13 16:09 UTC+01:00 (CET)</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>DBL network has performance issue</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Network performance degradation </li>
</ul>
<hr />
<p><strong>Update: 2024-12-13 15:17 UTC+01:00 (CET)</strong></p>
<p>Situation is back to normal.</p>
<hr />
<p><strong>Update: 2024-12-13 15:57 UTC+01:00 (CET)</strong></p>
<p>We notice performance degradation again.</p>
<hr />
<p><strong>Update: 2024-12-13 16:09 UTC+01:00 (CET)</strong></p>
<p>Situation is back to normal.</p>2024-12-13T12:09:00+00:00550INCIDENT: SysEleven STACK issues in region DBL2024-12-22T06:19:49.205033+00:00<p>Affected Components: <strong>SysEleven Stack, region DBL</strong></p>
<p>Incident Start: <strong>2024-18-12 00:47</strong>
Incident Start: <strong>2024-18-12 01:27</strong></p>
<hr />
<p>Description:</p>
<ul>
<li>Occurring errors are currently being investigated.</li>
</ul>
<hr />
<p>Customer Impact:</p>
<ul>
<li>Connectivity is restricted</li>
</ul>
<hr />
<p>Update: <strong>2024-18-12 01:27</strong></p>
<ul>
<li>We could see problems with cross region connectivity outgoing from the DBL region, between 00:35 - 01:10, at the moment traffic seems to have normalized again, we are still investigating</li>
</ul>
<hr />
<p>Update: <strong>2024-18-12 02:00</strong></p>
<ul>
<li>Device causing the network issues was identified, root cause will be further investigated</li>
</ul>2024-12-17T22:40:00+00:00