A huge hardware BUG hit all Intel CPU x86. A software patch for Linux is ready. We are testing it and will start to deploy it in the next hours.
Maximum tomorrow, a new kernel will be proposed for all customers VPS, PCI, Baremetal. We will upgrade all the images for Public Cloud, Private Cloud, VPS.
We will need to restart all the hosts Public Cloud/VPS. We want to start it on Saturday, with minimum of impact for the customers. We are looking for the best scenario.
All the hosts Shared Hosting will be upgraded with no downtime.
Malicious code usurps properties of CPU branch prediction features to speculatively run code
Mitigation:
OS & VMM updates
+ Firmware Updates for CPU
Meltdown, Variant 3: rogue data cache load (CVE-2017-5754)
Access memory controlled by the OS while running a malicious application
Mitigation:
OS updates
Variant 1,3 are easy to fix: just the kernel upgrade.
Variant 2: it’s the kernel upgrade + the firmware upgrade for CPU, the microcode for each model of the CPU. Microcode for new CPU is already developped, but it will take 2-3 weeks to have the firmware for the old CPU.
Variant 2: Branch Target Injection
Mitigation 1: Microcode patch BIOS to introduce new feature + kernel patch to use it
Mitigation 2: Patch compilers to avoid any indirect jump and use a static trampoline (aka retpoline)
gcc have a pending patch to introduce this feature
Testing 4.14.11 latest stable on different envs & large pool of baremetals. At this time, all the flags are green (except NVIDIA’s drivers). We will deploy this version directly on netboot & use this version as native OVH kernel for the reinstallation: be safe against Meltdown.
Baremetal: Windows Server, we will be ready for 2016 and 2012r2 (US and FR only) for the reinstallation in 2-3 hours.
Cloud Destkop :
We have coded the upgrade of the host: all tests were successful.
It is a cold upgrade, we need to reboot the host. To update the Destkops, we will do this using a GPO, automated tests are planed tonight on some tests desktops.
Cloud Destkop Infrastructure :
Update will be done by PCC. To update the Admin VMs, we will install a WSUS
Private Cloud: We already have #500 hosts patched, with a secure build. (6.0). Tests are OK. We are testing the Linux kernel, upgrade is coded. The host upgrade is in testing mode (for 5.5, 6.0, 6.5). Windows upgrade is in development and finished tomorrow.
pCC: Our priorities are focused on customer infrastructures: ESXi to patch, VMs, mainly windows VMs (backup server). We expect no downtime on customer infrastructure: the VMs will be moved to another host when rebooting the host. Then we will focus on management infrastructures.
pCC: Still waiting for Zerto to update their VRA appliances.
We might require 3 reboots: 1 to secure, 1 to update the BIOS, and 1 with a vmware hotfix to integrate guest OS updates, but later on.
pCI: Testing 4.14.11. We started with hosts Metrics, Ceph aaS, OpenData, Plesk aas. They will confirm us that they don't see impact on their use case.
2nd mitigation of Variant 2 is "retpoline", needs modification of the compiler (gcc) and recompilation of all softs. It'll be the way to go on the long run, but recompiling the planet will take months. does NOT neet the microcode update, will be the answer for unpatchables BIOS.
Baremetal:
We have put in production :
4.14.11 as native kernel OVH for the reinstallation
4.14.11 in rescue-pro
4.14.11 available through the netboot too with a special description.
Baremetal:
Windows Server 2016 and 2012r2 are also “KBized” and available through our installation wizard.
Note that: The customer must enable a flag through the registry database to enable the mitigation.
1st mitigation of Variant 2 is the new microcode update AND a kernel update. BOTH are needed.
The microcode introduces a new MSR, and the kernel must be updated to use it thru the IBRS patches.
pCI: we have confirmation that KVM is immune to guests reading HV or other guest memory via variant 3 (aka meltdown). KVM is NOT "impacted" by Meltdown. So, right now, a guest VM cannot read another VM's memory, neither the HOST 's memory.
baremetal: we are deploying the netboot kernel that include the microcodes for all CPUs. it will activate new flags in /proc/. once, the kernel can use the new flags, you are protected against variant 2
pci: patch kvm that exposes the new flags to VPS/PCI in coming
baremetal:
variant 2, mitigation 1:
example of the microcode loaded before kernel. upgrade BIOS not needed.
waiting for the kernel with IBRS.
shared hosting:
upgrade of the kernel to 4.14.11 with KPTI in progress. it will take 24h to reboot all « mutu ». it will allow to be protected against variant 3.
ASAP we have the kernel with IBRS, we will upgrade « mutu » again to protect against variant 2.
Q: how to know of your baremetal has the last microcode ?
A: # rdmsr 0x00000048 has to work
here example of the same server: E5-2689v4 not patched. rdmsr with errors.
E5-2689v4 patched. no error on rdmsr.
At 10pm, 10% of WebHosting will be on 4.14.11 that protects against Variant 3. We will stop, then check during 12 hours the stability, before start the upgrade tomorrow morning.
pCC: the upgrade plan in progress to fix Variant 1,2,3.
It will be done in 3 phases :
- Phase 1 : Security updates in the customer side
- Phase 2 : Security updates in the OVH side
- Phase 3 : Functionality patch for ESXi
Baremetal:
We are currently deploying the 4.14.12. It will be done promptly. It fixes Variant 3.
Variant 2, Mitigation 1
We have a smart strategy to load the microcode via 2 methods (uefi and initram) without a BIOS flash as first iteration. We will put it in production tomorrow
pCI & vps 2016:
Variant 3: no impact on KVM
Variant 2, mitigation 1
Microcode is packaged, qemu with ibsr_enabled patch test ongoing. Waiting for kernel patch with IBRS to be merged & test.
Only then we will start the upgrade of pCI reboot each host.
VPS 2014:
Variant 3 (ovh): not sensible.
Variant 3 (customers): Virtuozzo team is still integrating KPTI in openvz kernel.
Variant 2: Physical hosts update will be rolled out via pCC.
The teams worked hard during the last days with this new « bug ». Now, we know what should be known, starting deploying the protections, prepare the next moves..
The situation in under control :)
Time to create the docs to help our customers to protect theirs services in #Ovh..
First, not full, documentation, to help our customers to understand : 1) the general informations 2) what the custs have to do if they are using our services, depends of Service 3) what #Ovh is doing, depends of Product
shared hosting:
yesterday, we deployed new 4.14.11 on 10% of infra.
today: No kernel panic, No noticeable impact on performance, No random reboot, No application errors, Custs feedback: none
Decision: GO to proceed on ALL servers. At 10pm all done.
Windows Server
vSphere
Debian
Red Hat Enterprise
Red Hat OpenStack
CentOS
Fedora
SUSE OpenStack
SUSE Enterprise
SUSE CaaS
Gentoo
Slackware
SmartOS
CloudLinux
Ubuntu
OpenSuse
Archlinux
OpenVZ
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/5 #OVHcloud
Vu la rareté et donc l’augmentation du coût IPv4, nous n’allons pas pouvoir maintenir la gratuité des IPv4 additionnelles
Le prix de IPv4 / mois sera une moyenne sur les achats que nous avons fait ces derniers 10 ans. ipxo.com/blog/ipv4-pric…
Utilisez IPv6 gratuit
2/5 - new IPv4 0e puis 1.49e/mo
- old IPv4 on prendra 3 ans pour faire passer le prix de 0.00e/mo à 1.49e/mo
- ceux qui utilisent IPv4 pour envoyer les emails en volume vont contribuer aux coûts du SOC qui combat le spam et travaille sur la réputation de nos IPv4: 0.99e/IPv4/mo
3/5 Nous allons appliquer les nouveaux prix sur new IPv4 vers Sep 2022.
On pense démarrer le changement de prix sur les old IPv4
- Jan 2023 : 0.24e/ip
- Juillet 2023 : 0.49e/ip
- Jan 2024 : 0.74e/ip
- Juillet 2024 : 0.99e/ip
- Jan 2025 : 1.24e/ip
- Juillet : 2025 : 1.49e/ip
1/4 Encore les experts du Cloud qui n’ont pas de connaissances élémentaires de BM du Cloud.
La réponse courte: le CAPEX est proportionnel à la croissance du revenue et il est remboursable en 3 ans.
La réponse longe (en supposons que les 25B$ est un vrai invest infra):
2/4 15% du revenue sert à maintenir le revenue (à l’identique) en mettant en place les nouveaux HW et en retirant les anciens. Donc 60B$ x 15% = 9B$ c’est du CAPEX «maintient du revenue» qui évite l’erosion du revenue liée au HW qui devient vieux dont la valeur marchande diminue.
3/4 En suite, 25-9 = 16B$ servent à la croissance de revenue. avec un ratio 15:1-20:1 cela va générer
Year-1 6B$
Year-2 12B$
Year-3 12B$
puis 12B$ par an à condition d’investir 15% (encore le même) pour maintenir le revenu, donc 1.8B$ par an.
We started the execution of our plan for @Shadow_France :
Step 1: Temporary and in US only, we suspend Shadow Ultra and Shadow Infinite. The cust has the choice to be downgraded to Boost or can cancel the subscription :(
We are sorry starting new Shadow with bad news 😭 Thread ⬇️
Why ? Why ? Why ?
The current Ultra and Infinite run on RTX 5K and RTX 6K.
In US, RTX 5k/6k was deployed with @2crsi in 2020. During Chapter 11 process, we decided to NOT to sign this contract. This is why, @2crsi started to take back their HW from the current DC in US.
In Europe, Ultra and Infinite run on @OVHcloud We continue this contract. We will NOT suspend Ultra and Infinite in Europe.
As already annonced, we’ve started to work on the new platform with a new CPU, a new GPU, a new Storage and we’re working on new offers !
We have a major incident on SBG2. The fire declared in the building. Firefighters were immediately on the scene but could not control the fire in SBG2. The whole site has been isolated which impacts all services in SGB1-4. We recommend to activate your Disaster Recovery Plan.
Update 5:20pm. Everybody is safe.
Fire destroyed SBG2. A part of SBG1 is destroyed. Firefighters are protecting SBG3. no impact SBG4.
Update 7:20am
Fire is over. Firefighters continue to cool the buildings with the water.
We don’t have the access to the site. That is why SBG1, SBG3, SBG4 won’t be restarted today.
Après 20ans d’OVHcloud, 24/7/365, j’ai décidé de m’accorder 3 mois pour réaliser un vieux rêve : créer un groupe et produire « un album concept ». Si COVID-19 ne gâche pas tout, on va le sortir pour fin de cette année. Et s’il plait le groupe fera « une tournée mondiale » (mdr).
J’ai composé la musique, les mélodies puis j’ai imaginé les histoires qu’on raconte, mais je n’ai pas écrit les textes. 12 morceaux, 45min sur de choses qui composent une vie.
Enorme merci à Bruno Cheno pour avoir accepté l’execution de la production mais pas que :)
J-11:
Après 14 jours de pause (à raison de 11 meetings chez OVHcloud et 4H de guitare par jour), on entre dans le « last miles ». 10H de guitare par jour et 6H de répètes en band tous les jours. Il faut prêt pour les enregistrements studios qui démarrent dans 11 jours.
Juste pour info: les nouvelles offres HPC SDDC et vSAN ont été mises en PROD. Comme vous pouvez voir les hosts avec beaucoup de CPU et RAM sont particulièrement interessants en
terme de prix par core / prix par GB RAM. ovhcloud.com/fr/enterprise/…
Si vous consolidez les infra en utilisant les gros serveurs, vous diminuez vos couts de 2x à 4x VS si vous utilisez les petits hosts.
exemple: PRE 48 vs PRE 768:
vous avez 16x + de RAM pour 4x + cher.
Cele vient du fait que, suite à 4-5 ans de travail, nous avons évolué le BM pour être + agressive sur les gros serveurs.
Ce wk j’ai écrit un blog à ce sujet. Il sera publié début de la sem pro. Là on voit enfin les premiers résultats du travail effectué durant ces longes années.