Octave Klaba Profile picture
OVHcloud.Jezby.Shadow.Qwant.Poweend.MFF.married.1+2 daughters.

Jan 3, 2018, 46 tweets

#Ovh Weekly News: week 1

Happy New Year ! :)

A huge hardware BUG hit all Intel CPU x86. A software patch for Linux is ready. We are testing it and will start to deploy it in the next hours.

Maximum tomorrow, a new kernel will be proposed for all customers VPS, PCI, Baremetal. We will upgrade all the images for Public Cloud, Private Cloud, VPS.

We will need to restart all the hosts Public Cloud/VPS. We want to start it on Saturday, with minimum of impact for the customers. We are looking for the best scenario.

All the hosts Shared Hosting will be upgraded with no downtime.

Spectre, Variant 1: bounds check bypass (CVE-2017-5753)

Use existing code with access to secrets by making it speculatively execute memory operations

Mitigation:
OS & VMM updates

Spectre, Variant 2: branch target injection (CVE-2017-5715)

Malicious code usurps properties of CPU branch prediction features to speculatively run code

Mitigation:
OS & VMM updates
+ Firmware Updates for CPU

Meltdown, Variant 3: rogue data cache load (CVE-2017-5754)

Access memory controlled by the OS while running a malicious application

Mitigation:
OS updates

Variant 1,3 are easy to fix: just the kernel upgrade.

Variant 2: it’s the kernel upgrade + the firmware upgrade for CPU, the microcode for each model of the CPU. Microcode for new CPU is already developped, but it will take 2-3 weeks to have the firmware for the old CPU.

Variant 2: Branch Target Injection

Mitigation 1: Microcode patch BIOS to introduce new feature + kernel patch to use it
Mitigation 2: Patch compilers to avoid any indirect jump and use a static trampoline (aka retpoline)

gcc have a pending patch to introduce this feature

Testing 4.14.11 latest stable on different envs & large pool of baremetals. At this time, all the flags are green (except NVIDIA’s drivers). We will deploy this version directly on netboot & use this version as native OVH kernel for the reinstallation: be safe against Meltdown.

Baremetal: Windows Server, we will be ready for 2016 and 2012r2 (US and FR only) for the reinstallation in 2-3 hours.

Cloud Destkop :
We have coded the upgrade of the host: all tests were successful.
It is a cold upgrade, we need to reboot the host. To update the Destkops, we will do this using a GPO, automated tests are planed tonight on some tests desktops.

Cloud Destkop Infrastructure :
Update will be done by PCC. To update the Admin VMs, we will install a WSUS

Private Cloud: We already have #500 hosts patched, with a secure build. (6.0). Tests are OK. We are testing the Linux kernel, upgrade is coded. The host upgrade is in testing mode (for 5.5, 6.0, 6.5). Windows upgrade is in development and finished tomorrow.

pCC: Our priorities are focused on customer infrastructures: ESXi to patch, VMs, mainly windows VMs (backup server). We expect no downtime on customer infrastructure: the VMs will be moved to another host when rebooting the host. Then we will focus on management infrastructures.

pCC: Still waiting for Zerto to update their VRA appliances.

We might require 3 reboots: 1 to secure, 1 to update the BIOS, and 1 with a vmware hotfix to integrate guest OS updates, but later on.

pCI: Testing 4.14.11. We started with hosts Metrics, Ceph aaS, OpenData, Plesk aas. They will confirm us that they don't see impact on their use case.

Meltdown, Spectre bug impacting x86-64 CPU - #OVH fully mobilised ovh.co.uk/news/articles/…

Vulnérabilités Meltdown/Spectre affectant les CPU x86-64 : #OVH pleinement mobilisé ovh.com/fr/blog/vulner…

2nd mitigation of Variant 2 is "retpoline", needs modification of the compiler (gcc) and recompilation of all softs. It'll be the way to go on the long run, but recompiling the planet will take months. does NOT neet the microcode update, will be the answer for unpatchables BIOS.

Baremetal:
We have put in production :
4.14.11 as native kernel OVH for the reinstallation
4.14.11 in rescue-pro
4.14.11 available through the netboot too with a special description.

Baremetal:
Windows Server 2016 and 2012r2 are also “KBized” and available through our installation wizard.

Note that: The customer must enable a flag through the registry database to enable the mitigation.

1st mitigation of Variant 2 is the new microcode update AND a kernel update. BOTH are needed.

The microcode introduces a new MSR, and the kernel must be updated to use it thru the IBRS patches.

pCI: we have confirmation that KVM is immune to guests reading HV or other guest memory via variant 3 (aka meltdown). KVM is NOT "impacted" by Meltdown. So, right now, a guest VM cannot read another VM's memory, neither the HOST 's memory.

baremetal: we are deploying the netboot kernel that include the microcodes for all CPUs. it will activate new flags in /proc/. once, the kernel can use the new flags, you are protected against variant 2

pci: patch kvm that exposes the new flags to VPS/PCI in coming

baremetal:
variant 2, mitigation 1:
example of the microcode loaded before kernel. upgrade BIOS not needed.
waiting for the kernel with IBRS.

shared hosting:
upgrade of the kernel to 4.14.11 with KPTI in progress. it will take 24h to reboot all « mutu ». it will allow to be protected against variant 3.

ASAP we have the kernel with IBRS, we will upgrade « mutu » again to protect against variant 2.

Q: how to know of your baremetal has the last microcode ?
A: # rdmsr 0x00000048 has to work

here example of the same server: E5-2689v4 not patched. rdmsr with errors.
E5-2689v4 patched. no error on rdmsr.

At 10pm, 10% of WebHosting will be on 4.14.11 that protects against Variant 3. We will stop, then check during 12 hours the stability, before start the upgrade tomorrow morning.

Details: travaux.ovh.net/?do=details&id…

pCC: the upgrade plan in progress to fix Variant 1,2,3.

It will be done in 3 phases :
- Phase 1 : Security updates in the customer side
- Phase 2 : Security updates in the OVH side
- Phase 3 : Functionality patch for ESXi

Details (fr/en): travaux.ovh.net/?do=details&id…

Desktop/VDI:
We benched the hosts with the fix for windows server 2016 and esxi 6.0. No issue.

We are going to update all our hosts next week (starting on Tuesday at 6 am). It will fix against Variant 1,2,3.

Details (fr/en): travaux.ovh.net/?do=details&id…

Baremetal:
We are currently deploying the 4.14.12. It will be done promptly. It fixes Variant 3.

Variant 2, Mitigation 1
We have a smart strategy to load the microcode via 2 methods (uefi and initram) without a BIOS flash as first iteration. We will put it in production tomorrow

pCI & vps 2016:
Variant 3: no impact on KVM

Variant 2, mitigation 1
Microcode is packaged, qemu with ibsr_enabled patch test ongoing. Waiting for kernel patch with IBRS to be merged & test.

Only then we will start the upgrade of pCI reboot each host.

VPS 2014:
Variant 3 (ovh): not sensible.

Variant 3 (customers): Virtuozzo team is still integrating KPTI in openvz kernel.

Variant 2: Physical hosts update will be rolled out via pCC.

The teams worked hard during the last days with this new « bug ». Now, we know what should be known, starting deploying the protections, prepare the next moves..

The situation in under control :)

Time to create the docs to help our customers to protect theirs services in #Ovh..

First, not full, documentation, to help our customers to understand :
1) the general informations
2) what the custs have to do if they are using our services, depends of Service
3) what #Ovh is doing, depends of Product

docs.ovh.com/fr/dedicated/i…

will be improved next days.

shared hosting:
yesterday, we deployed new 4.14.11 on 10% of infra.
today: No kernel panic, No noticeable impact on performance, No random reboot, No application errors, Custs feedback: none

Decision: GO to proceed on ALL servers. At 10pm all done.

More:
travaux.ovh.net/?do=details&id…

shared hosting:
variant 3: protected :)

Summary to understand « what to do » « when » « how to protect your service » = f ( product you use in #Ovh )

Details per Product
docs.ovh.com/fr/dedicated/i…

Details per OS:
docs.ovh.com/fr/dedicated/m…

#Meltdown ? #Spectre ?

.. it’s clear .. easy ..

Is your OS already patched? #Meltdown #Spectre

Check it now:
docs.ovh.com/fr/dedicated/m…

Windows Server
vSphere
Debian
Red Hat Enterprise
Red Hat OpenStack
CentOS
Fedora
SUSE OpenStack
SUSE Enterprise
SUSE CaaS
Gentoo
Slackware
SmartOS
CloudLinux
Ubuntu
OpenSuse
Archlinux
OpenVZ

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling