commit 58fb9b51d6c5061d772740965eb1ca2fc68aae7b Author: Ben Hutchings Date: Wed Oct 3 04:10:10 2018 +0100 Linux 3.16.59 commit 3c270e64a394ea5e52be9e371f5676fa974f6deb Author: Kees Cook Date: Fri Jul 7 11:57:29 2017 -0700 exec: Limit arg stack to at most 75% of _STK_LIM commit da029c11e6b12f321f36dac8771e833b65cec962 upstream. To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: replaced code is slightly different] Signed-off-by: Ben Hutchings commit aba97ce870f92835fa3385861f850e3e992dc42a Author: Vincent Pelletier Date: Sun Sep 9 04:09:26 2018 +0000 scsi: target: iscsi: Use hex2bin instead of a re-implementation commit 1816494330a83f2a064499d8ed2797045641f92c upstream. This change has the following effects, in order of descreasing importance: 1) Prevent a stack buffer overflow 2) Do not append an unnecessary NULL to an anyway binary buffer, which is writing one byte past client_digest when caller is: chap_string_to_hex(client_digest, chap_r, strlen(chap_r)); The latter was found by KASAN (see below) when input value hes expected size (32 hex chars), and further analysis revealed a stack buffer overflow can happen when network-received value is longer, allowing an unauthenticated remote attacker to smash up to 17 bytes after destination buffer (16 bytes attacker-controlled and one null). As switching to hex2bin requires specifying destination buffer length, and does not internally append any null, it solves both issues. This addresses CVE-2018-14633. Beyond this: - Validate received value length and check hex2bin accepted the input, to log this rejection reason instead of just failing authentication. - Only log received CHAP_R and CHAP_C values once they passed sanity checks. ================================================================== BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod] Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021 CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G O 4.17.8kasan.sess.connops+ #2 Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014 Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod] Call Trace: dump_stack+0x71/0xac print_address_description+0x65/0x22e ? chap_string_to_hex+0x32/0x60 [iscsi_target_mod] kasan_report.cold.6+0x241/0x2fd chap_string_to_hex+0x32/0x60 [iscsi_target_mod] chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod] ? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod] ? ftrace_caller_op_ptr+0xe/0xe ? __orc_find+0x6f/0xc0 ? unwind_next_frame+0x231/0x850 ? kthread+0x1a0/0x1c0 ? ret_from_fork+0x35/0x40 ? ret_from_fork+0x35/0x40 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? deref_stack_reg+0xd0/0xd0 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? is_module_text_address+0xa/0x11 ? kernel_text_address+0x4c/0x110 ? __save_stack_trace+0x82/0x100 ? ret_from_fork+0x35/0x40 ? save_stack+0x8c/0xb0 ? 0xffffffffc1660000 ? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod] ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? process_one_work+0x35c/0x640 ? worker_thread+0x66/0x5d0 ? kthread+0x1a0/0x1c0 ? ret_from_fork+0x35/0x40 ? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod] ? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod] chap_main_loop+0x172/0x570 [iscsi_target_mod] ? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod] ? rx_data+0xd6/0x120 [iscsi_target_mod] ? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod] ? cyc2ns_read_begin.part.2+0x90/0x90 ? _raw_spin_lock_irqsave+0x25/0x50 ? memcmp+0x45/0x70 iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod] ? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod] ? del_timer+0xe0/0xe0 ? memset+0x1f/0x40 ? flush_sigqueue+0x29/0xd0 iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod] ? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod] ? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod] process_one_work+0x35c/0x640 worker_thread+0x66/0x5d0 ? flush_rcu_work+0x40/0x40 kthread+0x1a0/0x1c0 ? kthread_bind+0x30/0x30 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x17fffc000000000() raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00 ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00 >ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00 ^ ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2 ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 ================================================================== Signed-off-by: Vincent Pelletier Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings commit e44ab03f41ba55e181f4ed64e546feac8f8e69dc Author: Daniel Rosenberg Date: Mon Jul 2 16:59:37 2018 -0700 HID: debug: check length before copy_to_user() commit 717adfdaf14704fd3ec7fa2c04520c0723247eac upstream. If our length is greater than the size of the buffer, we overflow the buffer Signed-off-by: Daniel Rosenberg Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Ben Hutchings commit 3141e0750231be243bd4cd0fa6eebeb6a1578537 Author: Andy Whitcroft Date: Thu Sep 20 09:09:48 2018 -0600 floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl commit 65eea8edc315589d6c993cf12dbb5d0e9ef1fe4e upstream. The final field of a floppy_struct is the field "name", which is a pointer to a string in kernel memory. The kernel pointer should not be copied to user memory. The FDGETPRM ioctl copies a floppy_struct to user memory, including this "name" field. This pointer cannot be used by the user and it will leak a kernel address to user-space, which will reveal the location of kernel code and data and undermine KASLR protection. Model this code after the compat ioctl which copies the returned data to a previously cleared temporary structure on the stack (excluding the name pointer) and copy out to userspace from there. As we already have an inparam union with an appropriate member and that memory is already cleared even for read only calls make use of that as a temporary store. Based on an initial patch by Brian Belleville. CVE-2018-7755 Signed-off-by: Andy Whitcroft Broke up long line. Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings commit 46b57f819163e3a84ff00b31485ee0638dbf1fdc Author: Tyler Hicks Date: Tue Sep 4 15:24:05 2018 +0000 irda: Only insert new objects into the global database via setsockopt The irda_setsockopt() function conditionally allocates memory for a new self->ias_object or, in some cases, reuses the existing self->ias_object. Existing objects were incorrectly reinserted into the LM_IAS database which corrupted the doubly linked list used for the hashbin implementation of the LM_IAS database. When combined with a memory leak in irda_bind(), this issue could be leveraged to create a use-after-free vulnerability in the hashbin list. This patch fixes the issue by only inserting newly allocated objects into the database. CVE-2018-6555 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Tyler Hicks Reviewed-by: Seth Arnold Reviewed-by: Stefan Bader Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit af8f681e48239817afb290f4e8ee3ca094f513e6 Author: Tyler Hicks Date: Tue Sep 4 15:24:04 2018 +0000 irda: Fix memory leak caused by repeated binds of irda socket The irda_bind() function allocates memory for self->ias_obj without checking to see if the socket is already bound. A userspace process could repeatedly bind the socket, have each new object added into the LM-IAS database, and lose the reference to the old object assigned to the socket to exhaust memory resources. This patch errors out of the bind operation when self->ias_obj is already assigned. CVE-2018-6554 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Tyler Hicks Reviewed-by: Seth Arnold Reviewed-by: Stefan Bader Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 35560b4505e40d45ad6a261fce98afa0993cbb6e Author: Andy Lutomirski Date: Mon Jun 5 07:40:25 2017 -0700 mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP commit 5dd0b16cdaff9b94da06074d5888b03235c0bf17 upstream. This fixes CONFIG_SMP=n, CONFIG_DEBUG_TLBFLUSH=y without introducing further #ifdef soup. Caught by a Kbuild bot randconfig build. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code") Link: http://lkml.kernel.org/r/76da9a3cc4415996f2ad2c905b93414add322021.1496673616.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 951331123cff1cc0eddf2bc15c3d81664be24e22 Author: Markus Trippelsdorf Date: Thu Dec 15 13:45:13 2016 +0100 x86/tools: Fix gcc-7 warning in relocs.c commit 7ebb916782949621ff6819acf373a06902df7679 upstream. gcc-7 warns: In file included from arch/x86/tools/relocs_64.c:17:0: arch/x86/tools/relocs.c: In function ‘process_64’: arch/x86/tools/relocs.c:953:2: warning: argument 1 null where non-null expected [-Wnonnull] qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/tools/relocs.h:6:0, from arch/x86/tools/relocs_64.c:1: /usr/include/stdlib.h:741:13: note: in a call to function ‘qsort’ declared here extern void qsort This happens because relocs16 is not used for ELF_BITS == 64, so there is no point in trying to sort it. Make the sort_relocs(&relocs16) call 32bit only. Signed-off-by: Markus Trippelsdorf Link: http://lkml.kernel.org/r/20161215124513.GA289@x4 Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: Also make sort_relocs(&relocs64) conditional, which was done upstream in commit 6d24c5f72dfb "x86-64: Handle PC-relative relocations on per-CPU data".] Signed-off-by: Ben Hutchings commit 30170db328324e249143de296a3a56f0a2371493 Author: Finn Thain Date: Sat Dec 31 19:56:26 2016 -0500 via-cuda: Use spinlock_irq_save/restore instead of enable/disable_irq commit ac39452e942af6a212e8f89e8a36b71354323845 upstream. The cuda_start() function uses spinlock_irq_save/restore for mutual exclusion. Let's have cuda_poll() do the same when polling the VIA interrupt. The benefit to disabling local irqs when the interrupt is being polled is that the interrupt handler now has the same timing properties regardless of whether it is invoked normally or from cuda_poll(). This driver was written back when local irqs remained enabled during execution of interrupt handlers and cuda_poll() was probably trying to achieve the same effect by use of enable/disable_irq. Tested-by: Stan Johnson Signed-off-by: Finn Thain Signed-off-by: Michael Ellerman [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit b579b8f2128ee2b9e9393b6a18297bb79080ef34 Author: Vlastimil Babka Date: Thu Aug 23 16:21:29 2018 +0200 x86/speculation/l1tf: Suggest what to do on systems with too much RAM commit 6a012288d6906fee1dbc244050ade1dafe4a9c8d upstream. Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. Make the warning more helpful by suggesting the proper mem=X kernel boot parameter to make it effective and a link to the L1TF document to help decide if the mitigation is worth the unusable RAM. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Suggested-by: Michal Hocko Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Andi Kleen Cc: Dave Hansen Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz Signed-off-by: Ben Hutchings commit e2ec50b1272c238735ebf48a25020676818aef79 Author: Andi Kleen Date: Tue Aug 7 15:09:38 2018 -0700 x86/mm/kmmio: Make the tracer robust against L1TF commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream. The mmio tracer sets io mapping PTEs and PMDs to non present when enabled without inverting the address bits, which makes the PTE entry vulnerable for L1TF. Make it use the right low level macros to actually invert the address bits to protect against L1TF. In principle this could be avoided because MMIO tracing is not likely to be enabled on production machines, but the fix is straigt forward and for consistency sake it's better to get rid of the open coded PTE manipulation. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 308ad2888759478ed11dc989e8538c621046b811 Author: Andi Kleen Date: Tue Aug 7 15:09:39 2018 -0700 x86/mm/pat: Make set_memory_np() L1TF safe commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream. set_memory_np() is used to mark kernel mappings not present, but it has it's own open coded mechanism which does not have the L1TF protection of inverting the address bits. Replace the open coded PTE manipulation with the L1TF protecting low level PTE routines. Passes the CPA self test. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - cpa->pfn is actually a physical address here and needs to be shifted to produce a PFN - Adjust context] Signed-off-by: Ben Hutchings commit 9c8c0995084eb87c0f634a16d4a05406f3d3a16f Author: Andi Kleen Date: Tue Aug 7 15:09:37 2018 -0700 x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream. Some cases in THP like: - MADV_FREE - mprotect - split mark the PMD non present for temporarily to prevent races. The window for an L1TF attack in these contexts is very small, but it wants to be fixed for correctness sake. Use the proper low level functions for pmd/pud_mknotpresent() to address this. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - Drop change to pud_mknotpresent() - pmd_mknotpresent() does not touch _PAGE_NONE] Signed-off-by: Ben Hutchings commit a3129ae750bdf515e6bd437e08d2ad73dcb2476c Author: Ben Hutchings Date: Sat Sep 29 16:31:50 2018 +0100 x86/speculation/l1tf: Protect NUMA-balance entries against L1TF NUMA balancing has its own functions that manipulated the PRESENT flag in PTEs and PMDs. These were not affected by the changes in commit 6b28baca9b1f "x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation". This is not a problem upstream because NUMA balancing was changed to use {pte,pmd}_modify() in Linux 4.0. Override the generic implementations for x86 with implementations that do the same inversion as {pte,pmd}_modify(). Signed-off-by: Ben Hutchings Cc: x86@kernel.org Cc: Mel Gorman commit 1fb3a36c0e333e2810c9aa4715c0e2080dad5a19 Author: Sean Christopherson Date: Fri Aug 17 10:27:36 2018 -0700 x86/speculation/l1tf: Exempt zeroed PTEs from inversion commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37 upstream. It turns out that we should *not* invert all not-present mappings, because the all zeroes case is obviously special. clear_page() does not undergo the XOR logic to invert the address bits, i.e. PTE, PMD and PUD entries that have not been individually written will have val=0 and so will trigger __pte_needs_invert(). As a result, {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok because the page at physical address 0 is reserved early in boot specifically to mitigate L1TF, so explicitly exempt them from the inversion when reading the PFN. Manifested as an unexpected mprotect(..., PROT_NONE) failure when called on a VMA that has VM_PFNMAP and was mmap'd to as something other than PROT_NONE but never used. mprotect() sends the PROT_NONE request down prot_none_walk(), which walks the PTEs to check the PFNs. prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns -EACCES because it thinks mprotect() is trying to adjust a high MMIO address. [ This is a very modified version of Sean's original patch, but all credit goes to Sean for doing this and also pointing out that sometimes the __pte_needs_invert() function only gets the protection bits, not the full eventual pte. But zero remains special even in just protection bits, so that's ok. - Linus ] Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by: Sean Christopherson Acked-by: Andi Kleen Cc: Thomas Gleixner Cc: Josh Poimboeuf Cc: Michal Hocko Cc: Vlastimil Babka Cc: Dave Hansen Cc: Greg Kroah-Hartman Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 13948a9f275c7d945589737c6f29241b97930630 Author: Andi Kleen Date: Tue Aug 7 15:09:36 2018 -0700 x86/speculation/l1tf: Invert all not present mappings commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream. For kernel mappings PAGE_PROTNONE is not necessarily set for a non present mapping, but the inversion logic explicitely checks for !PRESENT and PROT_NONE. Remove the PROT_NONE check and make the inversion unconditional for all not present mappings. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit b3dc998f1ca71c91f0b0e077a360405f0550a511 Author: Jiri Kosina Date: Sat Jul 14 21:56:13 2018 +0200 x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures commit 6c26fcd2abfe0a56bbd95271fce02df2896cfd24 upstream. pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g. ia64) and breaks build: include/asm-generic/pgtable.h: Assembler messages: include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)' include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true' include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)' include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false' arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle Move those two static inlines into the !__ASSEMBLY__ section so that they don't confuse the asm build pass. Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings") Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman [groeck: Context changes] Signed-off-by: Guenter Roeck Signed-off-by: Ben Hutchings commit c11f6523a7298a7e0b1a2b767e292a69639b997d Author: Michal Hocko Date: Wed Jun 27 17:46:50 2018 +0200 x86/speculation/l1tf: Fix up pte->pfn conversion for PAE commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream. Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for CONFIG_PAE because phys_addr_t is wider than unsigned long and so the pte_val reps. shift left would get truncated. Fix this up by using proper types. [dwmw2: Backport to 4.9] Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation") Reported-by: Jan Beulich Signed-off-by: Michal Hocko Signed-off-by: Thomas Gleixner Acked-by: Vlastimil Babka Signed-off-by: David Woodhouse Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: Adjust context. Also restore the fix to pfn_pud().] Signed-off-by: Ben Hutchings commit 6097e7f43a8b9109626e3ffef3e280119febb2f9 Author: Vlastimil Babka Date: Thu Aug 23 15:44:18 2018 +0200 x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM commit b0a182f875689647b014bc01d36b340217792852 upstream. Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. In fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to holes in the e820 map, the main region is almost 500MB over the 32GB limit: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the 500MB revealed, that there's an off-by-one error in the check in l1tf_select_mitigation(). l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range check in the mitigation path does not take this into account. Instead of amending the range check, make l1tf_pfn_limit() return the first PFN which is over the limit which is less error prone. Adjust the other users accordingly. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: George Anchev Reported-by: Christopher Snowhill Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Andi Kleen Cc: Dave Hansen Cc: Michal Hocko Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz Signed-off-by: Ben Hutchings commit 7076c25fd00163ce2189a88a31877cc1376b0be5 Author: Vlastimil Babka Date: Mon Aug 20 11:58:35 2018 +0200 x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit commit 9df9516940a61d29aedf4d91b483ca6597e7d480 upstream. On 32bit PAE kernels on 64bit hardware with enough physical bits, l1tf_pfn_limit() will overflow unsigned long. This in turn affects max_swapfile_size() and can lead to swapon returning -EINVAL. This has been observed in a 32bit guest with 42 bits physical address size, where max_swapfile_size() overflows exactly to 1 << 32, thus zero, and produces the following warning to dmesg: [ 6.396845] Truncating oversized swap area, only using 0k out of 2047996k Fix this by using unsigned long long instead. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Reported-by: Dominique Leuenberger Reported-by: Adrian Schroeter Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Acked-by: Andi Kleen Acked-by: Michal Hocko Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Dave Hansen Cc: Michal Hocko Link: https://lkml.kernel.org/r/20180820095835.5298-1-vbabka@suse.cz Signed-off-by: Ben Hutchings commit d6d4b0323639065a2360b50392aa05ba3fdd5dff Author: Vlastimil Babka Date: Fri Jun 22 17:39:33 2018 +0200 x86/speculation/l1tf: Protect PAE swap entries against L1TF commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream. The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for offset. The lower word is zeroed, thus systems with less than 4GB memory are safe. With 4GB to 128GB the swap type selects the memory locations vulnerable to L1TF; with even more memory, also the swap offfset influences the address. This might be a problem with 32bit PAE guests running on large 64bit hosts. By continuing to keep the whole swap entry in either high or low 32bit word of PTE we would limit the swap size too much. Thus this patch uses the whole PAE PTE with the same layout as the 64bit version does. The macros just become a bit tricky since they assume the arch-dependent swp_entry_t to be 32bit. Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Acked-by: Michal Hocko [bwh: Backported to 3.16: CONFIG_PGTABLE_LEVELS is not defined; use other config symbols in the condition.] Signed-off-by: Ben Hutchings commit 90ff407c74170af5be4ccf02baa1ee89fef36976 Author: Vlastimil Babka Date: Thu Jun 21 12:36:29 2018 +0200 x86/speculation/l1tf: Extend 64bit swap file size limit commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream. The previous patch has limited swap file size so that large offsets cannot clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation. It assumed that offsets are encoded starting with bit 12, same as pfn. But on x86_64, offsets are encoded starting with bit 9. Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA and 256TB with 46bit MAX_PA. Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit f168a77cc3d6289f9ac07b381f0ab5c3b0d8b6db Author: Konrad Rzeszutek Wilk Date: Wed Jun 20 16:42:57 2018 -0400 x86/bugs: Move the l1tf function and define pr_fmt properly commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream. The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt which was defined as "Spectre V2 : ". Move the function to be past SSBD and also define the pr_fmt. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit df54183ac4eccbbae95afedf7ed1643ac46ac88a Author: Vlastimil Babka Date: Tue Aug 14 20:50:47 2018 +0200 x86/init: fix build with CONFIG_SWAP=n commit 792adb90fa724ce07c0171cbc96b9215af4b1045 upstream. The introduction of generic_max_swapfile_size and arch-specific versions has broken linking on x86 with CONFIG_SWAP=n due to undefined reference to 'generic_max_swapfile_size'. Fix it by compiling the x86-specific max_swapfile_size() only with CONFIG_SWAP=y. Reported-by: Tomas Pruzina Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Vlastimil Babka Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit cf957b8f323ad8237f5c685eea8415a3086b1d33 Author: Andi Kleen Date: Wed Jun 13 15:48:28 2018 -0700 x86/speculation/l1tf: Limit swap file size to MAX_PA/2 commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream. For the L1TF workaround its necessary to limit the swap file size to below MAX_PA/2, so that the higher bits of the swap offset inverted never point to valid memory. Add a mechanism for the architecture to override the swap file size check in swapfile.c and add a x86 specific max swapfile check function that enforces that limit. The check is only enabled if the CPU is vulnerable to L1TF. In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system with 46bit PA it is 32TB. The limit is only per individual swap file, so it's always possible to exceed these limits with multiple swap files or partitions. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Dave Hansen [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 74a430116636754bf42e7ab08fdd9629bd00ffc1 Author: Andi Kleen Date: Wed Jun 13 15:48:27 2018 -0700 x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings commit 42e4089c7890725fcd329999252dc489b72f2921 upstream. For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. [dwmw2: Backport to 4.9] [groeck: Backport to 4.4] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit fc8f00f523b74d55c6c7e056aee799b412ff25f7 Author: Naoya Horiguchi Date: Wed Feb 11 15:27:37 2015 -0800 pagewalk: improve vma handling commit fafaa4264eba49fd10695c193a82760558d093f4 upstream. Current implementation of page table walker has a fundamental problem in vma handling, which started when we tried to handle vma(VM_HUGETLB). Because it's done in pgd loop, considering vma boundary makes code complicated and bug-prone. From the users viewpoint, some user checks some vma-related condition to determine whether the user really does page walk over the vma. In order to solve these, this patch moves vma check outside pgd loop and introduce a new callback ->test_walk(). Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Cyrill Gorcunov Cc: Dave Hansen Cc: Pavel Emelyanov Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of L1TF mitigation] Signed-off-by: Ben Hutchings commit 71875653e47c6f128e76105566043f949f896460 Author: Naoya Horiguchi Date: Wed Feb 11 15:27:34 2015 -0800 mm/pagewalk: remove pgd_entry() and pud_entry() commit 0b1fbfe50006c41014cc25660c0e735d21c34939 upstream. Currently no user of page table walker sets ->pgd_entry() or ->pud_entry(), so checking their existence in each loop is just wasting CPU cycle. So let's remove it to reduce overhead. Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Cyrill Gorcunov Cc: Dave Hansen Cc: Kirill A. Shutemov Cc: Pavel Emelyanov Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of L1TF mitigation] Signed-off-by: Ben Hutchings commit f941fa5197ec5bce8c08978444a8f221f52e975a Author: Dave Airlie Date: Mon Oct 24 15:37:48 2016 +1000 drm/drivers: add support for using the arch wc mapping API. commit 7cf321d118a825c1541b43ca45294126fd474efa upstream. This fixes a regression in all these drivers since the cache mode tracking was fixed for mixed mappings. It uses the new arch API to add the VRAM range to the PAT mapping tracking tables. Fixes: 87744ab3832 (mm: fix cache mode tracking in vm_insert_mixed()) Reviewed-by: Christian König . Signed-off-by: Dave Airlie [bwh: Backported to 3.16: - Drop changes in amdgpu - In nouveau, use struct nouveau_device * and nv_device_resource_{start,len}() - Adjust context] Signed-off-by: Ben Hutchings commit 82b70f2ac8284a8b4c6dced6694b6c80750f8fb0 Author: Dave Airlie Date: Mon Oct 24 15:27:59 2016 +1000 x86/io: add interface to reserve io memtype for a resource range. (v1.1) commit 8ef4227615e158faa4ee85a1d6466782f7e22f2f upstream. A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Andy Lutomirski Cc: Denys Vlasenko Cc: Brian Gerst Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams Acked-by: Ingo Molnar Reviewed-by: Thomas Gleixner Signed-off-by: Dave Airlie [bwh: Backported to 3.16: Memory types have type unsigned long, and the constant is named _PAGE_CACHE_WC instead of _PAGE_CACHE_MODE_WC.] Signed-off-by: Ben Hutchings commit fdf8ea2ba6e301feda08aa2b66e3d2b5b69f470c Author: Dan Williams Date: Fri Oct 7 17:00:18 2016 -0700 mm: fix cache mode tracking in vm_insert_mixed() commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream. vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(), fails to check the pgprot_t it uses for the mapping against the one recorded in the memtype tracking tree. Add the missing call to track_pfn_insert() to preclude cases where incompatible aliased mappings are established for a given physical address range. [groeck: Backport to v4.4.y] Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Cc: David Airlie Cc: Matthew Wilcox Cc: Ross Zwisler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit e65f91b4392875a2e04b344c8dfb683d04fa4552 Author: Andy Lutomirski Date: Tue Dec 29 20:12:20 2015 -0800 mm: Add vm_insert_pfn_prot() commit 1745cbc5d0dee0749a6bc0ea8e872c5db0074061 upstream. The x86 vvar vma contains pages with differing cacheability flags. x86 currently implements this by manually inserting all the ptes using (io_)remap_pfn_range when the vma is set up. x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the mappings as needed. The correct API to use to insert a pfn in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the vma's cache mode, and the HPET page in particular needs to be uncached despite the fact that the rest of the VMA is cached. Add vm_insert_pfn_prot() to support varying cacheability within the same non-COW VMA in a more sane manner. x86 could alternatively use multiple VMAs, but that's messy, would break CRIU, and would create unnecessary VMAs that would waste memory. Signed-off-by: Andy Lutomirski Reviewed-by: Kees Cook Acked-by: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 299147ddcbd447d97e80088d05aff0fa62af34c2 Author: Andi Kleen Date: Wed Jun 13 15:48:26 2018 -0700 x86/speculation/l1tf: Add sysfs reporting for l1tf commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream. L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Assign the next available bits from feature word 7 and bug word 0 - CONFIG_PGTABLE_LEVELS is not defined; use other config symbols in the condition - Adjust context] Signed-off-by: Ben Hutchings commit 251377474f8c66ec70e5b3883fee13db791e21a4 Author: Andi Kleen Date: Wed Jun 13 15:48:25 2018 -0700 x86/speculation/l1tf: Make sure the first page is always reserved commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream. The L1TF workaround doesn't make any attempt to mitigate speculate accesses to the first physical page for zeroed PTEs. Normally it only contains some data from the early real mode BIOS. It's not entirely clear that the first page is reserved in all configurations, so add an extra reservation call to make sure it is really reserved. In most configurations (e.g. with the standard reservations) it's likely a nop. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: Ben Hutchings commit a21dcddca6930a1edb043712e8562365ccf96dba Author: Andi Kleen Date: Wed Jun 13 15:48:24 2018 -0700 x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream. When PTEs are set to PROT_NONE the kernel just clears the Present bit and preserves the PFN, which creates attack surface for L1TF speculation speculation attacks. This is important inside guests, because L1TF speculation bypasses physical page remapping. While the host has its own migitations preventing leaking data from other VMs into the guest, this would still risk leaking the wrong page inside the current guest. This uses the same technique as Linus' swap entry patch: while an entry is is in PROTNONE state invert the complete PFN part part of it. This ensures that the the highest bit will point to non existing memory. The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and pte/pmd/pud_pfn undo it. This assume that no code path touches the PFN part of a PTE directly without using these primitives. This doesn't handle the case that MMIO is on the top of the CPU physical memory. If such an MMIO region was exposed by an unpriviledged driver for mmap it would be possible to attack some real memory. However this situation is all rather unlikely. For 32bit non PAE the inversion is not done because there are really not enough bits to protect anything. Q: Why does the guest need to be protected when the HyperVisor already has L1TF mitigations? A: Here's an example: Physical pages 1 2 get mapped into a guest as GPA 1 -> PA 2 GPA 2 -> PA 1 through EPT. The L1TF speculation ignores the EPT remapping. Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and they belong to different users and should be isolated. A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and gets read access to the underlying physical page. Which in this case points to PA 2, so it can read process B's data, if it happened to be in L1, so isolation inside the guest is broken. There's nothing the hypervisor can do about this. This mitigation has to be done in the guest itself. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen [bwh: Backported to 3.16: - s/check_pgprot/massage_pgprot/ - Keep using PTE_PFN_MASK to extract PFN from pmd_pfn() and pud_pfn(), as we don't need to worry about the PAT bit being set here] Signed-off-by: Ben Hutchings commit 4047db3e1e4f552e2df5d6ff75715b9d504fc690 Author: Ben Hutchings Date: Fri Sep 28 01:15:29 2018 +0100 x86: mm: Add PUD functions These are extracted from commit a00cc7d9dd93 "mm, x86: add support for PUD-sized transparent hugepages" and will be used by later patches. Signed-off-by: Ben Hutchings commit 5fe3e72fa4229fa457d6a6a31d104aff23edd8bd Author: Linus Torvalds Date: Wed Jun 13 15:48:23 2018 -0700 x86/speculation/l1tf: Protect swap entries against L1TF commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream. With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting side effects allow to read the memory the PTE is pointing too, if its values are still in the L1 cache. For swapped out pages Linux uses unmapped PTEs and stores a swap entry into them. To protect against L1TF it must be ensured that the swap entry is not pointing to valid memory, which requires setting higher bits (between bit 36 and bit 45) that are inside the CPUs physical address space, but outside any real memory. To do this invert the offset to make sure the higher bits are always set, as long as the swap file is not too big. Note there is no workaround for 32bit !PAE, or on systems which have more than MAX_PA/2 worth of memory. The later case is very unlikely to happen on real systems. [AK: updated description and minor tweaks by. Split out from the original patch ] Signed-off-by: Linus Torvalds Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Tested-by: Andi Kleen Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen [bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA here] Signed-off-by: Ben Hutchings commit 3e2303b14e5560ef760b7934a8c4c9ad1cdf246f Author: Linus Torvalds Date: Wed Jun 13 15:48:22 2018 -0700 x86/speculation/l1tf: Change order of offset/type in swap entry commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream. If pages are swapped out, the swap entry is stored in the corresponding PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate on PTE entries which have the present bit set and would treat the swap entry as phsyical address (PFN). To mitigate that the upper bits of the PTE must be set so the PTE points to non existent memory. The swap entry stores the type and the offset of a swapped out page in the PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware ignores the bits beyond the phsyical address space limit, so to make the mitigation effective its required to start 'offset' at the lowest possible bit so that even large swap offsets do not reach into the physical address space limit bits. Move offset to bit 9-58 and type to bit 59-63 which are the bits that hardware generally doesn't care about. That, in turn, means that if you on desktop chip with only 40 bits of physical addressing, now that the offset starts at bit 9, there needs to be 30 bits of offset actually *in use* until bit 39 ends up being set, which means when inverted it will again point into existing memory. So that's 4 terabyte of swap space (because the offset is counted in pages, so 30 bits of offset is 42 bits of actual coverage). With bigger physical addressing, that obviously grows further, until the limit of the offset is hit (at 50 bits of offset - 62 bits of actual swap file coverage). This is a preparatory change for the actual swap entry inversion to protect against L1TF. [ AK: Updated description and minor tweaks. Split into two parts ] [ tglx: Massaged changelog ] Signed-off-by: Linus Torvalds Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Tested-by: Andi Kleen Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen [bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA here] Signed-off-by: Ben Hutchings commit 74ea955c1a1c01dad917e11283899e13d87f56a8 Author: Naoya Horiguchi Date: Fri Sep 8 16:10:46 2017 -0700 mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream. _PAGE_PSE is used to distinguish between a truly non-present (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and should be treated as present. But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would cause confusion between one of those PMDs undergoing a THP split, and a soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work well, because it can hurt optimization of tlb handling in thp split. Thus, we need to move the bit. In the current kernel, bits 1-4 are not used in non-present format since commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7 is used as reserved (always clear), so please don't use it for other purpose. Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com Signed-off-by: Naoya Horiguchi Signed-off-by: Zi Yan Acked-by: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: David Nellans Cc: Ingo Molnar Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Minchan Kim Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Andrea Arcangeli Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA here] Signed-off-by: Ben Hutchings commit d6bf6cc66a06ade0ba19d645d52278dfec1f0d12 Author: Dave Hansen Date: Thu Jul 7 17:19:11 2016 -0700 x86/mm: Move swap offset/type up in PTE to work around erratum commit 00839ee3b299303c6a5e26a0a2485427a3afcbbf upstream. This erratum can result in Accessed/Dirty getting set by the hardware when we do not expect them to be (on !Present PTEs). Instead of trying to fix them up after this happens, we just allow the bits to get set and try to ignore them. We do this by shifting the layout of the bits we use for swap offset/type in our 64-bit PTEs. It looks like this: bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0| after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| Note that D was already a don't care (X) even before. We just move TYPE up and turn its old spot (which could be hit by the A bit) into all don't cares. We take 5 bits away from the offset, but that still leaves us with 50 bits which lets us index into a 62-bit swapfile (4 EiB). I think that's probably fine for the moment. We could theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it doesn't gain us anything. Signed-off-by: Dave Hansen Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com Signed-off-by: Ingo Molnar [bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA, which no longer exists upstream. Adjust the bit numbers accordingly, incorporating commit ace7fab7a6cd "x86/mm: Fix swap entry comment and macro".] Signed-off-by: Ben Hutchings commit 10855c7f42392ba42dcd0a934b3f99d0bd1eea31 Author: Andi Kleen Date: Wed Jun 13 15:48:21 2018 -0700 x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream. L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU speculates on PTE entries which do not have the PRESENT bit set, if the content of the resulting physical address is available in the L1D cache. The OS side mitigation makes sure that a !PRESENT PTE entry points to a physical address outside the actually existing and cachable memory space. This is achieved by inverting the upper bits of the PTE. Due to the address space limitations this only works for 64bit and 32bit PAE kernels, but not for 32bit non PAE. This mitigation applies to both host and guest kernels, but in case of a 64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of the PAE address space (44bit) is not enough if the host has more than 43 bits of populated memory address space, because the speculation treats the PTE content as a physical host address bypassing EPT. The host (hypervisor) protects itself against the guest by flushing L1D as needed, but pages inside the guest are not protected against attacks from other processes inside the same guest. For the guest the inverted PTE mask has to match the host to provide the full protection for all pages the host could possibly map into the guest. The hosts populated address space is not known to the guest, so the mask must cover the possible maximal host address space, i.e. 52 bit. On 32bit PAE the maximum PTE mask is currently set to 44 bit because that is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits the mask to be below what the host could possible use for physical pages. The L1TF PROT_NONE protection code uses the PTE masks to determine which bits to invert to make sure the higher bits are set for unmapped entries to prevent L1TF speculation attacks against EPT inside guests. In order to invert all bits that could be used by the host, increase __PHYSICAL_PAGE_SHIFT to 52 to match 64bit. The real limit for a 32bit PAE kernel is still 44 bits because all Linux PTEs are created from unsigned long PFNs, so they cannot be higher than 44 bits on a 32bit kernel. So these extra PFN bits should be never set. The only users of this macro are using it to look at PTEs, so it's safe. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Dave Hansen Signed-off-by: Ben Hutchings commit 8d863b2eacd2db5125ea0cbbcf36176711c4f1b5 Author: Kirill A. Shutemov Date: Mon Feb 16 16:00:18 2015 -0800 powerpc: drop _PAGE_FILE and pte_file()-related helpers commit 780fc5642f59b6c6e2b05794de60b2d2ad5f040e upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 0113d282c79712a7b87dae00324e12dd6815d043 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:25 2015 -0800 xtensa: drop _PAGE_FILE and pte_file()-related helpers commit d9ecee281b8f89da6d3203be62802eda991e37cc upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Acked-by: Max Filippov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 19da6060adec722e1f5d44b7e489ffbfd1efebf7 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:22 2015 -0800 x86: drop _PAGE_FILE and pte_file()-related helpers commit 0a191362058391878cc2a4d4ccddcd8223eb4f79 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 46a63591f8ad6f09d6f2c8271786ca08f8de89f3 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:20 2015 -0800 unicore32: drop pte_file()-related helpers commit 40171798fe11a6dc1d963058b097b2c4c9d34a9c upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Guan Xuetao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 7b793705434e28e78fa5f7dd5d874ba8a00cee64 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:17 2015 -0800 um: drop _PAGE_FILE and pte_file()-related helpers commit 3513006a5691ae3629eef9ddef0b71a47c40dfbc upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Jeff Dike Cc: Richard Weinberger Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit dfaa82c2551e9088df53d75a9b15d351ffaad83a Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:14 2015 -0800 tile: drop pte_file()-related helpers commit eb12f4872a3845a8803f689646dea5b92a30aff7 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Acked-by: Chris Metcalf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 85bbb556636d2d50eadec0b95f64ba42a785561e Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:12 2015 -0800 sparc: drop pte_file()-related helpers commit 6a8c4820895cf1dd2a128aef67ce079ba6eded80 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 7c72d9c99d2b44054da8bdb0b979c31f25697811 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:09 2015 -0800 sh: drop _PAGE_FILE and pte_file()-related helpers commit 8b70beac99466b6d164de9fe647b3567e6f17e3a upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit ae4f4dfad136ac1f4a1a439de26a3ca017627b52 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:06 2015 -0800 score: drop _PAGE_FILE and pte_file()-related helpers commit 917e401ea75478d4f4575bc8b0ef3d14ecf9ef69 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Cc: Chen Liqin Cc: Lennox Wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 3d456f09f261536d4f6f9600f1939b0c0281b66f Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:04 2015 -0800 s390: drop pte_file()-related helpers commit 6e76d4b20bf6b514408ab5bd07f4a76723259b64 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Acked-by: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 3f6b3d70f3ea71a39b21af88f0442c3dd9daebd8 Author: Kirill A. Shutemov Date: Tue Feb 10 14:11:01 2015 -0800 parisc: drop _PAGE_FILE and pte_file()-related helpers commit 8d55da810f1fabcf1d4c0bbc46205e5f2c0fa84b upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: "James E.J. Bottomley" Cc: Helge Deller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 34390bdee812cef0a422bcde68494b29fa82f53a Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:58 2015 -0800 openrisc: drop _PAGE_FILE and pte_file()-related helpers commit 3824e3cf7e865b2ff0b71de23b16e332fe6a853a upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Jonas Bonn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 17a940d12189d7199a51fdb4cb6f6c320463b746 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:53 2015 -0800 mn10300: drop _PAGE_FILE and pte_file()-related helpers commit 6bf63a8ccb1dccd6ab81bc8bc46863493629cdb8 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increases the number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Cc: David Howells Cc: Koichi Yasutake Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit d513a8e0ba3b3ea19841cb023954f3a76436a34c Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:50 2015 -0800 mips: drop _PAGE_FILE and pte_file()-related helpers commit b32da82e28ce90bff4e371fc15d2816fa3175bb0 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: Deleted definitions are slightly different] Signed-off-by: Ben Hutchings commit 1f35e3eba1b7f2ff2789e639ff6f3474ea8aafa9 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:47 2015 -0800 microblaze: drop _PAGE_FILE and pte_file()-related helpers commit 937fa39fb22fea1c1d8ca9e5f31c452b91ac7239 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Michal Simek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 4bc8d1908e1f3d90b41f6db08bed85301972319d Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:45 2015 -0800 metag: drop _PAGE_FILE and pte_file()-related helpers commit 22f9bf3950f20d24198791685f2dccac2c4ef38a upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: James Hogan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 54d5752f05211fb4af8bfd148a0aa7fb7d41e70b Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:41 2015 -0800 m68k: drop _PAGE_FILE and pte_file()-related helpers commit 1eeda0abf4425c91e7ce3ca32f1908c3a51bf84e upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 93aeece684b22447f8df5acc60bee2b5db89faf6 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:39 2015 -0800 m32r: drop _PAGE_FILE and pte_file()-related helpers commit 406b16e26d0996516c8d1641008a7d326bf282d6 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 7115166ebb95bcd09e32f46a957ce6f5acb28203 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:36 2015 -0800 ia64: drop _PAGE_FILE and pte_file()-related helpers commit 636a002b704e0a36cefb5f4cf0293fab858fc46c upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Cc: Tony Luck Cc: Fenghua Yu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit e73e6781f80e31d4991deba4c922959d14b455e2 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:33 2015 -0800 hexagon: drop _PAGE_FILE and pte_file()-related helpers commit d99f95e6522db22192c331c75de182023a49fbcc upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Cc: Richard Kuo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 4b3b8304e6364740f24f8752b8b3c342c935dd87 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:31 2015 -0800 frv: drop _PAGE_FILE and pte_file()-related helpers commit ca5bfa7b390017f053d7581bc701518b87bc3d43 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit b0bad22f41942d5b19ac51ecaf39cdd6aedc04c3 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:28 2015 -0800 cris: drop _PAGE_FILE and pte_file()-related helpers commit 103f3d9a26df944f4c29de190d72dfbf913c71af upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Mikael Starvik Cc: Jesper Nilsson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit d53d67ff422d595b0842c4266850522eee1c3c16 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:25 2015 -0800 c6x: drop pte_file() commit f5b45de9b00eb53d11ada85c61e4ea1c31ab8218 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Acked-by: Mark Salter Cc: Aurelien Jacquiot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit e2b130e53da6d1acaae0765382e4acc52864684e Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:23 2015 -0800 blackfin: drop pte_file() commit 2bc6ff14d46745a7728ed4ed90c5e0edca91f52e upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Steven Miao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 05b880fec74d1f492700dd45d82a11d010b4f564 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:20 2015 -0800 avr32: drop _PAGE_FILE and pte_file()-related helpers commit 7a7d2db4b8b3505a3195178619ffcc80985c4be1 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Haavard Skinnemoen Acked-by: Hans-Christian Egtvedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 107d6e0f42c8fddaaf0784a5860f60171e7b4a25 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:17 2015 -0800 arm: drop L_PTE_FILE and pte_file()-related helpers commit b007ea798f5c568d3f464d37288220ef570f062c upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also adjust __SWP_TYPE_SHIFT, effectively increase size of possible swap file to 128G. Signed-off-by: Kirill A. Shutemov Cc: Russell King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 15dc111e16f0015c3d979c0e400044aa35c02a30 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:15 2015 -0800 arm64: drop PTE_FILE and pte_file()-related helpers commit 9b3e661e58b90b0c2d5c2168c23408f1e59e9e35 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also adjust __SWP_TYPE_SHIFT and increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov Acked-by: Catalin Marinas Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 5445c123757a283a090c29367bb6a5e478ab7f15 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:12 2015 -0800 arc: drop _PAGE_FILE and pte_file()-related helpers commit 18747151308f9e0fb63766057957617ec4afa190 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Acked-by: Vineet Gupta Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit a3cb67bad1f1fa3c6c31086f0349bce2bb3be1ed Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:09 2015 -0800 alpha: drop _PAGE_FILE and pte_file()-related helpers commit b816157a5366550c5ee29a6431ba1abb88721266 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 53df494aaa7e2fef9d6f3b272d42e8272327f119 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:07 2015 -0800 asm-generic: drop unused pte_file* helpers commit 5064c8e19dc215afae8ffae95570e7f22062d49c upstream. All users are gone. Signed-off-by: Kirill A. Shutemov Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 38ae375257cd4d518a7c9d4fdd7b8c96c4b55c1f Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:04 2015 -0800 mm: remove rest usage of VM_NONLINEAR and pte_file() commit 0661a33611fca12570cba48d9344ce68834ee86c upstream. One bit in ->vm_flags is unused now! Signed-off-by: Kirill A. Shutemov Cc: Dan Carpenter Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: Drop changes in mm/debug.c] Signed-off-by: Ben Hutchings commit c9222f12a5d9e547e3b7d72983a5951fb945a5a3 Author: Kirill A. Shutemov Date: Tue Feb 10 14:10:02 2015 -0800 mm: replace vma->sharead.linear with vma->shared commit ac51b934f3912582d3c897c6c4d09b32ea57b2c7 upstream. After removing vma->shared.nonlinear we have only one member of vma->shared union, which doesn't make much sense. This patch drops the union and move struct vma->shared.linear to vma->shared. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit e4c2e72564fbefbe691e445d68c86967ec559bb2 Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:59 2015 -0800 rmap: drop support of non-linear mappings commit 27ba0644ea9dfe6e7693abc85837b60e40583b96 upstream. We don't create non-linear mappings anymore. Let's drop code which handles them in rmap. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings commit 9b32e3ea687fa273d4cacdac9f9bc6753904220e Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:57 2015 -0800 proc: drop handling non-linear mappings commit 1da4b35b001481df99a6dcab12d5d39a876f7056 upstream. We have to handle non-linear mappings for /proc/PID/{smaps,clear_refs} which is unused now. Let's drop it. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings commit ae347f225e91a960c61f7470a3b21bcb4ea5fcfa Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:54 2015 -0800 mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub commit d83a08db5ba6072caa658745881f4baa9bad6a08 upstream. Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings commit b8f97f75196eb1a607813dd51bb8476b832de3da Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:51 2015 -0800 mm: drop support of non-linear mapping from fault codepath commit 9b4bdd2ffab9557ac43af7dff02e7dab1c8c58bd upstream. We don't create non-linear mappings anymore. Let's drop code which handles them on page fault. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings commit f0de3ca478c66bab1d79609fd38c82d59ba71938 Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:49 2015 -0800 mm: drop support of non-linear mapping from unmap/zap codepath commit 8a5f14a23177061ec11daeaa3d09d0765d785c47 upstream. We have remap_file_pages(2) emulation in -mm tree for few release cycles and we plan to have it mainline in v3.20. This patchset removes rest of VM_NONLINEAR infrastructure. Patches 1-8 take care about generic code. They are pretty straight-forward and can be applied without other of patches. Rest patches removes pte_file()-related stuff from architecture-specific code. It usually frees up one bit in non-present pte. I've tried to reuse that bit for swap offset, where I was able to figure out how to do that. For obvious reason I cannot test all that arch-specific code and would like to see acks from maintainers. In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial kernel code. That's too much for functionality nobody uses. Tested-by: Felipe Balbi This patch (of 38): We don't create non-linear mappings anymore. Let's drop code which handles them on unmap/zap. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 83051c4473eba07294eb73f9246c86b2c35bfda1 Author: Kirill A. Shutemov Date: Wed Feb 17 13:11:15 2016 -0800 mm: fix regression in remap_file_pages() emulation commit 48f7df329474b49d83d0dffec1b6186647f11976 upstream. Grazvydas Ignotas has reported a regression in remap_file_pages() emulation. Testcase: #define _GNU_SOURCE #include #include #include #include #define SIZE (4096 * 3) int main(int argc, char **argv) { unsigned long *p; long i; p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; if (remap_file_pages(p, 4096, 0, 1, 0)) { perror("remap_file_pages"); return -1; } if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) { perror("remap_file_pages"); return -1; } assert(p[0] == 1); munmap(p, SIZE); return 0; } The second remap_file_pages() fails with -EINVAL. The reason is that remap_file_pages() emulation assumes that the target vma covers whole area we want to over map. That assumption is broken by first remap_file_pages() call: it split the area into two vma. The solution is to check next adjacent vmas, if they map the same file with the same flags. Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation") Signed-off-by: Kirill A. Shutemov Reported-by: Grazvydas Ignotas Tested-by: Grazvydas Ignotas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings commit 097f98edde717ce09f217d8a285fe357dcd29fd1 Author: Kirill A. Shutemov Date: Tue Feb 10 14:09:46 2015 -0800 mm: replace remap_file_pages() syscall with emulation commit c8d78c1823f46519473949d33f0d1d33fe21ea16 upstream. remap_file_pages(2) was invented to be able efficiently map parts of huge file into limited 32-bit virtual address space such as in database workloads. Nonlinear mappings are pain to support and it seems there's no legitimate use-cases nowadays since 64-bit systems are widely available. Let's drop it and get rid of all these special-cased code. The patch replaces the syscall with emulation which creates new VMA on each remap_file_pages(), unless they it can be merged with an adjacent one. I didn't find *any* real code that uses remap_file_pages(2) to test emulation impact on. I've checked Debian code search and source of all packages in ALT Linux. No real users: libc wrappers, mentions in strace, gdb, valgrind and this kind of stuff. There are few basic tests in LTP for the syscall. They work just fine with emulation. To test performance impact, I've written small test case which demonstrate pretty much worst case scenario: map 4G shmfs file, write to begin of every page pgoff of the page, remap pages in reverse order, read every page. The test creates 1 million of VMAs if emulation is in use, so I had to set vm.max_map_count to 1100000 to avoid -ENOMEM. Before: 23.3 ( +- 4.31% ) seconds After: 43.9 ( +- 0.85% ) seconds Slowdown: 1.88x I believe we can live with that. Test case: #define _GNU_SOURCE #include #include #include #include #define MB (1024UL * 1024) #define SIZE (4096 * MB) int main(int argc, char **argv) { unsigned long *p; long i, pass; for (pass = 0; pass < 10; pass++) { p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; for (i = 0; i < SIZE / 4096; i++) { if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096, 0, (SIZE - 4096 * (i + 1)) >> 12, 0)) { perror("remap_file_pages"); return -1; } } for (i = SIZE / 4096 - 1; i >= 0; i--) assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1); munmap(p, SIZE); } return 0; } [akpm@linux-foundation.org: fix spello] [sasha.levin@oracle.com: initialize populate before usage] [sasha.levin@oracle.com: grab file ref to prevent race while mmaping] Signed-off-by: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Jones Cc: Linus Torvalds Cc: Armin Rigo Signed-off-by: Sasha Levin Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings commit 26ec5b3af4fc556073fc16be8f7d2102783422ab Author: Ben Hutchings Date: Thu Sep 27 21:31:31 2018 +0100 x86/cpufeatures: Show KAISER in cpuinfo I noticed that in the upstream kernel PTI is exposed in /proc/cpuinfo and in the 4.4 and 4.9 stable branches KAISER is exposed similarly, but for some reason the backport to 3.16 hid it. Change to match the other branches. Signed-off-by: Ben Hutchings commit d566d0f9645aadeaaf76a1502bf3c9bc2dc0e2ab Author: Juergen Gross Date: Thu Jun 21 10:43:31 2018 +0200 x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths commit 74899d92e66663dc7671a8017b3146dcd4735f3b upstream. Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods Tested-by: Brian Woods Signed-off-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings commit ca5ab05a1e24188c04ab2782c4df4b9cc0a49591 Author: Thomas Gleixner Date: Sun Aug 12 20:41:45 2018 +0200 KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled commit 024d83cadc6b2af027e473720f3c3da97496c318 upstream. Mikhail reported the following lockdep splat: WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170 but this lock was taken by another, HARDIRQ-safe lock in the past: (&(&sighand->siglock)->rlock){-.-.} and interrupts could create inverse lock ordering between them. Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The code path which connects those locks is: speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64() In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host(). This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible. Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov Signed-off-by: Thomas Gleixner Tested-by: Mikhail Gavrilov Cc: Joerg Roedel Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Matthew Wilcox Cc: Borislav Petkov Cc: Konrad Rzeszutek Wilk Cc: Tom Lendacky Cc: kvm@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings commit 9fa8aec27b80d28da9fc308f51229bc8ee25ee52 Author: Konrad Rzeszutek Wilk Date: Mon May 21 17:54:49 2018 -0400 KVM/VMX: Expose SSBD properly to guests commit 0aa48468d00959c8a37cd3ac727284f4f7359151 upstream. The X86_FEATURE_SSBD is an synthetic CPU feature - that is it bit location has no relevance to the real CPUID 0x7.EBX[31] bit position. For that we need the new CPU feature name. Fixes: 52817587e706 ("x86/cpufeatures: Disentangle SSBD enumeration") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Cc: kvm@vger.kernel.org Cc: "Radim Krčmář" Cc: "H. Peter Anvin" Cc: Paolo Bonzini Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.com [bwh: Backported to 3.16: - Fix guest_cpuid_has_spec_ctrl() as well - Adjust context] Signed-off-by: Ben Hutchings commit 5a9cbccff42fdecd30daaf8e88d4779cce055ac7 Author: Konrad Rzeszutek Wilk Date: Wed May 16 23:18:09 2018 -0400 x86/bugs: Rename SSBD_NO to SSB_NO commit 240da953fcc6a9008c92fae5b1f727ee5ed167ab upstream. The "336996 Speculative Execution Side Channel Mitigations" from May defines this as SSB_NO, hence lets sync-up. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit dde241727d8213c0f29102642a6be2629df4c596 Author: Tom Lendacky Date: Thu May 10 22:06:39 2018 +0200 KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD commit bc226f07dcd3c9ef0b7f6236fe356ea4a9cb4769 upstream. Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - There is no SMM support or cpu_has_high_real_mode_segbase operation - Adjust context] Signed-off-by: Ben Hutchings commit 8963b10319ec195059f8a65c049303f84cb02d38 Author: Thomas Gleixner Date: Thu May 10 20:42:48 2018 +0200 x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG commit 47c61b3955cf712cadfc25635bf9bc174af030ea upstream. Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl argument to check whether the state must be modified on the host. The update reuses speculative_store_bypass_update() so the ZEN-specific sibling coordination can be reused. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit fbb7b98887d4fe5e556b2146857b9c43b6c469f3 Author: Thomas Gleixner Date: Sat May 12 20:10:00 2018 +0200 x86/bugs: Rework spec_ctrl base and mask logic commit be6fcb5478e95bb1c91f489121238deb3abca46a upstream. x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value which are not to be modified. However the implementation is not really used and the bitmask was inverted to make a check easier, which was removed in "x86/bugs: Remove x86_spec_ctrl_set()" Aside of that it is missing the STIBP bit if it is supported by the platform, so if the mask would be used in x86_virt_spec_ctrl() then it would prevent a guest from setting STIBP. Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to sanitize the value which is supplied by the guest. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 4e99bb051d3e60dbb323c5562375c96f56d56ec4 Author: Thomas Gleixner Date: Sat May 12 20:53:14 2018 +0200 x86/bugs: Remove x86_spec_ctrl_set() commit 4b59bdb569453a60b752b274ca61f009e37f4dae upstream. x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there provide no real value as both call sites can just write x86_spec_ctrl_base to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra masking or checking. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 9433d17d1407cb6b858a7f3d9bd5b21e5e69d5da Author: Thomas Gleixner Date: Sat May 12 20:49:16 2018 +0200 x86/bugs: Expose x86_spec_ctrl_base directly commit fa8ac4988249c38476f6ad678a4848a736373403 upstream. x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR. x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to prevent modification to that variable. Though the variable is read only after init and globaly visible already. Remove the function and export the variable instead. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit f486422f0959ab05b8e4f694e8f31b590e7554ad Author: Borislav Petkov Date: Sat May 12 00:14:51 2018 +0200 x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} commit cc69b34989210f067b2c51d5539b5f96ebcc3a01 upstream. Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit dbfa4250cef087f7b7809f5031301d1e78135145 Author: Thomas Gleixner Date: Thu May 10 20:31:44 2018 +0200 x86/speculation: Rework speculative_store_bypass_update() commit 0270be3e34efb05a88bc4c422572ece038ef3608 upstream. The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 9ed451be3e8c5f0e23537925d00483d08f2f3ca1 Author: Tom Lendacky Date: Thu May 17 17:09:18 2018 +0200 x86/speculation: Add virtualized speculative store bypass disable support commit 11fb0683493b2da112cd64c9dada221b52463bf7 upstream. Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - This CPUID word is feature word 11 - Adjust filenames, context] Signed-off-by: Ben Hutchings commit 70d5392690143d20185c144003e4acdb92203eac Author: Thomas Gleixner Date: Wed May 9 23:01:01 2018 +0200 x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream. AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit af108cc9145299e70235a574e65b5b34ed13ef9c Author: Thomas Gleixner Date: Wed May 9 21:53:09 2018 +0200 x86/speculation: Handle HT correctly on AMD commit 1f50ddb4f4189243c05926b842dc1a0332195f31 upstream. The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - s/topology_sibling_cpumask/topology_thread_cpumask/ - Adjust context] Signed-off-by: Ben Hutchings commit 340cfe03f30081d124632fe7408065b1bb54adeb Author: Thomas Gleixner Date: Thu May 10 16:26:00 2018 +0200 x86/cpufeatures: Add FEATURE_ZEN commit d1035d971829dcf80e8686ccde26f94b0a069472 upstream. Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Use the next available bit number in CPU feature word 7 - Adjust filename, context] Signed-off-by: Ben Hutchings commit 8a12c48125b5752e5f5a2aa55cd218d455a119c2 Author: Thomas Gleixner Date: Thu May 10 20:21:36 2018 +0200 x86/cpufeatures: Disentangle SSBD enumeration commit 52817587e706686fcdb27f14c1b000c92f266c96 upstream. The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Use the next available bit number in CPU feature word 7 - Adjust filename, context] Signed-off-by: Ben Hutchings commit afd515a46ecc30dd613d70657ddcaf16939ad2d7 Author: Thomas Gleixner Date: Thu May 10 19:13:18 2018 +0200 x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS commit 7eb8956a7fec3c1f0abc2a5517dada99ccc8a961 upstream. The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Use the next available bit number in CPU feature word 7 - Adjust filename, context] Signed-off-by: Ben Hutchings commit f93d91a5b5345998092d9db00c8be9683207cbea Author: Borislav Petkov Date: Wed May 2 18:15:14 2018 +0200 x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP commit e7c587da125291db39ddf1f49b18e5970adbac17 upstream. Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Jörg Otte Cc: Linus Torvalds Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Use the next available bit numbers in CPU feature word 7 - Adjust filename, context] Signed-off-by: Ben Hutchings commit 39efb5c8a13f65a2a14a97c0b9f43f65d3e7633c Author: Thomas Gleixner Date: Fri May 11 15:21:01 2018 +0200 KVM: SVM: Move spec control call after restore of GS commit 15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765 upstream. svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 67d57c4c51a8e9849cdb79ec795b499b760efc83 Author: Jim Mattson Date: Sun May 13 17:33:57 2018 -0400 x86/cpu: Make alternative_msr_write work for 32-bit code commit 5f2b745f5e1304f438f9b2cd03ebc8120b6e0d3b upstream. Cast val and (val >> 32) to (u32), so that they fit in a general-purpose register in both 32-bit and 64-bit code. [ tglx: Made it u32 instead of uintptr_t ] Fixes: c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") Signed-off-by: Jim Mattson Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Acked-by: Linus Torvalds Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 3b5fdffe9114e3c0cc07e5d8e6d6f89bb0d07847 Author: Konrad Rzeszutek Wilk Date: Fri May 11 16:50:35 2018 -0400 x86/bugs: Fix the parameters alignment and missing void commit ffed645e3be0e32f8e9ab068d257aee8d0fe8eec upstream. Fixes: 7bb4d366c ("x86/bugs: Make cpu_show_common() static") Fixes: 24f7fc83b ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 02a92f36ec3af60e14ce516028b5ff82ca472a40 Author: Jiri Kosina Date: Thu May 10 22:47:32 2018 +0200 x86/bugs: Make cpu_show_common() static commit 7bb4d366cba992904bffa4820d24e70a3de93e76 upstream. cpu_show_common() is not used outside of arch/x86/kernel/cpu/bugs.c, so make it static. Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit eca4fd360aeea799e1f65a58e97f4f9ce89df4fe Author: Jiri Kosina Date: Thu May 10 22:47:18 2018 +0200 x86/bugs: Fix __ssb_select_mitigation() return type commit d66d8ff3d21667b41eddbe86b35ab411e40d8c5f upstream. __ssb_select_mitigation() returns one of the members of enum ssb_mitigation, not ssb_mitigation_cmd; fix the prototype to reflect that. Fixes: 24f7fc83b9204 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 704a0bcfa8751fa8d3381a2c375a941a3643ca6f Author: Borislav Petkov Date: Tue May 8 15:43:45 2018 +0200 Documentation/spec_ctrl: Do some minor cleanups commit dd0792699c4058e63c0715d9a7c2d40226fcdddc upstream. Fix some typos, improve formulations, end sentences with a fullstop. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 6be901da18db5d686b293a16688adb8eecdc7fc3 Author: Konrad Rzeszutek Wilk Date: Wed May 9 21:41:38 2018 +0200 proc: Use underscores for SSBD in 'status' commit e96f46ee8587607a828f783daa6eb5b44d25004d upstream. The style for the 'status' file is CamelCase or this. _. Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 9ba2801ac0e44bbe21a5aba4a2e599f9a0792e9a Author: Konrad Rzeszutek Wilk Date: Wed May 9 21:41:38 2018 +0200 x86/bugs: Rename _RDS to _SSBD commit 9f65fb29374ee37856dbad847b4e121aab72b510 upstream. Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - Update guest_cpuid_has_spec_ctrl() rather than vmx_{get,set}_msr() - Update _TIF_WORK_MASK and _TIF_ALLWORK_MASK - Adjust filenames, context] Signed-off-by: Ben Hutchings commit f8983158cf25740f5fed217344c446928b521f06 Author: Kees Cook Date: Thu May 3 14:37:54 2018 -0700 x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass commit f21b53b20c754021935ea43364dbf53778eeba32 upstream. Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit 34be01c449e2f06bf019979efde3bbf9c5b45c82 Author: Thomas Gleixner Date: Fri May 4 15:12:06 2018 +0200 seccomp: Move speculation migitation control to arch code commit 8bf37d8c067bb7eb8e7c381bdadf9bd89182b6bc upstream. The migitation control is simpler to implement in architecture code as it avoids the extra function call to check the mode. Aside of that having an explicit seccomp enabled mode in the architecture mitigations would require even more workarounds. Move it into architecture code and provide a weak function in the seccomp code. Remove the 'which' argument as this allows the architecture to decide which mitigations are relevant for seccomp. Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 92856049f6e54b124805b3335c84c79937934655 Author: Kees Cook Date: Thu May 3 14:56:12 2018 -0700 seccomp: Add filter flag to opt-out of SSB mitigation commit 00a02d0c502a06d15e07b857f8ff921e3e402675 upstream. If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - We don't support SECCOMP_FILTER_FLAG_TSYNC or SECCOMP_FILTER_FLAG_LOG - Drop selftest changes] Signed-off-by: Ben Hutchings commit 9e48fb3d80a91074ba3f7e0219edde53ec03ef92 Author: Thomas Gleixner Date: Fri May 4 09:40:03 2018 +0200 seccomp: Use PR_SPEC_FORCE_DISABLE commit b849a812f7eb92e96d1c8239b06581b2cfd8b275 upstream. Use PR_SPEC_FORCE_DISABLE in seccomp() because seccomp does not allow to widen restrictions. Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 0b51874e298c685fd20d68f075c70b63b54633f2 Author: Thomas Gleixner Date: Thu May 3 22:09:15 2018 +0200 prctl: Add force disable speculation commit 356e4bfff2c5489e016fdb925adbf12a1e3950ee upstream. For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings commit 284aa1550489336c3e5fd7b7ea3269b6ad96fe01 Author: Kees Cook Date: Tue May 1 15:07:31 2018 -0700 seccomp: Enable speculation flaw mitigations commit 5c3070890d06ff82eecb808d02d2ca39169533ef upstream. When speculation flaw mitigations are opt-in (via prctl), using seccomp will automatically opt-in to these protections, since using seccomp indicates at least some level of sandboxing is desired. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - Apply to current task - Adjust context] Signed-off-by: Ben Hutchings commit c0f77718114bfdc56711dbaa411825839b7a190e Author: Kees Cook Date: Tue May 1 15:31:45 2018 -0700 proc: Provide details on speculation flaw mitigations commit fae1fa0fc6cca8beee3ab8ed71d54f9a78fa3f64 upstream. As done with seccomp and no_new_privs, also show speculation flaw mitigation state in /proc/$pid/status. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 6b065a1d23b27cc826b1f86b01c800d9a86efb71 Author: Kees Cook Date: Tue May 1 15:19:04 2018 -0700 nospec: Allow getting/setting on non-current task commit 7bbf1373e228840bb0295a2ca26d548ef37f448e upstream. Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than current. This is needed both for /proc/$pid/status queries and for seccomp (since thread-syncing can trigger seccomp in non-current threads). Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings commit 4781d92a0499db5f1840bf1570960e68c4872c3d Author: Thomas Gleixner Date: Sun Apr 29 15:26:40 2018 +0200 x86/speculation: Add prctl for Speculative Store Bypass mitigation commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream. Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings commit acdf4971010ee1f01b7e8986a9ef11b8c88b85c8 Author: Thomas Gleixner Date: Sun Apr 29 15:21:42 2018 +0200 x86/process: Allow runtime control of Speculative Store Bypass commit 885f82bfbc6fefb6664ea27965c3ab9ac4194b8c upstream. The Speculative Store Bypass vulnerability can be mitigated with the Reduced Data Speculation (RDS) feature. To allow finer grained control of this eventually expensive mitigation a per task mitigation control is required. Add a new TIF_RDS flag and put it into the group of TIF flags which are evaluated for mismatch in switch_to(). If these bits differ in the previous and the next task, then the slow path function __switch_to_xtra() is invoked. Implement the TIF_RDS dependent mitigation control in the slow path. If the prctl for controlling Speculative Store Bypass is disabled or no task uses the prctl then there is no overhead in the switch_to() fast path. Update the KVM related speculation control functions to take TID_RDS into account as well. Based on a patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Reviewed-by: Konrad Rzeszutek Wilk [bwh: Backported to 3.16: - Exclude _TIF_RDS from _TIF_WORK_MASK and _TIF_ALLWORK_MASK - Adjust filename, context] Signed-off-by: Ben Hutchings commit 883b3421d7b9c027a34888e690d6aadba1943aac Author: Thomas Gleixner Date: Sun Apr 29 15:20:11 2018 +0200 prctl: Add speculation control prctls commit b617cfc858161140d69cc0b5cc211996b557a1c7 upstream. Add two new prctls to control aspects of speculation related vulnerabilites and their mitigations to provide finer grained control over performance impacting mitigations. PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature which is selected with arg2 of prctl(2). The return value uses bit 0-2 with the following meaning: Bit Define Description 0 PR_SPEC_PRCTL Mitigation can be controlled per task by PR_SET_SPECULATION_CTRL 1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is disabled 2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is enabled If all bits are 0 the CPU is not affected by the speculation misfeature. If PR_SPEC_PRCTL is set, then the per task control of the mitigation is available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation misfeature will fail. PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which is selected by arg2 of prctl(2) per task. arg3 is used to hand in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE. The common return values are: EINVAL prctl is not implemented by the architecture or the unused prctl() arguments are not 0 ENODEV arg2 is selecting a not supported speculation misfeature PR_SET_SPECULATION_CTRL has these additional return values: ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE ENXIO prctl control of the selected speculation misfeature is disabled The first supported controlable speculation misfeature is PR_SPEC_STORE_BYPASS. Add the define so this can be shared between architectures. Based on an initial patch from Tim Chen and mostly rewritten. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Reviewed-by: Konrad Rzeszutek Wilk [bwh: Backported to 3.16: - Add the documentation directly under Documentation/ since there is no userspace-api subdirectory or reST index - Adjust context] Signed-off-by: Ben Hutchings commit 601100765f6ac070a677669da1d9d96bbe3ec9dd Author: Thomas Gleixner Date: Sun Apr 29 15:01:37 2018 +0200 x86/speculation: Create spec-ctrl.h to avoid include hell commit 28a2775217b17208811fa43a9e96bd1fdf417b86 upstream. Having everything in nospec-branch.h creates a hell of dependencies when adding the prctl based switching mechanism. Move everything which is not required in nospec-branch.h to spec-ctrl.h and fix up the includes in the relevant files. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 2218ec3989560033dfda6026f8de6f1efaac0438 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:25 2018 -0400 x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest commit da39556f66f5cfe8f9c989206974f1cb16ca5d7c upstream. Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar [dwmw2: Handle 4.9 guest CPUID differences, rename guest_cpu_has_ibrs() → guest_cpu_has_spec_ctrl()] Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 29f1c2ac73cd880aee2578fbfedf7f4179ca172f Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:24 2018 -0400 x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested commit 764f3c21588a059cd783c6ba0734d4db2d72822d upstream. AMD does not need the Speculative Store Bypass mitigation to be enabled. The parameters for this are already available and can be done via MSR C001_1020. Each family uses a different bit in that MSR for this. [ tglx: Expose the bit mask via a variable and move the actual MSR fiddling into the bugs code as that's the right thing to do and also required to prepare for dynamic enable/disable ] Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - Renumber the feature bit - We don't have __ro_after_init - Adjust filename, context] Signed-off-by: Ben Hutchings commit 4d9f99b5b77826d916216f40304c497c7057eb42 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:23 2018 -0400 x86/bugs: Whitelist allowed SPEC_CTRL MSR values commit 1115a859f33276fe8afb31c60cf9d8e657872558 upstream. Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the future (or in fact use different MSRs for the same functionality). As such a run-time mechanism is required to whitelist the appropriate MSR values. [ tglx: Made the variable __ro_after_init ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - We don't have __ro_after_init - Adjust context] Signed-off-by: Ben Hutchings commit 0e21a7d7e2a39539fb2771099ce5faa408805f03 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:22 2018 -0400 x86/bugs/intel: Set proper CPU features and setup RDS commit 772439717dbf703b39990be58d8d4e3e4ad0598a upstream. Intel CPUs expose methods to: - Detect whether RDS capability is available via CPUID.7.0.EDX[31], - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS. - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS. With that in mind if spec_store_bypass_disable=[auto,on] is selected set at boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it. Note that this does not fix the KVM case where the SPEC_CTRL is exposed to guests which can muck with it, see patch titled : KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS. And for the firmware (IBRS to be set), see patch titled: x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits [ tglx: Distangled it from the intel implementation and kept the call order ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings commit 1cd7b5bcb30c69d05964cbf20d7410e82083b95b Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:21 2018 -0400 x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation commit 24f7fc83b9204d20f878c57cb77d261ae825e033 upstream. Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - Renumber the feature bit - Adjust filenames, context] Signed-off-by: Ben Hutchings commit 45f1e691477f26a60dc1a0d3661d58b99848dfdb Author: Konrad Rzeszutek Wilk Date: Sat Apr 28 22:34:17 2018 +0200 x86/cpufeatures: Add X86_FEATURE_RDS commit 0cc5fa00b0a88dad140b4e5c2cead9951ad36822 upstream. Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU supports Reduced Data Speculation. [ tglx: Split it out from a later patch ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - This CPUID word is feature word 10 - Adjust filename] Signed-off-by: Ben Hutchings commit 65d362d590594c95c2f33ba96b314dba6d1b97ab Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:20 2018 -0400 x86/bugs: Expose /sys/../spec_store_bypass commit c456442cd3a59eeb1d60293c26cbe2ff2c4e42cf upstream. Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - Renumber X86_BUG_SPEC_STORE_BYPASS - Adjust filename, context] Signed-off-by: Ben Hutchings commit df53e5d9c5c7debdfb99e84caa3903dfc47cbd4e Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:19 2018 -0400 x86/bugs, KVM: Support the combination of guest and host IBRS commit 5cf687548705412da47c9cec342fd952d71ed3d5 upstream. A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 2ba071eb39d6691d6d29eec434448766dcdc2f5d Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:18 2018 -0400 x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits commit 1b86883ccb8d5d9506529d42dbe1a5257cb30b18 upstream. The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all the other bits as reserved. The Intel SDM glossary defines reserved as implementation specific - aka unknown. As such at bootup this must be taken it into account and proper masking for the bits in use applied. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Made x86_spec_ctrl_base __ro_after_init ] Suggested-by: Jon Masters Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: - We don't have __ro_after_init - Adjust context] Signed-off-by: Ben Hutchings commit b72dd55897316db40e0bc2aaa3ac8493ea4ae751 Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:17 2018 -0400 x86/bugs: Concentrate bug reporting into a separate function commit d1059518b4789cabe34bb4b714d07e6089c82ca1 upstream. Those SysFS functions have a similar preamble, as such make common code to handle them. Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: s/X86_FEATURE_PTI/X86_FEATURE_KAISER/] Signed-off-by: Ben Hutchings commit cacee1eef8ec25cdf5d4f44827b3badfb6818bef Author: Konrad Rzeszutek Wilk Date: Wed Apr 25 22:04:16 2018 -0400 x86/bugs: Concentrate bug detection into a separate function commit 4a28bfe3267b68e22c663ac26185aa16c9b879ef upstream. Combine the various logic which goes through all those x86_cpu_id matching structures in one function. Suggested-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Ingo Molnar [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings commit 4172af7e06994104deeb53e344f53cf4173ce144 Author: Linus Torvalds Date: Tue May 1 15:55:51 2018 +0200 x86/nospec: Simplify alternative_msr_write() commit 1aa7a5735a41418d8e01fa7c9565eb2657e2ea3f upstream. The macro is not type safe and I did look for why that "g" constraint for the asm doesn't work: it's because the asm is more fundamentally wrong. It does movl %[val], %%eax but "val" isn't a 32-bit value, so then gcc will pass it in a register, and generate code like movl %rsi, %eax and gas will complain about a nonsensical 'mov' instruction (it's moving a 64-bit register to a 32-bit one). Passing it through memory will just hide the real bug - gcc still thinks the memory location is 64-bit, but the "movl" will only load the first 32 bits and it all happens to work because x86 is little-endian. Convert it to a type safe inline function with a little trick which hands the feature into the ALTERNATIVE macro. Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Signed-off-by: Ben Hutchings