commit 59377679473491963a599bfd51cc9877492312ee Author: Greg Kroah-Hartman Date: Sat Jul 1 13:12:42 2023 +0200 Linux 6.4.1 Link: https://lore.kernel.org/r/20230629184151.888604958@linuxfoundation.org Tested-by: Ronald Warsow Link: https://lore.kernel.org/r/20230630055626.202608973@linuxfoundation.org Link: https://lore.kernel.org/r/20230630072101.040486316@linuxfoundation.org Tested-by: Ron Economos Tested-by: Jon Hunter Tested-by: Ronald Warsow Tested-by: Rudi Heitbaum Tested-by: Linux Kernel Functional Testing Signed-off-by: Greg Kroah-Hartman commit 2aad4f30f4e4702fc3cfff67bc1ed5aa85510aec Author: Linus Torvalds Date: Fri Jun 30 18:24:49 2023 -0700 xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion commit d85a143b69abb4d7544227e26d12c4c7735ab27d upstream. It turns out that xtensa has a really odd configuration situation: you can do a no-MMU config, but still have the page fault code enabled. Which doesn't sound all that sensible, but it turns out that xtensa can have protection faults even without the MMU, and we have this: config PFAULT bool "Handle protection faults" if EXPERT && !MMU default y help Handle protection faults. MMU configurations must enable it. noMMU configurations may disable it if used memory map never generates protection faults or faults are always fatal. If unsure, say Y. which completely violated my expectations of the page fault handling. End result: Guenter reports that the xtensa no-MMU builds all fail with arch/xtensa/mm/fault.c: In function ‘do_page_fault’: arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’ because I never exposed the new lock_mm_and_find_vma() function for the no-MMU case. Doing so is simple enough, and fixes the problem. Reported-and-tested-by: Guenter Roeck Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 87d780e048bda5cda800f8ffc5199ecdd53b1412 Author: Linus Torvalds Date: Thu Jun 29 23:34:29 2023 -0700 csky: fix up lock_mm_and_find_vma() conversion commit e55e5df193d247a38a5e1ac65a5316a0adcc22fa upstream. As already mentioned in my merge message for the 'expand-stack' branch, we have something like 24 different versions of the page fault path for all our different architectures, all just _slightly_ different due to various historical reasons (usually related to exactly when they branched off the original i386 version, and the details of the other architectures they had in their history). And a few of them had some silly mistake in the conversion. Most of the architectures call the faulting address 'address' in the fault path. But not all. Some just call it 'addr'. And if you end up doing a bit too much copy-and-paste, you end up with the wrong version in the places that do it differently. In this case it was csky. Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Reported-by: Guenter Roeck Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0d98e5325f1f423be19c6124905133951cc17198 Author: Linus Torvalds Date: Thu Jun 29 23:04:57 2023 -0700 parisc: fix expand_stack() conversion commit ea3f8272876f2958463992f6736ab690fde7fa9c upstream. In commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") I tried to deal with the remaining odd page fault handling cases. The oddest one is ia64, which has stacks that grow both up and down. And because ia64 was _so_ odd, I asked people to verify the end result. But a close second oddity is parisc, which is the only one that has a main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But it looked obvious enough that I didn't worry about it. I should have worried a bit more. Not because it was particularly complex, but because I just used the wrong variable name. The previous vma isn't called "prev", it's called "prev_vma". Blush. Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 23d1e960cd12b38000908839c6b2ff494223149e Author: Linus Torvalds Date: Thu Jun 29 20:41:24 2023 -0700 sparc32: fix lock_mm_and_find_vma() conversion commit 0b26eadbf200abf6c97c6d870286c73219cdac65 upstream. The sparc32 conversion to lock_mm_and_find_vma() in commit a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") missed the fact that we didn't actually have a 'regs' pointer available in the 'force_user_fault()' case. It's there in the regular page fault path ("do_sparc_fault()"), but not the window underflow/overflow paths. Which is all fine - we can just pass in a NULL pointer. The register state is only used to avoid deadlock with kernel faults, which is not the case for any of these register window faults. Reported-by: Stephen Rothwell Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Signed-off-by: Linus Torvalds Cc: Naresh Kamboju Signed-off-by: Greg Kroah-Hartman commit 7a11f6e08edf13234cac84ff16f13f8204bc9374 Author: Ricardo Cañuelo Date: Thu May 25 14:18:11 2023 +0200 Revert "thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe" commit 86edac7d3888c715fe3a81bd61f3617ecfe2e1dd upstream. This reverts commit f05c7b7d9ea9477fcc388476c6f4ade8c66d2d26. That change was causing a regression in the generic-adc-thermal-probed bootrr test as reported in the kernelci-results list [1]. A proper rework will take longer, so revert it for now. [1] https://groups.io/g/kernelci-results/message/42660 Fixes: f05c7b7d9ea9 ("thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe") Signed-off-by: Ricardo Cañuelo Suggested-by: AngeloGioacchino Del Regno Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20230525121811.3360268-1-ricardo.canuelo@collabora.com Signed-off-by: Greg Kroah-Hartman commit e6d864166aaf6e4040a3614640262cb796153544 Author: Mike Hommey Date: Sun Jun 18 08:09:57 2023 +0900 HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651. commit 5fe251112646d8626818ea90f7af325bab243efa upstream. commit 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary") put restarting communication behind that flag, and this was apparently necessary on the T651, but the flag was not set for it. Fixes: 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary") Cc: stable@vger.kernel.org Signed-off-by: Mike Hommey Link: https://lore.kernel.org/r/20230617230957.6mx73th4blv7owqk@glandium.org Signed-off-by: Benjamin Tissoires Signed-off-by: Greg Kroah-Hartman commit 05b47034e2488c2924e5c032e20a1979d012b5b5 Author: Ludvig Michaelsson Date: Wed Jun 21 13:17:43 2023 +0200 HID: hidraw: fix data race on device refcount commit 944ee77dc6ec7b0afd8ec70ffc418b238c92f12b upstream. The hidraw_open() function increments the hidraw device reference counter. The counter has no dedicated synchronization mechanism, resulting in a potential data race when concurrently opening a device. The race is a regression introduced by commit 8590222e4b02 ("HID: hidraw: Replace hidraw device table mutex with a rwsem"). While minors_rwsem is intended to protect the hidraw_table itself, by instead acquiring the lock for writing, the reference counter is also protected. This is symmetrical to hidraw_release(). Link: https://github.com/systemd/systemd/issues/27947 Fixes: 8590222e4b02 ("HID: hidraw: Replace hidraw device table mutex with a rwsem") Cc: stable@vger.kernel.org Signed-off-by: Ludvig Michaelsson Link: https://lore.kernel.org/r/20230621-hidraw-race-v1-1-a58e6ac69bab@yubico.com Signed-off-by: Benjamin Tissoires Signed-off-by: Greg Kroah-Hartman commit 14fdcf965dc569f2c879fc2d234d95bc2711d6c2 Author: Zhang Shurong Date: Sun Jun 25 00:16:49 2023 +0800 fbdev: fix potential OOB read in fast_imageblit() commit c2d22806aecb24e2de55c30a06e5d6eb297d161d upstream. There is a potential OOB read at fast_imageblit, for "colortab[(*src >> 4)]" can become a negative value due to "const char *s = image->data, *src". This change makes sure the index for colortab always positive or zero. Similar commit: https://patchwork.kernel.org/patch/11746067 Potential bug report: https://groups.google.com/g/syzkaller-bugs/c/9ubBXKeKXf4/m/k-QXy4UgAAAJ Signed-off-by: Zhang Shurong Cc: stable@vger.kernel.org Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 00d5932e09d74ba442aee1f93db59a2d53bb5afd Author: Hugh Dickins Date: Wed Jun 28 21:31:35 2023 -0700 mm/khugepaged: fix regression in collapse_file() commit e8c716bc6812202ccf4ce0f0bad3428b794fb39c upstream. There is no xas_pause(&xas) in collapse_file()'s main loop, at the points where it does xas_unlock_irq(&xas) and then continues. That would explain why, once two weeks ago and twice yesterday, I have hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged: fix iteration in collapse_file" removed the xas_set(&xas, index) just before it: xas.xa_node could be left pointing to a stale node, if there was concurrent activity on the file which transformed its xarray. I tried inserting xas_pause()s, but then even bootup crashed on that VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in xas_pause(). xas_next() and xas_pause() are good for use in simple loops, but not in this one: xas_set() worked well until now, so use xas_set(&xas, index) explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not to need its own xas_set(), and not to interfere with the xa_state (which would probably stop the crashes from xas_pause(), but I trust that less). The user-visible effects of this bug (if VM_BUG_ONs are configured out) would be data loss and data leak - potentially - though in practice I expect it is more likely that a subsequent check (e.g. on mapping or on nr_none) would notice an inconsistency, and just abandon the collapse. Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/ Fixes: c8a8f3b4a95a ("mm/khugepaged: fix iteration in collapse_file") Signed-off-by: Hugh Dickins Cc: stable@kernel.org Cc: Andrew Morton Cc: Matthew Wilcox Cc: David Stevens Cc: Peter Xu Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f450d0307644f0efd91ed99f613069302ff52e52 Author: Linus Torvalds Date: Sun Jun 25 14:02:25 2023 -0700 gup: add warning if some caller would seem to want stack expansion commit a425ac5365f6cb3cc47bf83e6bff0213c10445f7 upstream. It feels very unlikely that anybody would want to do a GUP in an unmapped area under the stack pointer, but real users sometimes do some really strange things. So add a (temporary) warning for the case where a GUP fails and expanding the stack might have made it work. It's trivial to do the expansion in the caller as part of getting the mm lock in the first place - see __access_remote_vm() for ptrace, for example - it's just that it's unnecessarily painful to do it deep in the guts of the GUP lookup when we might have to drop and re-take the lock. I doubt anybody actually does anything quite this strange, but let's be proactive: adding these warnings is simple, and will make debugging it much easier if they trigger. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d0198363f9108e4adb2511e607ba91e44779e8b1 Author: Jason Gerecke Date: Thu Jun 8 14:38:28 2023 -0700 HID: wacom: Use ktime_t rather than int when dealing with timestamps commit 9a6c0e28e215535b2938c61ded54603b4e5814c5 upstream. Code which interacts with timestamps needs to use the ktime_t type returned by functions like ktime_get. The int type does not offer enough space to store these values, and attempting to use it is a recipe for problems. In this particular case, overflows would occur when calculating/storing timestamps leading to incorrect values being reported to userspace. In some cases these bad timestamps cause input handling in userspace to appear hung. Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/901 Fixes: 17d793f3ed53 ("HID: wacom: insert timestamp to packed Bluetooth (BT) events") CC: stable@vger.kernel.org Signed-off-by: Jason Gerecke Reviewed-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20230608213828.2108-1-jason.gerecke@wacom.com Signed-off-by: Benjamin Tissoires Signed-off-by: Greg Kroah-Hartman commit fb32951c89030c5f9944ca8aa10301d7eb733b49 Author: Linus Torvalds Date: Sat Jun 24 13:45:51 2023 -0700 mm: always expand the stack with the mmap write lock held commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream. This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum Tested-by: John Paul Adrian Glaubitz # ia64 Tested-by: Frank Scheiner # ia64 Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit af099fa739b8408ea3b3c854a3098a6fd94855cb Author: Linus Torvalds Date: Mon Jun 19 11:34:15 2023 -0700 execve: expand new process stack manually ahead of time commit f313c51d26aa87e69633c9b46efb37a930faca71 upstream. This is a small step towards a model where GUP itself would not expand the stack, and any user that needs GUP to not look up existing mappings, but actually expand on them, would have to do so manually before-hand, and with the mm lock held for writing. It turns out that execve() already did almost exactly that, except it didn't take the mm lock at all (it's single-threaded so no locking technically needed, but it could cause lockdep errors). And it only did it for the CONFIG_STACK_GROWSUP case, since in that case GUP has obviously never expanded the stack downwards. So just make that CONFIG_STACK_GROWSUP case do the right thing with locking, and enable it generally. This will eventually help GUP, and in the meantime avoids a special case and the lockdep issue. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b2d6752dbfe74d9eed81fe5cb608232e87c823f4 Author: Liam R. Howlett Date: Fri Jun 16 15:58:54 2023 -0700 mm: make find_extend_vma() fail if write lock not held commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Liam R. Howlett Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit accf6d0c5832e3a880e6798fc4cc76d49573e7f4 Author: Linus Torvalds Date: Sat Jun 24 11:17:05 2023 -0700 powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() commit 2cd76c50d0b41cec5c87abfcdf25b236a2793fb6 upstream. This is one of the simple cases, except there's no pt_regs pointer. Which is fine, as lock_mm_and_find_vma() is set up to work fine with a NULL pt_regs. Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting, so we can just use the helper without any extra work. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 203cfe05efc84b6d888fdc37862c53c6ee64a006 Author: Linus Torvalds Date: Sat Jun 24 10:55:38 2023 -0700 mm/fault: convert remaining simple cases to lock_mm_and_find_vma() commit a050ba1e7422f2cc60ff8bfde3f96d34d00cb585 upstream. This does the simple pattern conversion of alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma() helper. They all have the regular fault handling pattern without odd special cases. The remaining architectures all have something that keeps us from a straightforward conversion: ia64 and parisc have stacks that can grow both up as well as down (and ia64 has special address region checks). And m68k, microblaze, openrisc, sparc64, and um end up having extra rules about only expanding the stack down a limited amount below the user space stack pointer. That is something that x86 used to do too (long long ago), and it probably could just be skipped, but it still makes the conversion less than trivial. Note that this conversion was done manually and with the exception of alpha without any build testing, because I have a fairly limited cross- building environment. The cases are all simple, and I went through the changes several times, but... Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4e3fb74f605054555693031083248903dc2796d6 Author: Ben Hutchings Date: Thu Jun 22 21:24:30 2023 +0200 arm/mm: Convert to using lock_mm_and_find_vma() commit 8b35ca3e45e35a26a21427f35d4093606e93ad0a upstream. arm has an additional check for address < FIRST_USER_ADDRESS before expanding the stack. Since FIRST_USER_ADDRESS is defined everywhere (generally as 0), move that check to the generic expand_downwards(). Signed-off-by: Ben Hutchings Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7e99b9821acc954f407da09b31a4a95d35dd147d Author: Ben Hutchings Date: Thu Jun 22 20:18:18 2023 +0200 riscv/mm: Convert to using lock_mm_and_find_vma() commit 7267ef7b0b77f4ed23b7b3c87d8eca7bd9c2d007 upstream. Signed-off-by: Ben Hutchings Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 929eb6b2a6901d5c7ef13a6aab0a07c31a603a05 Author: Ben Hutchings Date: Thu Jun 22 18:47:40 2023 +0200 mips/mm: Convert to using lock_mm_and_find_vma() commit 4bce37a68ff884e821a02a731897a8119e0c37b7 upstream. Signed-off-by: Ben Hutchings Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b6f36565369c66459dbdde7199e8fa89cbb5f54b Author: Michael Ellerman Date: Fri Jun 16 15:51:29 2023 +1000 powerpc/mm: Convert to using lock_mm_and_find_vma() commit e6fe228c4ffafdfc970cf6d46883a1f481baf7ea upstream. Signed-off-by: Michael Ellerman Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7a1383601b7ced8976854706f22630f4bc3473bf Author: Linus Torvalds Date: Thu Jun 15 17:11:44 2023 -0700 arm64/mm: Convert to using lock_mm_and_find_vma() commit ae870a68b5d13d67cf4f18d47bb01ee3fee40acb upstream. This converts arm64 to use the new page fault helper. It was very straightforward, but still needed a fix for the "obvious" conversion I initially did. Thanks to Suren for the fix and testing. Fixed-and-tested-by: Suren Baghdasaryan Unnecessary-code-removal-by: Liam R. Howlett Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d939d8c154f1dcc8ff6818e564a518b382a32ef5 Author: Linus Torvalds Date: Thu Jun 15 16:17:48 2023 -0700 mm: make the page fault mmap locking killable commit eda0047296a16d65a7f2bc60a408f70d178b2014 upstream. This is done as a separate patch from introducing the new lock_mm_and_find_vma() helper, because while it's an obvious change, it's not what x86 used to do in this area. We already abort the page fault on fatal signals anyway, so why should we wait for the mmap lock only to then abort later? With the new helper function that returns without the lock held on failure anyway, this is particularly easy and straightforward. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b11fa3d22ac0fbc0bfaa740b3b3669d43ec48503 Author: Linus Torvalds Date: Thu Jun 15 15:17:36 2023 -0700 mm: introduce new 'lock_mm_and_find_vma()' page fault helper commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream. .. and make x86 use it. This basically extracts the existing x86 "find and expand faulting vma" code, but extends it to also take the mmap lock for writing in case we actually do need to expand the vma. We've historically short-circuited that case, and have some rather ugly special logic to serialize the stack segment expansion (since we only hold the mmap lock for reading) that doesn't match the normal VM locking. That slight violation of locking worked well, right up until it didn't: the maple tree code really does want proper locking even for simple extension of an existing vma. So extract the code for "look up the vma of the fault" from x86, fix it up to do the necessary write locking, and make it available as a helper function for other architectures that can use the common helper. Note: I say "common helper", but it really only handles the normal stack-grows-down case. Which is all architectures except for PA-RISC and IA64. So some rare architectures can't use the helper, but if they care they'll just need to open-code this logic. It's also worth pointing out that this code really would like to have an optimistic "mmap_upgrade_trylock()" to make it quicker to go from a read-lock (for the common case) to taking the write lock (for having to extend the vma) in the normal single-threaded situation where there is no other locking activity. But that _is_ all the very uncommon special case, so while it would be nice to have such an operation, it probably doesn't matter in reality. I did put in the skeleton code for such a possible future expansion, even if it only acts as pseudo-documentation for what we're doing. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f5fcf6555a2a4f32947d17b92b173837cc652891 Author: Peng Zhang Date: Sat May 6 10:47:52 2023 +0800 maple_tree: fix potential out-of-bounds access in mas_wr_end_piv() commit cd00dd2585c4158e81fdfac0bbcc0446afbad26d upstream. Check the write offset end bounds before using it as the offset into the pivot array. This avoids a possible out-of-bounds access on the pivot array if the write extends to the last slot in the node, in which case the node maximum should be used as the end pivot. akpm: this doesn't affect any current callers, but new users of mapletree may encounter this problem if backported into earlier kernels, so let's fix it in -stable kernels in case of this. Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang Reviewed-by: Liam R. Howlett Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit b6e1ef3cd6d8504fe9107cf1e150d8027834612b Author: Oliver Hartkopp Date: Wed Jun 7 09:27:08 2023 +0200 can: isotp: isotp_sendmsg(): fix return error fix on TX path commit e38910c0072b541a91954682c8b074a93e57c09b upstream. With commit d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") the missing correct return value in the case of a protocol error was introduced. But the way the error value has been read and sent to the user space does not follow the common scheme to clear the error after reading which is provided by the sock_error() function. This leads to an error report at the following write() attempt although everything should be working. Fixes: d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") Reported-by: Carsten Schmidt Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 3f2719a1c382b1287ac2cc5f27c6c10703bf0c78 Author: Wyes Karny Date: Mon Jun 12 11:36:10 2023 +0000 cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated commit f4aad639302a07454dcb23b408dcadf8a9efb031 upstream. amd-pstate passive mode driver is hyphenated. So make amd-pstate active mode driver consistent with that rename "amd_pstate_epp" to "amd-pstate-epp". Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors") Cc: All applicable Reviewed-by: Gautham R. Shenoy Signed-off-by: Wyes Karny Acked-by: Huang Rui Reviewed-by: Perry Yuan Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 9e97e46e32104d204e739c29c41615f87cca15ff Author: Thomas Gleixner Date: Thu Jun 15 22:33:57 2023 +0200 x86/smp: Cure kexec() vs. mwait_play_dead() breakage commit d7893093a7417527c0d73c9832244e65c9d0114f upstream. TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj Signed-off-by: Thomas Gleixner Tested-by: Ashok Raj Reviewed-by: Ashok Raj Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de Signed-off-by: Greg Kroah-Hartman commit cc37b1184236b86addfb7051c6e2ff2dccb9a81a Author: Thomas Gleixner Date: Thu Jun 15 22:33:55 2023 +0200 x86/smp: Use dedicated cache-line for mwait_play_dead() commit f9c9987bf52f4e42e940ae217333ebb5a4c3b506 upstream. Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of MWAIT: kexec(). This is required as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes MWAIT to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de Signed-off-by: Greg Kroah-Hartman commit 4726d74f697f87c13481a22f893c8f04c6050910 Author: Thomas Gleixner Date: Thu Jun 15 22:33:54 2023 +0200 x86/smp: Remove pointless wmb()s from native_stop_other_cpus() commit 2affa6d6db28855e6340b060b809c23477aa546e upstream. The wmb()s before sending the IPIs are not synchronizing anything. If at all then the apic IPI functions have to provide or act as appropriate barriers. Remove these cargo cult barriers which have no explanation of what they are synchronizing. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de Signed-off-by: Greg Kroah-Hartman commit 8b1b43c4210088b33a79a1758e307d660febd8c6 Author: Tony Battersby Date: Thu Jun 15 22:33:52 2023 +0200 x86/smp: Dont access non-existing CPUID leaf commit 9b040453d4440659f33dc6f0aa26af418ebfe70b upstream. stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel CPUs return the content of the highest supported leaf when a non-existing leaf is read, while AMD CPUs return all zeros for unsupported leafs. So the result of the test on Intel CPUs is lottery. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required. Check whether the leaf is supported before reading it. [ tglx: Adjusted changelog ] Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Signed-off-by: Tony Battersby Signed-off-by: Thomas Gleixner Reviewed-by: Mario Limonciello Reviewed-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de Signed-off-by: Greg Kroah-Hartman commit f9abe01d5d68ff870213ef0cbec69b89f70096b3 Author: Thomas Gleixner Date: Wed Apr 26 18:37:00 2023 +0200 x86/smp: Make stop_other_cpus() more robust commit 1f5e7eb7868e42227ac426c96d437117e6e06e8e upstream. Tony reported intermittent lockups on poweroff. His analysis identified the wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that on SME enabled machines a kexec() does not leave any stale data in the caches when switching from encrypted to non-encrypted mode or vice versa. That wbinvd() is conditional on the SME feature bit which is read directly from CPUID. But that readout does not check whether the CPUID leaf is available or not. If it's not available the CPU will return the value of the highest supported leaf instead. Depending on the content the "SME" bit might be set or not. That's incorrect but harmless. Making the CPUID readout conditional makes the observed hangs go away, but it does not fix the underlying problem: CPU0 CPU1 stop_other_cpus() send_IPIs(REBOOT); stop_this_cpu() while (num_online_cpus() > 1); set_online(false); proceed... -> hang wbinvd() WBINVD is an expensive operation and if multiple CPUs issue it at the same time the resulting delays are even larger. But CPU0 already observed num_online_cpus() going down to 1 and proceeds which causes the system to hang. This issue exists independent of WBINVD, but the delays caused by WBINVD make it more prominent. Make this more robust by adding a cpumask which is initialized to the online CPU mask before sending the IPIs and CPUs clear their bit in stop_this_cpu() after the WBINVD completed. Check for that cpumask to become empty in stop_other_cpus() instead of watching num_online_cpus(). The cpumask cannot plug all holes either, but it's better than a raw counter and allows to restrict the NMI fallback IPI to be sent only the CPUs which have not reported within the timeout window. Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Reported-by: Tony Battersby Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Ashok Raj Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx Signed-off-by: Greg Kroah-Hartman commit 9a500542a1dce6eeaecaa18cadedfce5dfdf6b3e Author: Borislav Petkov (AMD) Date: Tue May 2 19:53:50 2023 +0200 x86/microcode/AMD: Load late on both threads too commit a32b0f0db3f396f1c9be2fe621e77c09ec3d8e7d upstream. Do the same as early loading - load on both threads. Signed-off-by: Borislav Petkov (AMD) Cc: Link: https://lore.kernel.org/r/20230605141332.25948-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman