commit 567f0ad8b0b773cc886906514b6c1ccbd1335a41 Author: Greg Kroah-Hartman Date: Tue Feb 4 18:18:03 2020 +0000 Linux 5.5.2 commit 165387a9c90152f35976d82feca6eff5f0d5ac02 Author: Josef Bacik Date: Fri Jan 31 09:31:05 2020 -0500 btrfs: do not zero f_bavail if we have available space commit d55966c4279bfc6a0cf0b32bf13f5df228a1eeb6 upstream. There was some logic added a while ago to clear out f_bavail in statfs() if we did not have enough free metadata space to satisfy our global reserve. This was incorrect at the time, however didn't really pose a problem for normal file systems because we would often allocate chunks if we got this low on free metadata space, and thus wouldn't really hit this case unless we were actually full. Fast forward to today and now we are much better about not allocating metadata chunks all of the time. Couple this with d792b0f19711 ("btrfs: always reserve our entire size for the global reserve") which now means we'll easily have a larger global reserve than our free space, we are now more likely to trip over this while still having plenty of space. Fix this by skipping this logic if the global rsv's space_info is not full. space_info->full is 0 unless we've attempted to allocate a chunk for that space_info and that has failed. If this happens then the space for the global reserve is definitely sacred and we need to report b_avail == 0, but before then we can just use our calculated b_avail. Reported-by: Martin Steigerwald Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted") CC: stable@vger.kernel.org # 4.5+ Reviewed-by: Qu Wenruo Tested-By: Martin Steigerwald Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 5c41180d719de814d686536e335651299da5a4b3 Author: Michal Koutný Date: Thu Jan 9 16:05:59 2020 +0100 cgroup: Prevent double killing of css when enabling threaded cgroup commit 3bc0bb36fa30e95ca829e9cf480e1ef7f7638333 upstream. The test_cgcore_no_internal_process_constraint_on_threads selftest when running with subsystem controlling noise triggers two warnings: > [ 597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0 > [ 597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160 Both stem from a call to cgroup_type_write. The first warning was also triggered by syzkaller. When we're switching cgroup to threaded mode shortly after a subsystem was disabled on it, we can see the respective subsystem css dying there. The warning in cgroup_apply_control_enable is harmless in this case since we're not adding new subsys anyway. The warning in cgroup_apply_control_disable indicates an attempt to kill css of recently disabled subsystem repeatedly. The commit prevents these situations by making cgroup_type_write wait for all dying csses to go away before re-applying subtree controls. When at it, the locations of WARN_ON_ONCE calls are moved so that warning is triggered only when we are about to misuse the dying css. Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com Reported-by: Christian Brauner Signed-off-by: Michal Koutný Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman commit 6d0a10479ec1713a4e8df897273209a0f791ca49 Author: Dan Carpenter Date: Wed Jan 15 20:49:04 2020 +0300 Bluetooth: Fix race condition in hci_release_sock() commit 11eb85ec42dc8c7a7ec519b90ccf2eeae9409de8 upstream. Syzbot managed to trigger a use after free "KASAN: use-after-free Write in hci_sock_bind". I have reviewed the code manually and one possibly cause I have found is that we are not holding lock_sock(sk) when we do the hci_dev_put(hdev) in hci_sock_release(). My theory is that the bind and the release are racing against each other which results in this use after free. Reported-by: syzbot+eba992608adf3d796bcc@syzkaller.appspotmail.com Signed-off-by: Dan Carpenter Signed-off-by: Johan Hedberg Signed-off-by: Greg Kroah-Hartman commit 6374fd5b64df1e447fb00888bb0a629f4ed6d20a Author: Zhenzhong Duan Date: Mon Jan 13 11:48:42 2020 +0800 ttyprintk: fix a potential deadlock in interrupt context issue commit 9a655c77ff8fc65699a3f98e237db563b37c439b upstream. tpk_write()/tpk_close() could be interrupted when holding a mutex, then in timer handler tpk_write() may be called again trying to acquire same mutex, lead to deadlock. Google syzbot reported this issue with CONFIG_DEBUG_ATOMIC_SLEEP enabled: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1 1 lock held by swapper/1/0: ... Call Trace: dump_stack+0x197/0x210 ___might_sleep.cold+0x1fb/0x23e __might_sleep+0x95/0x190 __mutex_lock+0xc5/0x13c0 mutex_lock_nested+0x16/0x20 tpk_write+0x5d/0x340 resync_tnc+0x1b6/0x320 call_timer_fn+0x1ac/0x780 run_timer_softirq+0x6c3/0x1790 __do_softirq+0x262/0x98c irq_exit+0x19b/0x1e0 smp_apic_timer_interrupt+0x1a3/0x610 apic_timer_interrupt+0xf/0x20 See link https://syzkaller.appspot.com/bug?extid=2eeef62ee31f9460ad65 for more details. Fix it by using spinlock in process context instead of mutex and having interrupt disabled in critical section. Reported-by: syzbot+2eeef62ee31f9460ad65@syzkaller.appspotmail.com Signed-off-by: Zhenzhong Duan Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Link: https://lore.kernel.org/r/20200113034842.435-1-zhenzhong.duan@gmail.com Signed-off-by: Greg Kroah-Hartman commit 9db441e545d5dd91a97726ea0e9776fe80d8eba4 Author: Tetsuo Handa Date: Thu Jan 2 12:53:49 2020 +0900 tomoyo: Use atomic_t for statistics counter commit a8772fad0172aeae339144598b809fd8d4823331 upstream. syzbot is reporting that there is a race at tomoyo_stat_update() [1]. Although it is acceptable to fail to track exact number of times policy was updated, convert to atomic_t because this is not a hot path. [1] https://syzkaller.appspot.com/bug?id=a4d7b973972eeed410596e6604580e0133b0fc04 Reported-by: syzbot Signed-off-by: Tetsuo Handa Signed-off-by: Greg Kroah-Hartman commit a5491f38c7299b4b27d83d88f208ec05e73e0e3d Author: Hans Verkuil Date: Tue Nov 12 10:22:28 2019 +0100 media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0 commit 569bc8d6a6a50acb5fcf07fb10b8d2d461fdbf93 upstream. This fixes a syzbot failure since actlen could be uninitialized, but it was still used. Syzbot link: https://syzkaller.appspot.com/bug?extid=6bf9606ee955b646c0e1 Reported-and-tested-by: syzbot+6bf9606ee955b646c0e1@syzkaller.appspotmail.com Signed-off-by: Hans Verkuil Acked-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit d7c47ec04e3171c13df53bc88cf0bc364820f48d Author: Hans Verkuil Date: Tue Nov 12 10:22:24 2019 +0100 media: gspca: zero usb_buf commit de89d0864f66c2a1b75becfdd6bf3793c07ce870 upstream. Allocate gspca_dev->usb_buf with kzalloc instead of kmalloc to ensure it is property zeroed. This fixes various syzbot errors about uninitialized data. Syzbot links: https://syzkaller.appspot.com/bug?extid=32310fc2aea76898d074 https://syzkaller.appspot.com/bug?extid=99706d6390be1ac542a2 https://syzkaller.appspot.com/bug?extid=64437af5c781a7f0e08e Reported-and-tested-by: syzbot+32310fc2aea76898d074@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+99706d6390be1ac542a2@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+64437af5c781a7f0e08e@syzkaller.appspotmail.com Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 78765f883b67f186118a788be8de2f7b1a973f3e Author: Sean Young Date: Sun Nov 10 11:25:13 2019 +0100 media: vp7045: do not read uninitialized values if usb transfer fails commit 26cff637121d8bb866ebd6515c430ac890e6ec80 upstream. It is not a fatal error if reading the mac address or the remote control decoder state fails. Reported-by: syzbot+ec869945d3dde5f33b43@syzkaller.appspotmail.com Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 003f1a9418de314cecda868bc2a5048b01ce5d8d Author: Sean Young Date: Sun Nov 10 11:15:37 2019 +0100 media: af9005: uninitialized variable printked commit 51d0c99b391f0cac61ad7b827c26f549ee55672c upstream. If usb_bulk_msg() fails, actual_length can be uninitialized. Reported-by: syzbot+9d42b7773d2fecd983ab@syzkaller.appspotmail.com Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit e787387e148d8bfcb84608d3eccd6259040870b7 Author: Sean Young Date: Sun Nov 10 11:04:40 2019 +0100 media: digitv: don't continue if remote control state can't be read commit eecc70d22ae51225de1ef629c1159f7116476b2e upstream. This results in an uninitialized variable read. Reported-by: syzbot+6bf9606ee955b646c0e1@syzkaller.appspotmail.com Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 8f5561c734b36c0b3089cbb381d8eecbe75ef72c Author: Jan Kara Date: Thu Dec 12 11:30:03 2019 +0100 reiserfs: Fix memory leak of journal device string commit 5474ca7da6f34fa95e82edc747d5faa19cbdfb5c upstream. When a filesystem is mounted with jdev mount option, we store the journal device name in an allocated string in superblock. However we fail to ever free that string. Fix it. Reported-by: syzbot+1c6756baf4b16b94d2a6@syzkaller.appspotmail.com Fixes: c3aa077648e1 ("reiserfs: Properly display mount options in /proc/mounts") CC: stable@vger.kernel.org Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit bc92426ff68b4dfab30920739f8b80a1614fa762 Author: Dan Carpenter Date: Thu Jan 30 22:11:07 2020 -0800 mm/mempolicy.c: fix out of bounds write in mpol_parse_str() commit c7a91bc7c2e17e0a9c8b9745a2cb118891218fd1 upstream. What we are trying to do is change the '=' character to a NUL terminator and then at the end of the function we restore it back to an '='. The problem is there are two error paths where we jump to the end of the function before we have replaced the '=' with NUL. We end up putting the '=' in the wrong place (possibly one element before the start of the buffer). Link: http://lkml.kernel.org/r/20200115055426.vdjwvry44nfug7yy@kili.mountain Reported-by: syzbot+e64a13c5369a194d67df@syzkaller.appspotmail.com Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display") Signed-off-by: Dan Carpenter Acked-by: Vlastimil Babka Dmitry Vyukov Cc: Michal Hocko Cc: Dan Carpenter Cc: Lee Schermerhorn Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 17f3667192ec7dbda7593dcf5a379b26cb31fc65 Author: Dirk Behme Date: Tue Jan 21 16:54:39 2020 +0100 arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean' commit d7bbd6c1b01cb5dd13c245d4586a83145c1d5f52 upstream. Since v4.3-rc1 commit 0723c05fb75e44 ("arm64: enable more compressed Image formats"), it is possible to build Image.{bz2,lz4,lzma,lzo} AArch64 images. However, the commit missed adding support for removing those images on 'make ARCH=arm64 (dist)clean'. Fix this by adding them to the target list. Make sure to match the order of the recipes in the makefile. Cc: stable@vger.kernel.org # v4.3+ Fixes: 0723c05fb75e44 ("arm64: enable more compressed Image formats") Signed-off-by: Dirk Behme Signed-off-by: Eugeniu Rosca Reviewed-by: Masahiro Yamada Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit efc469175fc69710e27928733372e6c57456bfe6 Author: Vitaly Chikunov Date: Tue Dec 24 20:20:29 2019 +0300 tools lib: Fix builds when glibc contains strlcpy() commit 6c4798d3f08b81c2c52936b10e0fa872590c96ae upstream. Disable a couple of compilation warnings (which are treated as errors) on strlcpy() definition and declaration, allowing users to compile perf and kernel (objtool) when: 1. glibc have strlcpy() (such as in ALT Linux since 2004) objtool and perf build fails with this (in gcc): In file included from exec-cmd.c:3: tools/include/linux/string.h:20:15: error: redundant redeclaration of ‘strlcpy’ [-Werror=redundant-decls] 20 | extern size_t strlcpy(char *dest, const char *src, size_t size); 2. clang ignores `-Wredundant-decls', but produces another warning when building perf: CC util/string.o ../lib/string.c:99:8: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] size_t __weak strlcpy(char *dest, const char *src, size_t size) ../../tools/include/linux/compiler.h:66:34: note: expanded from macro '__weak' # define __weak __attribute__((weak)) /usr/include/bits/string_fortified.h:151:8: note: previous definition is here __NTH (strlcpy (char *__restrict __dest, const char *__restrict __src, Committer notes: The #pragma GCC diagnostic directive was introduced in gcc 4.6, so check for that as well. Fixes: ce99091 ("perf tools: Move strlcpy() from perf to tools/lib/string.c") Fixes: 0215d59 ("tools lib: Reinstate strlcpy() header guard with __UCLIBC__") Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=118481 Signed-off-by: Vitaly Chikunov Reviewed-by: Dmitry Levin Cc: Dmitry Levin Cc: Josh Poimboeuf Cc: kbuild test robot Cc: Peter Zijlstra Cc: stable@vger.kernel.org Cc: Vineet Gupta Link: http://lore.kernel.org/lkml/20191224172029.19690-1-vt@altlinux.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit ef1c4ec84f971ef7edca96f633df93db20699f04 Author: Chanwoo Choi Date: Tue Nov 5 18:18:03 2019 +0900 PM / devfreq: Add new name attribute for sysfs commit 2fee1a7cc6b1ce6634bb0f025be2c94a58dfa34d upstream. The commit 4585fbcb5331 ("PM / devfreq: Modify the device name as devfreq(X) for sysfs") changed the node name to devfreq(x). After this commit, it is not possible to get the device name through /sys/class/devfreq/devfreq(X)/*. Add new name attribute in order to get device name. Cc: stable@vger.kernel.org Fixes: 4585fbcb5331 ("PM / devfreq: Modify the device name as devfreq(X) for sysfs") Signed-off-by: Chanwoo Choi Signed-off-by: Greg Kroah-Hartman commit 3efca0c1421c30849857aeb8a948902f33c460e6 Author: Andres Freund Date: Wed Jan 8 20:30:30 2020 -0800 perf c2c: Fix return type for histogram sorting comparision functions commit c1c8013ec34d7163431d18367808ea40b2e305f8 upstream. Commit 722ddfde366f ("perf tools: Fix time sorting") changed - correctly so - hist_entry__sort to return int64. Unfortunately several of the builtin-c2c.c comparison routines only happened to work due the cast caused by the wrong return type. This causes meaningless ordering of both the cacheline list, and the cacheline details page. E.g a simple: perf c2c record -a sleep 3 perf c2c report will result in cacheline table like ================================================= Shared Data Cache Line Table ================================================= # # ------- Cacheline ---------- Total Tot - LLC Load Hitm - - Store Reference - - Load Dram - LLC Total - Core Load Hit - - LLC Load Hit - # Index Address Node PA cnt records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rmt # ..... .............. .... ...... ....... ...... ..... ..... ... .... ..... ...... ...... .... ...... ..... ..... ..... ... .... ....... 0 0x7f0d27ffba00 N/A 0 52 0.12% 13 6 7 12 12 0 0 7 14 40 4 16 0 0 0 1 0x7f0d27ff61c0 N/A 0 6353 14.04% 1475 801 674 779 779 0 0 718 1392 5574 1299 1967 0 115 0 2 0x7f0d26d3ec80 N/A 0 71 0.15% 16 4 12 13 13 0 0 12 24 58 1 20 0 9 0 3 0x7f0d26d3ec00 N/A 0 98 0.22% 23 17 6 19 19 0 0 6 12 79 0 40 0 10 0 i.e. with the list not being ordered by Total Hitm. Fixes: 722ddfde366f ("perf tools: Fix time sorting") Signed-off-by: Andres Freund Tested-by: Michael Petlan Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Namhyung Kim Cc: Peter Zijlstra Cc: stable@vger.kernel.org # v3.16+ Link: http://lore.kernel.org/lkml/20200109043030.233746-1-andres@anarazel.de Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 943c9bb5bf1aa5756225f501a84b16a85f3e4927 Author: Andy Shevchenko Date: Thu Jan 30 22:11:01 2020 -0800 lib/test_bitmap: correct test data offsets for 32-bit commit 69334ca530da80c1563ac6a3bd32afa40884ccd3 upstream. On 32-bit platform the size of long is only 32 bits which makes wrong offset in the array of 64 bit size. Calculate offset based on BITS_PER_LONG. Link: http://lkml.kernel.org/r/20200109103601.45929-1-andriy.shevchenko@linux.intel.com Fixes: 30544ed5de43 ("lib/bitmap: introduce bitmap_replace() helper") Signed-off-by: Andy Shevchenko Reported-by: Guenter Roeck Cc: Rasmus Villemoes Cc: Yury Norov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 917c9c80663a1aa4ea5e9703c73641256c1d2488 Author: Andreas Gruenbacher Date: Sun Dec 8 13:12:49 2019 +0000 gfs2: Another gfs2_find_jhead fix commit eed0f953b90e86e765197a1dad06bb48aedc27fe upstream. On filesystems with a block size smaller than the page size, gfs2_find_jhead can split a page across two bios (for example, when blocks are not allocated consecutively). When that happens, the first bio that completes will unlock the page in its bi_end_io handler even though the page hasn't been read completely yet. Fix that by using a chained bio for the rest of the page. While at it, clean up the sector calculation logic in gfs2_log_alloc_bio. In gfs2_find_jhead, simplify the disk block and offset calculation logic and fix a variable name. Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Andreas Gruenbacher Signed-off-by: Greg Kroah-Hartman commit 286129763464fb60338b731d4dc843e2a1272936 Author: David Michael Date: Sun Jan 26 17:31:58 2020 -0500 KVM: PPC: Book3S PR: Fix -Werror=return-type build failure [ Upstream commit fd24a8624eb29d3b6b7df68096ce0321b19b03c6 ] Fixes: 3a167beac07c ("kvm: powerpc: Add kvmppc_ops callback") Signed-off-by: David Michael Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin commit 1f15e25407508fa9f70cd0ab16fa23ed54758ea9 Author: Xiaochen Shen Date: Thu Jan 9 00:28:04 2020 +0800 x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup [ Upstream commit 074fadee59ee7a9d2b216e9854bd4efb5dad679f ] There is a race condition in the following scenario which results in an use-after-free issue when reading a monitoring file and deleting the parent ctrl_mon group concurrently: Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls kernfs_break_active_protection() to drop the active reference of kernfs node in rdtgroup_kn_lock_live(). In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub kernfs nodes to drop the active reference when removing all subtree kernfs nodes recursively. Thread 2 could block on kernfs_remove() until Thread 1 calls kernfs_break_active_protection(). Only after kernfs_remove() completes the refcount of rdtgrp could be trusted. Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(), Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0 instead of 1 due to the race. In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_mondata_show) Thread 2 (rdtgroup_rmdir) -------------------------------- ------------------------- rdtgroup_kn_lock_live /* * kn active protection until * kernfs_break_active_protection(kn) */ rdtgrp = kernfs_to_rdtgroup(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp /* * sentry->waitcount should be 1 * but is 0 now due to the race. */ kfree(sentry)*[1] /* * Only after kernfs_remove() * completes, the refcount of * rdtgrp could be trusted. */ atomic_inc(&rdtgrp->waitcount) /* kn->active-- */ kernfs_break_active_protection(kn) rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED /* * Blocking routine, wait for * all sub kernfs nodes to drop * active reference in * kernfs_break_active_protection. */ kernfs_remove(rdtgrp->kn) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kernfs_unbreak_active_protection(kn) kfree(rdtgrp) mutex_lock mon_event_read rdtgroup_kn_unlock mutex_unlock /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) /* kn->active++ */ kernfs_unbreak_active_protection(kn) kfree(rdtgrp) Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp. Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Suggested-by: Reinette Chatre Signed-off-by: Xiaochen Shen Signed-off-by: Borislav Petkov Reviewed-by: Reinette Chatre Reviewed-by: Tony Luck Acked-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com Signed-off-by: Sasha Levin commit b5a4949897ebada8132d54c93208989482503103 Author: Xiaochen Shen Date: Thu Jan 9 00:28:03 2020 +0800 x86/resctrl: Fix use-after-free when deleting resource groups [ Upstream commit b8511ccc75c033f6d54188ea4df7bf1e85778740 ] A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount) that indicates how many waiters expect this rdtgrp to exist. Waiters could be waiting on rdtgroup_mutex or some work sitting on a task's workqueue for when the task returns from kernel mode or exits. The deletion of a rdtgrp is intended to have two phases: (1) while holding rdtgroup_mutex the necessary cleanup is done and rdtgrp->flags is set to RDT_DELETED, (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed only if there are no waiters and its flag is set to RDT_DELETED. Upon gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check for the RDT_DELETED flag. When unmounting the resctrl file system or deleting ctrl_mon groups, all of the subdirectories are removed and the data structure of rdtgrp is forcibly freed without checking rdtgrp->waitcount. If at this point there was a waiter on rdtgrp then a use-after-free issue occurs when the waiter starts running and accesses the rdtgrp structure it was waiting on. See kfree() calls in [1], [2] and [3] in these two call paths in following scenarios: (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() There are several scenarios that result in use-after-free issue in following: Scenario 1: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdt_kill_sb() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdt_kill_sb) ------------------------------- ---------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task /* * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked */ atomic_inc(&rdtgrp->waitcount) /* Callback move_myself will be scheduled for later */ task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) mutex_lock rmdir_all_sub /* * sentry and rdtgrp are freed * without checking refcount */ free_all_child_rdtgrp kfree(sentry)*[1] kfree(rdtgrp)*[2] mutex_unlock /* * Callback is scheduled to execute * after rdt_kill_sb is finished */ move_myself /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1] or [2]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) Scenario 2: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdtgroup_rmdir) ------------------------------- ------------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task /* * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked */ atomic_inc(&rdtgrp->waitcount) /* Callback move_myself will be scheduled for later */ task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp /* * sentry is freed without * checking refcount */ kfree(sentry)*[3] rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) /* * Callback is scheduled to execute * after rdt_kill_sb is finished */ move_myself /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [3]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed like following. Note that "0x6b" is POISON_FREE after kfree(). The corrupted bits "0x6a", "0x64" at offset 0x424 correspond to waitcount member of struct rdtgroup which was freed: Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048 420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkjkkkkkkkkkkk Single bit error detected. Probably bad RAM. Run memtest86+ or a similar memory test tool. Next obj: start=ffff9504c5b0d800, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048 420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkdkkkkkkkkkkk Prev obj: start=ffff9504c58ab000, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Fix this by taking reference count (waitcount) of rdtgrp into account in the two call paths that currently do not do so. Instead of always freeing the resource group it will only be freed if there are no waiters on it. If there are waiters, the resource group will have its flags set to RDT_DELETED. It will be left to the waiter to free the resource group when it starts running and finding that it was the last waiter and the resource group has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Suggested-by: Reinette Chatre Signed-off-by: Xiaochen Shen Signed-off-by: Borislav Petkov Reviewed-by: Reinette Chatre Reviewed-by: Tony Luck Acked-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com Signed-off-by: Sasha Levin commit aec861a60b8cd50f251d929af01b8ae2a463503a Author: Xiaochen Shen Date: Thu Jan 9 00:28:05 2020 +0800 x86/resctrl: Fix a deadlock due to inaccurate reference [ Upstream commit 334b0f4e9b1b4a1d475f803419d202f6c5e4d18e ] There is a race condition which results in a deadlock when rmdir and mkdir execute concurrently: $ ls /sys/fs/resctrl/c1/mon_groups/m1/ cpus cpus_list mon_data tasks Thread 1: rmdir /sys/fs/resctrl/c1 Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1 3 locks held by mkdir/48649: #0: (sb_writers#17){.+.+}, at: [] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [] filename_create+0x7b/0x170 #2: (rdtgroup_mutex){+.+.}, at: [] rdtgroup_kn_lock_live+0x3d/0x70 4 locks held by rmdir/48652: #0: (sb_writers#17){.+.+}, at: [] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [] do_rmdir+0x13f/0x1e0 #2: (&type->i_mutex_dir_key#8){++++}, at: [] vfs_rmdir+0x4d/0x120 #3: (rdtgroup_mutex){+.+.}, at: [] rdtgroup_kn_lock_live+0x3d/0x70 Thread 1 is deleting control group "c1". Holding rdtgroup_mutex, kernfs_remove() removes all kernfs nodes under directory "c1" recursively, then waits for sub kernfs node "mon_groups" to drop active reference. Thread 2 is trying to create a subdirectory "m1" in the "mon_groups" directory. The wrapper kernfs_iop_mkdir() takes an active reference to the "mon_groups" directory but the code drops the active reference to the parent directory "c1" instead. As a result, Thread 1 is blocked on waiting for active reference to drop and never release rdtgroup_mutex, while Thread 2 is also blocked on trying to get rdtgroup_mutex. Thread 1 (rdtgroup_rmdir) Thread 2 (rdtgroup_mkdir) (rmdir /sys/fs/resctrl/c1) (mkdir /sys/fs/resctrl/c1/mon_groups/m1) ------------------------- ------------------------- kernfs_iop_mkdir /* * kn: "m1", parent_kn: "mon_groups", * prgrp_kn: parent_kn->parent: "c1", * * "mon_groups", parent_kn->active++: 1 */ kernfs_get_active(parent_kn) kernfs_iop_rmdir /* "c1", kn->active++ */ kernfs_get_active(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) /* "c1", kn->active-- */ kernfs_break_active_protection(kn) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp sentry->flags = RDT_DELETED rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED kernfs_get(kn) kernfs_remove(rdtgrp->kn) __kernfs_remove /* "mon_groups", sub_kn */ atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active) kernfs_drain(sub_kn) /* * sub_kn->active == KN_DEACTIVATED_BIAS + 1, * waiting on sub_kn->active to drop, but it * never drops in Thread 2 which is blocked * on getting rdtgroup_mutex. */ Thread 1 hangs here ----> wait_event(sub_kn->active == KN_DEACTIVATED_BIAS) ... rdtgroup_mkdir rdtgroup_mkdir_mon(parent_kn, prgrp_kn) mkdir_rdt_prepare(parent_kn, prgrp_kn) rdtgroup_kn_lock_live(prgrp_kn) atomic_inc(&rdtgrp->waitcount) /* * "c1", prgrp_kn->active-- * * The active reference on "c1" is * dropped, but not matching the * actual active reference taken * on "mon_groups", thus causing * Thread 1 to wait forever while * holding rdtgroup_mutex. */ kernfs_break_active_protection( prgrp_kn) /* * Trying to get rdtgroup_mutex * which is held by Thread 1. */ Thread 2 hangs here ----> mutex_lock ... The problem is that the creation of a subdirectory in the "mon_groups" directory incorrectly releases the active protection of its parent directory instead of itself before it starts waiting for rdtgroup_mutex. This is triggered by the rdtgroup_mkdir() flow calling rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the parent control group ("c1") as argument. It should be called with kernfs node "mon_groups" instead. What is currently missing is that the kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp. Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() instead. And then it operates on the same rdtgroup structure but handles the active reference of kernfs node "mon_groups" to prevent deadlock. The same changes are also made to the "mon_data" directories. This results in some unused function parameters that will be cleaned up in follow-up patch as the focus here is on the fix only in support of backporting efforts. Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Suggested-by: Reinette Chatre Signed-off-by: Xiaochen Shen Signed-off-by: Borislav Petkov Reviewed-by: Reinette Chatre Reviewed-by: Tony Luck Acked-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com Signed-off-by: Sasha Levin commit b9bbd35c4ca93de6aa649b46e02e95e425c9fdd3 Author: Ronnie Sahlberg Date: Fri Jan 31 05:52:51 2020 +1000 cifs: fix soft mounts hanging in the reconnect code commit c54849ddd832ae0a45cab16bcd1ed2db7da090d7 upstream. RHBZ: 1795429 In recent DFS updates we have a new variable controlling how many times we will retry to reconnect the share. If DFS is not used, then this variable is initialized to 0 in: static inline int dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl) { return tl ? tl->tl_numtgts : 0; } This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1 and never actually get to pass this conditional: if (--retries) continue; The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN and basically the kernel threads are virtually hung and unkillable. Fixes: a3a53b7603798fd8 (cifs: Add support for failover in smb2_reconnect()) Signed-off-by: Ronnie Sahlberg Signed-off-by: Steve French Reviewed-by: Paulo Alcantara (SUSE) CC: Stable Signed-off-by: Greg Kroah-Hartman commit d76341d93dedbcf6ed5a08dfc8bce82d3e9a772b Author: Al Viro Date: Sat Feb 1 16:26:45 2020 +0000 vfs: fix do_last() regression commit 6404674acd596de41fd3ad5f267b4525494a891a upstream. Brown paperbag time: fetching ->i_uid/->i_mode really should've been done from nd->inode. I even suggested that, but the reason for that has slipped through the cracks and I went for dir->d_inode instead - made for more "obvious" patch. Analysis: - at the entry into do_last() and all the way to step_into(): dir (aka nd->path.dentry) is known not to have been freed; so's nd->inode and it's equal to dir->d_inode unless we are already doomed to -ECHILD. inode of the file to get opened is not known. - after step_into(): inode of the file to get opened is known; dir might be pointing to freed memory/be negative/etc. - at the call of may_create_in_sticky(): guaranteed to be out of RCU mode; inode of the file to get opened is known and pinned; dir might be garbage. The last was the reason for the original patch. Except that at the do_last() entry we can be in RCU mode and it is possible that nd->path.dentry->d_inode has already changed under us. In that case we are going to fail with -ECHILD, but we need to be careful; nd->inode is pointing to valid struct inode and it's the same as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we should use that. Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com Wearing-brown-paperbag: Al Viro Cc: stable@kernel.org Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late") Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman