[spam][crazy] Diagnosing an Explosive Failure

k gmkarl at gmail.com
Wed Jan 5 01:44:34 PST 2022


# build vmem-wip of mimalloc
git clone https://github.com/xloem/mimalloc
mkdir mimalloc/build
cd mimalloc/build
cmake ..
make -j
cd ../..

# use it to build pytorch
git clone https://github.com/pytorch/pytorch
cd pytorch
LD_PRELOAD=../../mimalloc/build/libmimalloc.so VMEM_PREFIX=vmem
python3 setup.py build

I've tried the above just about two or three times on a raspberry pi
running ubuntu 64.  When I do, a SIGKILL is sent ijmmediately to all
running processes.  My PTYs close.  GDM restarts its session.  Quite
surprising that a software bug run by a non-root user would cause
that.

I'm still presently pretty inhibited around root-process and
live-kernel debugging, so my approach would be to narrow down the
build process and the code executed using manual bisection of source
lines.

The likely culprit is the addition of libseccomp to the code, as a
stop-gap to prevent the use of fork() corrupting memory:
+      // disable fork and clone, which would write to the same vmem file
+      void * fctx = seccomp_init(SCMP_ACT_ALLOW);
+      if (fctx) {
+        seccomp_rule_add(fctx, SCMP_ACT_ERRNO(ENOMEM), SCMP_SYS(fork), 0);
+        seccomp_rule_add(fctx, SCMP_ACT_ERRNO(ENOMEM), SCMP_SYS(clone), 0);
+        seccomp_load(fctx);
+        seccomp_release(fctx);
+      }

I'm not sure if the disabling of clone() is actually needed as well, I
just saw somewhere they were similar similar calls.

The explosive error doesn't happen on the small test case I wrote.
Only on the large build.

I may not debug this immediately as I have other things going on
today, but it seemed pleasant I guess to add it to the list.  It's not
often you encounter a software bug that resets your entire system
interface.


More information about the cypherpunks mailing list