Get Started

          Tracking down a glibc regression

          We’d just upgraded our glibc package from 2.32 to 2.33, when we noticed some peculiar behavior. Glibc 2.32 had a number of minor security issues and needed to be patched or upgraded. Instead of back-porting the patches to 2.32 we decided to upgrade directly to 3.33. After completing the upgrade and rebooting, a program was failing to start. Reverting glibc to 2.32 got things working again and confirmed that the glibc upgrade was the source of the trouble, but we wanted to understand why things were broken with the newer version.

          The first thing I reached for was strace. Strace shows the active system calls that a program is making. Often when a program is hanging, strace will show that it is blocking in a call to open a tcp connection or lock a file. If you want to learn more about strace, Julia Evans has a great series of posts on it.  In this case strace showed that the broken program was not stuck anywhere, but very busy calling pselect6 and recvfrom:

          pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999521}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 222 pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999516}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 514 pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999529})

          When glibc was downgraded, strace showed it was using select instead of pselect6 and would eventually timeout:

          select(4, [3], NULL, NULL, {0, 13272}) = 1 (in [3], left {0, 13234}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 151 select(4, [3], NULL, NULL, {0, 13234}) = 1 (in [3], left {0, 13232}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 151 select(4, [3], NULL, NULL, {0, 13232}) = 1 (in [3], left {0, 12712}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 148 select(4, [3], NULL, NULL, {0, 12712}) = 1 (in [3], left {0, 12710}) recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 148 select(4, [3], NULL, NULL, {0, 12710}) = 0 (Timeout)

          So at this point we knew:

          • Newer glibc is using pselect6 instead of select
          • The select timeout argument is decreasing and eventually times out
          • The pselect6 timeout is not decreasing and it never times out.

          Select is a libc function and syscall for implementing asynchronous IO.  There are more modern alternatives like epoll or io_uring, but for many years it was the standard for handling IO between different sources, or to read from a single source with a timeout.  One of the arguments to select is a timeout.  This is used to tell select how long to wait before returning that nothing was ready.

          If you read the man page for select, it has the following to say about the timeout argument:

          select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument. On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns.

          And if you look at the standard, it confirms that the timeout argument may be modified, but isn’t required to be.

          Upon successful completion, the select() function may modify the object pointed to by the timeout argument.

          The program in question was relying on the timeout value changing after the call to select. It was running select in a loop and re-using the timeout value to act as an overall deadline for the loop. But why was it broken now? There didn’t appear to be anything in the release notes for glibc to indicate that they had changed this behavior.  A minimal test program was written to call select and output what the resulting timeout was. The test program uses fork to create a parent and child process. The parent calls select with a 4 second timeout, while the child pauses for 2 seconds and then writes something to the parent. We know we have the old behavior if the timeout option changes from 4 to 2.  If the timeout value doesn’t change, we know we are seeing the new behavior.

          #include <stdio.h> #include <sys/time.h> #include <sys/types.h> #include <unistd.h> #include <string.h> #include <stdlib.h> int main(void) { fd_set rfds; struct timeval tv; int retval; int fd1[2]; pid_t p; if (pipe(fd1)==-1) { perror("pipe"); return 1; } p = fork(); if (p < 0) { perror("fork"); return 1; } // Parent process: Call select on the pipe with a 4 second timeout. if (p > 0) { FD_ZERO(&rfds); FD_SET(fd1[0], &rfds); /* Wait up to four seconds. */ tv.tv_sec = 4; tv.tv_usec = 0; printf("tv before: %ld.%06ld\n", tv.tv_sec, tv.tv_usec); retval = select(fd1[0]+1, &rfds, NULL, NULL, &tv); if (retval == -1) perror("select()"); else if (retval) printf("Data is available now.\n"); else printf("No data within four seconds.\n"); printf("tv after : %ld.%06ld\n", tv.tv_sec, tv.tv_usec); int broken = tv.tv_sec >= 3; printf("select: %s\n", broken ? "BROKEN" : "OK"); return broken; } else { // child process: Pause for 2 seconds, then write something to the parent. sleep(2); write(fd1[1], "Hello", strlen("Hello")); close(fd1[1]); exit(0); } }

          On 2.32 this outputs:

          tv before: 4.000000 Data is available now. tv after : 2.000002 select: OK

          But on 2.33 it outputs:

          tv before: 4.000000 Data is available now. tv after : 4.000000 select: BROKEN

          Because the value of the timeout argument is not guaranteed to be updated after the call to select either of these outcomes is valid. This means it’s not truly broken behavior, but the change is still unexpected. The timeout argument is no longer being modified to reflect the amount of time not slept.  Any program expecting this behavior may go into an infinite loop or otherwise malfunction.  At this point we know exactly what changed, but we still don’t know when or why it was changed.

          Git has a feature called bisect that can be used to narrow down exactly when this behavior changed.  Given a known good version and a known bad version, git will use binary search across the commit history to find the commit that introduced the problem.  In order for this process to work we need a way to run the test program against an arbitrary version of glibc.  After working out how to do this once manually, the process was turned into a script:

          #!/bin/sh rm -fr /tmp/glibc-build /tmp/lib /tmp/usr || exit 1 mkdir -p /tmp/glibc-build || exit 1 cd /tmp/glibc-build || exit 1 /tmp/src/glibc/configure \ --prefix=/tmp/usr \ --disable-profile \ libc_cv_slibdir=/tmp/lib > /dev/null || exit 125 make -j 8 > /dev/null || exit 125 make install > /dev/null || exit 125 SYSROOT=/tmp/usr cd ~ gcc \ -L${SYSROOT}/tmp/lib \ -I${SYSROOT}/include \ --sysroot=${SYSROOT} \ -Wl,-rpath=/tmp/lib/ \ -Wl,--dynamic-linker=/tmp/lib/ld-linux-x86-64.so.2 \ -o select_test select_test.c || exit 1 exec ./select_test

          This script will first build glibc from /tmp/glibc and install it into /tmp/usr and /tmp/lib. If the build succeeds, it will build and run the select test program. If the build fails, we make sure to exit with the status 125, which tells git bisect to skip this revision.  The required flags for gcc were pieced together from information at https://sourceware.org/glibc/wiki/Testing/Builds.

          With this script in place, we can have git bisect run it for us and use it to narrow down the commit that changed the select behavior. First we tell it that 2.33 was “bad” and 2.31 was “good,” then we start the automatic process using the above script:

          git checkout glibc-2.33 git bisect bad git checkout glibc-2.31 git bisect good git bisect run ~/bisect_glibc.sh

          Because git bisect uses binary search it takes a small number of iterations to search through hundreds of commits. After a short wait the following commit was found:

          commit 2433d39b69743f100f972e7886f91a2e21795ef0 (refs/bisect/bad) Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Jul 6 16:06:51 2020 -0300 linux: Add time64 select support The syscall __NR_pselect6_time64 (32-bit) or __NR_pselect6 (64-bit) is used as default. For architectures with __ASSUME_TIME64_SYSCALLS the 32-bit fallback uses __NR_select/__NR__newselect or __NR_pselect6 (it should cover the microblaze case where older kernels do not provide __NR_pselect6).

          As part of the testing I ran the script against the latest commit in glibc and that showed the old select behavior again. Was this already fixed? The commit log showed a number of things related to select and timeouts, including bug numbers:

          $ git log master --pretty=format:"%h%x09%ad%x09%s"|grep select|head -n 30 cedbf6d5f3 Thu Apr 8 07:39:32 2021 -0300 linux: always update select timeout (BZ #27706) 9d7c5cc38e Wed Mar 31 13:53:34 2021 -0300 linux: Normalize and return timeout on select (BZ #27651) 49a40ba18e Fri Apr 9 10:05:13 2021 -0300 libsupport: Add support_select_normalizes_timeout 5628f103f5 Fri Apr 9 10:02:30 2021 -0300 libsupport: Add support_select_modifies_timeout 1b53b5d970 Thu Mar 25 16:57:45 2021 -0300 misc: Fix tst-select timeout handling (BZ#27648) 7759be2593 Tue Mar 23 17:29:26 2021 +0100 S390: Also check vector support in memmove ifunc-selector [BZ #27511] bff3019afc Sat Mar 13 23:34:21 2021 +0100 tst: Provide test for select

          Git bisect can be used to find the commit that restored the old behavior as well, the logic just needs to be flipped:

          git checkout glibc-2.33 git bisect old git checkout master git bisect new git bisect run ~/bisect_glibc_backwards.sh

          Where the backwards version swaps out:

          exec ./select_test

          For:

          exit ! ./select_test

          That bisection results in confirming that the recent commits from only a few weeks ago are what restored the previous behavior:

          commit 9d7c5cc38e58fb0923e88901f87174a511b61552 (refs/bisect/new) Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Mar 31 13:53:34 2021 -0300 linux: Normalize and return timeout on select (BZ #27651) The commit 2433d39b697, which added time64 support to select, changed the function to use __NR_pselect6 (or __NR_pelect6_time64) on all architectures. However, on architectures where the symbol was implemented with __NR_select the kernel normalizes the passed timeout instead of return EINVAL. For instance, the input timeval { 0, 5000000 } is interpreted as { 5, 0 }. And as indicated by BZ #27651, this semantic seems to be expected and changing it results in some performance issues (most likely the program does not check the return code and keeps issuing select with unormalized tv_usec argument).

          By using a little Dockerfile we can quickly test various other distributions to see which ones might have had their behavior changed recently.  To test fedora 34, we can use the following Dockerfile (other distributions are similar):

          FROM fedora:34 RUN /lib64/lib.so.6 | grep version #running libc outputs the version RUN dnf -y install gcc ADD ./select_test.c . RUN gcc -o select_test select_test.c RUN ./select_test

          This results in the following:

          Distro libc Timeout value
          Fedora 33 Glibc 2.32 Time remaining
          Fedora 34 Glibc 2.33 Unchanged
          Ubuntu 20.04 Glibc 2.31 Time remaining
          Ubuntu 20.10 Glibc 2.32 Time remaining
          Ubuntu 21.04 Glibc 2.33 Unchanged
          Debian unstable Glibc 2.31 Time remaining
          Archlinux latest Glibc 2.33 Unchanged
          Alpine musl Unchanged
          OS X   Unchanged

          Even though the old behavior will soon be restored, linux distributions are shipping glibc 2.33. The latest versions of Fedora, Ubuntu, and Arch are using Glibc 2.33 and show the different select behavior. Platforms that are not based on glibc also do not modify the timeout argument.

          This issue shows that relying on behavior that isn’t explicitly guaranteed will eventually break things in ways that might not immediately be obvious. Ultimately any programs that have this issue need to be patched to follow the standard and not rely on the implementation-defined behavior of the timeout argument. 

           

          Search

            Recent Posts