We’d just upgraded our glibc package from 2.32 to 2.33, when we noticed some peculiar behavior. Glibc 2.32 had a number of minor security issues and needed to be patched or upgraded. Instead of back-porting the patches to 2.32 we decided to upgrade directly to 3.33. After completing the upgrade and rebooting, a program was failing to start. Reverting glibc to 2.32 got things working again and confirmed that the glibc upgrade was the source of the trouble, but we wanted to understand why things were broken with the newer version.
The first thing I reached for was strace. Strace shows the active system calls that a program is making. Often when a program is hanging, strace will show that it is blocking in a call to open a tcp connection or lock a file. If you want to learn more about strace, Julia Evans has a great series of posts on it. In this case strace showed that the broken program was not stuck anywhere, but very busy calling pselect6 and recvfrom:
pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999521})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 222
pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999516})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 514
pselect6(4, [3], NULL, NULL, {0, 500000000}, 0) = 1 (in [3], left {0, 499999529})
When glibc was downgraded, strace showed it was using select instead of pselect6 and would eventually timeout:
select(4, [3], NULL, NULL, {0, 13272}) = 1 (in [3], left {0, 13234})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 151
select(4, [3], NULL, NULL, {0, 13234}) = 1 (in [3], left {0, 13232})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 151
select(4, [3], NULL, NULL, {0, 13232}) = 1 (in [3], left {0, 12712})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 148
select(4, [3], NULL, NULL, {0, 12712}) = 1 (in [3], left {0, 12710})
recvfrom(3, "..."..., 1514, 0, NULL, NULL) = 148
select(4, [3], NULL, NULL, {0, 12710}) = 0 (Timeout)
So at this point we knew:
Select is a libc function and syscall for implementing asynchronous IO. There are more modern alternatives like epoll or io_uring, but for many years it was the standard for handling IO between different sources, or to read from a single source with a timeout. One of the arguments to select is a timeout. This is used to tell select how long to wait before returning that nothing was ready.
If you read the man page for select, it has the following to say about the timeout argument:
select() may update the timeout argument to indicate how
much time was left. pselect() does not change this argument.
On Linux, select() modifies timeout to reflect the amount
of time not slept; most other implementations do not do this.
(POSIX.1-2001 permits either behavior.) This causes problems
both when Linux code which reads timeout is ported to
other operating systems, and when code is ported to Linux
that reuses a struct timeval for multiple select()s in a
loop without reinitializing it. Consider timeout to be
undefined after select() returns.
And if you look at the standard, it confirms that the timeout argument may be modified, but isn’t required to be.
Upon successful completion, the select() function may
modify the object pointed to by the timeout argument.
The program in question was relying on the timeout value changing after the call to select. It was running select in a loop and re-using the timeout value to act as an overall deadline for the loop. But why was it broken now? There didn’t appear to be anything in the release notes for glibc to indicate that they had changed this behavior. A minimal test program was written to call select and output what the resulting timeout was. The test program uses fork to create a parent and child process. The parent calls select with a 4 second timeout, while the child pauses for 2 seconds and then writes something to the parent. We know we have the old behavior if the timeout option changes from 4 to 2. If the timeout value doesn’t change, we know we are seeing the new behavior.
#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
fd_set rfds;
struct timeval tv;
int retval;
int fd1[2];
pid_t p;
if (pipe(fd1)==-1) {
perror("pipe");
return 1;
}
p = fork();
if (p < 0) {
perror("fork");
return 1;
}
// Parent process: Call select on the pipe with a 4 second timeout.
if (p > 0) {
FD_ZERO(&rfds);
FD_SET(fd1[0], &rfds);
/* Wait up to four seconds. */
tv.tv_sec = 4;
tv.tv_usec = 0;
printf("tv before: %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
retval = select(fd1[0]+1, &rfds, NULL, NULL, &tv);
if (retval == -1)
perror("select()");
else if (retval)
printf("Data is available now.\n");
else
printf("No data within four seconds.\n");
printf("tv after : %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
int broken = tv.tv_sec >= 3;
printf("select: %s\n", broken ? "BROKEN" : "OK");
return broken;
} else { // child process: Pause for 2 seconds, then write something to the parent.
sleep(2);
write(fd1[1], "Hello", strlen("Hello"));
close(fd1[1]);
exit(0);
}
}
On 2.32 this outputs:
tv before: 4.000000
Data is available now.
tv after : 2.000002
select: OK
But on 2.33 it outputs:
tv before: 4.000000
Data is available now.
tv after : 4.000000
select: BROKEN
Because the value of the timeout argument is not guaranteed to be updated after the call to select either of these outcomes is valid. This means it’s not truly broken behavior, but the change is still unexpected. The timeout argument is no longer being modified to reflect the amount of time not slept. Any program expecting this behavior may go into an infinite loop or otherwise malfunction. At this point we know exactly what changed, but we still don’t know when or why it was changed.
Git has a feature called bisect that can be used to narrow down exactly when this behavior changed. Given a known good version and a known bad version, git will use binary search across the commit history to find the commit that introduced the problem. In order for this process to work we need a way to run the test program against an arbitrary version of glibc. After working out how to do this once manually, the process was turned into a script:
#!/bin/sh
rm -fr /tmp/glibc-build /tmp/lib /tmp/usr || exit 1
mkdir -p /tmp/glibc-build || exit 1
cd /tmp/glibc-build || exit 1
/tmp/src/glibc/configure \
--prefix=/tmp/usr \
--disable-profile \
libc_cv_slibdir=/tmp/lib > /dev/null || exit 125
make -j 8 > /dev/null || exit 125
make install > /dev/null || exit 125
SYSROOT=/tmp/usr
cd ~
gcc \
-L${SYSROOT}/tmp/lib \
-I${SYSROOT}/include \
--sysroot=${SYSROOT} \
-Wl,-rpath=/tmp/lib/ \
-Wl,--dynamic-linker=/tmp/lib/ld-linux-x86-64.so.2 \
-o select_test select_test.c || exit 1
exec ./select_test
This script will first build glibc from /tmp/glibc and install it into /tmp/usr and /tmp/lib. If the build succeeds, it will build and run the select test program. If the build fails, we make sure to exit with the status 125, which tells git bisect to skip this revision. The required flags for gcc were pieced together from information at https://sourceware.org/glibc/wiki/Testing/Builds.
With this script in place, we can have git bisect run it for us and use it to narrow down the commit that changed the select behavior. First we tell it that 2.33 was “bad” and 2.31 was “good,” then we start the automatic process using the above script:
git checkout glibc-2.33
git bisect bad
git checkout glibc-2.31
git bisect good
git bisect run ~/bisect_glibc.sh
Because git bisect uses binary search it takes a small number of iterations to search through hundreds of commits. After a short wait the following commit was found:
commit 2433d39b69743f100f972e7886f91a2e21795ef0 (refs/bisect/bad)
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Mon Jul 6 16:06:51 2020 -0300
linux: Add time64 select support
The syscall __NR_pselect6_time64 (32-bit) or __NR_pselect6 (64-bit)
is used as default. For architectures with __ASSUME_TIME64_SYSCALLS
the 32-bit fallback uses __NR_select/__NR__newselect or __NR_pselect6
(it should cover the microblaze case where older kernels do not
provide __NR_pselect6).
As part of the testing I ran the script against the latest commit in glibc and that showed the old select behavior again. Was this already fixed? The commit log showed a number of things related to select and timeouts, including bug numbers:
$ git log master --pretty=format:"%h%x09%ad%x09%s"|grep select|head -n 30
cedbf6d5f3 Thu Apr 8 07:39:32 2021 -0300 linux: always update select timeout (BZ #27706)
9d7c5cc38e Wed Mar 31 13:53:34 2021 -0300 linux: Normalize and return timeout on select (BZ #27651)
49a40ba18e Fri Apr 9 10:05:13 2021 -0300 libsupport: Add support_select_normalizes_timeout
5628f103f5 Fri Apr 9 10:02:30 2021 -0300 libsupport: Add support_select_modifies_timeout
1b53b5d970 Thu Mar 25 16:57:45 2021 -0300 misc: Fix tst-select timeout handling (BZ#27648)
7759be2593 Tue Mar 23 17:29:26 2021 +0100 S390: Also check vector support in memmove ifunc-selector [BZ #27511]
bff3019afc Sat Mar 13 23:34:21 2021 +0100 tst: Provide test for select
Git bisect can be used to find the commit that restored the old behavior as well, the logic just needs to be flipped:
git checkout glibc-2.33
git bisect old
git checkout master
git bisect new
git bisect run ~/bisect_glibc_backwards.sh
Where the backwards version swaps out:
exec ./select_test
For:
exit ! ./select_test
That bisection results in confirming that the recent commits from only a few weeks ago are what restored the previous behavior:
commit 9d7c5cc38e58fb0923e88901f87174a511b61552 (refs/bisect/new)
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Wed Mar 31 13:53:34 2021 -0300
linux: Normalize and return timeout on select (BZ #27651)
The commit 2433d39b697, which added time64 support to select, changed
the function to use __NR_pselect6 (or __NR_pelect6_time64) on all
architectures. However, on architectures where the symbol was
implemented with __NR_select the kernel normalizes the passed timeout
instead of return EINVAL. For instance, the input timeval
{ 0, 5000000 } is interpreted as { 5, 0 }.
And as indicated by BZ #27651, this semantic seems to be expected
and changing it results in some performance issues (most likely
the program does not check the return code and keeps issuing
select with unormalized tv_usec argument).
By using a little Dockerfile we can quickly test various other distributions to see which ones might have had their behavior changed recently. To test fedora 34, we can use the following Dockerfile (other distributions are similar):
FROM fedora:34
RUN /lib64/lib.so.6 | grep version #running libc outputs the version
RUN dnf -y install gcc
ADD ./select_test.c .
RUN gcc -o select_test select_test.c
RUN ./select_test
This results in the following:
Distro | libc | Timeout value |
Fedora 33 | Glibc 2.32 | Time remaining |
Fedora 34 | Glibc 2.33 | Unchanged |
Ubuntu 20.04 | Glibc 2.31 | Time remaining |
Ubuntu 20.10 | Glibc 2.32 | Time remaining |
Ubuntu 21.04 | Glibc 2.33 | Unchanged |
Debian unstable | Glibc 2.31 | Time remaining |
Archlinux latest | Glibc 2.33 | Unchanged |
Alpine | musl | Unchanged |
OS X | Unchanged |
Even though the old behavior will soon be restored, linux distributions are shipping glibc 2.33. The latest versions of Fedora, Ubuntu, and Arch are using Glibc 2.33 and show the different select behavior. Platforms that are not based on glibc also do not modify the timeout argument.
This issue shows that relying on behavior that isn’t explicitly guaranteed will eventually break things in ways that might not immediately be obvious. Ultimately any programs that have this issue need to be patched to follow the standard and not rely on the implementation-defined behavior of the timeout argument.