PDA

View Full Version : [server] 24.04 considerably slower than 20.04



Doug S
January 9th, 2024, 06:33 PM
My main test server is still 20.04, as I skipped 22.04 on real hardware. Doing some testing for the irqbalance thread (https://ubuntuforums.org/showthread.php?t=2493441) necessitated 24.04 on real hardware, verses a VM, so I added 24.04 server to my main test server as dual boot. I noticed the test ran a lot slower. Here is some data.

The kernel is: 6.7.0-060700-generic #202401072033 SMP PREEMPT_DYNAMIC Sun Jan 7 20:43:59 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux.
The test is 6 pairs of token passing rings, doing pretty much no work before passing the token along. The purpose of this test is to have the system prefer to use shallow idle states and to be a challenge for the teo (Timer Events Orientated) idle governor because it is not a time based test. I.E. the menu idle governor should do better for this test. The irqbalance service is disabled for this work (which I why I am starting a new thread for this). Results (30M loops):



OS : idle gov: uSec/loop: % : R time : U time : S time :
20.04 : menu : 4.66 : : 2:20.150 : 1:09.556 : 11:36.771
24.04 : menu : 5.24 : +12.44 : 2:36.818 : 0:12.027 : 2:20.262
: : : : : :
20.04 : teo : 5.15 : : 2:34.527 : 0:42.682 : 6:0.137
24.04 : teo : 6.02 : +16.89 : 3:0.884 : 1:02.892 : 10:55.273


Note that I think there is some time sample aliasing going on the User and System times. Only the Real time is useful, in my opinion.
There is no throttling involved. Processor package power is in the high 90's to low 100's watts area. The CPU frequency is maxed out at 4.8 GHz. The CPU frequency scaling driver is intel_pstate and the governor is powersave. HWP (HardWare Pstate control (also known as Intel Speed Shift)) was enabled (I might try HWP disabled later, along with some other tests. I'll edit this post with new data/findings.)

MAFoElffen
January 10th, 2024, 05:03 AM
I am very curious to see how this turns out... Watching with anticipation.

Doug S
January 11th, 2024, 10:48 PM
For unknown reasons, I can not edt and this to my original post. I get the dreaded, and so very frustrating "forbidden" message.

The above was a "FAST" test. Adding "MEDIUM" and "SLOW" data (there is no use including R,U,S times):



MEDIUM:
OS : idle gov: uSec/loop: % :
20.04 : menu : 12.79 : :
24.04 : menu : 13.56 : +6.02 :
: : : :
20.04 : teo : 13.31 : :
24.04 : teo : 14.11 : +6.01 :

SLOW:
OS : idle gov: uSec/loop: % :
20.04 : menu : 3055.01 : :
24.04 : menu : 3131.58 : +2.51 :
: : : :
20.04 : teo : 2718.25 : :
24.04 : teo : 3022.99 : +11.21 :


I do not understand what is different between the two tests using the exact same kernel.

Doug S
January 12th, 2024, 12:26 AM
Adding HWP disabled results:



no-hwp:
OS Gov test t(secs) %
24.04 teo fast 179.25 16.18%
20.04 teo fast 154.29
24.04 teo medium 178.6 4.19%
20.04 teo medium 171.41
24.04 teo slow 192.11 -0.11%
20.04 teo slow 192.32

24.04 menu fast 158.03 12.80%
20.04 menu fast 140.1
24.04 menu medium 173.99 3.84%
20.04 menu medium 167.55
24.04 menu slow 195.07 -0.36%
20.04 menu slow 195.78

The difference between 20.04 and 24.04 is system time, I just can not figure out why.

Doug S
January 14th, 2024, 11:46 PM
The Intel adrestia test is for wakeup latency information. I like it just as a good way to dwell on certain preferred idle states, depending on the test options.
This test was done with: HWP disabled; performance CPU frequency scaling governor; kernel 6.7 generic (250 Hertz).
Threads: 500.
Loops: 2e6
arrival time: 20 uSec (but it is implemented by a sleep call, so the overhead will result in longer.)
service time: 2000 uSec (but is actually not implemented in the code as far as I can tell.)

TEO idle governor:


OS and ? uSec uSec
20-teo ave 2456 90th percentile: 2536
24-teo-nosnap ave 2891 17.71% 90th percentile: 2972 17.19%
24-teo ave 2949 20.07% 90th percentile: 3046 20.11%


MENU idle governor:


OS and ? uSec uSec
20-menu ave 2360 90th percentile: 2474
24-menu-nosnap ave 2847 20.64% 90th percentile: 2954 19.40%
24-menu ave 2866 21.44% 90th percentile: 2968 19.97%

Aside from my inability to format the data properly, notice about 20% degradation between 20.04 and 24.04.
When I compared running services between 20.04 and 24.04, I saw snapd, hence with it disabled for another 24.04 test.

And for processor energy:

MENU:
20.04 66,539.94 Joules
24.04 72,032.04 Joules (snapd disabled) +8%
24.04 72,908.09 Joules +10%

TEO:
20.04 65,692.04 Joules
24.04 71,571.98 Joules (snapd disabled) +9%
24.04 72,504.09 Joules +10%

MAFoElffen
January 15th, 2024, 02:49 AM
I don't understand why it is taking more power and running slower... Hmmm. (???)

You said that was using the same kernel for all the tests?

Doug S
January 15th, 2024, 07:37 AM
I don't understand why it is taking more power and running slower... Hmmm. (???)

You said that was using the same kernel for all the tests?Yes, the exact same kernel.

An idle check (20 minutes each) shows idle power consumption is the same (~1.4 watts), and idle states selections are similar between 20.04 and 24.04.

EDIT: Previous claims of idle differences were due to operator error.

IanW
January 15th, 2024, 09:58 AM
You might be seeing the same performance drop Linus has been seeing:-
https://www.phoronix.com/news/Torvalds-Perf-Regression-Fix

Never mind, that was on 6.8, not 6.7.

jbicha
January 15th, 2024, 06:24 PM
Just to be clear, are you using the same kernel on both Ubuntu 20.04 LTS as 24.04 LTS ?

However, the kernel 6.7.0-060700-generic is not the Ubuntu kernel so I think a comparison of the actual Ubuntu kernel would be more interesting to Ubuntu developers.

Doug S
January 16th, 2024, 01:59 AM
Just to be clear, are you using the same kernel on both Ubuntu 20.04 LTS as 24.04 LTS ?

However, the kernel 6.7.0-060700-generic is not the Ubuntu kernel so I think a comparison of the actual Ubuntu kernel would be more interesting to Ubuntu developers.
Yes, the same kernel on both Ubuntu 20.04 LTS and 24.04 LTS.
And yes, I pretty much only ever use mainline kernels. Perhaps at some later date I'll try an Ubuntu kernel.

As a sanity check I redid the andrestia test, double and triple checking that I booted the correct kernel:



menu 2nd test times in uSec (24.04 snapd service disabled)
OS ave % 90th %
20.04 2378 2518
24.04 2845 19.64% 2933 16.48%

teo 2nd test times in uSec (24.04 snapd service disabled)
OS ave % 90th% %
20.04 2434 2512
24.04 2990 22.84% 3076 22.45%

jbicha
January 16th, 2024, 03:12 AM
Generally, developers don't read this forum. If you want developers to see your post, a better place is https://discourse.ubuntu.com/

But your tests would be more interesting if they used Ubuntu kernels. I don't know if the Ubuntu 24.04 LTS kernel would boot on 20.04 LTS; I don't have any need myself to be trying alternative kernels.

Doug S
January 18th, 2024, 01:46 AM
Generally, developers don't read this forum. If you want developers to see your post, a better place is https://discourse.ubuntu.com/

But your tests would be more interesting if they used Ubuntu kernels. I don't know if the Ubuntu 24.04 LTS kernel would boot on 20.04 LTS; I don't have any need myself to be trying alternative kernels.

Hi Jeremy,

Thanks for chiming in on this thread.

I am aware that developers don't read this forum. For now, I am just looking for input/suggestions from my friends herein. I'll look towards escalation at some point, and if I am able to bound the issue a little better. That being said, this one is driving me a bit nuts.

Doug S
January 20th, 2024, 05:11 PM
Status update:

I compared line by line the dmesg outputs booting the same kernel, 6.7.0-060700-generic, on both 20.04 and 22.04 and did not see anything (I could easily have missed something).

I compared the loaded module list, and do see 4 differences: cfg80211; dmi_sysfs; grtr; uas has 2 references whereas it used to have 1.

Tried to look for some different default configurations, such as scheduler or whatever. Haven't found anything, but not sure of everywhere to look.

Since the execution differences appeared to be in system call code, I made a test that just does a bunch of system calls to "times". It did not work, with 24.04 actually running just a little faster than 20.04, 1.2%.

Continuing...

MAFoElffen
January 20th, 2024, 08:23 PM
@Doug S ---

Configuring and sampling kernel stats on 22.04.3 with 6.7.060700 now for that "other"... You know I also have Dev 24.04 on that same hardware. If you send me what you are running for tests with this, I can send you that also...

Doug S
January 21st, 2024, 05:32 PM
@Doug S ---

Configuring and sampling kernel stats on 22.04.3 with 6.7.060700 now for that "other"... You know I also have Dev 24.04 on that same hardware. If you send me what you are running for tests with this, I can send you that also...Hi Mike,

I would be very grateful if you would try and test on your computer. I am still picking away at things in an attempt to create a relatively simple test. I have achieved a 25% throughput difference between 20.04 and 24.04, but that was with a fairly complicated test involving 11 token passing ring pairs. My current tests involve a much simpler 1 token passing pair using regular pipes or named pipes. For 1 CPU, there is no performance difference. For 2 CPUs, but on the same core, there is about a 9% performance difference. For 2 CPUs on different cores, there is about a 15% performance difference.

Give me another day or two to try to make some test that others could run.

Doug S
January 21st, 2024, 07:57 PM
Previously, I was avoiding running out of CPU capacity. However, If I allow overload, then idle state selection decisions are eliminated as a variable, since no CPU goes idle at all for the duration of the test. Processor package power is about 114 watts for this test, and I had to raise my processor temperature limit before throttling by 5 degrees from 75 to 80 degrees C.

EDIT:


Ubuntu 20.04.6 LTS
doug@s19:~$ dpkg -l | grep firm
ii amd64-microcode 3.20191218.1ubuntu1.2 amd64 Processor microcode firmware for AMD CPUs
ii intel-microcode 3.20230808.0ubuntu0.20.04.1 amd64 Processor microcode firmware for Intel CPUs
ii ipxe-qemu 1.0.0+git-20190109.133f4c4-0ubuntu3.2 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii linux-firmware 1.187.39 all Firmware for Linux kernel drivers
ii ovmf 0~20191122.bd85bf54-2ubuntu3.4 all UEFI firmware for 64-bit x86 virtual machines

Ubuntu Noble Numbat (development branch)
doug@s19:~$ dpkg -l | grep firm
ii amd64-microcode 3.20231019.1ubuntu1 amd64 Processor microcode firmware for AMD CPUs
ii firmware-sof-signed 2.2.6-1ubuntu4 all Intel SOF firmware - signed
ii intel-microcode 3.20231114.1 amd64 Processor microcode firmware for Intel CPUs
ii ipxe-qemu 1.21.1+git-20220113.fbbdc3926-0ubuntu1 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii linux-firmware 20230919.git3672ccab-0ubuntu2.2 amd64 Firmware for Linux kernel drivers
ii ovmf 2023.11-4 all UEFI firmware for 64-bit x86 virtual machines
While the intel-microcode shows different versions, I know the processor uCode itself is the same for both boots.
SOF is Sound Open Firmware, so not relevant.

EDIT 2: I do have a git clone of the git linux-firmware and do update some files when the mainline kernel install complains.

EDIT3: The /lib/firmware directories. Not sure how to compare the actual files that matter to my system:


doug@s19:/media/nvme/home/doug/idle/perf/results/20-24-compare/firmware$ ls -l
total 172
-rw-rw-r-- 1 doug doug 59567 Jan 21 22:52 20-04.txt
-rw-rw-r-- 1 doug doug 111304 Jan 21 22:55 24-04.txt
doug@s19:/media/nvme/home/doug/idle/perf/results/20-24-compare/firmware$ head *.txt
==> 20-04.txt <==
/lib/firmware:
1a98-INTEL-EDK2-2-tplg.bin
3com
a300_pfp.fw
a300_pm4.fw
acenic
adaptec
advansys
agere_ap_fw.bin
agere_sta_fw.bin

==> 24-04.txt <==
/lib/firmware:
1a98-INTEL-EDK2-2-tplg.bin.zst
3com
acenic
adaptec
advansys
agere_ap_fw.bin.zst
agere_sta_fw.bin.zst
amd
amdgpu

MAFoElffen
January 22nd, 2024, 06:40 AM
You have mail... (PM's)

All that is missing is Noble on 6.7.0. It doesn't want to boot at the moment on 'that kernel'. I'll look at that tomorrow.

I did you the query you asked for, which was every 2 seconds, for about 2 minutes each.

Doug S
January 22nd, 2024, 10:26 PM
On the 20.04 system, I replaced /usr/lib/firmware with my git clone of the current master. The test on 20.04 actually ran a little faster, on average. Conclusion: The performance difference between 20.04 and 24.04 is not due to firmware.

EDIT: there are approximately an extra 500 interrupts per second for the 24.04 test verses the 20.04 case.

EDIT 2: 1 100 seconds trace of interuupts while the test was running:



24.04:
334 tasklet_entry
334 tasklet_exit
1204 irq_handler_entry
1204 irq_handler_exit
2525 timer_expire_entry
31787 softirq_entry
31787 softirq_exit
31787 softirq_raise
302037 local_timer_entry
302136 hrtimer_expire_entry

20.04:
250 tasklet_entry
250 tasklet_exit
600 irq_handler_entry
600 irq_handler_exit
2877 timer_expire_entry
31262 softirq_exit
31262 softirq_raise
31263 softirq_entry
300778 local_timer_entry
300840 hrtimer_expire_entry

The irq_handler differences are mostly i915 interrupts (545), that do not occur on 20.04. However they only account for 0.21% of CPU 0 usage. I.E. no smoking gun. I have yet to look at other ISR times.

Doug S
January 23rd, 2024, 05:55 PM
@Mike

Here is what I am currently doing as the test.
Note: very hacky stuff, don't judge.
I have modified things to assume the program and script are in the same directory.

The ping pong c program:



/************************************************** ****
/*
/* pingpong.c Smythies 2022.10.21
/* Useing stdin and stdout redirection for this
/* program is a problem. The program doesn't start
/* execution until there is something in the
/* stdin redirected queue, so trying to start
/* things via the last flag doesn't work.
/* Try treating the incoming and outgoing named
/* as files opened herein. This will also allow
/* timeout management as a future edit.
/*
/* pingpong.c Smythies 2022.10.20
/* Use the new "last" flag to also start the
/* token passing.
/*
/* pingpong.c Smythies 2022.10.19
/* If the delay between the last read of the
/* first token and the write from the last place
/* in the chain of stuff is large enough then the
/* first intance of the program might have terminated
/* and shutdown the read pipe, resulting in a SIGPIPE
/* signal. With no handler it causes the program to
/* terminate.
/* Add an optional command line parameter to indicate if
/* this instance of the program is the last one and
/* therefore it should not attempt to pass along the
/* last token.
/*
/* pingpong.c Smythies 2021.10.26
/* Eveything works great as long as the number
/* of stops in the token passing ring is small
/* enough. However, synchronization issues
/* develop if the number of stops gets big enough.
/* Introduce a synchorizing step, after which
/* there should not be any EOF return codes.
/*
/* pingpong.c Smythies 2021.10.24
/* Print loop number and error code upon error
/* exit. Exit on 1st error. Was 3rd.
/*
/* pingpong.c Smythies 2021.10.23
/* Change to using CLOCK_MONOTONIC_RAW instead of
/* gettimeofday, as it doesn't have any
/* adjustments.
/* Change to nanoseconds.
/*
/* pingpong.c Smythies 2021.07.31
/* Add write error check.
/*
/* pingpong.c Smythies 2021.07.24
/* Exit after a few errors.
/*
/* pingpong.c Smythies 2021.07.23
/* Add execution time.
/*
/* pingpong.c Smythies 2020.12.07
/* Add an outter loop counter comnmand line option.
/* Make it optional, so as not to break my existing
/* scripts.
/*
/* pingpong.c Smythies 2020.06.21
/* The original code is from Alexander.
/* (See: https://marc.info/?l=linux-kernel&m=159137588213540&w=2)
/* But, it seems to get out of sync in my application.
/* Start this history header.
/* I can only think of some error return.
/* Add some error checking, I guess.
/*
/************************************************** ****/

#include <sys/time.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <string.h>
//#include <signal.h>
//#include <sys/wait.h>
//#include <linux/unistd.h>

#define MAX_ERRORS 2
/* Aribitrary */
#define SYNC_LOOPS 3

unsigned long long stamp(void){
struct timespec tv;

clock_gettime(CLOCK_MONOTONIC_RAW,&tv);

return (unsigned long long)tv.tv_sec * 1000000000 + tv.tv_nsec;
} /* endprocedure */

int main(int argc, char **argv){
unsigned long long tend, tstart;
long i, j, k, n, m;
long eof_count = 0;
int error_count = 0;
int err, inf, outf, errvalue;
int last = 0;
char c = '\n';
char *infile, *outfile;

// fprintf(stderr, "begin...\n");

switch(argc){
case 4:
infile = argv[1];
outfile = argv[2];
n = atol(argv[3]);
m = LONG_MAX;
break;
case 5:
infile = argv[1];
outfile = argv[2];
n = atol(argv[3]);
m = atol(argv[4]);
break;
case 6:
infile = argv[1];
outfile = argv[2];
n = atol(argv[3]);
m = atol(argv[4]);
last = atoi(argv[5]);
break;
default:
printf("%s : Useage: pingpong infifo outfifo inner_loop [optional outer_loop [optional last flag]]\n", argv[0]);
return -1;
} /* endcase */

// printf(" infile: %s ; outfile: %s ; %d\n", infile, outfile, last);

if(last != 1){ // for all but the last, create the named pipe outfile
err = mkfifo(outfile, 0666);
if ((err != 0) && (errno != EEXIST)){ // file already exists is OK
errvalue = errno;
printf("Cannot create output fifo file: %s ; %d ; %s\n", outfile, err, strerror(errvalue));
return -1;
} /* endif */
} else { // for the last we open the write first, read should already be open.
if ((outf = open(outfile, O_WRONLY)) == -1){
errvalue = errno;
printf("Cannot open last output fifo file: %s ; %d ; %s\n", outfile, outf, strerror(errvalue));
return -1;
} /* endif */
} /* endif */

if ((inf = open(infile, O_RDONLY)) == -1){
errvalue = errno;
printf("Cannot open input fifo file: %s ; %d ; %s\n", outfile, inf, strerror(errvalue));
return -1;
} /* endif */

if(last != 1){ // for all but the last, now we open the write
// if ((outf = open(outfile, O_WRONLY | O_NONBLOCK)) == -1){
if ((outf = open(outfile, O_WRONLY)) == -1){
errvalue = errno;
printf("Cannot open not last output fifo file: %s ; %d ; %s\n", outfile, outf, strerror(errvalue));
return -1;
} /* endif */
} /* endif */

if(last == 1){ // the last chain initiates the token passing
// usleep(999999);
err = write(outf, &c, 1);
if(err != 1){
fprintf(stderr, "pingpong write error on startup, aborting. %d %d %d\n", last, err, outf);
return -1;
} /* endif */
} /* endif */

// printf("flag 4: inf: %d ; outf: %d ; %d \n", inf, outf, last);

/* make sure we are synchronized. EOF (0 return code) can occur until we are */

j = SYNC_LOOPS;
while(j > 0) { // for SYNC_LOOP successful loops do:
err = read(inf, &c, 1);
if(err == 1){
j--; // don't decrement for EOF.
for (i = n; i; i--){ // we also attempt to sync in time for later T start
k = i;
k = k++;
} /* endfor */
err = write(outf, &c, 1);
if(err != 1){ // and then pass along the token along to the next pipeline step.
fprintf(stderr, "pingpong sync step: write error or timeout to named pipe. (error code: %d ; loops left: %ld ; last: %d)\n", err, j, last);
return -1;
} /* endif */
} else {
if(err < 0){
fprintf(stderr, "pingpong sync step: read error or timeout from named pipe. (error code: %d ; loops left: %ld ; last: %d)\n", err, j, last);
return -1;
} else {
eof_count++; // does the loop counter need to be reset??
} /* endif */
} /* endif */
} /* endwhile */

// printf(" infile: %s ; outfile: %s ; last: %d; eof_count %ld\n", infile, outfile, last, eof_count);

/* now we are synchronized, or so I claim. Get on with the real work. EOF is an error now.*/

j = m;
tstart = stamp(); /* only start the timer once synchronized */
while(j > 0) { // for outer_loop times do:
err = read(inf, &c, 1);
if(err == 1){
for (i = n; i; i--){ // for each token, do a packet of work.
k = i;
k = k++;
} /* endfor */
err = write(outf, &c, 1);
if(err != 1){ // and then pass along the token along to the next pipeline step.
fprintf(stderr, "pingpong write error or timeout to named pipe. (error code: %d ; loops left: %ld ; EOFs: %ld ; last: %d)\n", err, j, eof_count, last);
error_count++;
if(error_count >= MAX_ERRORS) return -1;
} /* endif */
} else {
error_count++;
fprintf(stderr, "pingpong read error or timeout from named pipe. (error code: %d ; loops left: %ld ; EOFs: %ld ; last: %d)\n", err, j, eof_count, last);
if(error_count >= MAX_ERRORS) return -1;
} /* endif */
// if(j <= 3) fprintf(stderr, "Loop: %ld ; EOFs: %ld\n", j, eof_count);
j--;
} /* endwhile */
tend = stamp(); // the timed portion is done

/* Now we do one token pass to flush. The previous write pipe may have already been terminated, so EOF read response is O.K. */

err = read(inf, &c, 1);
if(err == 1){
if(last != 1){ // last in the chain does not pass along the last token
err = write(outf, &c, 1);
if(err != 1){ // and then pass along the token along to the next pipeline step.
fprintf(stderr, "pingpong flush loop: write error or timeout to named pipe. (error code: %d ; EOFs: %ld ; last: %d)\n", err, eof_count, last);
} /* endif */
} /* endif */
} else {
fprintf(stderr, "pingpong flush loop: read error or timeout from named pipe. (error code: %d ; EOFs: %ld ; last: %d)\n", err, eof_count, last);
} /* endif */

fprintf(stderr,"%.4f usecs/loop. EOFs: %ld\n",(double)(tend-tstart)/((double) m * 1000.0), eof_count);
close(outf);
close(inf);
return -1;
// return 0;
} /* endprogram */


The script:



#! /bin/dash
#
# ping-pong-many-parallel Smythies 2024.01.23
# assume the ping pong program is local.
#
# ping-pong-many-parallel Smythies 2022.10.23
# update required to reflect changes to program
#
# ping-pong-many-parallel Smythies 2022.10.09
# Launch parrallel ping-pong pairs.

# because I always forget from last time
killall pingpong

# If it does not already exist, then create the first named pipe.

COUNTER=0
POINTER1=0
POINTER2=1
while [ $COUNTER -lt $3 ];
do
if [ -p /dev/shm/pong$POINTER1 ]
then
rm /dev/shm/pong$POINTER1
fi
mkfifo /dev/shm/pong$POINTER1

POINTER1=$(($POINTER1+1000))
POINTER2=$(($POINTER2+1000))
COUNTER=$(($COUNTER+1))
done

COUNTER=0
POINTER1=0
POINTER2=1
while [ $COUNTER -lt $3 ];
do
./pingpong /dev/shm/pong$POINTER1 /dev/shm/pong$POINTER2 $1 $2 &
./pingpong /dev/shm/pong$POINTER2 /dev/shm/pong$POINTER1 $1 $2 1 &

POINTER1=$(($POINTER1+1000))
POINTER2=$(($POINTER2+1000))
COUNTER=$(($COUNTER+1))
done


Create some directory and put those two files there. Make the script executable and compile the c program (Note: use the older OS for the compile):



doug@s19:~/idle/self-contained-test$ ls -l
total 16
-rw-rw-r-- 1 doug doug 8874 Jan 23 08:03 pingpong.c
-rwxr-xr-x 1 doug doug 980 Jan 23 08:03 ping-pong-many-parallel
doug@s19:~/idle/self-contained-test$ cc pingpong.c -o pingpong
doug@s19:~/idle/self-contained-test$ ls -l
total 36
-rwxrwxr-x 1 doug doug 17304 Jan 23 08:04 pingpong
-rw-rw-r-- 1 doug doug 8874 Jan 23 08:03 pingpong.c
-rwxr-xr-x 1 doug doug 980 Jan 23 08:03 ping-pong-many-parallel


This uses a lot of energy and creates a lot of heat while running, so be sure your thermal and power limit throttling protections are working properly. That being said, we want this test to run without any throttling involved so as to not influence the results. This includes number of active cores throttling, so you might have to limit your max CPU frequency to below the number of active cores limit. I normally run with thermal throttling set to 75 degrees, but set it to 80 degrees for this. The system should otherwise be fairly idle for this test. I use 3 terminals: One for test execution; One running "top -d 15", where I can be sure there is no idle time; One running "sudo /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,C orWatt --interval 15", monitoring for power, temperature, and CPU frequency where any throttling will show. The low frequency of the 2 monitoring terminals is to reduce their influence on the test. Note that there will be a little bit of idle as the test finishes as some pairs finish before others and the load reduces. The test needs to run for at least a few minutes to be reduce any influence from startup and wind-down. You might need to adjust the number of pairs to run because you have more CPUs and cores than me. You might need to increase the number of loops because your processors are faster than mine.

Example test run: I use 20 pairs and 30,000,000 loops and no work per token stop, because we are trying to maximize system time and minimize user time. I also use the performance CPU frequency scaling governor.

Step 1:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
doug@s19:~/idle/self-contained-test$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu10/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu11/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu8/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu9/cpufreq/scaling_governor:powersave
doug@s19:~/idle/self-contained-test$ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
[sudo] password for doug:
performance
doug@s19:~/idle/self-contained-test$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu10/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu11/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu8/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu9/cpufreq/scaling_governor:performance
Step 2: Launch the 2 monitoring task in their terminals (not shown, yet) and wait for a couple of reference samples.
Step 3: Launch the test:


doug@s19:~/idle/self-contained-test$ ./ping-pong-many-parallel 0 30000000 20
pingpong: no process found <<<< This is normal
doug@s19:~/idle/self-contained-test$

Observe the monitoring terminals: first the top window, for no idle time and mostly system time:


top - 08:42:07 up 16:14, 3 users, load average: 22.54, 8.42, 3.07
Tasks: 264 total, 25 running, 239 sleeping, 0 stopped, 0 zombie
%Cpu0 : 7.7 us, 92.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 7.2 us, 92.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 7.1 us, 92.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 8.1 us, 91.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 7.8 us, 92.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 8.5 us, 91.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 7.7 us, 92.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 7.9 us, 92.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 7.9 us, 92.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 7.8 us, 92.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 8.0 us, 92.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 8.1 us, 91.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 31927.3 total, 27810.3 free, 382.2 used, 3734.8 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 31076.6 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3622 doug 20 0 2364 1024 1024 S 36.5 0.0 0:35.32 pingpong
3623 doug 20 0 2364 1024 1024 S 36.4 0.0 0:35.35 pingpong
3640 doug 20 0 2364 1024 1024 R 34.4 0.0 0:35.40 pingpong
3641 doug 20 0 2364 1024 1024 S 34.4 0.0 0:35.47 pingpong
3626 doug 20 0 2364 896 896 R 34.0 0.0 0:35.52 pingpong
3627 doug 20 0 2364 896 896 R 33.8 0.0 0:35.51 pingpong
3628 doug 20 0 2364 896 896 R 33.6 0.0 0:33.14 pingpong
3619 doug 20 0 2364 1024 1024 R 33.6 0.0 0:38.24 pingpong
3629 doug 20 0 2364 1024 1024 S 33.6 0.0 0:33.18 pingpong
3618 doug 20 0 2364 896 896 S 33.5 0.0 0:38.10 pingpong
3614 doug 20 0 2364 896 896 R 33.4 0.0 0:35.99 pingpong
3615 doug 20 0 2364 896 896 S 33.3 0.0 0:35.93 pingpong
3653 doug 20 0 2364 1024 1024 R 31.8 0.0 0:33.53 pingpong
3652 doug 20 0 2364 1024 1024 R 31.6 0.0 0:33.41 pingpong
3650 doug 20 0 2364 1024 1024 R 31.4 0.0 0:34.81 pingpong
3651 doug 20 0 2364 1024 1024 R 31.2 0.0 0:34.68 pingpong
3638 doug 20 0 2364 1024 1024 S 30.6 0.0 0:33.73 pingpong
3639 doug 20 0 2364 1024 1024 S 30.6 0.0 0:33.71 pingpong
3644 doug 20 0 2364 896 896 R 30.2 0.0 0:34.66 pingpong
3645 doug 20 0 2364 1024 1024 R 30.0 0.0 0:34.61 pingpong
3620 doug 20 0 2364 896 896 R 29.8 0.0 0:38.04 pingpong
3621 doug 20 0 2364 896 896 R 29.8 0.0 0:38.03 pingpong
3616 doug 20 0 2364 1024 1024 S 29.0 0.0 0:33.86 pingpong
3617 doug 20 0 2364 1024 1024 S 29.0 0.0 0:33.79 pingpong
3637 doug 20 0 2364 896 896 R 28.4 0.0 0:32.42 pingpong
3636 doug 20 0 2364 1024 1024 R 28.3 0.0 0:32.31 pingpong
3646 doug 20 0 2364 1024 1024 R 27.4 0.0 0:33.42 pingpong
3647 doug 20 0 2364 896 896 S 27.4 0.0 0:33.35 pingpong
...

and the turbostat terminal for not throttling and a consistent CPU frequency. This is from after the test
Note: from our PM's you know to exectute your turbostat binary form whereever it is and to bypass the Ubuntu dependancy wrappr.


doug@s19:~/idle/perf/results/q243$ sudo /home/doug/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,C orWatt --interval 15
[sudo] password for doug:
Busy% Bzy_MHz IRQ PkgTmp PkgWatt CorWatt GFXWatt RAMWatt
0.05 4669 982 36 1.59 0.93 0.00 1.33
0.05 4696 845 36 1.58 0.93 0.00 1.33
0.06 4628 1069 36 1.59 0.93 0.00 1.33
0.05 4646 881 36 1.57 0.91 0.00 1.33
46.68 4799 25113 66 52.49 51.83 0.00 1.34
99.76 4800 51467 67 110.69 110.04 0.00 1.33
99.76 4800 53242 69 111.13 110.47 0.00 1.33
99.76 4800 52871 70 111.50 110.85 0.00 1.33
99.76 4800 54558 72 112.39 111.73 0.00 1.33
99.76 4800 52502 73 112.64 111.97 0.00 1.33
99.76 4800 53247 73 112.84 112.18 0.00 1.33
99.76 4800 53043 73 113.05 112.39 0.00 1.33
99.76 4800 53467 74 113.21 112.55 0.00 1.33
99.76 4800 52729 74 113.31 112.65 0.00 1.33
99.76 4800 53662 73 113.24 112.59 0.00 1.33
99.76 4800 52669 74 113.34 112.68 0.00 1.33
99.76 4800 53368 74 112.99 112.32 0.00 1.33
99.76 4800 53080 74 113.12 112.47 0.00 1.33
99.73 4800 51977 74 113.12 112.46 0.00 1.33
92.03 4800 1164504 67 106.09 105.42 0.00 1.33
9.38 4799 17895 44 18.32 17.65 0.00 1.33
0.01 4100 375 43 2.03 1.37 0.00 1.33
0.05 4661 1047 43 2.23 1.57 0.00 1.33

And, eventually, the test results:


doug@s19:~/idle/self-contained-test$ ./ping-pong-many-parallel 0 30000000 20
pingpong: no process found
doug@s19:~/idle/self-contained-test$ 6.9971 usecs/loop. EOFs: 0
6.9971 usecs/loop. EOFs: 0
7.0961 usecs/loop. EOFs: 0
7.0961 usecs/loop. EOFs: 0
7.2167 usecs/loop. EOFs: 0
7.2167 usecs/loop. EOFs: 0
7.3631 usecs/loop. EOFs: 0
7.3631 usecs/loop. EOFs: 0
7.4195 usecs/loop. EOFs: 0
7.4195 usecs/loop. EOFs: 0
7.4453 usecs/loop. EOFs: 0
7.4453 usecs/loop. EOFs: 0
7.4599 usecs/loop. EOFs: 0
7.4599 usecs/loop. EOFs: 0
7.4695 usecs/loop. EOFs: 0
7.4695 usecs/loop. EOFs: 0
7.4712 usecs/loop. EOFs: 0
7.4712 usecs/loop. EOFs: 0
7.5009 usecs/loop. EOFs: 0
7.5009 usecs/loop. EOFs: 0
7.5324 usecs/loop. EOFs: 0
7.5324 usecs/loop. EOFs: 0
7.6344 usecs/loop. EOFs: 0
7.6344 usecs/loop. EOFs: 0
7.6577 usecs/loop. EOFs: 0
7.6577 usecs/loop. EOFs: 0
7.6735 usecs/loop. EOFs: 0
7.6735 usecs/loop. EOFs: 0
7.6763 usecs/loop. EOFs: 0
7.6763 usecs/loop. EOFs: 0
7.7355 usecs/loop. EOFs: 0
7.7355 usecs/loop. EOFs: 0
7.7581 usecs/loop. EOFs: 0
7.7581 usecs/loop. EOFs: 0
7.8000 usecs/loop. EOFs: 0
7.8000 usecs/loop. EOFs: 0
7.8477 usecs/loop. EOFs: 0
7.8477 usecs/loop. EOFs: 0
7.8972 usecs/loop. EOFs: 0
7.8972 usecs/loop. EOFs: 0

Doug S
January 23rd, 2024, 09:10 PM
I did a test with 40 ping pong pairs. Also, just for completeness, I also tried a copy of the git master firmware as the firmware for 24.04, as a second test, and while it makes a difference I do not know if it is more than possible test repeatability variations. IRQ averaged 3042 per second for all tests.

Doug S
January 24th, 2024, 11:53 PM
For the program listed a couple of posts above, I used default compile options which includes dynamic libraries. Those would be different between 20.04 and 24.04. I compiled with 20.04 specifying static so as to eliminate that test variable (observe how much bigger the executable is, because it includes the needed libraries):


doug@s19:~/idle/self-contained-test$ cc -static pingpong.c -o pingpong
doug@s19:~/idle/self-contained-test$ ls -l
total 892
-rwxrwxr-x 1 doug doug 876240 Jan 24 14:50 pingpong
-rw-rw-r-- 1 doug doug 8874 Jan 23 08:03 pingpong.c
-rwxrwxr-x 1 doug doug 17304 Jan 23 08:04 pingpong-dyn
-rwxr-xr-x 1 doug doug 980 Jan 23 08:03 ping-pong-many-parallel

EDIT: And, finally as earlier requested, kernel 6.6.0-14-generic. the average difference was 19.40%
293338

Doug S
January 27th, 2024, 02:20 AM
This thread is about differences between 20.04 and 24.04. So what about 22.04.3 LTS?
293357

Doug S
January 29th, 2024, 01:12 AM
Generally, developers don't read this forum. If you want developers to see your post, a better place is https://discourse.ubuntu.com/

But your tests would be more interesting if they used Ubuntu kernels. I don't know if the Ubuntu 24.04 LTS kernel would boot on 20.04 LTS; I don't have any need myself to be trying alternative kernels.Hi Jeremy, or anybody:

What category and tag would you suggest I use for creating a topic on discourse? There doesn't seem to be something like a "development" category. I suppose "discussion" could be the tag. Maybe the "Uncategorized" category?

jbicha
January 29th, 2024, 01:24 AM
For this topic, I suggest the Kernel (https://discourse.ubuntu.com/c/kernel/108) category.

MAFoElffen
January 29th, 2024, 02:00 AM
This thread is about differences between 20.04 and 24.04. So what about 22.04.3 LTS?


The stats I gave you were from 22.04.3 with 6.5.0.and 6.7.0. But those were the early commands you had me run.

Since then, I uninstalled 6.7.0 from Mainline, as it breaks ZFS, and when it updates, drops my ZFS-On-Root Install from the Grub Boot menus. LOL

Still getting those other stat's for you between 22.04 and 24.04.

Yes, I am a plus 1 to choose topic "Kernel"

Doug S
January 29th, 2024, 02:37 AM
For this topic, I suggest the Kernel (https://discourse.ubuntu.com/c/kernel/108) category.

Okay, thanks.

Please see https://discourse.ubuntu.com/t/24-04-considerably-slower-than-20-04-or-22-04-for-some-high-system-percetnage-usage-cases/41987

Doug S
January 29th, 2024, 02:40 AM
The stats I gave you were from 22.04.3 with 6.5.0.and 6.7.0. But those were the early commands you had me run.

Since then, I uninstalled 6.7.0 from Mainline, as it breaks ZFS, and when it updates, drops my ZFS-On-Root Install from the Grub Boot menus. LOL

Still getting those other stat's for you between 22.04 and 24.04.

Yes, I am a plus 1 to choose topic "Kernel"Thanks. I am curious what you get. I now suggest 40 ping pong pairs:


./ping-pong-many-parallel 0 30000000 40

Doug S
February 13th, 2024, 02:01 AM
Mike or anybody willing to do the test described in post #19 above (https://ubuntuforums.org/showthread.php?t=2494238&page=2&p=14176562#post14176562): It can now be done on one 24.04 system only with the only variable between test runs being this addition to whatever is already on the grub command line:


systemd.unified_cgroup_hierarchy=0