HOWTO: A simple driver for the Linux CPUFreq framework

Background

So I've got this oldish Toshiba laptop with a Pentium-120 processor. The laptop gets quite hot during normal operation, and the fan can be noisy. Fortunately, there is a way to change the CPU speed on the fly, as well as toggle the L2 cache and system fan on and off. This is accomplished through a userspace program called 'toshset', which uses the 'toshiba' driver through /dev/toshiba to enter the SMM mode of the laptop and execute BIOS code (known as the SCI and HCI) to fiddle with the appropriate setting. (The Toshiba SCI/HCI details were reverse engineered by Jonathan Buzzard from the Windows driver.)

What is SMM? (off-topic yet interesting)

SMM is a special mode of the processor triggered by an external interrupt on the #SMI pin. In the Toshiba laptop hardware, this pin is asserted by an read from I/O port $B2. Inside SMM, the CPU has access to a reserved area called SMRAM, usually of 32KB size and located at $38000-$3FFFF or $A8000-$AFFFF. At boot time, the BIOS loads SMBASE+$8000 with the program (SMI handler) that is to be run when SMM is entered, and then cloaks SMRAM from system software using the system core logic. When SMM is entered, the CPU state (excluding FPU and test registers) is written out from $3FFFF to $3FE00, and restored from that area when SMM is exited. It is only possible to recover the SMI handler code by disassembling the BIOS or by snooping the CPU's address and data bus after #SMIACT is asserted in response to #SMI, so under normal circumstances the SMM code can only be “black-box” analyzed by reverse engineering the system software that uses whatever API the SMM program provides in a particular machine. However, some system chipsets allow SMRAM to be re-enabled, after it has already been enabled for the handler installation and cloaked. On these systems, recovering or modifying the SMI handler code by a program is possible.

The code stored in SMRAM is quite literally the most arcane and buried embedded software within a PC. It can also be the source of much hidden latency, since a SMI interrupt causes the processor to write its entire internal state to SMRAM, flush its cache as the first action of the SMI handler (in write-back mode, this can take thousands of cycles), execute the SMI handler, and restore its internal state and exit SMM once a RSM instruction is encountered in the SMI handler. Unfortunately, there is no way to know how long the processor will be in SMM mode once entered because the length of the code it executes cannot be determined externally. It can also be quite difficult to discover all of the sources of SMI interrupts, since a SMI interrupt could come from any chipset component, and because it could potentially be invoked by software or by purely external hardware events depending on the specific design of the system. Also, when SMM is entered, the CPU pipeline is flushed; only pending I/O and HALT instructions can be restarted.

APM event

Since this laptop propagates power status events through the APM BIOS, my first “optimization” was to put a script in /etc/apm/event.d. When the power cord is plugged in, it sets the machine to maximum performance settings, and when the power cord is removed, it sets the machine to minimal performance settings.

#!/bin/sh

# Place this script in /etc/apm/event.d to automatically manage CPU
# power consumption in response to APM events.
# Debian: requires toshset and powermgmt-base packages

set -e

TOSHSET=/usr/bin/toshset
ON_AC_POWER=/usr/bin/on_ac_power

[ -x "${TOSHSET}" ] || exit 0
[ -x "${ON_AC_POWER}" ] || exit 0

cpu_fast() {
    logger "CPU going to performance settings"
    ${TOSHSET} -bs user
    ${TOSHSET} -cpu fast
    ${TOSHSET} -cpucache on
    ${TOSHSET} -lcd bright
    ${TOSHSET} -fan on
    ${TOSHSET} -d 30
    if [ -f /var/run/xbattbar.pid ]; then
        kill -TERM `cat /var/run/xbattbar.pid`;
    fi
    exit 0
}

cpu_slow() {
    logger "CPU going to power-saving settings"
    ${TOSHSET} -bs user
    ${TOSHSET} -cpu slow
    ${TOSHSET} -cpucache off
    ${TOSHSET} -lcd semi
    ${TOSHSET} -fan off
    ${TOSHSET} -d 3
    # Displaying the battery bar requires 'local:' to be entered into
    # /etc/X0.hosts - a security risk on a multiuser system but probably okay
    # for a portable workstation.
    if [ -x /usr/X11R6/bin/xbattbar ]; then
        DISPLAY=:0 /usr/X11R6/bin/xbattbar >/dev/null 2>&1 &
        echo $! > /var/run/xbattbar.pid;
    fi
    exit 0
}

${ON_AC_POWER} && ( cpu_fast; exit 0 )
${ON_AC_POWER} || ( cpu_slow; exit 0 )

apmiser

Then, I wanted a solution to manage the CPU speed based on load, because the system gets quite hot while sitting on {couch|lap}, and is idle or nearly idle most of the time anyway.

IBM Thinkpad laptops have a similar configuration interface called SMAPI BIOS (not to be confused with SMBIOS, the System Management BIOS extensions) which also uses SMM to configure the laptop. The program 'tpctl' and the /dev/thinkpad driver were written in similar fashion to the Toshiba driver.

There exists a daemon program called 'apmiser' for use with the IBM Thinkpad, which, despite its name, has nothing to do with APM. apmiser uses the information in /proc/stat to calculate the current CPU usage and to change the CPU speed on the fly using the 'tpctl' program based on the system load. It is trivial to modify apmiser to call toshset instead of tpctl. Unfortuntately, apmiser is written in Perl and so has a large memory footprint. (Translating it to C should improve the memory usage.)

Cron/load average

I decided I wanted to try a non-daemon approach. A program that is run once per minute from cron and checks the 1 minute load average in order to decide whether the CPU should be sped up, and the 5 minute load average to see if the CPU should be slowed down. It also manages the fan based on the 1 minute load.

This was a first try:

#!/bin/sh

LOAD=` uptime | sed 's/.*\(load.*\).*/\1/g' | sed 's/,//g' `

AVGOK=` echo $LOAD | awk '{ print ($3 /dev/null
        /usr/bin/toshset -cpu fast
        /usr/bin/toshset -cpucache on;
else
        /usr/bin/fan -f >/dev/null;
        if [ $FIVEMINIDLE -eq 1 ]; then
                /usr/bin/toshset -cpu slow
                /usr/bin/toshset -cpucache off
        fi
fi

This version, when run from cron once per minute, turned out to have a “hiccup” every minute because of bash starting and the several calls to toshset.
This was a rewrite in C for speed:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
        float min1, min5;
        FILE *f;
        float user1, user5;

        if (argc < 3) user5 = 1.00;
        else if (sscanf(argv[2], "%f", &user5) != 1) {
                puts("Error in 5 minute argument");
                exit(EXIT_FAILURE);
        }
        if (argc < 2) user1 = 0.50;
        else if (sscanf(argv[1], "%f", &user1) != 1) {
                puts("Error in 1 minute argument");
                exit(EXIT_FAILURE);
        }

        f = fopen("/proc/loadavg", "r");
        if (f == NULL)
        {
                perror("fopen");
                exit(EXIT_FAILURE);
        }

        if (fscanf(f, "%f %f", &min1, &min5) != 2)
        {
                perror("fscanf");
                exit(EXIT_FAILURE);
        }

        if (! (min1 < user1))
        {
                system("/usr/bin/toshset -fan on -cpu fast -cpucache on > /dev/null");
        }
        else
        {
                system("/usr/bin/toshset -fan off >/dev/null;");
                if (min5 < user5) {
                        system("/usr/bin/toshset -cpu slow -cpucache off");
                }
        }
        exit(EXIT_SUCCESS);
}

However, I still wasn't happy with this because the response to a change in load could not happen for around a minute, and it was still quite a “heavy” program with the several toshset invocations.

cpufreq

TODO

/sys/devices/system/cpu/cpu0/cpufreq:
scaling_governor: performance, powersave, conservative, etc (cpufreq_powersave.ko, cpufreq_conservative.ko)
scaling_cur_freq:
conservative/*

toshiba_freq.ko depends on freq_table.ko

/sys/module/cpufreq/parameters/debug: set to 1 to enable debug output of cpufreq-core and freq-table, 2 for debug output of driver, 4 for governors. This only produces output if CONFIG_CPU_FREQ_DEBUG=y. It is a bit field, so if you want all 3 outputs for example, set it to 7.

Leave a Reply