Ubuntu Forums ubuntu.com - launchpad.net - ubuntu help  

Go Back   Ubuntu Forums > The Ubuntu Forum Community > Forum Archive > Main Support Categories > Hardware & Laptops
Register Reset Password Forum Help Forum Council Search Today's Posts Mark Forums Read

Hello, Unregistered You are browsing a READ only archive of the main support categories pre 4/21/2008. You will not be able to post or reply any threads in this section.

Hardware & Laptops
Problems with hardware & laptops not being detected or supported during or after install.

 
 
Thread Tools Display Modes
Old July 20th, 2007   #1
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Lightbulb HowTo: Debug Suspend, Resume, Hibernate issues

Like many users, hackers, and developers I've been dealing with various issues on a variety of PCs when trying to get suspend-to-RAM (aka STR or S2RAM) and hibernate (aka suspend-to-disk, STD or S2disk) working satisfactorily.

There are many many forum and blog postings around detailing various fixes and workarounds but there is little material that helps inquisitive hackers understand the process so they can intelligently debug and hopefully fix the problems for themselves and others.

This thread is intended to change that. It is intended to collect information about how the kernel handles suspend and resume, what code paths are followed, and how to debug problems methodically.

Please don't post replies asking for help - start another thread in the forums and if I see it and can help I will.

Please do post replies if you have techniques or tips on debugging suspend/resume/hibernate.

There are several issues to address that may well help you in solving suspend issues.

1. Confusing Suspend and Hibernate

Due to an unfortunate choice of naming, many people believe suspend is the same thing as hibernate - it isn't.

Hibernate is when an image of the PC's memory is saved to disk, usually to the swap partition or dedicated hibernate file, and then all power removed. The PC then cold-boots as normal. Whilst booting the standard boot-strap code looks for the resume disk image and if it finds it performs the required actions to write it back to RAM and then pick-up execution where it left off.

The battery can safely be removed whilst hibernated. Hibernate and resume will take a minute or two depending on the memory-size of the PC and what is in use.

It is sometimes called suspend-to-disk or STD, but the use of the word suspend can cause endless confusion when discussing highly technical issues even with experienced developers. Therefore in this thread always refer to it as hibernate.

Suspend is when all hardware devices including the CPU are put into a low-power state or turned off, device configuration saved into memory, but memory (RAM) remains powered and self-refreshed.

Technically, suspending a PC has two variations: standby and memory. Standby puts devices into a low-power state from which they can instantly resume whereas memory effectively disables the devices and turns them off.

Suspend also has the advantage that it can be resumed by WAKE events generated by hardware such as a modem, network interface or other capable device.

Obviously the battery (or AC adaptor) must remain connected to maintain power to RAM. Suspend and resume should take about 10 seconds each - it may take longer if the devices being suspended, or their drivers, are not well-written or if external events delay them (e.g. network interface releasing its IP address).


2. Multiple Packages

As well as the kernel's suspend code there is support for swsusp, which is a variation of hibernate.

There are also several userspace packages that have been developed to try and work around the problems in known drivers to ensure successful hibernation, the most popular of which is TuxOnIce (previously known as Suspend2).

Ubuntu Feisty and Gutsy have userspace scripts for power management, as well as gnome-power-manager (aka g-p-m). If you are using Gnome then gnome-power-manager will handle hibernate, suspend and resume. If you use the command-line the script /etc/acpi/sleep.sh is used.

3. Debugging Tools

The most powerful tool we have is our intelligence. Debugging most suspend/resume issues is a case of repeating the same procedure over and over again; each time making minor changes and observing the results.

There are some tweaks we can make to scripts, kernel build parameters, and for the confident even altering kernel source-code itself, but in the end it comes down to reading and interpreting what we can see.

The best place to look is the PC's kernel log: /var/log/kern.log. Within the kernel source code programmers often include debug messages at critical points that are written to the kernel log. Reading the log helps us understand what paths in the code the kernel is executing.

These messages are printed using calls to the printk() function and variations of it such as ACPI_FUNCTION_TRACE() and others.

By relating the messages in kern.log to where they occur in the source code we get an understanding of what the kernel was doing and why.

Linus Torvalds provided a simple yet effective tool within the kernel to make debugging of badly behaved drivers and devices easier. This is also described in the Ubuntu Wiki Debugging Kernel Suspend

OpenSUSE have an excellent S2ram debugging article. S2ram is part of the suspend package.

A very useful resource for understanding all Linux source-code issues is the Linux Cross Reference site. It indexes every identifier in every source file for all the recent versions of Linux and makes searching and following the code easy by simple hyperlink clicking.

When debugging it is best to use the sleep.sh shell script with a minor tweak so we can watch what is happening. Adding the bash shell directive set -x forces the commands the script executes to be printed to the terminal - this helps understand the process and know where it is at any point. Undue delays or error reports can quickly be matched to their cause.

4. Getting Started

Start an Xterm session from Applications -> Accessories -> Terminal.

Almost everything needs to be done with superuser permissions so lets start by switching to the superuser (root):
Code:
$ sudo su
Password:
Type your password at the prompt (don't get confused and think it wants the root password - for security the root account password is disabled)

Now lets add that tweak to sleep.sh:
Code:
$ cd /etc/acpi
$ gedit sleep.sh

(gedit:7510): GnomeUI-WARNING **: While connecting to session manager:
Authentication Rejected, reason : None of the authentication protocols specified are supported and host-based authentication failed.
(don't worry about the gedit warning, its expected.)

Add the line "set -x" as the second line so it looks like this and then save and close the file.
Code:
#!/bin/bash
set -x
. /etc/default/acpi-support
To make it easier to view the log file start the log-viewer: System -> Administration -> System Log

In the side-pane select kern.log. If it isn't listed open it using File -> Open...

To increase the readable width you might find hiding the side-pane useful: View -> Side Pane.

Now, in the message-pane scroll to the end. If you've not changed anything new messages to the log will automatically be added as you watch. Whilst that is okay for casual viewing and monitoring, when you are studying the messages you may well find that an annoyance. You can turn it off using View -> Monitor.

5. Testing

Return to the Terminal window with the command-line and try putting the PC to "sleep" aka suspend:
Code:
$ ./sleep.sh force
+ . /etc/default/acpi-support
++ ACPI_SLEEP=true
++ ACPI_HIBERNATE=true
++ ACPI_SLEEP_MODE=mem
++ MODULES=
++ MODULES_WHITELIST=
You'll see the commands scroll by as the script executes. At some point the screen will be turned off and, if things work as they should, the PC will suspend. You'll know because the power-light will flash slowly. To resume press the power button (or designated wake-up button).

If you're very lucky the screen will come back on and you'll be returned to the point where you left off. You'll see in the Terminal window the resume script executing (it is called at the end of sleep.sh). Once the script has completed you can switch to the log viewer and scroll back to view the messages the kernel generated.

If the screen backlight hasn't been switched on then of course you're staring at a dark screen with no obvious way to recovery. This is a common problem and there are a variety of solutions. We'll look at some of those in a later article.
If you've got additional PCs on your network install ssh server on this PC and after resume try to log into it from another PC. This assumes the network has successfully resumed, of course. Often it is best to use a wired network for this kind of debugging since it is far more reliable than wireless. I'll deal with how to set-up and use remote debugging in another article.

The point at which the kernel resumed is marked by the message "Back to C!" and will be surrounded by LATE suspend and EARLY resume messages:
Code:
kernel: [35005.068000] Intel ICH 0000:00:1f.5: LATE suspend
kernel: [35005.068000] pci 0000:00:1f.3: LATE suspend
kernel: [35005.068000] pci 0000:00:1f.0: LATE suspend
kernel: [35005.068000] pci 0000:00:1e.0: LATE suspend
kernel: [35005.068000] i810_smbus 0000:00:02.0: LATE suspend
kernel: [35005.068000] agpgart-intel 0000:00:00.0: LATE suspend
kernel: [35005.068000] Back to C!
kernel: [35008.068000] agpgart-intel 0000:00:00.0: EARLY resume
kernel: [35008.068000] i810_smbus 0000:00:02.0: EARLY resume
kernel: [35008.068000] pci 0000:00:1e.0: EARLY resume
kernel: [35008.068000] pci 0000:00:1f.0: EARLY resume
kernel: [35008.068000] ata_piix 0000:00:1f.1: EARLY resume
kernel: [35008.068000] uhci_hcd 0000:00:1f.2: EARLY resume
kernel: [35008.068000] pci 0000:00:1f.3: EARLY resume
kernel: [35008.068000] uhci_hcd 0000:00:1f.4: EARLY resume
kernel: [35008.068000] Intel ICH 0000:00:1f.5: EARLY resume
The eagle-eyed will notice in the log extract above the kernel time signatures (the numbers between the square brackets) don't change. If the PC were working correctly there would be additional messages at the point the PC resumed:
Code:
kernel: [35020.000000] Intel machine check architecture supported.
kernel: [35020.000000] Intel machine check reporting enabled on CPU#0.
kernel: [35020.000000] Back to C!
The issue I'm debugging here is that the PC appears to suspend but immediately resumes. As you'll see from my investigation it turns out that what is actually happening is that suspend is failing at the point where the kernel is trying to put the PC to sleep.

To debug further we need to read and understand the path through the source-code and compare the kernel debug messages in the code with those in the log-file. From that comparison we can determine with a fair degree of accuracy what code wasn't executed that should have been. That allows us to narrow down and eventually pin-point the cause and hopefully develop a fix.

Therefore lets step back and take a look at the source-code so we can understand what should be going on.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 22nd, 2007 at 05:35 AM..
IntuitiveNipple is offline  
Old July 20th, 2007   #2
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Lightbulb Understanding the Kernel Suspend Process

6. Understanding the Kernel Suspend Process

When the kernel is asked to suspend the PC it has to stop all user processes first. Once it has done that it iterates through all devices requesting they prepare to suspend, and then in a second iteration asks them to suspend fully. It saves device configurations then calls ACPI (Advanced Control and Power Interface) functions to handle putting the PC to sleep.

Here's an overview of the key functions called up to the point the PC suspends:
  • enter_state()
    • suspend_prepare()
    • suspend_enter()
      • device_power_down()
      • acpi_pm_enter()
        • do_suspend_lowlevel()
        • acpi_enter_sleep_state()

And here's the relevant source-code of those functions.

kernel/power/main.c::enter_state()
Code:
 pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
	if ((error = suspend_prepare(state)))
		goto Unlock;

	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
	error = suspend_enter(state);

	pr_debug("PM: Finishing wakeup.\n");
	suspend_finish(state);
You'll notice those debug messages in the log-file. They are good sync-points between source-code and logs for following the progress of the code.

This is the ACPI power-management structure containing the addresses of the functions to be called for each of the four operations valid prepare enter and finish:

drivers/acpi/sleep/main.c
Code:
static struct pm_ops acpi_pm_ops = {
	.valid = acpi_pm_state_valid,
	.prepare = acpi_pm_prepare,
	.enter = acpi_pm_enter,
	.finish = acpi_pm_finish,
};
And this is the code that calls the acpi_pm_enter() function using pm_ops->enter(state):

kernel/power/main.c::suspend_enter()
Code:
	if ((error = device_power_down(PMSG_SUSPEND))) {
		printk(KERN_ERR "Some devices failed to power down\n");
		goto Done;
	}
	error = pm_ops->enter(state);
	device_power_up();
acpi_pm_enter() decides which type of suspend/hibernate operation to perform:

drivers/acpi/sleep/main.c::acpi_pm_enter()
Code:
	acpi_enable_wakeup_device(acpi_state);
	switch (pm_state) {
	case PM_SUSPEND_STANDBY:
		barrier();
		status = acpi_enter_sleep_state(acpi_state);
		break;

	case PM_SUSPEND_MEM:
		do_suspend_lowlevel();
		break;

	case PM_SUSPEND_DISK:
		if (acpi_pm_ops.pm_disk_mode == PM_DISK_PLATFORM)
			status = acpi_enter_sleep_state(acpi_state);
		break;
	case PM_SUSPEND_MAX:
		acpi_power_off();
		break;

	default:
		return -EINVAL;
	}

	/* ACPI 3.0 specs (P62) says that it's the responsabilty
	 * of the OSPM to clear the status bit [ implying that the
	 * POWER_BUTTON event should not reach userspace ]
	 */
	if (ACPI_SUCCESS(status) && (acpi_state == ACPI_STATE_S3))
		acpi_clear_event(ACPI_EVENT_POWER_BUTTON);

	local_irq_restore(flags);
	printk(KERN_DEBUG "Back to C!\n");
You'll see the all-important "Back to C!" message above. You know at this point the PC has begun the resume process, so the code just prior to it must be where the PC finally goes to sleep.

This architecture-specific assembly-language code sets things up correctly for the CPU type before calling acpi_enter_sleep_state():

arch/i386/kernel/acpi/wakeup.S::do_suspend_lowlevel()
Code:
ENTRY(do_suspend_lowlevel)
	call	save_processor_state
	call	save_registers
	pushl	$3
	call	acpi_enter_sleep_state
	addl	$4, %esp

#	In case of S3 failure, we'll emerge here.  Jump
# 	to ret_point to recover
	jmp	ret_point
	.p2align 4,,7
ret_point:
	call	restore_registers
	call	restore_processor_state
	ret
Finally, these writes to registers defined in ACPI tables in the BIOS cause the PC to go to sleep.

drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
	ACPI_DEBUG_PRINT((ACPI_DB_INIT,
			  "Entering sleep state [S%d]\n", sleep_state));

	/*
	 * We split the writes of SLP_TYP and SLP_EN to workaround
	 * poorly implemented hardware.
	 */

	/* Write #1: fill in SLP_TYP data */

	status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1A_CONTROL,
					PM1Acontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

	status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1B_CONTROL,
					PM1Bcontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 20th, 2007 at 10:51 PM..
IntuitiveNipple is offline  
Old July 20th, 2007   #3
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Post Digging Deeper into Suspend

In the particular scenario I'm debugging here I need to know which form of suspend is being used. By correlating a message in the log-file written by kernel/power/main.c::enter_state():
Code:
kernel [35005.068000] PM: Entering mem sleep
with the switch case statements in drivers/acpi/sleep/main.c::acpi_pm_enter():
Code:
	switch (pm_state) {
	case PM_SUSPEND_STANDBY:
		barrier();
		status = acpi_enter_sleep_state(acpi_state);
		break;

	case PM_SUSPEND_MEM:
		do_suspend_lowlevel();
		break;

	case PM_SUSPEND_DISK:
		if (acpi_pm_ops.pm_disk_mode == PM_DISK_PLATFORM)
			status = acpi_enter_sleep_state(acpi_state);
		break;
	case PM_SUSPEND_MAX:
		acpi_power_off();
		break;
We can tell that PM_SUSPEND_MEM was the code-path followed (suspending to memory according to the log messages), and do_suspend_lowlevel() should be called.

In do_suspend_lowlevel() the call to acpi_enter_sleep_state() will only return if the sleep attempt fails. If it succeeds the assembly-language wakeup code in wakeup.S will be executed.

As the log-file doesn't have the message "Entering sleep state..." printed from acpi_enter_sleep_state() immediately before going to sleep we know that the failure was at some point prior to that. It is likely that one of the tests in acpi_enter_sleep_state() failed and it returned early so we need to examine that code in more detail from the beginning.

drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state)
{
	u32 PM1Acontrol;
	u32 PM1Bcontrol;
	struct acpi_bit_register_info *sleep_type_reg_info;
	struct acpi_bit_register_info *sleep_enable_reg_info;
	u32 in_value;
	acpi_status status;

	ACPI_FUNCTION_TRACE(acpi_enter_sleep_state);

	if ((acpi_gbl_sleep_type_a > ACPI_SLEEP_TYPE_MAX) ||
	    (acpi_gbl_sleep_type_b > ACPI_SLEEP_TYPE_MAX)) {
		ACPI_ERROR((AE_INFO, "Sleep values out of range: A=%X B=%X",
			    acpi_gbl_sleep_type_a, acpi_gbl_sleep_type_b));
		return_ACPI_STATUS(AE_AML_OPERAND_VALUE);
	}

	sleep_type_reg_info =
	    acpi_hw_get_bit_register_info(ACPI_BITREG_SLEEP_TYPE_A);
	sleep_enable_reg_info =
	    acpi_hw_get_bit_register_info(ACPI_BITREG_SLEEP_ENABLE);

	/* Clear wake status */

	status =
	    acpi_set_register(ACPI_BITREG_WAKE_STATUS, 1, ACPI_MTX_DO_NOT_LOCK);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

	/* Clear all fixed and general purpose status bits */

	status = acpi_hw_clear_acpi_status(ACPI_MTX_DO_NOT_LOCK);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

	/*
	 * 1) Disable/Clear all GPEs
	 * 2) Enable all wakeup GPEs
	 */
	status = acpi_hw_disable_all_gpes();
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}
	acpi_gbl_system_awake_and_running = FALSE;

	status = acpi_hw_enable_all_wakeup_gpes();
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

	/* Get current value of PM1A control */

	status = acpi_hw_register_read(ACPI_MTX_DO_NOT_LOCK,
				       ACPI_REGISTER_PM1_CONTROL, &PM1Acontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}
	ACPI_DEBUG_PRINT((ACPI_DB_INIT,
			  "Entering sleep state [S%d]\n", sleep_state));
There are six points at which this code could return early via return_ACPI_STATUS(). If we delve into the macro definition using Linux Cross Reference we find that return_ACPI_STATUS calls acpi_ut_status_exit which in turn calls acpi_ut_debug_print() which, if certain debug control variables have the correct values, will print additional debug information to the kernel log.

Unfortunately for us the standard kernel package is not compiled with these debug messages enabled.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 20th, 2007 at 11:40 PM.. Reason: Adding additional details
IntuitiveNipple is offline  
Old July 20th, 2007   #4
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Lightbulb Configuring and Building a Debug Kernel

To enable the additional debugging we must build a custom kernel with "CONFIG_ACPI_DEBUG=y" defined. The standard Ubuntu kernels aren't built with this debugging option set because it would generate a lot of unwanted and confusing messages in the log and slow the PC down.

I'll deal with how to configure and build a custom kernel later but you can get the essentials from the Ubuntu developers Wiki Kernel/Compile page. I recently corrected that page to clarify how to build the kernel from git since that is the method used by the Ubuntu kernel developers - I recommend you use that method if you want to have a go.

Assuming you have the kernel sources installed and are going to build them you will need to edit the kernel .config file. This file contains directives that tell GNU make which options and modules to compile.

If you are not using the git source you can copy your current kernel config from /boot/config-2.6.20-16-generic into the source directory, renaming it to ".config" in the process, then open it in a text editor.

If you're using git source change into the debian/config/i386/ directory and open the config file in a text editor.

Find the commented-out CONFIG_ACPI_DEBUG and change it so it becomes:
Code:
CONFIG_ACPI_DEBUG=y
Save the file.

Now use the appropriate build method to start building the kernel.

If you're using the git method you only need to build the flavour you need for testing - this will save you a lot of time!
Code:
$ debian/rules updateconfigs
$ AUTOBUILD=1 fakeroot debian/rules binary-debs flavours=generic
Now go to bed, go to work, or find a good book to read. Building the kernel the first time will take up to 3 hours or more (depending on the speed of your PC). Subsequent builds will generally be faster because of caching (if you've installed ccache) or unchanged dependencies.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 21st, 2007 at 05:59 AM..
IntuitiveNipple is offline  
Old July 22nd, 2007   #5
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Exclamation Results using an ACPI_DEBUG kernel

After building and installing a kernel with CONFIG_ACPI_DEBUG=y I gathered additional results which have, instead of clarifying the situation, actually made it more complicated.

The debug kernel was installed:
Code:
$ sudo dpkg -i linux-image-2.6.20-16-ref-generic_2.6.20-16.30_i386.deb
$ sudo dpkg -i linux-headers-2.6.20-16-ref_2.6.20-16.30_i386.deb
After rebooting into the kernel a manual suspend is attempted:
Code:
$ sudo /etc/acpi/sleep.sh force
The messages in kern.log at the point the PC should have gone to sleep show:
Code:
kernel: [  220.284000] agpgart-intel 0000:00:00.0: LATE suspend
kernel: [  220.284000]  hwsleep-0285 [03] enter_sleep_state     : Entering sleep state [S3]
kernel: [  220.284000] Back to C!
kernel: [  223.284000] agpgart-intel 0000:00:00.0: EARLY resume
The earlier guess that the sleep code wasn't being executed proves incorrect. The code is indeed getting to the sleep code as we can see from "Entering Sleep state [S3]".

What options are left?
  • the acpi_hw_register_write() operations are failing
  • the location or values being written by the sleep code are incorrect
  • the PC BIOS ACPI tables are incorrect
  • the PC is ignoring the sleep signal
  • some device is causing an immediate wakeup
So now we have to eliminate these options one-by-one until we have identified the culprit.

The list is ordered according to ease of checking, so we'll start with checking that the two acpi_hw_register_write() operations are not failing by increasing the amount of ACPI debug information written to the log-file.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 22nd, 2007 at 05:40 AM..
IntuitiveNipple is offline  
Old July 22nd, 2007   #6
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Lightbulb Increasing ACPI debug message verbosity

If we've compiled with CONFIG_ACPI_DEBUG=y ACPI debug message verbosity (how much is reported) can be controlled by two settings which assume reasonably sane default values to avoid swamping the logs with information.
  • acpi_dbg_layer
  • acpi_dbg_level
These can be over-ridden with values set on the kernel command-line at boot-time, or more usefully for us, by writing values to the acpi driver directly via the /proc/ file-system:

Code:
$ cd /proc/acpi
$ ls debug*
debug_layer debug_level

$ cat debug_layer
Description                     Hex        SET
ACPI_UTILITIES                  0x00000001[*]
ACPI_HARDWARE                   0x00000002[*]
ACPI_EVENTS                     0x00000004[*]
ACPI_TABLES                     0x00000008[*]
ACPI_NAMESPACE                  0x00000010[*]
ACPI_PARSER                     0x00000020[*]
ACPI_DISPATCHER                 0x00000040[*]
ACPI_EXECUTER                   0x00000080[*]
ACPI_RESOURCES                  0x00000100[*]
ACPI_CA_DEBUGGER                0x00000200[*]
ACPI_OS_SERVICES                0x00000400[*]
ACPI_CA_DISASSEMBLER            0x00000800[*]
ACPI_COMPILER                   0x00001000[*]
ACPI_TOOLS                      0x00002000[*]
ACPI_ALL_DRIVERS                0xFFFF0000[*]
--
debug_layer = 0xFFFF3FFF (* = enabled, - = partial)

$ cat debug_level
Description                     Hex        SET
ACPI_LV_ERROR                   0x00000001[*]
ACPI_LV_WARN                    0x00000002[*]
ACPI_LV_INIT                    0x00000004[*]
ACPI_LV_DEBUG_OBJECT            0x00000008[*]
ACPI_LV_INFO                    0x00000010 [ ]
ACPI_LV_INIT_NAMES              0x00000020 [ ]
ACPI_LV_PARSE                   0x00000040 [ ]
ACPI_LV_LOAD                    0x00000080 [ ]
ACPI_LV_DISPATCH                0x00000100 [ ]
ACPI_LV_EXEC                    0x00000200 [ ]
ACPI_LV_NAMES                   0x00000400 [ ]
ACPI_LV_OPREGION                0x00000800 [ ]
ACPI_LV_BFIELD                  0x00001000 [ ]
ACPI_LV_TABLES                  0x00002000 [ ]
ACPI_LV_VALUES                  0x00004000 [ ]
ACPI_LV_OBJECTS                 0x00008000 [ ]
ACPI_LV_RESOURCES               0x00010000 [ ]
ACPI_LV_USER_REQUESTS           0x00020000 [ ]
ACPI_LV_PACKAGE                 0x00040000 [ ]
ACPI_LV_ALLOCATIONS             0x00100000 [ ]
ACPI_LV_FUNCTIONS               0x00200000 [ ]
ACPI_LV_OPTIMIZATIONS           0x00400000 [ ]
ACPI_LV_MUTEX                   0x01000000 [ ]
The total flag-value is determined by bit-wise OR-ing together of the individual flag values (declared in include/acpi/acoutput.h).

We could just set them all by using the value 0xFFFFFFFF - it would result in a lot of output which could be a problem if the time-delays added by having to generate the messages affects the problem we're debugging. If the issue were timing-related the additional time taken to generate these messages could mask the problem, so always beware.

You've seen how to read the ACPI debug flags using cat; this is how to write to them:
Code:
$ sudo su
Password: 

$ echo "0x0020000F" > debug_level
The flag-value is calculated by adding the hex-value of each flag we want together. This command would retain the current flags ERROR, WARN, INIT, DEBUG_OBJECT (0x0000000F) and add FUNCTIONS (0x00200000). This will report entry into ACPI functions and the values returned.

Unfortunately when testing it also seems to prevent the system reporting anything after the "Preparing to sleep" messages until the system is resuming, so it makes things worse.

What we need to do is experiment with the flags and read the source-code to determine which flags will help.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 22nd, 2007 at 07:39 AM..
IntuitiveNipple is offline  
Old July 22nd, 2007   #7
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Post Debug Suspend by Adding Code to the Kernel

In the case I'm currently debugging it is clear that using the regular ACPI debug messages might not be enough. Therefore I added a series of log-messages to the last few instructions before the PC should sleep.

drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
	/* Insert SLP_TYP bits */

	PM1Acontrol |=
	    (acpi_gbl_sleep_type_a << sleep_type_reg_info->bit_position);
	PM1Bcontrol |=
	    (acpi_gbl_sleep_type_b << sleep_type_reg_info->bit_position);

	/*
	 * We split the writes of SLP_TYP and SLP_EN to workaround
	 * poorly implemented hardware.
	 */

	/* Write #1: fill in SLP_TYP data */

#define DEBUG_RESUME
 
#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1A (SLP_TYP data) %x\n", PM1Acontrol));
#endif

	status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1A_CONTROL,
					PM1Acontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1B (SLP_TYP data) %x\n", PM1Bcontrol));
#endif

  status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1B_CONTROL,
					PM1Bcontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

	/* Insert SLP_ENABLE bit */

	PM1Acontrol |= sleep_enable_reg_info->access_bit_mask;
	PM1Bcontrol |= sleep_enable_reg_info->access_bit_mask;

	/* Write #2: SLP_TYP + SLP_EN */

	ACPI_FLUSH_CPU_CACHE();

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1A (SLP_TYP + SLP_EN) %x\n", PM1Acontrol));
#endif
 
	status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1A_CONTROL,
					PM1Acontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1B (SLP_TYP + SLP_EN) %x\n", PM1Bcontrol));
#endif
 
	status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1B_CONTROL,
					PM1Bcontrol);
	if (ACPI_FAILURE(status)) {
		return_ACPI_STATUS(status);
	}

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Just done sleep S3\n"));
#endif
You'll see I've added ACPI_DEBUG_PRINT() before each call to acpi_hw_register_write(). The code is conditional (wrapped in #ifdef preprocessor blocks) and won't be included by the compiler unless DEBUG_RESUME is defined. To make things simple here I've included that definition in the source, but usually you'd use the compiler directive EXTRA_CFLAGS when doing the build. It would be added to the ACPI hardware Makefile

drivers/acpi/hardware/Makefile
Code:
EXTRA_CFLAGS += $(ACPI_CFLAGS) -DDEBUG_RESUME
After building and installing the kernel with this code in, doing a suspend reports:
Code:
kernel: [  242.116000] agpgart-intel 0000:00:00.0: LATE suspend
kernel: [  242.116000]  hwsleep-0285 [03] enter_sleep_state     : Entering sleep state [S3]
kernel: [  242.116000]  hwsleep-0310 [03] enter_sleep_state     : Writing PM1A (SLP_TYP data) 1401
kernel: [  242.116000]  hwsleep-0321 [03] enter_sleep_state     : Writing PM1B (SLP_TYP data) 1401
kernel: [  242.116000] Back to C!
kernel: [  245.116000] agpgart-intel 0000:00:00.0: EARLY resume
From this we can see that the second call to acpi_hw_register_write() fails in some way because it never reaches "Writing PM1A (SLP_TYP + SLP_EN)".

Because I'm still not seeing the error-value being reported via return_ACPI_STATUS() I'm going to add another bit of code to do that:
Code:
  status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1B_CONTROL,
					PM1Bcontrol);
	if (ACPI_FAILURE(status)) {
#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Failed writing PM1B (SLP_TYP data). error-code 0x%x\n", status));
#endif

		return_ACPI_STATUS(status);
	}
I'll rebuild the kernel and test again.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA
IntuitiveNipple is offline  
Old July 22nd, 2007   #8
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Post Premature Partial ACPI Suspend?

The issue becomes ever more interesting. I added debug messages either side of the suspect acpi_hw_register_write() and both are activated and the return status from the call is AE_OK (0), which is supposed to be success (which explains why the message after ACPI_FAILURE() didn't print:
Code:
kernel: [  328.044000]  hwsleep-0285 [03] enter_sleep_state     : Entering sleep state [S3]
kernel: [  328.044000]  hwsleep-0308 [03] enter_sleep_state     : Writing PM1A (SLP_TYP data) 1401
kernel: [  328.044000]  hwsleep-0319 [03] enter_sleep_state     : Writing PM1B (SLP_TYP data) 1401
kernel: [  328.044000]  hwsleep-0326 [03] enter_sleep_state     : Just written PM1B (SLP_TYP data). status=0x0
kernel: [  328.044000] Back to C!
Here's the updated code:
drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1B (SLP_TYP data) %x\n", PM1Bcontrol));
#endif

  status = acpi_hw_register_write(ACPI_MTX_DO_NOT_LOCK,
					ACPI_REGISTER_PM1B_CONTROL,
					PM1Bcontrol);
#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Just written PM1B (SLP_TYP data). status=0x%x\n", status));
#endif
It seems as if the write to the register is prematurely putting the PC into partial sleep mode.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA
IntuitiveNipple is offline  
Old July 23rd, 2007   #9
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Post ACPI_FLUSH_CPU_CACHE() is the culprit - but why?

Adding additional debug messages shows that the issue occurs when ACPI_FLUSH_CPU_CACHE() is called:
Code:
kernel: [  251.532000]  hwsleep-0285 [03] enter_sleep_state     : Entering sleep state [S3]
kernel: [  251.532000]  hwsleep-0308 [03] enter_sleep_state     : Writing PM1A (SLP_TYP data) 1401
kernel: [  251.532000]  hwsleep-0319 [03] enter_sleep_state     : Writing PM1B (SLP_TYP data) 1401
kernel: [  251.532000]  hwsleep-0326 [03] enter_sleep_state     : Just written PM1B (SLP_TYP data). status=0x0
kernel: [  251.532000]  hwsleep-0343 [03] enter_sleep_state     : About to flush CPU cache
kernel: [  251.532000] Back to C!
So this is where the brains of ACPI suspend gurus need to be brought in to try and determine why this is happening, and how to fix it.

The additional debug messages:
drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
	/* Write #2: SLP_TYP + SLP_EN */
#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "About to flush CPU cache\n"));
#endif

	ACPI_FLUSH_CPU_CACHE();

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Writing PM1A (SLP_TYP + SLP_EN) %x\n", PM1Acontrol));
#endif
The ACPI_FLUSH_CPU_CACHE() macro expands to:
include/asm-i386/system.h
Code:
#define wbinvd() \
         __asm__ __volatile__ ("wbinvd": : :"memory")
wbinvd is the Intel CPU instruction "Write Back and Invalidate Cache".

Quote:
Writes back all modified cache lines in the processor’s internal cache to main memory
and invalidates (flushes) the internal caches. The instruction then issues a special-
function bus cycle that directs external caches to also write back modified data and
another bus cycle to indicate that the external caches should be invalidated.

After executing this instruction, the processor does not wait for the external caches
to complete their write-back and flushing operations before proceeding with instruc-
tion execution. It is the responsibility of hardware to respond to the cache write-back
and flush signals.

The WBINVD instruction is a privileged instruction. When the processor is running in
protected mode, the CPL of a program or procedure must be 0 to execute this
instruction. This instruction is also a serializing instruction (see “Serializing Instruc-
tions” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 3A).
The next step is to move the call to ACPI_FLUSH_CPU_CACHE() to before the acpi_hw_register_write() calls begin and see if it changes the behaviour.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA

Last edited by IntuitiveNipple; July 23rd, 2007 at 12:56 PM..
IntuitiveNipple is offline  
Old July 23rd, 2007   #10
IntuitiveNipple
Skinny Soy Caramel Ubuntu
 
Join Date: Nov 2006
Location: España
Beans: 668
Ubuntu Development Release
Post Intel WBINVD instruction the cause?

With ACPI_FLUSH_CPU_CACHE() moved to be prior to the sleep register writes acpi_enter_sleep_state() the problem still occurs. It is beginning to feel like some kind of bug with the cache.
Code:
kernel: [  352.232000]  hwsleep-0285 [03] enter_sleep_state     : Entering sleep state [S3]
kernel: [  352.232000]  hwsleep-0301 [03] enter_sleep_state     : About to flush CPU cache
kernel: [  352.232000] Back to C!
The revised source code:

drivers/acpi/hardware/hwsleep.c::acpi_enter_sleep_state()
Code:
	ACPI_DEBUG_PRINT((ACPI_DB_INIT,
			  "Entering sleep state [S%d]\n", sleep_state));

	/* Clear SLP_EN and SLP_TYP fields */

	PM1Acontrol &= ~(sleep_type_reg_info->access_bit_mask |
			 sleep_enable_reg_info->access_bit_mask);
	PM1Bcontrol = PM1Acontrol;

	/* Insert SLP_TYP bits */

	PM1Acontrol |=
	    (acpi_gbl_sleep_type_a << sleep_type_reg_info->bit_position);
	PM1Bcontrol |=
	    (acpi_gbl_sleep_type_b << sleep_type_reg_info->bit_position);

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "About to flush CPU cache\n"));
#endif

	ACPI_FLUSH_CPU_CACHE();

#ifdef DEBUG_RESUME
 ACPI_DEBUG_PRINT((ACPI_DB_INIT, "Done flush CPU cache\n"));
#endif
The next test is to do without the flush.
__________________
TJ.
"The only intuitive interface is the nipple. After that it's all learned"
So try my Ubuntu PPA
IntuitiveNipple is offline  
 

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 05:01 PM.


vBulletin ©2000 - 2010, Jelsoft Enterprises Ltd. Ubuntu Logo, Ubuntu and Canonical © Canonical Ltd. Tango Icons © Tango Desktop Project. lingonberry