This chapter describes programmable I/O (PIO) architecture, first describing the PIO address and then describing the flow of PIO operations.
On SGI Altix 3000 systems, the PIO address that the device driver and CPU encounter is different from the PCI-X addresses that are initialized on the base address registers (BARs). PCI-X host bridge adapters on SGI Altix 3000 systems can generate only single address cycles for PIO read and write operations. This limits the size of the PCI address on a PCI bus to 32 bits.
PCI-X host adapters on SGI Altix 3000 systems provide a set of device registers that help maintain PIO attributes. Two of the most common PIO attributes are as follows:
| DEV_IO_MEM | Enables device memory or I/O space. When set, the request generated on the PCI bus is for the PCI memory resource. Otherwise, the request is generated for the PCI I/O resource. | |
| DEV_OFF | Specifies PCI-X address offset bits. These 12 bits replace bits 31 to 20 of the PIO address that the PCI-X host bridge adapter obtains from the CPU. |
The diagram in Figure 7-1, provides the breakdown of a “mapped” PIO address as seen by the device driver on the CPU. Note that this address is quite different from the addresses that are initialized on the BARs.
The following sections describe the flow of PIO operation to local and remote PCI-X devices.
The flow of PIO to a local PCI-X device is depicted in Figure 7-2. Following the figure is an explanation of the numbered components.
CPU sees that the address is “uncached.” It forwards the address on the front side bus (FSB).
The SHub receives the request and determines from the NASID (bits 48 to 38) that it is targeted to itself.
The SHub forwards the request to the attached PCI-X host bridge adapter via the Xtown2 link.
The PCI-X host bridge adapter receives the request, parses the addresses, and places the modified PIO address (PCI-X bus address) on the PCI-X bus.
The PCI-X device with the matching BARs responds to the request.
The flow of PIO to a remote PCI-X device is depicted in Figure 7-3. Following the figure is an explanation of the numbered components.
The CPU places the PIO mapped address on the FSB.
The local SHub receives the request and determines from the NASID that it is not targeted to itself.
The local SHub forwards the request via the NUMAlink to the targeted remote SHub.
The PCI-X host bridge adapter receives the request, parses the addresses, and places the modified PIO Address (PCI-X bus address) on the PCI-X bus.
The PCI-X device with the matching BARs responds to the request.
The PIO address that the CPU issues does not look anything like the PCI address on the targeted base address register (BAR). Consider the following example:
Example 7-1. Address Translation
A PCI-X device has requested for I/O a resouce of 512 bytes. It is connected via NASID (Node ID) 0x0 and it is on that node's local PCI bus (widget identifier) 0xe.
At boot time the system has initialized the BARs to 0x1fff_0001. Given the previous information, the "mapped" PIO address looks like the following to the device driver and the CPU:
0xc000_0008_0e4f_0000 |
The diagram in Figure 7-4, provides the breakdown of an address that the CPU issues. This is the address you get from the pci_dev structure.
The targeted PCI-X host bridge adapter gets the PIO address as the following:
0x0e4f_0000 |
The device register contents of 0x11ff specifies the following:
DEV_IO_MEM == IO DEV_OFF == 0x1ff |
The relevent device register is the one identified by PCI bus widgets 0xe and 0x4. With this information from the device register for this address, the PCI-X host bridge adapter places the following PCI-X bus PCI address on widget 0xe as a PCI I/O transaction (read or write operation):
0x1fff_0000 |
The PCI-X host bridge adapter strips the 0xe4 (from 0x0e4f_0000 to form 0xf_0000) and prepends 0x1ff (from the device register DEV_OFF value) to 0xf_0000 to make the PCI-X bus address 0x1fff_0000. This value matches the value as initialized in the BAR.
| Note: Reading the BARs for an address to use as a PIO will definitely not work on SGI Altix 3000 systems. It might or might not work on any other systems. Most importantly, it makes your code not portable. |
Device drivers on SGI Altix 3000 systems must use the PCI resource routines described in the following sections to obtain either the I/O or memory PIO addresses that are initialized by the platform. Device drivers must not read or use the BARs directly.
Linux provides the following PCI resource interfaces to obtain the PCI I/O resource address.
To retrieve the start I/O resource address:
pci_resource_start(dev,bar) |
To retrieve the ending address of an I/O resource address:
pci_resource_end(dev,bar) |
To obtain the length of an I/O resource address:
pci_resource_len(dev,bar) |
For example:
reg_base = pci_resource_start(pdev, 0);
reg_len = pci_resource_len(pdev, 0);
flags = pci_resource_flags(dev,bar);
if (flags & IORESOURCE_IO) {
// This is an I/O resource.
} |
Linux provides the following PCI resource interfaces to obtain PCI memory resource addresses.
To retrieve the start memory resource address:
pci_resource_start(dev,bar) |
To retrieve the ending address of a memory resource address:
pci_resource_end(dev,bar) |
To obtain the length of a memory resource address:
pci_resource_len(dev,bar) |
For example:
reg_base = pci_resource_start(pdev, 0);
reg_len = pci_resource_len(pdev, 0);
flags = pci_resource_flags(dev,bar);
if (flags & IORESOURCE_MEM) {
// This is a memory resource.
} |
If the device driver provides the ability to memory-map the memory resource address into user space, the pgprot_noncached () macro must be used to set appropriate caching attributes on the corresponding virtual memory area.
For example:
#include <asm/page.h>
static int
my_dev_mmap(struct file *filp, struct vm_area_struct *vma) {
unsigned long my_dev_page;
struct pci_dev *dev;
/* Determine dev by methods specific to your driver, then... */
/* Check validity of input arguments, then... */
my_dev_page = pci_resource_start(dev, 0) + MY_DEV_PAGE_OFFSET;
vma->vm_flags |= VM_IO | VM_RESERVED;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
#if defined(CONFIG_IA64) || define(CONFIG_IA64_GENERIC)
my_pci_page = REGION_OFFSET(pci_resource_start(dev,0) + MY_DEV_PAGE_OFFSET);
#endif
return io_remap_page_range(vma, vma->vm_start, my_pci_page,
MY_DEV_PAGE_LEN, vma->vm_page_prot);
}
|
It is strongly recommended that device drivers call the following PCI-X resource reservation routines to ensure that no other drivers are currently using a resource by mistake.
To reserve a PCI I/O resource region:
request_region(start,n,name) |
To reserve a PCI memory resource region:
request_mem_region(start,n,name) |
To release the PCI I/O resource:
release_region(start,n); |
To release the PCI memory resource region:
release_mem_region(start,n); |
For example:
request_region(reg_base, reg_len, "any_id");
.....
release_region(reg_base, reg_len); |
You should reference PCI-X I/O resource addresses by using the following macros.
Single byte access macros:
inb(address); outb(value, address); |
Single word access macros:
inw(address); outw(value, address); |
Single long access macros:
inl(address); outl(value, address); |
Multiple byte access macros:
insb(address, value_address, byte_count); outsb(address, value_address, byte_count); |
Multiple word access macros:
insw(address, value_address, word_count); outsw(address, value_address, word_count); |
Multiple long access macros:
insl(address, value_address, long_count); outsl(address, value_address, long_count); |
| Note: Even though on SGI Altix 3000 systems, PCI-X I/O resource addresses are mapped addresses and can be referenced without using any of the macros in the preceding list, it is recommended that you use these macros so that your code is portable. |
PCI-X memory resource addresses should not be used alone. Use the following platform-independent macros with PCI-X memory resource addresses.
Single byte access macros:
readb(address); writeb(value, address); |
Single word access macros:
readw(address); writew(value, address); |
Single long access macros (4 bytes):
readl(address); writel(value, address); |
Single unsigned long access macros (8 bytes):
readq(address); writeq(value, address); |
PIO write operations on SGI Altix 3000 systems can be cached in the various system components prior to actual arrival at the device. These PIO write operations are called “posted” operations. To explicitly flush these write operations, the device driver is required to perform a PIO read operation (also known as a “PIO flush”) after the last significant PIO write operation.
The need to perform PIO flushes becomes apparent when you consider a multithreaded driver. Multithreaded drivers use a memory lock for synchronization, as shown in the example sequence in Table 7-1.
Time | CPU 0 | CPU 1 |
|---|---|---|
n | (1) Grab lock (This CPU wins the race for the lock) | (1) Grab lock (This CPU must wait, as CPU 0 has the lock) |
n + 1 | (2) PIO write of Oxa to device x | (2) Waiting |
n + 2 | (3) Release lock (but no guarantee that #2 has completed) | (3) Receive lock |
n + 3 | (4) No activity | (4) PIO write of Oxb to device x |
n + 4 | (5) Device can receive Oxb before Oxa |
|
To avoid the releasing of the memory lock before the PIO write has completed, drivers for SGI Altix 3000 systems can be programmed to issue an additional operation (a read operation to the same controller, called a PIO flush) to force the data to be delivered to the device before the memory lock is released and a second thread can issue a read operation. The sequence shown in Table 7-2, illustrates the correct usage.
Table 7-2. Correct Memory Lock Usage
Time | CPU 0 | CPU 1 |
|---|---|---|
n | (1) Grab lock (This CPU wins the race for the lock) | (1) Grab lock (This CPU must wait, as CPU 0 has the lock) |
n + 1 | (2) PIO write of Oxa to device x | (2) Waiting |
n + 2 | (3) PIO read to the same controller (4) Device receives Oxa | (3) Waiting |
n + 3 | (5) Release lock | (4) Receive lock |
n + 4 | (6) No activity | (5) PIO write of Oxb to device x (6) PIO read to the same controller (7) Device receives Oxb |
Even though at n + 1 CPU 0 issued the PIO write, it does not guarantee that the device will have received the data (Oxa) before n + 3. Similarly, it does not guarantee that the PIO write from CPU 1 at n + 3 does not arrive at the device before the operation that was issued by CPU 0 at n + 1.
Following is a more concrete example from a hypothetical device driver:
... CPU A: spin_lock_irqsave(&dev_lock, flags) CPU A: val = readl(my_status); CPU A: ... CPU A: writel(newval, ring_ptr); CPU A: spin_unlock_irqrestore(&dev_lock, flags) ... CPU B: spin_lock_irqsave(&dev_lock, flags) CPU B: val = readl(my_status); CPU B: ... CPU B: writel(newval2, ring_ptr); CPU B: spin_unlock_irqrestore(&dev_lock, flags) ... |
In the case above, the device may receive newval2 before it receives newval, which could cause problems. Following is a fix for the problem:
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
(***The following line fixes the previous problem***)
CPU A: (void)readl(safe_register); /* maybe a config register? */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* maybe a config register? */
CPU B: spin_unlock_irqrestore(&dev_lock, flags) |
Here, the read operations from safe_register cause the I/O chipset to flush any pending write operations before actually posting the read operation to the chipset, thus preventing possible data corruption.
For more informaton, see Appendix A, “Memory Operation Ordering on SGI Altix Systems”.
SGI Altix system hardware provides the capability to buffer write DMA buffers. These buffers are flushed only when the device generates an interrupt.
PCI specification requires that any bridge that can buffer DMA write buffers must ensure that these posted buffers are flushed whenever a PIO read is issued to the device. Because this specification is not supported on SGI Altix hardware, all of the PIO read macros (for example, inX() and readX()) have been enhanced to perform a DMA write flush before returning to the caller. However, on some devices and device drivers, this enhancement can cause a negligible performance degradation. Because of this potential performance implication, a “fast” PIO call procedure is available. These calls do not perform any DMA write buffer flushing. For devices that do not depend on a PIO read to flush posted write DMA buffers, you can use the following set of interfaces:
sn_inb_fast (unsigned long port) sn_inw_fast (unsigned long port) sn_inl_fast (unsigned long port) sn_readb_fast (void *addr) sn_readw_fast (void *addr) sn_readl_fast (void *addr) |
These calls are defined in the include/asm-ia64/sn/sn2/io.h file.