Posted on

System Calls (Part I)

Syscalls are the way a Software Application Program can interact with the underlining Operating System, everything from displaying a character on the screen to reading a file from the harddrive will generate syscalls. In this post we’ll analyse syscalls in details with examples on different CPU architectures and Operating Systems.

Overview

As it is defined on wikipedia ( System Calls  ) a system call is a software mechanism used by an application program to “request” services from an Operating System (OS).

An Operating System Service is a “function” (or a set of functions) offered by the OS to User Application Programs as:

  • Create a new Process ( usually in UNIX/Linux world is the fork() function and in Windows world is ntCreateProcess() etc… )
  • Create/Open a file
  • “talk” to another process
  • write characters on the screen or change pixels status on the screen

there are many functions offered by Operating Systems and, with improvements of OS generations, also nature of those functions is changing; for example each new release of Windows tend to incorporate more graphic routines into the kernel,  Linux now offers what’s called NAPI (New API) which is a modification to the standard device driver packet processing framework, which is designed to improve the performance of high-speed networking.

Note:
  • In Windows world “system calls” are called System Service Calls (as described in the “Microsoft Windows Internals” book Russinovich and Solomon).
  • In RISC OS world “system calls” are called SWI – SoftWare Interrupts (as described in the “RISC OS Programmer’s Reference Manual Volume 1” RISC OS Ltd/Castle Technologies Ltd).
  • In BSD/Linux world system calls are System Calls 😉

How does System Calls work?

Because all Operating System Kernels work in a separate address space, not directly accessible by Application Programs, when an Application Program call a System Service through a system call, the system needs to switch the CPU context from User Mode (this is the CPU context in which Application Programs gets executed and it have less privileges) to Kernel Mode (also known as Supervisor Mode, which is the CPU context in which the Kernel gets executed, this context has full privileges and so can access the kernel address space), to execute the system service.

Before to perform this operation, the library version of the system call (which is the one effectively called by the Application Program and usually accessed via Dynamic Link Libraries or Static one linked at compilation time) will place the proper arguments in relevant registers and the system call number in special processor “transfer registers”, then it will call a special processor instruction (also called trap) that will cause the processor to switch mode and the OS kernel will start to execute the desired service.

To understand better this sequence of actions let’s have a look at a very very simple read() LIBRARY function (let’s see the disassembled code version to better understand the set of actions made by the library version of the syscall, which is relative to the syscall, but it’s a different code executed in user mode!):

.global _read;

_read:
 movl $0x6, %eax; 
 int $0x80;
 ret

The library function do very little, it moves the value 5 into the CPU register %eax and call the x86 trap instruction int $0x80. The value in %eax is going to be used by the kernel to vector (determines) which system call is being invoked.

When the int instruction gets executed the hardware will take over (in the sense that the CPU will be alerted that it has to execute a set of automatic actions). on Intel x86 this is usually means moving from a CPL (Current Privilege Level) of 3 (which is user mode) to CPL 0 (which is the kernel mode).

Then the hardware transfers control to the trap vectors of the system (to enable the hardware to know what code to run when a particular trap occurs the OS, when booting, must make sure to inform the hardware of the location of the code to run when such traps take place, this operation usually happen at the boot of the operating system).

At this point the trap handler routine of the kernel will be executed (each operating system may have a different stack of function calls to set everything up right to execute the requested system service), the trap handler routine will call the correct system service routine.

Once the system service routine has been executed, it will generally report an integer value for the trap handler routine (it’ll be stored back into the transfer registers) and, eventually, results values which the kernel caller function will place into the CPU relevant registers.

At this point the control is generally passed back to the trap handler which may consist of a single function or of a stack of functions  (depending on the OS) that will restore the CPU state and set back the CPL to 3 (user mode). Because the trap handler function restored the state of the CPU as when the trap was rised up by the int instruction the execution flow will continue from the library syscall function instruction after the int instruction.

Then, generally, the library syscall function will recover the syscall result values from the relevant registers and will pass them to the Application Program.

During this sequence of actions the Application Programm will “freeze temporarily” (the correct term is Suspend) until the kernel will be done with the execution of the system call. We will use the term Process from now on to identify the Application Program.

Note:
  • In RISC OS world a Process is called Task.
  • In Windows world a Application program is called User Process.

The time of this suspension depends on the nature of the syscall and Kernel and the Process architecture. What follows are some examples of this:

  • Single threaded Kernels (like MS DOS, RISC OS etc…) will always suspend the Process (and all the other Processes in concurrent execution) until the syscall is completed.
  • Multi-threaded Kernels on a single CPU system (like Windows NT, XP etc… when executed on old Pentium CPU) will suspend the calling process until the syscall is completed and all the other processes in concurrent execution until the kernel thread gets re-scheduled.
  • Multi-threaded Kernels on multiple CPU system (like modern Linux, XP, W7 when executed on modern CPUs like new multicore ARMs or modern Intel CPUs) will suspend the calling process until the kernel thread will get re-scheduled and other processes (based on the number of CPU available to process all the concurrent threads)
  • Multi-threaded Process on multi-threaded kernels will have the calling thread suspended (and eventually all other threads depending on the calling thread) until the syscall will be completed.

When the syscall requires hardware intervention, for example when the Process tries to open a file on a disk, other elements will get into the equation, like IRQs to control the device and the device itself (so how the device will manipulate the data and how the device will access memory, device latency and so on).

It is very important to understand operating system architecture and the way certain devices work when developing applications that requires LOW LATENCY and HIGH PERFORMANCES while using different devices to manipulate data.

Architectural Notes

Intel and AMD Processors

On Intel x86 32bits Architecture, usually, the old method to cause a trap was to cause an interrupt called 0x80 (0x stays for Hexadecimal notation).

  • We use the ASM Instruction INT (Interrupt) to call 0x80 in x86 assembler on Linux, BSD, BeOS
  • On Windows:
    • On x86 architecture for processors before the Pentium II, Windows uses the instruction int 0x2e which results in the described trap.
    • On x86 architecture from Pentium II, Windows uses a special instruction called sysenter. This instruction was defined by Intel as specifically for “fast System Service Dispatcher” (this is the name of System Calls Mechanism in Windows world).
      • Note: also Linux from kernel 2.6.x uses sysenter and sysexit istructions to call system calls, apparently for performace improvement.
    • On AMD architecture from K6 to all 32-bits AMD processors, Windows uses a special instruction called syscall, which works similar to sysenter.
    • On x86_64 architecture Windows uses always syscall.
    • On IA64 architecture, Windows uses a special instruction called EPC (Enter Privileged Mode).
    • The EAX register is used to pass the System Service Call number to the kernel and the EDX register points to the list of parameters or the stack stores the caller arguments.

Note: On Linux, BSD, BeOS world no stack is used to pass parameters to the kernel so you should use (sequentially) ebx, ecx, edx, esi and edi registers.


ARM Processors

On ARM Architecture (with RISC OS) there is some variants to this mechanism so lets see the ARM-RISC OS characteristics here:

  • We always refer to ASM instruction SWI (SoftWare Interrupt)
  • We have 24 bits to identify the RISC OS service routine we want to call (also if in C and BBC BASIC we have OS_SWI names for easly identify SO routines)
  • To pass values (arguments) to the System Service we can use processor registers from R0 to R9 (with R0 and R9 included) and if we use BBC BASIC then only R0 – R7 (both of them included)
  • When a SWI instruction is executed, the processor enters Supervisor Mode, the CPSR (Current Program Status Register, R13) is stored into spsr_SVC ( Stored Program Status Register_svc, in other words register R13 is copied and then replaced with r13_svc ), and the return address is stored in lr_SVC ( Link Register_svc, in other words R14 is copied and then replaced with register r14_svc ). If you call a SWI while in supervisor mode you must store lr_SVC and spsr_SVC to ensure that the original values of the link register and the SPSR are not lost.

PowerPC Processors

On PowerPC Architecture we have this:

  • We use SC (System Call) ASM Instruction to call the kernel.
    • The sc instruction causes a system call interrupt, what’s follow are IBM official information about sc:
      • The effective address (EA) of the instruction following the sc instruction is placed into the Save Restore Register 0 (SRR0).
      • Bits 0, 5-9, and 16-31 of the Machine State Register (MSR) are placed into the corresponding bits of Save Restore Register 1 (SRR1).
      • Bits 1-4 and 10-15 of SRR1 are set to undefined values.
      • The sc instruction serves as both a basic and an extended mnemonic. In the extended form, the LEV field (LEV its a field located between the 20th and 26th bits of sc op code) is omitted and assumed to be 0.
      • sc is ONLY PowerPC related, the POWER family use svc istruction (SuperVisor Call), both sc and svc have the same op code.

Sparc Processors

On Sparc Architecture we have:

  • OpenSolaris supports 3 different Software Traps (this is the OpenSolaris way to call the Software Interrupts to call the kernel routines)
    • 0x00 Used for system calls for binaries running in SunOS 4.x binary compatability mode.
    • 0x08 Used for 32-bit (ILP32) binary running on 64-bit (ILP64) kernel
    • 0x40 Used for 64-bit (ILP64) binary running on 64-bit (ILP64) kernel
  • The ASM instruction in this case is TA (Trap Always)

Real world Assembler code examples to explain concepts

What follows is an example (in Assembler Language) of how to use System Calls for each architecture previously described.

As you can see by the code, each assembler program just load the string “Hello, world!” in the appropriate registers (because it is a string we load only the string-pointer!!!) and then call the right OS Service routine that will display the sting on the screen. The program will be launched in user address space and when it will perform a kernel call then the syscall process will start.

#
# GENERIC Intel x86 (ia32)
#
# This file assembles to form hello world program run
# from the command line with no arguments.
#
.data
msg:
.string "Hello, world!n"
len = . - msg              # length of the string
.text                      # we must export the entry point
                           # to the ELF linker or
.global _start             # loader. They conventionally
                           # recognise _start as their
                           # entry point. Use ld -e foo to
                           # override the default.
_start:                    # write the string to stdout
movl $len,%edx             # third argument: message length
movl $msg,%ecx             # second argument: pointer to
                           # message to write
movl $1,%ebx               # first argument: file handle (stdout)
movl $4,%eax               # system call number (sys_write)
int $0x80                  # <=== call kernel
                           # and exit
movl $0,%ebx               # first argument: exit code
movl $1,%eax               # system call number (sys_exit)
int $0x80                  # <=== call kernel

;
; ARM s.HelloW
;
; This file assembles to form a hello world program run
; from the command line with no arguments.
;
AREA |main|, CODE, READONLY
                                ; Use the GET directive to
                                ; include a list of SWI
                                ; names as if typed here
GET ^.AsmHdrs.h.SWINames
                                ; Now for the actual program code
ENTRY                           ; ObjAsm entry point directive
SWI OS_GetEnv                   ; <=== call kernel
MOV r13, r1                     ; set up stack pointer
SWI OS_WriteS                   ; <=== call kernel
= "Hello, World!",13,10,0
ALIGN
SWI OS_Exit                     ; <=== call kernel
END

#
# PowerPC (PPC32)
#
# This file assembles to form hello world program run
# from the command line with no arguments.
#
.data
msg:
.string "Hello, world!n"
len = . - msg                   # length of the string
.text
.global _start
_start:                         # write the string to stdout
li 0,4                          # syscall number (sys_write)
li 3,1                          # first argument: file descriptor
                                # (stdout)
                                # second argument: pointer to
                                # message to write
lis 4,msg@ha                    # load top 16 bits of &amp;msg
addi 4,4,msg@l                  # load bottom 16 bits
li 5,len                        # third argument: message length
sc                              # <=== call kernel
                                # and exit
li 0,1                          # syscall number (sys_exit)
li 3,1                          # first argument: exit code
sc                              # <=== call kernel

!
! Sparc Assembler
! (because i don't have a Spark where to try
! this code I used the excellent Bruce Ediger's code!)
!
! Bruce Ediger
! This code works on NetBSD for Spark
! Solaris for Spark
! SunOS 4.x
.text
.align 4
.global start
start:
mov 0,%o0                           ! stdout
set string,%o1                      ! address of the string
mov 14,%o2                          ! number of bytes in string
mov 4,%g1                           ! write(2) system call - 
                                    ! write(0, string, 14);
ta 0                                ! <=== call the kernel
mov 0,%o0                           ! exit code 0
mov 1,%g1                           ! _exit(2) system call -
                                    ! exit(0);
ta 0                                ! <=== call the kernel
.align 4
string:
.ascii "Hello, World!n"

Links:
1) Here there is a list of Windows System calls (NT/2000/XP/2003/Vista/2008/7/8): http://j00ru.vexillium.org/ntapi/

2) RISC OS Module programming from official RISC OS PRM Manual, full chapter, lots of details: http://www.riscos.com/support/developers/prm/modules.html

3) Linux cool Syscall interactive table: http://syscalls.kernelgrok.com/

4) FreeBSD Syscall calling conventions and extra info: http://www.freebsd.org/doc/en/books/developers-handbook/x86-system-calls.html

Thanks for reading and, if you enjoyed this post, please support my blog by visiting my on-line hacking and engineering merchandise shop on redbubble.com by clicking here, thank you!🙂

Tagged: 32bits, 64bits, ARM, Assembler, BSD, cpu context, Intel, kernel address space, kernel linux, kernel mode, Linux, Operating System, PowerPC, PPC, RISC OS, Spark, syscall, Windows, x86, x86_64

Leave a Reply