Syscalls are the way a Software Application Program can interact with the underlining Operating System, everything from displaying a character on the screen to reading a file from the harddrive will generate syscalls. In this post we’ll analyse syscalls in details with examples on different CPU architectures and Operating Systems.
As it is defined on wikipedia ( System Calls ) a system call is a software mechanism used by an application program to “request” services from an Operating System (OS).
An Operating System Service is a “function” (or a set of functions) offered by the OS to User Application Programs as:
- Create a new Process ( usually in UNIX/Linux world is the fork() function and in Windows world is ntCreateProcess() etc… )
- Create/Open a file
- “talk” to another process
- write characters on the screen or change pixels status on the screen
there are many functions offered by Operating Systems and, with improvements of OS generations, also nature of those functions is changing; for example each new release of Windows tend to incorporate more graphic routines into the kernel, Linux now offers what’s called NAPI (New API) which is a modification to the standard device driver packet processing framework, which is designed to improve the performance of high-speed networking.
How does System Calls work?
Because all Operating System Kernels work in a separate address space, not directly accessible by Application Programs, when an Application Program call a System Service through a system call, the system needs to switch the CPU context from User Mode (this is the CPU context in which Application Programs gets executed and it have less privileges) to Kernel Mode (also known as Supervisor Mode, which is the CPU context in which the Kernel gets executed, this context has full privileges and so can access the kernel address space), to execute the system service.
Before to perform this operation, the library version of the system call (which is the one effectively called by the Application Program and usually accessed via Dynamic Link Libraries or Static one linked at compilation time) will place the proper arguments in relevant registers and the system call number in special processor “transfer registers”, then it will call a special processor instruction (also called trap) that will cause the processor to switch mode and the OS kernel will start to execute the desired service.
To understand better this sequence of actions let’s have a look at a very very simple read() LIBRARY function (let’s see the disassembled code version to better understand the set of actions made by the library version of the syscall, which is relative to the syscall, but it’s a different code executed in user mode!):
.global _read; _read: movl $0x6, %eax; int $0x80; ret
The library function do very little, it moves the value 5 into the CPU register %eax and call the x86 trap instruction int $0x80. The value in %eax is going to be used by the kernel to vector (determines) which system call is being invoked.
When the int instruction gets executed the hardware will take over (in the sense that the CPU will be alerted that it has to execute a set of automatic actions). on Intel x86 this is usually means moving from a CPL (Current Privilege Level) of 3 (which is user mode) to CPL 0 (which is the kernel mode).
Then the hardware transfers control to the trap vectors of the system (to enable the hardware to know what code to run when a particular trap occurs the OS, when booting, must make sure to inform the hardware of the location of the code to run when such traps take place, this operation usually happen at the boot of the operating system).
At this point the trap handler routine of the kernel will be executed (each operating system may have a different stack of function calls to set everything up right to execute the requested system service), the trap handler routine will call the correct system service routine.
Once the system service routine has been executed, it will generally report an integer value for the trap handler routine (it’ll be stored back into the transfer registers) and, eventually, results values which the kernel caller function will place into the CPU relevant registers.
At this point the control is generally passed back to the trap handler which may consist of a single function or of a stack of functions (depending on the OS) that will restore the CPU state and set back the CPL to 3 (user mode). Because the trap handler function restored the state of the CPU as when the trap was rised up by the int instruction the execution flow will continue from the library syscall function instruction after the int instruction.
Then, generally, the library syscall function will recover the syscall result values from the relevant registers and will pass them to the Application Program.
During this sequence of actions the Application Programm will “freeze temporarily” (the correct term is Suspend) until the kernel will be done with the execution of the system call. We will use the term Process from now on to identify the Application Program.
The time of this suspension depends on the nature of the syscall and Kernel and the Process architecture. What follows are some examples of this:
- Single threaded Kernels (like MS DOS, RISC OS etc…) will always suspend the Process (and all the other Processes in concurrent execution) until the syscall is completed.
- Multi-threaded Kernels on a single CPU system (like Windows NT, XP etc… when executed on old Pentium CPU) will suspend the calling process until the syscall is completed and all the other processes in concurrent execution until the kernel thread gets re-scheduled.
- Multi-threaded Kernels on multiple CPU system (like modern Linux, XP, W7 when executed on modern CPUs like new multicore ARMs or modern Intel CPUs) will suspend the calling process until the kernel thread will get re-scheduled and other processes (based on the number of CPU available to process all the concurrent threads)
- Multi-threaded Process on multi-threaded kernels will have the calling thread suspended (and eventually all other threads depending on the calling thread) until the syscall will be completed.
When the syscall requires hardware intervention, for example when the Process tries to open a file on a disk, other elements will get into the equation, like IRQs to control the device and the device itself (so how the device will manipulate the data and how the device will access memory, device latency and so on).
It is very important to understand operating system architecture and the way certain devices work when developing applications that requires LOW LATENCY and HIGH PERFORMANCES while using different devices to manipulate data.
Intel and AMD Processors
On Intel x86 32bits Architecture, usually, the old method to cause a trap was to cause an interrupt called 0x80 (0x stays for Hexadecimal notation).
Note: On Linux, BSD, BeOS world no stack is used to pass parameters to the kernel so you should use (sequentially) ebx, ecx, edx, esi and edi registers.
On ARM Architecture (with RISC OS) there is some variants to this mechanism so lets see the ARM-RISC OS characteristics here:
On PowerPC Architecture we have this:
On Sparc Architecture we have:
Real world Assembler code examples to explain concepts
What follows is an example (in Assembler Language) of how to use System Calls for each architecture previously described.
As you can see by the code, each assembler program just load the string “Hello, world!” in the appropriate registers (because it is a string we load only the string-pointer!!!) and then call the right OS Service routine that will display the sting on the screen. The program will be launched in user address space and when it will perform a kernel call then the syscall process will start.
# # GENERIC Intel x86 (ia32) # # This file assembles to form hello world program run # from the command line with no arguments. # .data msg: .string "Hello, world!n" len = . - msg # length of the string .text # we must export the entry point # to the ELF linker or .global _start # loader. They conventionally # recognise _start as their # entry point. Use ld -e foo to # override the default. _start: # write the string to stdout movl $len,%edx # third argument: message length movl $msg,%ecx # second argument: pointer to # message to write movl $1,%ebx # first argument: file handle (stdout) movl $4,%eax # system call number (sys_write) int $0x80 # <=== call kernel # and exit movl $0,%ebx # first argument: exit code movl $1,%eax # system call number (sys_exit) int $0x80 # <=== call kernel
; ; ARM s.HelloW ; ; This file assembles to form a hello world program run ; from the command line with no arguments. ; AREA |main|, CODE, READONLY ; Use the GET directive to ; include a list of SWI ; names as if typed here GET ^.AsmHdrs.h.SWINames ; Now for the actual program code ENTRY ; ObjAsm entry point directive SWI OS_GetEnv ; <=== call kernel MOV r13, r1 ; set up stack pointer SWI OS_WriteS ; <=== call kernel = "Hello, World!",13,10,0 ALIGN SWI OS_Exit ; <=== call kernel END
# # PowerPC (PPC32) # # This file assembles to form hello world program run # from the command line with no arguments. # .data msg: .string "Hello, world!n" len = . - msg # length of the string .text .global _start _start: # write the string to stdout li 0,4 # syscall number (sys_write) li 3,1 # first argument: file descriptor # (stdout) # second argument: pointer to # message to write lis 4,msg@ha # load top 16 bits of &msg addi 4,4,msg@l # load bottom 16 bits li 5,len # third argument: message length sc # <=== call kernel # and exit li 0,1 # syscall number (sys_exit) li 3,1 # first argument: exit code sc # <=== call kernel
! ! Sparc Assembler ! (because i don't have a Spark where to try ! this code I used the excellent Bruce Ediger's code!) ! ! Bruce Ediger ! This code works on NetBSD for Spark ! Solaris for Spark ! SunOS 4.x .text .align 4 .global start start: mov 0,%o0 ! stdout set string,%o1 ! address of the string mov 14,%o2 ! number of bytes in string mov 4,%g1 ! write(2) system call - ! write(0, string, 14); ta 0 ! <=== call the kernel mov 0,%o0 ! exit code 0 mov 1,%g1 ! _exit(2) system call - ! exit(0); ta 0 ! <=== call the kernel .align 4 string: .ascii "Hello, World!n"
1) Here there is a list of Windows System calls (NT/2000/XP/2003/Vista/2008/7/8): http://j00ru.vexillium.org/ntapi/
2) RISC OS Module programming from official RISC OS PRM Manual, full chapter, lots of details: http://www.riscos.com/support/developers/prm/modules.html
3) Linux cool Syscall interactive table: http://syscalls.kernelgrok.com/
4) FreeBSD Syscall calling conventions and extra info: http://www.freebsd.org/doc/en/books/developers-handbook/x86-system-calls.html
Thanks for reading and, if you enjoyed this post, please support my blog by visiting my on-line hacking and engineering merchandise shop on redbubble.com by clicking here, thank you!