diff --git a/man/io_uring.7 b/man/io_uring.7 new file mode 100644 index 0000000000000000000000000000000000000000..33621bffab46a6a062ca2824e4980cd331ab4fe4 --- /dev/null +++ b/man/io_uring.7 @@ -0,0 +1,781 @@ +.\" Copyright (C) 2020 Shuveb Hussain +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" + +.TH io_uring 7 2020-07-26 "Linux" "Linux Programmer's Manual" +.SH NAME +io_uring \- Asynchronous I/O facility +.SH SYNOPSIS +.nf +.B "#include " +.fi +.PP +.SH DESCRIPTION +.PP +.B io_uring +is a Linux-specific API for asynchronous I/O. +It allows the user to submit one or more I/O requests, +which are processed asynchronously without blocking the calling process. +.B io_uring +gets its name from ring buffers which are shared between user space and +kernel space. This arrangement allows for efficient I/O, +while avoiding the overhead of copying buffers between them, +where possible. +This interface makes +.B io_uring +different from other UNIX I/O APIs, +wherein, +rather than just communicate between kernel and user space with system calls, +ring buffers are used as the main mode of communication. +This arrangement has various performance benefits which are discussed in a +separate section below. +This man page uses the terms shared buffers, shared ring buffers and +queues interchangeably. +.PP +The general programming model you need to follow for +.B io_uring +is outlined below +.IP \(bu +Set up shared buffers with +.BR io_uring_setup (2) +and +.BR mmap (2), +mapping into user space shared buffers for the submission queue (SQ) and the +completion queue (CQ). +You place I/O requests you want to make on the SQ, +while the kernel places the results of those operations on the CQ. +.IP \(bu +For every I/O request you need to make (like to read a file, write a file, +accept a socket connection, etc), you create a submission queue entry, +or SQE, +describe the I/O operation you need to get done and add it to the tail of +the submission queue (SQ). +Each I/O operation is, +in essence, +the equivalent of a system call you would have made otherwise, +if you were not using +.BR io_uring . +You can add more than one SQE to the queue depending on the number of +operations you want to request. +.IP \(bu +After you add one or more SQEs, +you need to call +.BR io_uring_enter (2) +to tell the kernel to dequeue your I/O requests off the SQ and begin +processing them. +.IP \(bu +For each SQE you submit, +once it is done processing the request, +the kernel places a completion queue event or CQE at the tail of the +completion queue or CQ. +The kernel places exactly one matching CQE in the CQ for every SQE you +submit on the SQ. +After you retrieve a CQE, +minimally, +you might be interested in checking the +.I res +field of the CQE structure, +which corresponds to the return value of the system +call's equivalent, +had you used it directly without using +.BR io_uring . +For instance, +a read operation under +.BR io_uring , +started with the +.BR IORING_OP_READ +operation, issues the equivalent of the +.BR read (2) +system call. In practice, it mixes the semantics of +.BR pread (2) +and +.BR preadv2 (2) +in that it takes an explicit offset, and supports using -1 for the offset to +indicate that the current file position should be used instead of passing in +an explicit offset. See the opcode documentation for more details. Given that +io_uring is an async interface, +.I errno +is never used for passing back error information. Instead, +.I res +will contain what the equivalent system call would have returned in case +of success, and in case of error +.I res +will contain +.I -errno . +For example, if the normal read system call would have returned -1 and set +.I errno +to +.B EINVAL , +then +.I res +would contain +.B -EINVAL . +If the normal system call would have returned a read size of 1024, then +.I res +would contain 1024. +.IP \(bu +Optionally, +.BR io_uring_enter (2) +can also wait for a specified number of requests to be processed by the kernel +before it returns. +If you specified a certain number of completions to wait for, +the kernel would have placed at least those many number of CQEs on the CQ, +which you can then readily read, +right after the return from +.BR io_uring_enter (2). +.IP \(bu +It is important to remember that I/O requests submitted to the kernel can +complete in any order. +It is not necessary for the kernel to process one request after another, +in the order you placed them. +Given that the interface is a ring, +the requests are attempted in order, +however that doesn't imply any sort of ordering on their completion. +When more than one request is in flight, +it is not possible to determine which one will complete first. +When you dequeue CQEs off the CQ, +you should always check which submitted request it corresponds to. +The most common method for doing so is utilizing the +.I user_data +field in the request, which is passed back on the completion side. +.PP +Adding to and reading from the queues: +.IP \(bu +You add SQEs to the tail of the SQ. +The kernel reads SQEs off the head of the queue. +.IP \(bu +The kernel adds CQEs to the tail of the CQ. +You read CQEs off the head of the queue. +.SS Submission queue polling +One of the goals of +.B io_uring +is to provide a means for efficient I/O. +To this end, +.B io_uring +supports a polling mode that lets you avoid the call to +.BR io_uring_enter (2), +which you use to inform the kernel that you have queued SQEs on to the SQ. +With SQ Polling, +.B io_uring +starts a kernel thread that polls the submission queue for any I/O +requests you submit by adding SQEs. +With SQ Polling enabled, +there is no need for you to call +.BR io_uring_enter (2), +letting you avoid the overhead of system calls. +A designated kernel thread dequeues SQEs off the SQ as you add them and +dispatches them for asynchronous processing. +.SS Setting up io_uring +.PP +The main steps in setting up +.B io_uring +consist of mapping in the shared buffers with +.BR mmap (2) +calls. +In the example program included in this man page, +the function +.BR app_setup_uring () +sets up +.B io_uring +with a QUEUE_DEPTH deep submission queue. +Pay attention to the 2 +.BR mmap (2) +calls that set up the shared submission and completion queues. +If your kernel is older than version 5.4, +three +.BR mmap(2) +calls are required. +.PP +.SS Submitting I/O requests +The process of submitting a request consists of describing the I/O +operation you need to get done using an +.B io_uring_sqe +structure instance. +These details describe the equivalent system call and its parameters. +Because the range of I/O operations Linux supports are very varied and the +.B io_uring_sqe +structure needs to be able to describe them, +it has several fields, +some packed into unions for space efficiency. +Here is a simplified version of struct +.B io_uring_sqe +with some of the most often used fields: +.PP +.in +4n +.EX +struct io_uring_sqe { + __u8 opcode; /* type of operation for this sqe */ + __s32 fd; /* file descriptor to do IO on */ + __u64 off; /* offset into file */ + __u64 addr; /* pointer to buffer or iovecs */ + __u32 len; /* buffer size or number of iovecs */ + __u64 user_data; /* data to be passed back at completion time */ + __u8 flags; /* IOSQE_ flags */ + ... +}; +.EE +.in + +Here is struct +.B io_uring_sqe +in full: + +.in +4n +.EX +struct io_uring_sqe { + __u8 opcode; /* type of operation for this sqe */ + __u8 flags; /* IOSQE_ flags */ + __u16 ioprio; /* ioprio for the request */ + __s32 fd; /* file descriptor to do IO on */ + union { + __u64 off; /* offset into file */ + __u64 addr2; + }; + union { + __u64 addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; + }; + __u32 len; /* buffer size or number of iovecs */ + union { + __kernel_rwf_t rw_flags; + __u32 fsync_flags; + __u16 poll_events; /* compatibility */ + __u32 poll32_events; /* word-reversed for BE */ + __u32 sync_range_flags; + __u32 msg_flags; + __u32 timeout_flags; + __u32 accept_flags; + __u32 cancel_flags; + __u32 open_flags; + __u32 statx_flags; + __u32 fadvise_advice; + __u32 splice_flags; + }; + __u64 user_data; /* data to be passed back at completion time */ + union { + struct { + /* pack this to avoid bogus arm OABI complaints */ + union { + /* index into fixed buffers, if used */ + __u16 buf_index; + /* for grouped buffer selection */ + __u16 buf_group; + } __attribute__((packed)); + /* personality to use, if used */ + __u16 personality; + __s32 splice_fd_in; + }; + __u64 __pad2[3]; + }; +}; +.EE +.in +.PP +To submit an I/O request to +.BR io_uring , +you need to acquire a submission queue entry (SQE) from the submission +queue (SQ), +fill it up with details of the operation you want to submit and call +.BR io_uring_enter (2). +There are helper functions of the form io_uring_prep_X to enable proper +setup of the SQE. If you want to avoid calling +.BR io_uring_enter (2), +you have the option of setting up Submission Queue Polling. +.PP +SQEs are added to the tail of the submission queue. +The kernel picks up SQEs off the head of the SQ. +The general algorithm to get the next available SQE and update the tail is +as follows. +.PP +.in +4n +.EX +struct io_uring_sqe *sqe; +unsigned tail, index; +tail = *sqring->tail; +index = tail & (*sqring->ring_mask); +sqe = &sqring->sqes[index]; +/* fill up details about this I/O request */ +describe_io(sqe); +/* fill the sqe index into the SQ ring array */ +sqring->array[index] = index; +tail++; +atomic_store_release(sqring->tail, tail); +.EE +.in +.PP +To get the index of an entry, +the application must mask the current tail index with the size mask of the +ring. +This holds true for both SQs and CQs. +Once the SQE is acquired, +the necessary fields are filled in, +describing the request. +While the CQ ring directly indexes the shared array of CQEs, +the submission side has an indirection array between them. +The submission side ring buffer is an index into this array, +which in turn contains the index into the SQEs. +.PP +The following code snippet demonstrates how a read operation, +an equivalent of a +.BR preadv2 (2) +system call is described by filling up an SQE with the necessary +parameters. +.PP +.in +4n +.EX +struct iovec iovecs[16]; + ... +sqe->opcode = IORING_OP_READV; +sqe->fd = fd; +sqe->addr = (unsigned long) iovecs; +sqe->len = 16; +sqe->off = offset; +sqe->flags = 0; +.EE +.in +.TP +.B Memory ordering +Modern compilers and CPUs freely reorder reads and writes without +affecting the program's outcome to optimize performance. +Some aspects of this need to be kept in mind on SMP systems since +.B io_uring +involves buffers shared between kernel and user space. +These buffers are both visible and modifiable from kernel and user space. +As heads and tails belonging to these shared buffers are updated by kernel +and user space, +changes need to be coherently visible on either side, +irrespective of whether a CPU switch took place after the kernel-user mode +switch happened. +We use memory barriers to enforce this coherency. +Being significantly large subjects on their own, +memory barriers are out of scope for further discussion on this man page. +.TP +.B Letting the kernel know about I/O submissions +Once you place one or more SQEs on to the SQ, +you need to let the kernel know that you've done so. +You can do this by calling the +.BR io_uring_enter (2) +system call. +This system call is also capable of waiting for a specified count of +events to complete. +This way, +you can be sure to find completion events in the completion queue without +having to poll it for events later. +.SS Reading completion events +Similar to the submission queue (SQ), +the completion queue (CQ) is a shared buffer between the kernel and user +space. +Whereas you placed submission queue entries on the tail of the SQ and the +kernel read off the head, +when it comes to the CQ, +the kernel places completion queue events or CQEs on the tail of the CQ and +you read off its head. +.PP +Submission is flexible (and thus a bit more complicated) since it needs to +be able to encode different types of system calls that take various +parameters. +Completion, +on the other hand is simpler since we're looking only for a return value +back from the kernel. +This is easily understood by looking at the completion queue event +structure, +struct +.BR io_uring_cqe : +.PP +.in +4n +.EX +struct io_uring_cqe { + __u64 user_data; /* sqe->data submission passed back */ + __s32 res; /* result code for this event */ + __u32 flags; +}; +.EE +.in +.PP +Here, +.I user_data +is custom data that is passed unchanged from submission to completion. +That is, +from SQEs to CQEs. +This field can be used to set context, +uniquely identifying submissions that got completed. +Given that I/O requests can complete in any order, +this field can be used to correlate a submission with a completion. +.I res +is the result from the system call that was performed as part of the +submission; +its return value. + +The +.I flags +field carries request-specific information. As of the 6.0 kernel, the following +flags are defined: + +.TP +.B IORING_CQE_F_BUFFER +If set, the upper 16 bits of the flags field carries the buffer ID that was +chosen for this request. The request must have been issued with +.B IOSQE_BUFFER_SELECT +set, and used with a request type that supports buffer selection. Additionally, +buffers must have been provided upfront either via the +.B IORING_OP_PROVIDE_BUFFERS +or the +.B IORING_REGISTER_PBUF_RING +methods. +.TP +.B IORING_CQE_F_MORE +If set, the application should expect more completions from the request. This +is used for requests that can generate multiple completions, such as multi-shot +requests, receive, or accept. +.TP +.B IORING_CQE_F_SOCK_NONEMPTY +If set, upon receiving the data from the socket in the current request, the +socket still had data left on completion of this request. +.TP +.B IORING_CQE_F_NOTIF +Set for notification CQEs, as seen with the zero-copy networking send and +receive support. +.PP +The general sequence to read completion events off the completion queue is +as follows: +.PP +.in +4n +.EX +unsigned head; +head = *cqring->head; +if (head != atomic_load_acquire(cqring->tail)) { + struct io_uring_cqe *cqe; + unsigned index; + index = head & (cqring->mask); + cqe = &cqring->cqes[index]; + /* process completed CQE */ + process_cqe(cqe); + /* CQE consumption complete */ + head++; +} +atomic_store_release(cqring->head, head); +.EE +.in +.PP +It helps to be reminded that the kernel adds CQEs to the tail of the CQ, +while you need to dequeue them off the head. +To get the index of an entry at the head, +the application must mask the current head index with the size mask of the +ring. +Once the CQE has been consumed or processed, +the head needs to be updated to reflect the consumption of the CQE. +Attention should be paid to the read and write barriers to ensure +successful read and update of the head. +.SS io_uring performance +Because of the shared ring buffers between kernel and user space, +.B io_uring +can be a zero-copy system. +Copying buffers to and from becomes necessary when system calls that +transfer data between kernel and user space are involved. +But since the bulk of the communication in +.B io_uring +is via buffers shared between the kernel and user space, +this huge performance overhead is completely avoided. +.PP +While system calls may not seem like a significant overhead, +in high performance applications, +making a lot of them will begin to matter. +While workarounds the operating system has in place to deal with Spectre +and Meltdown are ideally best done away with, +unfortunately, +some of these workarounds are around the system call interface, +making system calls not as cheap as before on affected hardware. +While newer hardware should not need these workarounds, +hardware with these vulnerabilities can be expected to be in the wild for a +long time. +While using synchronous programming interfaces or even when using +asynchronous programming interfaces under Linux, +there is at least one system call involved in the submission of each +request. +In +.BR io_uring , +on the other hand, +you can batch several requests in one go, +simply by queueing up multiple SQEs, +each describing an I/O operation you want and make a single call to +.BR io_uring_enter (2). +This is possible due to +.BR io_uring 's +shared buffers based design. +.PP +While this batching in itself can avoid the overhead associated with +potentially multiple and frequent system calls, +you can reduce even this overhead further with Submission Queue Polling, +by having the kernel poll and pick up your SQEs for processing as you add +them to the submission queue. This avoids the +.BR io_uring_enter (2) +call you need to make to tell the kernel to pick SQEs up. +For high-performance applications, +this means even fewer system call overheads. +.SH CONFORMING TO +.B io_uring +is Linux-specific. +.SH EXAMPLES +The following example uses +.B io_uring +to copy stdin to stdout. +Using shell redirection, +you should be able to copy files with this example. +Because it uses a queue depth of only one, +this example processes I/O requests one after the other. +It is purposefully kept this way to aid understanding. +In real-world scenarios however, +you'll want to have a larger queue depth to parallelize I/O request +processing so as to gain the kind of performance benefits +.B io_uring +provides with its asynchronous processing of requests. +.PP +.EX +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define QUEUE_DEPTH 1 +#define BLOCK_SZ 1024 + +/* Macros for barriers needed by io_uring */ +#define io_uring_smp_store_release(p, v) \\ + atomic_store_explicit((_Atomic typeof(*(p)) *)(p), (v), \\ + memory_order_release) +#define io_uring_smp_load_acquire(p) \\ + atomic_load_explicit((_Atomic typeof(*(p)) *)(p), \\ + memory_order_acquire) + +int ring_fd; +unsigned *sring_tail, *sring_mask, *sring_array, + *cring_head, *cring_tail, *cring_mask; +struct io_uring_sqe *sqes; +struct io_uring_cqe *cqes; +char buff[BLOCK_SZ]; +off_t offset; + +/* + * System call wrappers provided since glibc does not yet + * provide wrappers for io_uring system calls. +* */ + +int io_uring_setup(unsigned entries, struct io_uring_params *p) +{ + return (int) syscall(__NR_io_uring_setup, entries, p); +} + +int io_uring_enter(int ring_fd, unsigned int to_submit, + unsigned int min_complete, unsigned int flags) +{ + return (int) syscall(__NR_io_uring_enter, ring_fd, to_submit, + min_complete, flags, NULL, 0); +} + +int app_setup_uring(void) { + struct io_uring_params p; + void *sq_ptr, *cq_ptr; + + /* See io_uring_setup(2) for io_uring_params.flags you can set */ + memset(&p, 0, sizeof(p)); + ring_fd = io_uring_setup(QUEUE_DEPTH, &p); + if (ring_fd < 0) { + perror("io_uring_setup"); + return 1; + } + + /* + * io_uring communication happens via 2 shared kernel-user space ring + * buffers, which can be jointly mapped with a single mmap() call in + * kernels >= 5.4. + */ + + int sring_sz = p.sq_off.array + p.sq_entries * sizeof(unsigned); + int cring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe); + + /* Rather than check for kernel version, the recommended way is to + * check the features field of the io_uring_params structure, which is a + * bitmask. If IORING_FEAT_SINGLE_MMAP is set, we can do away with the + * second mmap() call to map in the completion ring separately. + */ + if (p.features & IORING_FEAT_SINGLE_MMAP) { + if (cring_sz > sring_sz) + sring_sz = cring_sz; + cring_sz = sring_sz; + } + + /* Map in the submission and completion queue ring buffers. + * Kernels < 5.4 only map in the submission queue, though. + */ + sq_ptr = mmap(0, sring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, + ring_fd, IORING_OFF_SQ_RING); + if (sq_ptr == MAP_FAILED) { + perror("mmap"); + return 1; + } + + if (p.features & IORING_FEAT_SINGLE_MMAP) { + cq_ptr = sq_ptr; + } else { + /* Map in the completion queue ring buffer in older kernels separately */ + cq_ptr = mmap(0, cring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, + ring_fd, IORING_OFF_CQ_RING); + if (cq_ptr == MAP_FAILED) { + perror("mmap"); + return 1; + } + } + /* Save useful fields for later easy reference */ + sring_tail = sq_ptr + p.sq_off.tail; + sring_mask = sq_ptr + p.sq_off.ring_mask; + sring_array = sq_ptr + p.sq_off.array; + + /* Map in the submission queue entries array */ + sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, + ring_fd, IORING_OFF_SQES); + if (sqes == MAP_FAILED) { + perror("mmap"); + return 1; + } + + /* Save useful fields for later easy reference */ + cring_head = cq_ptr + p.cq_off.head; + cring_tail = cq_ptr + p.cq_off.tail; + cring_mask = cq_ptr + p.cq_off.ring_mask; + cqes = cq_ptr + p.cq_off.cqes; + + return 0; +} + +/* +* Read from completion queue. +* In this function, we read completion events from the completion queue. +* We dequeue the CQE, update and head and return the result of the operation. +* */ + +int read_from_cq() { + struct io_uring_cqe *cqe; + unsigned head; + + /* Read barrier */ + head = io_uring_smp_load_acquire(cring_head); + /* + * Remember, this is a ring buffer. If head == tail, it means that the + * buffer is empty. + * */ + if (head == *cring_tail) + return -1; + + /* Get the entry */ + cqe = &cqes[head & (*cring_mask)]; + if (cqe->res < 0) + fprintf(stderr, "Error: %s\\n", strerror(abs(cqe->res))); + + head++; + + /* Write barrier so that update to the head are made visible */ + io_uring_smp_store_release(cring_head, head); + + return cqe->res; +} + +/* +* Submit a read or a write request to the submission queue. +* */ + +int submit_to_sq(int fd, int op) { + unsigned index, tail; + + /* Add our submission queue entry to the tail of the SQE ring buffer */ + tail = *sring_tail; + index = tail & *sring_mask; + struct io_uring_sqe *sqe = &sqes[index]; + /* Fill in the parameters required for the read or write operation */ + sqe->opcode = op; + sqe->fd = fd; + sqe->addr = (unsigned long) buff; + if (op == IORING_OP_READ) { + memset(buff, 0, sizeof(buff)); + sqe->len = BLOCK_SZ; + } + else { + sqe->len = strlen(buff); + } + sqe->off = offset; + + sring_array[index] = index; + tail++; + + /* Update the tail */ + io_uring_smp_store_release(sring_tail, tail); + + /* + * Tell the kernel we have submitted events with the io_uring_enter() + * system call. We also pass in the IOURING_ENTER_GETEVENTS flag which + * causes the io_uring_enter() call to wait until min_complete + * (the 3rd param) events complete. + * */ + int ret = io_uring_enter(ring_fd, 1,1, + IORING_ENTER_GETEVENTS); + if(ret < 0) { + perror("io_uring_enter"); + return -1; + } + + return ret; +} + +int main(int argc, char *argv[]) { + int res; + + /* Setup io_uring for use */ + if(app_setup_uring()) { + fprintf(stderr, "Unable to setup uring!\\n"); + return 1; + } + + /* + * A while loop that reads from stdin and writes to stdout. + * Breaks on EOF. + */ + while (1) { + /* Initiate read from stdin and wait for it to complete */ + submit_to_sq(STDIN_FILENO, IORING_OP_READ); + /* Read completion queue entry */ + res = read_from_cq(); + if (res > 0) { + /* Read successful. Write to stdout. */ + submit_to_sq(STDOUT_FILENO, IORING_OP_WRITE); + read_from_cq(); + } else if (res == 0) { + /* reached EOF */ + break; + } + else if (res < 0) { + /* Error reading file */ + fprintf(stderr, "Error: %s\\n", strerror(abs(res))); + break; + } + offset += res; + } + + return 0; +} +.EE +.SH SEE ALSO +.BR io_uring_enter (2) +.BR io_uring_register (2) +.BR io_uring_setup (2) diff --git a/man/io_uring_buf_ring_add.3 b/man/io_uring_buf_ring_add.3 new file mode 100644 index 0000000000000000000000000000000000000000..9d8283baa2384ea78f7bee086f57b26ff99980a0 --- /dev/null +++ b/man/io_uring_buf_ring_add.3 @@ -0,0 +1,53 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_buf_ring_add 3 "May 18, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_buf_ring_add \- add buffers to a shared buffer ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_buf_ring_add(struct io_uring_buf_ring *" br ", +.BI " void *" addr ", +.BI " unsigned int " len ", +.BI " unsigned short " bid ", +.BI " int " mask ", +.BI " int " buf_offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_buf_ring_add (3) +adds a new buffer to the shared buffer ring +.IR br . +The buffer address is indicated by +.I addr +and is of +.I len +bytes of length. +.I bid +is the buffer ID, which will be returned in the CQE. +.I mask +is the size mask of the ring, available from +.BR io_uring_buf_ring_mask (3) . +.I buf_offset +is the offset to insert at from the current tail. If just one buffer is provided +before the ring tail is committed with +.BR io_uring_buf_ring_advance (3) +or +.BR io_uring_buf_ring_cq_advance (3), +then +.I buf_offset +should be 0. If buffers are provided in a loop before being committed, the +.I buf_offset +must be incremented by one for each buffer added. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_mask (3), +.BR io_uring_buf_ring_advance (3), +.BR io_uring_buf_ring_cq_advance (3) diff --git a/man/io_uring_buf_ring_advance.3 b/man/io_uring_buf_ring_advance.3 new file mode 100644 index 0000000000000000000000000000000000000000..f2dc90b5d0a3c921b6508f4ed4f6d2f1f1275548 --- /dev/null +++ b/man/io_uring_buf_ring_advance.3 @@ -0,0 +1,31 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_buf_ring_advance 3 "May 18, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_buf_ring_advance \- advance index of provided buffer in buffer ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_buf_ring_advance(struct io_uring_buf_ring *" br ", +.BI " int " count ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_buf_ring_advance (3) +commits +.I count +previously added buffers to the shared buffer ring +.IR br , +making them visible to the kernel and hence consumable. This passes ownership +of the buffer to the ring. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_add (3), +.BR io_uring_buf_ring_cq_advance (3) diff --git a/man/io_uring_buf_ring_cq_advance.3 b/man/io_uring_buf_ring_cq_advance.3 new file mode 100644 index 0000000000000000000000000000000000000000..4967a845d98b61074252b32bef3df991857ce3d8 --- /dev/null +++ b/man/io_uring_buf_ring_cq_advance.3 @@ -0,0 +1,41 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_buf_ring_cq_advance 3 "May 18, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_buf_ring_cq_advance \- advance index of provided buffer and CQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_buf_ring_cq_advance(struct io_uring *" ring ", +.BI " struct io_uring_buf_ring *" br ", +.BI " int " count ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_buf_ring_cq_advance (3) +commits +.I count +previously added buffers to the shared buffer ring +.IR br , +making them visible to the kernel and hence consumable. This passes ownership +of the buffer to the ring. At the same time, it advances the CQ ring of +.I ring +by +.I count +amount. This effectively bundles both a +.BR io_uring_buf_ring_advance (3) +call and a +.BR io_uring_cq_avance (3) +into one operation. Since updating either ring index entails a store memory +barrier, doing both at once is more efficient. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_add (3), +.BR io_uring_buf_ring_advance (3) diff --git a/man/io_uring_buf_ring_init.3 b/man/io_uring_buf_ring_init.3 new file mode 100644 index 0000000000000000000000000000000000000000..50cf69a7bbde9c26b1f9edf94dd69a0177711e1d --- /dev/null +++ b/man/io_uring_buf_ring_init.3 @@ -0,0 +1,30 @@ +.\" Copyright (C) 2022 Dylan Yudaken +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_buf_ring_init 3 "June 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_buf_ring_init \- Initialise a buffer ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_buf_ring_init(struct io_uring_buf_ring *" br ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_buf_ring_init (3) +initialises +.IR br +so that it is ready to be used. It may be called after +.BR io_uring_register_buf_ring (3) +but must be called before the buffer ring is used in any other way. + +.SH RETURN VALUE +None + +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_add (3) +.BR io_uring_buf_ring_advance (3), +.BR io_uring_buf_ring_cq_advance (3) diff --git a/man/io_uring_buf_ring_mask.3 b/man/io_uring_buf_ring_mask.3 new file mode 100644 index 0000000000000000000000000000000000000000..9160663053fdfb755bd706158bb469c859905327 --- /dev/null +++ b/man/io_uring_buf_ring_mask.3 @@ -0,0 +1,27 @@ +.\" Copyright (C) 2022 Dylan Yudaken +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_buf_ring_mask 3 "June 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_buf_ring_mask \- Calculate buffer ring mask size +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_buf_ring_mask(__u32 " ring_entries ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_buf_ring_mask (3) +calculates the appropriate size mask for a buffer ring. +.IR ring_entries +is the ring entries as specified in +.BR io_uring_register_buf_ring (3) . + +.SH RETURN VALUE +Size mask for the buffer ring. + +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_add (3) diff --git a/man/io_uring_cq_advance.3 b/man/io_uring_cq_advance.3 new file mode 100644 index 0000000000000000000000000000000000000000..fae257223cf482171957dfbed6a921e93ca0e360 --- /dev/null +++ b/man/io_uring_cq_advance.3 @@ -0,0 +1,49 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_cq_advance 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_cq_advance \- mark one or more io_uring completion events as consumed +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_cq_advance(struct io_uring *" ring "," +.BI " unsigned " nr ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_cq_advance (3) +function marks +.I nr +IO completions belonging to the +.I ring +param as consumed. + +After the caller has submitted a request with +.BR io_uring_submit (3), +the application can retrieve the completion with +.BR io_uring_wait_cqe (3), +.BR io_uring_peek_cqe (3), +or any of the other CQE retrieval helpers, and mark it as consumed with +.BR io_uring_cqe_seen (3). + +The function +.BR io_uring_cqe_seen (3) +calls the function +.BR io_uring_cq_advance (3). + +Completions must be marked as seen, so their slot can get reused. Failure to do +so will result in the same completion being returned on the next invocation. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe (3), +.BR io_uring_peek_cqe (3), +.BR io_uring_wait_cqes (3), +.BR io_uring_wait_cqe_timeout (3), +.BR io_uring_cqe_seen (3) diff --git a/man/io_uring_cq_has_overflow.3 b/man/io_uring_cq_has_overflow.3 new file mode 100644 index 0000000000000000000000000000000000000000..e5b352a4f86a36a05e6e1cfc0204d3fb5f336878 --- /dev/null +++ b/man/io_uring_cq_has_overflow.3 @@ -0,0 +1,25 @@ +.\" Copyright (C) 2022 Dylan Yudaken +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_cq_has_overflow 3 "September 5, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_cq_has_overflow \- returns if there are overflow entries waiting to move to the CQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "bool io_uring_cq_has_overflow(const struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_cq_has_overflow (3) +function informs the application if CQ entries have overflowed and are waiting to be flushed to +the CQ ring. For example using +.BR io_uring_get_events (3) +. +.SH RETURN VALUE +True if there are CQ entries waiting to be flushed to the CQ ring. +.SH SEE ALSO +.BR io_uring_get_events (3) diff --git a/man/io_uring_cq_ready.3 b/man/io_uring_cq_ready.3 new file mode 100644 index 0000000000000000000000000000000000000000..641828a840d3f047138d87d17dc3eb8198b81c51 --- /dev/null +++ b/man/io_uring_cq_ready.3 @@ -0,0 +1,26 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_cq_ready 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_cq_ready \- returns number of unconsumed ready entries in the CQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "unsigned io_uring_cq_ready(const struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_cq_ready (3) +function returns the number of unconsumed entries that are ready belonging to the +.I ring +param. + +.SH RETURN VALUE +Returns the number of unconsumed ready entries in the CQ ring. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe (3) diff --git a/man/io_uring_cqe_get_data.3 b/man/io_uring_cqe_get_data.3 new file mode 100644 index 0000000000000000000000000000000000000000..4cbb32cd864e12c28fcfe2e4c48adf5f2fbe86ff --- /dev/null +++ b/man/io_uring_cqe_get_data.3 @@ -0,0 +1,53 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_cqe_get_data 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_cqe_get_data \- get user data for completion event +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void *io_uring_cqe_get_data(struct io_uring_cqe *" cqe ");" +.BI " +.BI "__u64 io_uring_cqe_get_data64(struct io_uring_cqe *" cqe ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_cqe_get_data (3) +function returns the user_data with the completion queue entry +.IR cqe +as a data pointer. + +The +.BR io_uring_cqe_get_data64 (3) +function returns the user_data with the completion queue entry +.IR cqe +as a 64-bit data value. + +After the caller has received a completion queue entry (CQE) with +.BR io_uring_wait_cqe (3), +the application can call +.BR io_uring_cqe_get_data (3) +or +.BR io_uring_cqe_get_data64 (3) +function to retrieve the +.I user_data +value. This requires that +.I user_data +has been set earlier with the function +.BR io_uring_sqe_set_data (3) +or +.BR io_uring_sqe_set_data64 (3). + +.SH RETURN VALUE +If the +.I user_data +value has been set before submitting the request, it will be returned. +Otherwise the functions returns NULL. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_sqe_set_data (3), +.BR io_uring_sqe_submit (3) diff --git a/man/io_uring_cqe_get_data64.3 b/man/io_uring_cqe_get_data64.3 new file mode 120000 index 0000000000000000000000000000000000000000..51991c2a145ee1c450b25a4cb7ee3f5ba71ce191 --- /dev/null +++ b/man/io_uring_cqe_get_data64.3 @@ -0,0 +1 @@ +io_uring_cqe_get_data.3 \ No newline at end of file diff --git a/man/io_uring_cqe_seen.3 b/man/io_uring_cqe_seen.3 new file mode 100644 index 0000000000000000000000000000000000000000..d2f2984601652b298f5103887047b81a2786cfac --- /dev/null +++ b/man/io_uring_cqe_seen.3 @@ -0,0 +1,42 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_cqe_seen 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_cqe_seen \- mark io_uring completion event as consumed +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_cqe_seen(struct io_uring *" ring "," +.BI " struct io_uring_cqe *" cqe ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_cqe_seen (3) +function marks the IO completion +.I cqe +belonging to the +.I ring +param as consumed. + +After the caller has submitted a request with +.BR io_uring_submit (3), +the application can retrieve the completion with +.BR io_uring_wait_cqe (3), +.BR io_uring_peek_cqe (3), +or any of the other CQE retrieval helpers, and mark it as consumed with +.BR io_uring_cqe_seen (3). + +Completions must be marked as completed so their slot can get reused. +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe (3), +.BR io_uring_peek_cqe (3), +.BR io_uring_wait_cqes (3), +.BR io_uring_wait_cqe_timeout (3), +.BR io_uring_cqe_seen (3) diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2 new file mode 100644 index 0000000000000000000000000000000000000000..b337ab822307be318b672ba1338fcf39b6634c20 --- /dev/null +++ b/man/io_uring_enter.2 @@ -0,0 +1,1700 @@ +.\" Copyright (C) 2019 Jens Axboe +.\" Copyright (C) 2019 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_enter 2 2019-01-22 "Linux" "Linux Programmer's Manual" +.SH NAME +io_uring_enter \- initiate and/or complete asynchronous I/O +.SH SYNOPSIS +.nf +.BR "#include " +.PP +.BI "int io_uring_enter(unsigned int " fd ", unsigned int " to_submit , +.BI " unsigned int " min_complete ", unsigned int " flags , +.BI " sigset_t *" sig ); +.PP +.BI "int io_uring_enter2(unsigned int " fd ", unsigned int " to_submit , +.BI " unsigned int " min_complete ", unsigned int " flags , +.BI " sigset_t *" sig ", size_t " sz ); +.fi +.PP +.SH DESCRIPTION +.PP +.BR io_uring_enter (2) +is used to initiate and complete I/O using the shared submission and +completion queues setup by a call to +.BR io_uring_setup (2). +A single call can both submit new I/O and wait for completions of I/O +initiated by this call or previous calls to +.BR io_uring_enter (2). + +.I fd +is the file descriptor returned by +.BR io_uring_setup (2). +.I to_submit +specifies the number of I/Os to submit from the submission queue. +.I flags +is a bitmask of the following values: +.TP +.B IORING_ENTER_GETEVENTS +If this flag is set, then the system call will wait for the specified +number of events in +.I min_complete +before returning. This flag can be set along with +.I to_submit +to both submit and complete events in a single system call. +.TP +.B IORING_ENTER_SQ_WAKEUP +If the ring has been created with +.B IORING_SETUP_SQPOLL, +then this flag asks the kernel to wakeup the SQ kernel thread to submit IO. +.TP +.B IORING_ENTER_SQ_WAIT +If the ring has been created with +.B IORING_SETUP_SQPOLL, +then the application has no real insight into when the SQ kernel thread has +consumed entries from the SQ ring. This can lead to a situation where the +application can no longer get a free SQE entry to submit, without knowing +when it one becomes available as the SQ kernel thread consumes them. If +the system call is used with this flag set, then it will wait until at least +one entry is free in the SQ ring. +.TP +.B IORING_ENTER_EXT_ARG +Since kernel 5.11, the system calls arguments have been modified to look like +the following: + +.nf +.BI "int io_uring_enter2(unsigned int " fd ", unsigned int " to_submit , +.BI " unsigned int " min_complete ", unsigned int " flags , +.BI " const void *" arg ", size_t " argsz ); +.fi + +which behaves just like the original definition by default. However, if +.B IORING_ENTER_EXT_ARG +is set, then instead of a +.I sigset_t +being passed in, a pointer to a +.I struct io_uring_getevents_arg +is used instead and +.I argsz +must be set to the size of this structure. The definition is as follows: + +.nf +.BI "struct io_uring_getevents_args { +.BI " __u64 sigmask; +.BI " __u32 sigmask_sz; +.BI " __u32 pad; +.BI " __u64 ts; +.BI "}; +.fi + +which allows passing in both a signal mask as well as pointer to a +.I struct __kernel_timespec +timeout value. If +.I ts +is set to a valid pointer, then this time value indicates the timeout for +waiting on events. If an application is waiting on events and wishes to +stop waiting after a specified amount of time, then this can be accomplished +directly in version 5.11 and newer by using this feature. +.TP +.B IORING_ENTER_REGISTERED_RING +If the ring file descriptor has been registered through use of +.B IORING_REGISTER_RING_FDS, +then setting this flag will tell the kernel that the +.I ring_fd +passed in is the registered ring offset rather than a normal file descriptor. + +.PP +.PP +If the io_uring instance was configured for polling, by specifying +.B IORING_SETUP_IOPOLL +in the call to +.BR io_uring_setup (2), +then min_complete has a slightly different meaning. Passing a value +of 0 instructs the kernel to return any events which are already complete, +without blocking. If +.I min_complete +is a non-zero value, the kernel will still return immediately if any +completion events are available. If no event completions are +available, then the call will poll either until one or more +completions become available, or until the process has exceeded its +scheduler time slice. + +Note that, for interrupt driven I/O (where +.B IORING_SETUP_IOPOLL +was not specified in the call to +.BR io_uring_setup (2)), +an application may check the completion queue for event completions +without entering the kernel at all. +.PP +When the system call returns that a certain amount of SQEs have been +consumed and submitted, it's safe to reuse SQE entries in the ring. This is +true even if the actual IO submission had to be punted to async context, +which means that the SQE may in fact not have been submitted yet. If the +kernel requires later use of a particular SQE entry, it will have made a +private copy of it. + +.I sig +is a pointer to a signal mask (see +.BR sigprocmask (2)); +if +.I sig +is not NULL, +.BR io_uring_enter (2) +first replaces the current signal mask by the one pointed to by +.IR sig , +then waits for events to become available in the completion queue, and +then restores the original signal mask. The following +.BR io_uring_enter (2) +call: +.PP +.in +4n +.EX +ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, &sig); +.EE +.in +.PP +is equivalent to +.I atomically +executing the following calls: +.PP +.in +4n +.EX +pthread_sigmask(SIG_SETMASK, &sig, &orig); +ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, NULL); +pthread_sigmask(SIG_SETMASK, &orig, NULL); +.EE +.in +.PP +See the description of +.BR pselect (2) +for an explanation of why the +.I sig +parameter is necessary. + +Submission queue entries are represented using the following data +structure: +.PP +.in +4n +.EX +/* + * IO submission data structure (Submission Queue Entry) + */ +struct io_uring_sqe { + __u8 opcode; /* type of operation for this sqe */ + __u8 flags; /* IOSQE_ flags */ + __u16 ioprio; /* ioprio for the request */ + __s32 fd; /* file descriptor to do IO on */ + union { + __u64 off; /* offset into file */ + __u64 addr2; + }; + union { + __u64 addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; + } + __u32 len; /* buffer size or number of iovecs */ + union { + __kernel_rwf_t rw_flags; + __u32 fsync_flags; + __u16 poll_events; /* compatibility */ + __u32 poll32_events; /* word-reversed for BE */ + __u32 sync_range_flags; + __u32 msg_flags; + __u32 timeout_flags; + __u32 accept_flags; + __u32 cancel_flags; + __u32 open_flags; + __u32 statx_flags; + __u32 fadvise_advice; + __u32 splice_flags; + __u32 rename_flags; + __u32 unlink_flags; + __u32 hardlink_flags; + }; + __u64 user_data; /* data to be passed back at completion time */ + union { + struct { + /* index into fixed buffers, if used */ + union { + /* index into fixed buffers, if used */ + __u16 buf_index; + /* for grouped buffer selection */ + __u16 buf_group; + } + /* personality to use, if used */ + __u16 personality; + union { + __s32 splice_fd_in; + __u32 file_index; + }; + }; + __u64 __pad2[3]; + }; +}; +.EE +.in +.PP +The +.I opcode +describes the operation to be performed. It can be one of: +.TP +.B IORING_OP_NOP +Do not perform any I/O. This is useful for testing the performance of +the io_uring implementation itself. +.TP +.B IORING_OP_READV +.TP +.B IORING_OP_WRITEV +Vectored read and write operations, similar to +.BR preadv2 (2) +and +.BR pwritev2 (2). +If the file is not seekable, +.I off +must be set to zero or -1. + +.TP +.B IORING_OP_READ_FIXED +.TP +.B IORING_OP_WRITE_FIXED +Read from or write to pre-mapped buffers. See +.BR io_uring_register (2) +for details on how to setup a context for fixed reads and writes. + +.TP +.B IORING_OP_FSYNC +File sync. See also +.BR fsync (2). +Note that, while I/O is initiated in the order in which it appears in +the submission queue, completions are unordered. For example, an +application which places a write I/O followed by an fsync in the +submission queue cannot expect the fsync to apply to the write. The +two operations execute in parallel, so the fsync may complete before +the write is issued to the storage. The same is also true for +previously issued writes that have not completed prior to the fsync. + +.TP +.B IORING_OP_POLL_ADD +Poll the +.I fd +specified in the submission queue entry for the events +specified in the +.I poll_events +field. Unlike poll or epoll without +.BR EPOLLONESHOT , +by default this interface always works in one shot mode. That is, once the poll +operation is completed, it will have to be resubmitted. + +If +.B IORING_POLL_ADD_MULTI +is set in the SQE +.I len +field, then the poll will work in multi shot mode instead. That means it'll +repatedly trigger when the requested event becomes true, and hence multiple +CQEs can be generated from this single SQE. The CQE +.I flags +field will have +.B IORING_CQE_F_MORE +set on completion if the application should expect further CQE entries from +the original request. If this flag isn't set on completion, then the poll +request has been terminated and no further events will be generated. This mode +is available since 5.13. + +If +.B IORING_POLL_UPDATE_EVENTS +is set in the SQE +.I len +field, then the request will update an existing poll request with the mask of +events passed in with this request. The lookup is based on the +.I user_data +field of the original SQE submitted, and this values is passed in the +.I addr +field of the SQE. This mode is available since 5.13. + +If +.B IORING_POLL_UPDATE_USER_DATA +is set in the SQE +.I len +field, then the request will update the +.I user_data +of an existing poll request based on the value passed in the +.I off +field. This mode is available since 5.13. + +This command works like +an async +.BR poll(2) +and the completion event result is the returned mask of events. For the +variants that update +.I user_data +or +.I events +, the completion result will be similar to +.B IORING_OP_POLL_REMOVE. + +.TP +.B IORING_OP_POLL_REMOVE +Remove an existing poll request. If found, the +.I res +field of the +.I "struct io_uring_cqe" +will contain 0. If not found, +.I res +will contain +.B -ENOENT, +or +.B -EALREADY +if the poll request was in the process of completing already. + +.TP +.B IORING_OP_EPOLL_CTL +Add, remove or modify entries in the interest list of +.BR epoll (7). +See +.BR epoll_ctl (2) +for details of the system call. +.I fd +holds the file descriptor that represents the epoll instance, +.I addr +holds the file descriptor to add, remove or modify, +.I len +holds the operation (EPOLL_CTL_ADD, EPOLL_CTL_DEL, EPOLL_CTL_MOD) to perform and, +.I off +holds a pointer to the +.I epoll_events +structure. Available since 5.6. + +.TP +.B IORING_OP_SYNC_FILE_RANGE +Issue the equivalent of a \fBsync_file_range\fR (2) on the file descriptor. The +.I fd +field is the file descriptor to sync, the +.I off +field holds the offset in bytes, the +.I len +field holds the length in bytes, and the +.I sync_range_flags +field holds the flags for the command. See also +.BR sync_file_range (2) +for the general description of the related system call. Available since 5.2. + +.TP +.B IORING_OP_SENDMSG +Issue the equivalent of a +.BR sendmsg(2) +system call. +.I fd +must be set to the socket file descriptor, +.I addr +must contain a pointer to the msghdr structure, and +.I msg_flags +holds the flags associated with the system call. See also +.BR sendmsg (2) +for the general description of the related system call. Available since 5.3. + +This command also supports the following modifiers in +.I ioprio: + +.PP +.in +12 +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently full and attempting to +send data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a send of the data when there is enough space available. +This initial send attempt can be wasteful for the case where the socket +is expected to be full, setting this flag will bypass the initial send +attempt and go straight to arming poll. If poll does indicate that data can +be sent, the operation will proceed. +.EE +.in +.PP + +.TP +.B IORING_OP_RECVMSG +Works just like IORING_OP_SENDMSG, except for +.BR recvmsg(2) +instead. See the description of IORING_OP_SENDMSG. Available since 5.3. + +This command also supports the following modifiers in +.I ioprio: + +.PP +.in +12 +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently empty and attempting to +receive data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a receive of the data when the socket has data to be read. +This initial receive attempt can be wasteful for the case where the socket +is expected to be empty, setting this flag will bypass the initial receive +attempt and go straight to arming poll. If poll does indicate that data is +ready to be received, the operation will proceed. +.EE +.in +.PP + +.TP +.B IORING_OP_SEND +Issue the equivalent of a +.BR send(2) +system call. +.I fd +must be set to the socket file descriptor, +.I addr +must contain a pointer to the buffer, +.I len +denotes the length of the buffer to send, and +.I msg_flags +holds the flags associated with the system call. See also +.BR send(2) +for the general description of the related system call. Available since 5.6. + +This command also supports the following modifiers in +.I ioprio: + +.PP +.in +12 +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently full and attempting to +send data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a send of the data when there is enough space available. +This initial send attempt can be wasteful for the case where the socket +is expected to be full, setting this flag will bypass the initial send +attempt and go straight to arming poll. If poll does indicate that data can +be sent, the operation will proceed. +.EE +.in +.PP + +.TP +.B IORING_OP_RECV +Works just like IORING_OP_SEND, except for +.BR recv(2) +instead. See the description of IORING_OP_SEND. Available since 5.6. + +This command also supports the following modifiers in +.I ioprio: + +.PP +.in +12 +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently empty and attempting to +receive data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a receive of the data when the socket has data to be read. +This initial receive attempt can be wasteful for the case where the socket +is expected to be empty, setting this flag will bypass the initial receive +attempt and go straight to arming poll. If poll does indicate that data is +ready to be received, the operation will proceed. +.EE +.in +.PP + +.TP +.B IORING_OP_TIMEOUT +This command will register a timeout operation. The +.I addr +field must contain a pointer to a struct timespec64 structure, +.I len +must contain 1 to signify one timespec64 structure, +.I timeout_flags +may contain IORING_TIMEOUT_ABS +for an absolute timeout value, or 0 for a relative timeout. +.I off +may contain a completion event count. A timeout +will trigger a wakeup event on the completion ring for anyone waiting for +events. A timeout condition is met when either the specified timeout expires, +or the specified number of events have completed. Either condition will +trigger the event. If set to 0, completed events are not counted, which +effectively acts like a timer. io_uring timeouts use the +.B CLOCK_MONOTONIC +clock source. The request will complete with +.I -ETIME +if the timeout got completed through expiration of the timer, or +.I 0 +if the timeout got completed through requests completing on their own. If +the timeout was canceled before it expired, the request will complete with +.I -ECANCELED. +Available since 5.4. + +Since 5.15, this command also supports the following modifiers in +.I timeout_flags: + +.PP +.in +12 +.B IORING_TIMEOUT_BOOTTIME +If set, then the clocksource used is +.I CLOCK_BOOTTIME +instead of +.I CLOCK_MONOTONIC. +This clocksource differs in that it includes time elapsed if the system was +suspend while having a timeout request in-flight. + +.B IORING_TIMEOUT_REALTIME +If set, then the clocksource used is +.I CLOCK_REALTIME +instead of +.I CLOCK_MONOTONIC. +.EE +.in +.PP + +.TP +.B IORING_OP_TIMEOUT_REMOVE +If +.I timeout_flags are zero, then it attempts to remove an existing timeout +operation. +.I addr +must contain the +.I user_data +field of the previously issued timeout operation. If the specified timeout +request is found and canceled successfully, this request will terminate +with a result value of +.I 0 +If the timeout request was found but expiration was already in progress, +this request will terminate with a result value of +.I -EBUSY +If the timeout request wasn't found, the request will terminate with a result +value of +.I -ENOENT +Available since 5.5. + +If +.I timeout_flags +contain +.I IORING_TIMEOUT_UPDATE, +instead of removing an existing operation, it updates it. +.I addr +and return values are same as before. +.I addr2 +field must contain a pointer to a struct timespec64 structure. +.I timeout_flags +may also contain IORING_TIMEOUT_ABS, in which case the value given is an +absolute one, not a relative one. +Available since 5.11. + +.TP +.B IORING_OP_ACCEPT +Issue the equivalent of an +.BR accept4(2) +system call. +.I fd +must be set to the socket file descriptor, +.I addr +must contain the pointer to the sockaddr structure, and +.I addr2 +must contain a pointer to the socklen_t addrlen field. Flags can be passed using +the +.I accept_flags +field. See also +.BR accept4(2) +for the general description of the related system call. Available since 5.5. + +If the +.I file_index +field is set to a positive number, the file won't be installed into the +normal file table as usual but will be placed into the fixed file table at index +.I file_index - 1. +In this case, instead of returning a file descriptor, the result will contain +either 0 on success or an error. If the index points to a valid empty slot, the +installation is guaranteed to not fail. If there is already a file in the slot, +it will be replaced, similar to +.B IORING_OP_FILES_UPDATE. +Please note that only io_uring has access to such files and no other syscall +can use them. See +.B IOSQE_FIXED_FILE +and +.B IORING_REGISTER_FILES. + +Available since 5.5. + +.TP +.B IORING_OP_ASYNC_CANCEL +Attempt to cancel an already issued request. +.I addr +must contain the +.I user_data +field of the request that should be canceled. The cancelation request will +complete with one of the following results codes. If found, the +.I res +field of the cqe will contain 0. If not found, +.I res +will contain -ENOENT. If found and attempted canceled, the +.I res +field will contain -EALREADY. In this case, the request may or may not +terminate. In general, requests that are interruptible (like socket IO) will +get canceled, while disk IO requests cannot be canceled if already started. +Available since 5.5. + +.TP +.B IORING_OP_LINK_TIMEOUT +This request must be linked with another request through +.I IOSQE_IO_LINK +which is described below. Unlike +.I IORING_OP_TIMEOUT, +.I IORING_OP_LINK_TIMEOUT +acts on the linked request, not the completion queue. The format of the command +is otherwise like +.I IORING_OP_TIMEOUT, +except there's no completion event count as it's tied to a specific request. +If used, the timeout specified in the command will cancel the linked command, +unless the linked command completes before the timeout. The timeout will +complete with +.I -ETIME +if the timer expired and the linked request was attempted canceled, or +.I -ECANCELED +if the timer got canceled because of completion of the linked request. Like +.B IORING_OP_TIMEOUT +the clock source used is +.B CLOCK_MONOTONIC +Available since 5.5. + + +.TP +.B IORING_OP_CONNECT +Issue the equivalent of a +.BR connect(2) +system call. +.I fd +must be set to the socket file descriptor, +.I addr +must contain the const pointer to the sockaddr structure, and +.I off +must contain the socklen_t addrlen field. See also +.BR connect(2) +for the general description of the related system call. Available since 5.5. + +.TP +.B IORING_OP_FALLOCATE +Issue the equivalent of a +.BR fallocate(2) +system call. +.I fd +must be set to the file descriptor, +.I len +must contain the mode associated with the operation, +.I off +must contain the offset on which to operate, and +.I addr +must contain the length. See also +.BR fallocate(2) +for the general description of the related system call. Available since 5.6. + +.TP +.B IORING_OP_FADVISE +Issue the equivalent of a +.BR posix_fadvise(2) +system call. +.I fd +must be set to the file descriptor, +.I off +must contain the offset on which to operate, +.I len +must contain the length, and +.I fadvise_advice +must contain the advice associated with the operation. See also +.BR posix_fadvise(2) +for the general description of the related system call. Available since 5.6. + +.TP +.B IORING_OP_MADVISE +Issue the equivalent of a +.BR madvise(2) +system call. +.I addr +must contain the address to operate on, +.I len +must contain the length on which to operate, +and +.I fadvise_advice +must contain the advice associated with the operation. See also +.BR madvise(2) +for the general description of the related system call. Available since 5.6. + +.TP +.B IORING_OP_OPENAT +Issue the equivalent of a +.BR openat(2) +system call. +.I fd +is the +.I dirfd +argument, +.I addr +must contain a pointer to the +.I *pathname +argument, +.I open_flags +should contain any flags passed in, and +.I len +is access mode of the file. See also +.BR openat(2) +for the general description of the related system call. Available since 5.6. + +If the +.I file_index +field is set to a positive number, the file won't be installed into the +normal file table as usual but will be placed into the fixed file table at index +.I file_index - 1. +In this case, instead of returning a file descriptor, the result will contain +either 0 on success or an error. If the index points to a valid empty slot, the +installation is guaranteed to not fail. If there is already a file in the slot, +it will be replaced, similar to +.B IORING_OP_FILES_UPDATE. +Please note that only io_uring has access to such files and no other syscall +can use them. See +.B IOSQE_FIXED_FILE +and +.B IORING_REGISTER_FILES. + +Available since 5.15. + +.TP +.B IORING_OP_OPENAT2 +Issue the equivalent of a +.BR openat2(2) +system call. +.I fd +is the +.I dirfd +argument, +.I addr +must contain a pointer to the +.I *pathname +argument, +.I len +should contain the size of the open_how structure, and +.I off +should be set to the address of the open_how structure. See also +.BR openat2(2) +for the general description of the related system call. Available since 5.6. + +If the +.I file_index +field is set to a positive number, the file won't be installed into the +normal file table as usual but will be placed into the fixed file table at index +.I file_index - 1. +In this case, instead of returning a file descriptor, the result will contain +either 0 on success or an error. If the index points to a valid empty slot, the +installation is guaranteed to not fail. If there is already a file in the slot, +it will be replaced, similar to +.B IORING_OP_FILES_UPDATE. +Please note that only io_uring has access to such files and no other syscall +can use them. See +.B IOSQE_FIXED_FILE +and +.B IORING_REGISTER_FILES. + +Available since 5.15. + +.TP +.B IORING_OP_CLOSE +Issue the equivalent of a +.BR close(2) +system call. +.I fd +is the file descriptor to be closed. See also +.BR close(2) +for the general description of the related system call. Available since 5.6. +If the +.I file_index +field is set to a positive number, this command can be used to close files +that were direct opened through +.B IORING_OP_OPENAT +, +.B IORING_OP_OPENAT2 +, or +.B IORING_OP_ACCEPT +using the io_uring specific direct descriptors. Note that only one of the +descriptor fields may be set. The direct close feature is available since +the 5.15 kernel, where direct descriptors were introduced. + +.TP +.B IORING_OP_STATX +Issue the equivalent of a +.BR statx(2) +system call. +.I fd +is the +.I dirfd +argument, +.I addr +must contain a pointer to the +.I *pathname +string, +.I statx_flags +is the +.I flags +argument, +.I len +should be the +.I mask +argument, and +.I off +must contain a pointer to the +.I statxbuf +to be filled in. See also +.BR statx(2) +for the general description of the related system call. Available since 5.6. + +.TP +.B IORING_OP_READ +.TP +.B IORING_OP_WRITE +Issue the equivalent of a +.BR pread(2) +or +.BR pwrite(2) +system call. +.I fd +is the file descriptor to be operated on, +.I addr +contains the buffer in question, +.I len +contains the length of the IO operation, and +.I offs +contains the read or write offset. If +.I fd +does not refer to a seekable file, +.I off +must be set to zero or -1. If +.I offs +is set to +.B -1 +, the offset will use (and advance) the file position, like the +.BR read(2) +and +.BR write(2) +system calls. These are non-vectored versions of the +.B IORING_OP_READV +and +.B IORING_OP_WRITEV +opcodes. See also +.BR read(2) +and +.BR write(2) +for the general description of the related system call. Available since 5.6. + +.TP +.B IORING_OP_SPLICE +Issue the equivalent of a +.BR splice(2) +system call. +.I splice_fd_in +is the file descriptor to read from, +.I splice_off_in +is an offset to read from, +.I fd +is the file descriptor to write to, +.I off +is an offset from which to start writing to. A sentinel value of +.B -1 +is used to pass the equivalent of a NULL for the offsets to +.BR splice(2). +.I len +contains the number of bytes to copy. +.I splice_flags +contains a bit mask for the flag field associated with the system call. +Please note that one of the file descriptors must refer to a pipe. +See also +.BR splice(2) +for the general description of the related system call. Available since 5.7. + +.TP +.B IORING_OP_TEE +Issue the equivalent of a +.BR tee(2) +system call. +.I splice_fd_in +is the file descriptor to read from, +.I fd +is the file descriptor to write to, +.I len +contains the number of bytes to copy, and +.I splice_flags +contains a bit mask for the flag field associated with the system call. +Please note that both of the file descriptors must refer to a pipe. +See also +.BR tee(2) +for the general description of the related system call. Available since 5.8. + +.TP +.B IORING_OP_FILES_UPDATE +This command is an alternative to using +.B IORING_REGISTER_FILES_UPDATE +which then works in an async fashion, like the rest of the io_uring commands. +The arguments passed in are the same. +.I addr +must contain a pointer to the array of file descriptors, +.I len +must contain the length of the array, and +.I off +must contain the offset at which to operate. Note that the array of file +descriptors pointed to in +.I addr +must remain valid until this operation has completed. Available since 5.6. + +.TP +.B IORING_OP_PROVIDE_BUFFERS +This command allows an application to register a group of buffers to be used +by commands that read/receive data. Using buffers in this manner can eliminate +the need to separate the poll + read, which provides a convenient point in +time to allocate a buffer for a given request. It's often infeasible to have +as many buffers available as pending reads or receive. With this feature, the +application can have its pool of buffers ready in the kernel, and when the +file or socket is ready to read/receive data, a buffer can be selected for the +operation. +.I fd +must contain the number of buffers to provide, +.I addr +must contain the starting address to add buffers from, +.I len +must contain the length of each buffer to add from the range, +.I buf_group +must contain the group ID of this range of buffers, and +.I off +must contain the starting buffer ID of this range of buffers. With that set, +the kernel adds buffers starting with the memory address in +.I addr, +each with a length of +.I len. +Hence the application should provide +.I len * fd +worth of memory in +.I addr. +Buffers are grouped by the group ID, and each buffer within this group will be +identical in size according to the above arguments. This allows the application +to provide different groups of buffers, and this is often used to have +differently sized buffers available depending on what the expectations are of +the individual request. When submitting a request that should use a provided +buffer, the +.B IOSQE_BUFFER_SELECT +flag must be set, and +.I buf_group +must be set to the desired buffer group ID where the buffer should be selected +from. Available since 5.7. + +.TP +.B IORING_OP_REMOVE_BUFFERS +Remove buffers previously registered with +.B IORING_OP_PROVIDE_BUFFERS. +.I fd +must contain the number of buffers to remove, and +.I buf_group +must contain the buffer group ID from which to remove the buffers. Available +since 5.7. + +.TP +.B IORING_OP_SHUTDOWN +Issue the equivalent of a +.BR shutdown(2) +system call. +.I fd +is the file descriptor to the socket being shutdown, and +.I len +must be set to the +.I how +argument. No no other fields should be set. Available since 5.11. + +.TP +.B IORING_OP_RENAMEAT +Issue the equivalent of a +.BR renameat2(2) +system call. +.I fd +should be set to the +.I olddirfd, +.I addr +should be set to the +.I oldpath, +.I len +should be set to the +.I newdirfd, +.I addr +should be set to the +.I oldpath, +.I addr2 +should be set to the +.I newpath, +and finally +.I rename_flags +should be set to the +.I flags +passed in to +.BR renameat2(2). +Available since 5.11. + +.TP +.B IORING_OP_UNLINKAT +Issue the equivalent of a +.BR unlinkat2(2) +system call. +.I fd +should be set to the +.I dirfd, +.I addr +should be set to the +.I pathname, +and +.I unlink_flags +should be set to the +.I flags +being passed in to +.BR unlinkat(2). +Available since 5.11. + +.TP +.B IORING_OP_MKDIRAT +Issue the equivalent of a +.BR mkdirat2(2) +system call. +.I fd +should be set to the +.I dirfd, +.I addr +should be set to the +.I pathname, +and +.I len +should be set to the +.I mode +being passed in to +.BR mkdirat(2). +Available since 5.15. + +.TP +.B IORING_OP_SYMLINKAT +Issue the equivalent of a +.BR symlinkat2(2) +system call. +.I fd +should be set to the +.I newdirfd, +.I addr +should be set to the +.I target +and +.I addr2 +should be set to the +.I linkpath +being passed in to +.BR symlinkat(2). +Available since 5.15. + +.TP +.B IORING_OP_LINKAT +Issue the equivalent of a +.BR linkat2(2) +system call. +.I fd +should be set to the +.I olddirfd, +.I addr +should be set to the +.I oldpath, +.I len +should be set to the +.I newdirfd, +.I addr2 +should be set to the +.I newpath, +and +.I hardlink_flags +should be set to the +.I flags +being passed in to +.BR linkat(2). +Available since 5.15. + +.TP +.B IORING_OP_MSG_RING +Send a message to an io_uring. +.I fd +must be set to a file descriptor of a ring that the application has access to, +.I len +can be set to any 32-bit value that the application wishes to pass on, and +.I off +should be set any 64-bit value that the application wishes to send. On the +target ring, a CQE will be posted with the +.I res +field matching the +.I len +set, and a +.I user_data +field matching the +.I off +value being passed in. This request type can be used to either just wake or +interrupt anyone waiting for completions on the target ring, or it can be used +to pass messages via the two fields. Available since 5.18. + +.TP +.B IORING_OP_SOCKET +Issue the equivalent of a +.BR socket(2) +system call. +.I fd +must contain the communication domain, +.I off +must contain the communication type, +.I len +must contain the protocol, and +.I rw_flags +is currently unused and must be set to zero. See also +.BR socket(2) +for the general description of the related system call. Available since 5.19. + +If the +.I file_index +field is set to a positive number, the file won't be installed into the +normal file table as usual but will be placed into the fixed file table at index +.I file_index - 1. +In this case, instead of returning a file descriptor, the result will contain +either 0 on success or an error. If the index points to a valid empty slot, the +installation is guaranteed to not fail. If there is already a file in the slot, +it will be replaced, similar to +.B IORING_OP_FILES_UPDATE. +Please note that only io_uring has access to such files and no other syscall +can use them. See +.B IOSQE_FIXED_FILE +and +.B IORING_REGISTER_FILES. + +Available since 5.19. + +.TP +.B IORING_OP_SEND_ZC +Issue the zerocopy equivalent of a +.BR send(2) +system call. Similar to IORING_OP_SEND, but tries to avoid making intermediate +copies of data. Zerocopy execution is not guaranteed and may fall back to +copying. The request may also fail with +.B -EOPNOTSUPP , +when a protocol doesn't support zerocopy, in which case users are recommended +to use copying sends instead. + +The +.I flags +field of the first +.I "struct io_uring_cqe" +may likely contain +.B IORING_CQE_F_MORE , +which means that there will be a second completion event / notification for +the request, with the +.I user_data +field set to the same value. The user must not modify the data buffer until the +notification is posted. The first cqe follows the usual rules and so its +.I res +field will contain the number of bytes sent or a negative error code. The +notification's +.I res +field will be set to zero and the +.I flags +field will contain +.B IORING_CQE_F_NOTIF . +The two step model is needed because the kernel may hold on to buffers for a +long time, e.g. waiting for a TCP ACK, and having a separate cqe for request +completions allows userspace to push more data without extra delays. Note, +notifications are only responsible for controlling the lifetime of the buffers, +and as such don't mean anything about whether the data has atually been sent +out or received by the other end. Even errored requests may generate a +notification, and the user must check for +.B IORING_CQE_F_MORE +rather than relying on the result. + +.I fd +must be set to the socket file descriptor, +.I addr +must contain a pointer to the buffer, +.I len +denotes the length of the buffer to send, and +.I msg_flags +holds the flags associated with the system call. When +.I addr2 +is non-zero it points to the address of the target with +.I addr_len +specifying its size, turning the request into a +.BR sendto(2) +system call equivalent. + +Available since 6.0. + +This command also supports the following modifiers in +.I ioprio: + +.PP +.in +12 +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently full and attempting to +send data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a send of the data when there is enough space available. +This initial send attempt can be wasteful for the case where the socket +is expected to be full, setting this flag will bypass the initial send +attempt and go straight to arming poll. If poll does indicate that data can +be sent, the operation will proceed. + +.B IORING_RECVSEND_FIXED_BUF +If set, instructs io_uring to use a pre-mapped buffer. The +.I buf_index +field should contain an index into an array of fixed buffers. See +.BR io_uring_register (2) +for details on how to setup a context for fixed buffer I/O. +.EE +.in +.PP + +.PP +The +.I flags +field is a bit mask. The supported flags are: +.TP +.B IOSQE_FIXED_FILE +When this flag is specified, +.I fd +is an index into the files array registered with the io_uring instance (see the +.B IORING_REGISTER_FILES +section of the +.BR io_uring_register (2) +man page). Note that this isn't always available for all commands. If used on +a command that doesn't support fixed files, the SQE will error with +.B -EBADF. +Available since 5.1. +.TP +.B IOSQE_IO_DRAIN +When this flag is specified, the SQE will not be started before previously +submitted SQEs have completed, and new SQEs will not be started before this +one completes. Available since 5.2. +.TP +.B IOSQE_IO_LINK +When this flag is specified, the SQE forms a link with the next SQE in the +submission ring. That next SQE will not be started before the previous request +completes. This, in effect, forms a chain of SQEs, which can be arbitrarily +long. The tail of the chain is denoted by the first SQE that does not have this +flag set. Chains are not supported across submission boundaries. Even if the +last SQE in a submission has this flag set, it will still terminate the current +chain. This flag has no effect on previous SQE submissions, nor does it impact +SQEs that are outside of the chain tail. This means that multiple chains can be +executing in parallel, or chains and individual SQEs. Only members inside the +chain are serialized. A chain of SQEs will be broken, if any request in that +chain ends in error. io_uring considers any unexpected result an error. This +means that, eg, a short read will also terminate the remainder of the chain. +If a chain of SQE links is broken, the remaining unstarted part of the chain +will be terminated and completed with +.B -ECANCELED +as the error code. Available since 5.3. +.TP +.B IOSQE_IO_HARDLINK +Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result. +Note that the link will still sever if we fail submitting the parent request, +hard links are only resilient in the presence of completion results for +requests that did submit correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. +Available since 5.5. +.TP +.B IOSQE_ASYNC +Normal operation for io_uring is to try and issue an sqe as non-blocking first, +and if that fails, execute it in an async manner. To support more efficient +overlapped operation of requests that the application knows/assumes will +always (or most of the time) block, the application can ask for an sqe to be +issued async from the start. Available since 5.6. +.TP +.B IOSQE_BUFFER_SELECT +Used in conjunction with the +.B IORING_OP_PROVIDE_BUFFERS +command, which registers a pool of buffers to be used by commands that read +or receive data. When buffers are registered for this use case, and this +flag is set in the command, io_uring will grab a buffer from this pool when +the request is ready to receive or read data. If successful, the resulting CQE +will have +.B IORING_CQE_F_BUFFER +set in the flags part of the struct, and the upper +.B IORING_CQE_BUFFER_SHIFT +bits will contain the ID of the selected buffers. This allows the application +to know exactly which buffer was selected for the operation. If no buffers +are available and this flag is set, then the request will fail with +.B -ENOBUFS +as the error code. Once a buffer has been used, it is no longer available in +the kernel pool. The application must re-register the given buffer again when +it is ready to recycle it (eg has completed using it). Available since 5.7. +.TP +.B IOSQE_CQE_SKIP_SUCCESS +Don't generate a CQE if the request completes successfully. If the request +fails, an appropriate CQE will be posted as usual and if there is no +.B IOSQE_IO_HARDLINK, +CQEs for all linked requests will be omitted. The notion of failure/success is +opcode specific and is the same as with breaking chains of +.B IOSQE_IO_LINK. +One special case is when the request has a linked timeout, then the CQE +generation for the linked timeout is decided solely by whether it has +.B IOSQE_CQE_SKIP_SUCCESS +set, regardless whether it timed out or was canceled. In other words, if a +linked timeout has the flag set, it's guaranteed to not post a CQE. + +The semantics are chosen to accommodate several use cases. First, when all but +the last request of a normal link without linked timeouts are marked with the +flag, only one CQE per lin is posted. Additionally, it enables suppression of +CQEs in cases where the side effects of a successfully executed operation is +enough for userspace to know the state of the system. One such example would +be writing to a synchronisation file. + +This flag is incompatible with +.B IOSQE_IO_DRAIN. +Using both of them in a single ring is undefined behavior, even when they are +not used together in a single request. Currently, after the first request with +.B IOSQE_CQE_SKIP_SUCCESS, +all subsequent requests marked with drain will be failed at submission time. +Note that the error reporting is best effort only, and restrictions may change +in the future. + +Available since 5.17. + +.PP +.I ioprio +specifies the I/O priority. See +.BR ioprio_get (2) +for a description of Linux I/O priorities. + +.I fd +specifies the file descriptor against which the operation will be +performed, with the exception noted above. + +If the operation is one of +.B IORING_OP_READ_FIXED +or +.BR IORING_OP_WRITE_FIXED , +.I addr +and +.I len +must fall within the buffer located at +.I buf_index +in the fixed buffer array. If the operation is either +.B IORING_OP_READV +or +.BR IORING_OP_WRITEV , +then +.I addr +points to an iovec array of +.I len +entries. + +.IR rw_flags , +specified for read and write operations, contains a bitwise OR of +per-I/O flags, as described in the +.BR preadv2 (2) +man page. + +The +.I fsync_flags +bit mask may contain either 0, for a normal file integrity sync, or +.B IORING_FSYNC_DATASYNC +to provide data sync only semantics. See the descriptions of +.B O_SYNC +and +.B O_DSYNC +in the +.BR open (2) +manual page for more information. + +The bits that may be set in +.I poll_events +are defined in \fI\fP, and documented in +.BR poll (2). + +.I user_data +is an application-supplied value that will be copied into +the completion queue entry (see below). +.I buf_index +is an index into an array of fixed buffers, and is only valid if fixed +buffers were registered. +.I personality +is the credentials id to use for this operation. See +.BR io_uring_register(2) +for how to register personalities with io_uring. If set to 0, the current +personality of the submitting task is used. +.PP +Once the submission queue entry is initialized, I/O is submitted by +placing the index of the submission queue entry into the tail of the +submission queue. After one or more indexes are added to the queue, +and the queue tail is advanced, the +.BR io_uring_enter (2) +system call can be invoked to initiate the I/O. + +Completions use the following data structure: +.PP +.in +4n +.EX +/* + * IO completion data structure (Completion Queue Entry) + */ +struct io_uring_cqe { + __u64 user_data; /* sqe->data submission passed back */ + __s32 res; /* result code for this event */ + __u32 flags; +}; +.EE +.in +.PP +.I user_data +is copied from the field of the same name in the submission queue +entry. The primary use case is to store data that the application +will need to access upon completion of this particular I/O. The +.I flags +is used for certain commands, like +.B IORING_OP_POLL_ADD +or in conjunction with +.B IOSQE_BUFFER_SELECT +or +.B IORING_OP_MSG_RING, +, see those entries for details. +.I res +is the operation-specific result, but io_uring-specific errors +(e.g. flags or opcode invalid) are returned through this field. +They are described in section +.B CQE ERRORS. +.PP +For read and write opcodes, the +return values match +.I errno +values documented in the +.BR preadv2 (2) +and +.BR pwritev2 (2) +man pages, with +.I +res +holding the equivalent of +.I -errno +for error cases, or the transferred number of bytes in case the operation +is successful. Hence both error and success return can be found in that +field in the CQE. For other request types, the return values are documented +in the matching man page for that type, or in the opcodes section above for +io_uring-specific opcodes. +.PP +.SH RETURN VALUE +.BR io_uring_enter (2) +returns the number of I/Os successfully consumed. This can be zero +if +.I to_submit +was zero or if the submission queue was empty. Note that if the ring was +created with +.B IORING_SETUP_SQPOLL +specified, then the return value will generally be the same as +.I to_submit +as submission happens outside the context of the system call. + +The errors related to a submission queue entry will be returned through a +completion queue entry (see section +.B CQE ERRORS), +rather than through the system call itself. + +Errors that occur not on behalf of a submission queue entry are returned via the +system call directly. On such an error, a negative error code is returned. The +caller should not rely on +.I errno +variable. +.PP +.SH ERRORS +These are the errors returned by +.BR io_uring_enter (2) +system call. +.TP +.B EAGAIN +The kernel was unable to allocate memory for the request, or otherwise ran out +of resources to handle it. The application should wait for some completions and +try again. +.TP +.B EBADF +.I fd +is not a valid file descriptor. +.TP +.B EBADFD +.I fd +is a valid file descriptor, but the io_uring ring is not in the right state +(enabled). See +.BR io_uring_register (2) +for details on how to enable the ring. +.TP +.B EBADR +At least one CQE was dropped even with the +.B IORING_FEAT_NODROP +feature, and there are no otherwise available CQEs. This clears the error state +and so with no other changes the next call to +.BR io_uring_setup (2) +will not have this error. This error should be extremely rare and indicates the +machine is running critically low on memory and. It may be reasonable for the +application to terminate running unless it is able to safely handle any CQE +being lost. +.TP +.B EBUSY +If the +.B IORING_FEAT_NODROP +feature flag is set, then +.B EBUSY +will be returned if there were overflow entries, +.B IORING_ENTER_GETEVENTS +flag is set and not all of the overflow entries were able to be flushed to +the CQ ring. + +Without +.B IORING_FEAT_NODROP +the application is attempting to overcommit the number of requests it can have +pending. The application should wait for some completions and try again. May +occur if the application tries to queue more requests than we have room for in +the CQ ring, or if the application attempts to wait for more events without +having reaped the ones already present in the CQ ring. +.TP +.B EINVAL +Some bits in the +.I flags +argument are invalid. +.TP +.B EFAULT +An invalid user space address was specified for the +.I sig +argument. +.TP +.B ENXIO +The io_uring instance is in the process of being torn down. +.TP +.B EOPNOTSUPP +.I fd +does not refer to an io_uring instance. +.TP +.B EINTR +The operation was interrupted by a delivery of a signal before it could +complete; see +.BR signal(7). +Can happen while waiting for events with +.B IORING_ENTER_GETEVENTS. + +.SH CQE ERRORS +These io_uring-specific errors are returned as a negative value in the +.I res +field of the completion queue entry. +.TP +.B EACCES +The +.I flags +field or +.I opcode +in a submission queue entry is not allowed due to registered restrictions. +See +.BR io_uring_register (2) +for details on how restrictions work. +.TP +.B EBADF +The +.I fd +field in the submission queue entry is invalid, or the +.B IOSQE_FIXED_FILE +flag was set in the submission queue entry, but no files were registered +with the io_uring instance. +.TP +.B EFAULT +buffer is outside of the process' accessible address space +.TP +.B EFAULT +.B IORING_OP_READ_FIXED +or +.B IORING_OP_WRITE_FIXED +was specified in the +.I opcode +field of the submission queue entry, but either buffers were not +registered for this io_uring instance, or the address range described +by +.I addr +and +.I len +does not fit within the buffer registered at +.IR buf_index . +.TP +.B EINVAL +The +.I flags +field or +.I opcode +in a submission queue entry is invalid. +.TP +.B EINVAL +The +.I buf_index +member of the submission queue entry is invalid. +.TP +.B EINVAL +The +.I personality +field in a submission queue entry is invalid. +.TP +.B EINVAL +.B IORING_OP_NOP +was specified in the submission queue entry, but the io_uring context +was setup for polling +.RB ( IORING_SETUP_IOPOLL +was specified in the call to io_uring_setup). +.TP +.B EINVAL +.B IORING_OP_READV +or +.B IORING_OP_WRITEV +was specified in the submission queue entry, but the io_uring instance +has fixed buffers registered. +.TP +.B EINVAL +.B IORING_OP_READ_FIXED +or +.B IORING_OP_WRITE_FIXED +was specified in the submission queue entry, and the +.I buf_index +is invalid. +.TP +.B EINVAL +.BR IORING_OP_READV , +.BR IORING_OP_WRITEV , +.BR IORING_OP_READ_FIXED , +.B IORING_OP_WRITE_FIXED +or +.B IORING_OP_FSYNC +was specified in the submission queue entry, but the io_uring instance +was configured for IOPOLLing, or any of +.IR addr , +.IR ioprio , +.IR off , +.IR len , +or +.I buf_index +was set in the submission queue entry. +.TP +.B EINVAL +.B IORING_OP_POLL_ADD +or +.B IORING_OP_POLL_REMOVE +was specified in the +.I opcode +field of the submission queue entry, but the io_uring instance was +configured for busy-wait polling +.RB ( IORING_SETUP_IOPOLL ), +or any of +.IR ioprio , +.IR off , +.IR len , +or +.I buf_index +was non-zero in the submission queue entry. +.TP +.B EINVAL +.B IORING_OP_POLL_ADD +was specified in the +.I opcode +field of the submission queue entry, and the +.I addr +field was non-zero. +.TP +.B EOPNOTSUPP +.I opcode +is valid, but not supported by this kernel. +.TP +.B EOPNOTSUPP +.B IOSQE_BUFFER_SELECT +was set in the +.I flags +field of the submission queue entry, but the +.I opcode +doesn't support buffer selection. diff --git a/man/io_uring_enter2.2 b/man/io_uring_enter2.2 new file mode 120000 index 0000000000000000000000000000000000000000..5566c093b8ee090f34f90c5c4de13a5b25d59930 --- /dev/null +++ b/man/io_uring_enter2.2 @@ -0,0 +1 @@ +io_uring_enter.2 \ No newline at end of file diff --git a/man/io_uring_free_probe.3 b/man/io_uring_free_probe.3 new file mode 100644 index 0000000000000000000000000000000000000000..960fda39ad6c0b88463a640dc89541c4509624df --- /dev/null +++ b/man/io_uring_free_probe.3 @@ -0,0 +1,27 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_free_probe 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_free_probe \- free probe instance +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_free_probe(struct io_uring_probe *" probe ");" +.fi +.SH DESCRIPTION +.PP +The function +.BR io_uring_free_probe (3) +frees the +.I probe +instance allocated with the +.BR io_uring_get_probe (3) +function. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_get_probe (3) diff --git a/man/io_uring_get_events.3 b/man/io_uring_get_events.3 new file mode 100644 index 0000000000000000000000000000000000000000..f2415423953c3c55781e4be8fabe02a0ba4abb6b --- /dev/null +++ b/man/io_uring_get_events.3 @@ -0,0 +1,33 @@ +.\" Copyright (C) 2022 Dylan Yudaken +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_get_events 3 "September 5, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_get_events \- Flush outstanding requests to CQE ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_get_events(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_get_events (3) +function runs outstanding work and flushes completion events to the CQE ring. + +There can be events needing to be flushed if the ring was full and had overflowed. +Alternatively if the ring was setup with the +.BR IORING_SETUP_DEFER_TASKRUN +flag then this will process outstanding tasks, possibly resulting in more CQEs. + +.SH RETURN VALUE +On success +.BR io_uring_get_events (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit_and_get_events (3), +.BR io_uring_cq_has_overflow (3) diff --git a/man/io_uring_get_probe.3 b/man/io_uring_get_probe.3 new file mode 100644 index 0000000000000000000000000000000000000000..353cc7314f4a6c35cf5ab290b435badd75922e09 --- /dev/null +++ b/man/io_uring_get_probe.3 @@ -0,0 +1,30 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_get_probe 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_get_probe \- get probe instance +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "io_uring_probe *io_uring_get_probe(void);" +.fi +.SH DESCRIPTION +.PP +The function +.BR io_uring_get_probe (3) +returns an allocated io_uring_probe structure to the caller. The caller is +responsible for freeing the structure with the function +.BR io_uring_free_probe (3). + +.SH NOTES +Earlier versions of the Linux kernel do not support probe. If the kernel +doesn't support probe, this function will return NULL. + +.SH RETURN VALUE +On success it returns an allocated io_uring_probe structure, otherwise +it returns NULL. +.SH SEE ALSO +.BR io_uring_free_probe (3) diff --git a/man/io_uring_get_sqe.3 b/man/io_uring_get_sqe.3 new file mode 100644 index 0000000000000000000000000000000000000000..b257ebb394829a72f360cd80194ac96079c9c230 --- /dev/null +++ b/man/io_uring_get_sqe.3 @@ -0,0 +1,57 @@ +.\" Copyright (C) 2020 Jens Axboe +.\" Copyright (C) 2020 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_get_sqe 3 "July 10, 2020" "liburing-0.7" "liburing Manual" +.SH NAME +io_uring_get_sqe \- get the next available submission queue entry from the +submission queue +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "struct io_uring_sqe *io_uring_get_sqe(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_get_sqe (3) +function gets the next available submission queue entry from the submission +queue belonging to the +.I ring +param. + +On success +.BR io_uring_get_sqe (3) +returns a pointer to the submission queue entry. On failure NULL is returned. + +If a submission queue entry is returned, it should be filled out via one of the +prep functions such as +.BR io_uring_prep_read (3) +and submitted via +.BR io_uring_submit (3). + +Note that neither +.BR io_uring_get_sqe +nor the prep functions set (or clear) the +.B user_data +field of the SQE. If the caller expects +.BR io_uring_cqe_get_data (3) +or +.BR io_uring_cqe_get_data64 (3) +to return valid data when reaping IO completions, either +.BR io_uring_sqe_set_data (3) +or +.BR io_uring_sqe_set_data64 (3) +.B MUST +have been called before submitting the request. + +.SH RETURN VALUE +.BR io_uring_get_sqe (3) +returns a pointer to the next submission queue event on success and NULL on +failure. If NULL is returned, the SQ ring is currently full and entries must +be submitted for processing before new ones can get allocated. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_sqe_set_data (3) diff --git a/man/io_uring_opcode_supported.3 b/man/io_uring_opcode_supported.3 new file mode 100644 index 0000000000000000000000000000000000000000..b981ed7d7dc6ca98c8ad80e715973e1dc2723174 --- /dev/null +++ b/man/io_uring_opcode_supported.3 @@ -0,0 +1,30 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_opcode_supported 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_opcode_supported \- is op code supported? +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_opcode_supported(struct io_uring_probe *" probe "," +.BI " int " opcode ");" +.fi +.SH DESCRIPTION +.PP +The function +.BR io_uring_opcode_supported (3) +allows the caller to determine if the passed in +.I opcode +belonging to the +.I probe +param is supported. An instance of the io_uring_probe instance can be +obtained by calling the function +.BR io_uring_get_probe (3). + +.SH RETURN VALUE +On success it returns 1, otherwise it returns 0. +.SH SEE ALSO +.BR io_uring_get_probe (3) diff --git a/man/io_uring_peek_cqe.3 b/man/io_uring_peek_cqe.3 new file mode 100644 index 0000000000000000000000000000000000000000..a4ac2da2f9177ec4798e75b0fe240c098d63a0a6 --- /dev/null +++ b/man/io_uring_peek_cqe.3 @@ -0,0 +1,38 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_peek_cqe 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_peek_cqe \- check if an io_uring completion event is available +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_peek_cqe(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_peek_cqe (3) +function returns an IO completion from the queue belonging to the +.I ring +param, if one is readily available. On successful return, +.I cqe_ptr +param is filled with a valid CQE entry. + +This function does not enter the kernel to wait for an event, an event +is only returned if it's already available in the CQ ring. + +.SH RETURN VALUE +On success +.BR io_uring_peek_cqe (3) +returns +.B 0 +and the cqe_ptr parameter is filled in. On failure it returns +.BR -EAGAIN . +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqes (3), +.BR io_uring_wait_cqe (3) diff --git a/man/io_uring_prep_accept.3 b/man/io_uring_prep_accept.3 new file mode 100644 index 0000000000000000000000000000000000000000..94edd464ef6cbd692f2e252f2ecec9106bc38db6 --- /dev/null +++ b/man/io_uring_prep_accept.3 @@ -0,0 +1,197 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_accept 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_accept \- prepare an accept request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_accept(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " struct sockaddr *" addr "," +.BI " socklen_t *" addrlen "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_accept_direct(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " struct sockaddr *" addr "," +.BI " socklen_t *" addrlen "," +.BI " int " flags "," +.BI " unsigned int " file_index ");" +.PP +.BI "void io_uring_prep_multishot_accept(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " struct sockaddr *" addr "," +.BI " socklen_t *" addrlen "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_multishot_accept_direct(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " struct sockaddr *" addr "," +.BI " socklen_t *" addrlen "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_accept (3) +function and its three variants prepare an accept request similar to +.BR accept4 (2). +The submission queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +to start accepting a connection request described by the socket address at +.I addr +and of structure length +.I addrlen +and using modifier flags in +.IR flags . + +The three variants allow combining the direct file table and multishot features. + +Direct descriptors are io_uring private file descriptors. They +avoid some of the overhead associated with thread shared file tables and +can be used in any io_uring request that takes a file descriptor. +The two direct variants here create such direct descriptors. +Subsequent to their creation, they can be used by setting +.B IOSQE_FIXED_FILE +in the SQE +.I flags +member, and setting the SQE +.I fd +field to the direct descriptor value rather than the regular file +descriptor. Direct descriptors are managed like registered files. + +To use an accept direct variant, the application must first have registered +a file table of a desired size using +.BR io_uring_register_files (3) +or +.BR io_uring_register_files_sparse (3). +Once registered, +.BR io_uring_prep_accept_direct (3) +allows an entry in that table to be specifically selected through the +.I file_index +argument. +If the specified entry already contains a file, the file will first be removed +from the table and closed, consistent with the behavior of updating an +existing file with +.BR io_uring_register_files_update (3). +.I file_index +can also be set to +.B IORING_FILE_INDEX_ALLOC +for this variant and +an unused table index will be dynamically chosen and returned. +Likewise, +.B io_uring_prep_multishot_accept_direct +will have an unused table index dynamically chosen and returned for each connection accepted. +If both forms of direct selection will be employed, specific and dynamic, see +.BR io_uring_register_file_alloc_range (3) +for setting up the table so dynamically chosen entries are made against +a different range than that targetted by specific requests. + +Note that old kernels don't check the SQE +.I file_index +field meaning +applications cannot rely on a +.B -EINVAL +CQE +.I res +being returned when the kernel is too old because older kernels +may not recognize they are being asked to use a direct table slot. + +When a direct descriptor accept request asks for a table slot to be +dynamically chosen but there are no free entries, +.B -ENFILE +is returned as the CQE +.IR res . + +The multishot variants allow an application to issue +a single accept request, which will repeatedly trigger a CQE when a connection +request comes in. Like other multishot type requests, the application should +look at the CQE +.I flags +and see if +.B IORING_CQE_F_MORE +is set on completion as an indication of whether or not the accept request +will generate further CQEs. Note that for the multishot variants, setting +.B addr +and +.B addrlen +may not make a lot of sense, as the same value would be used for every +accepted connection. This means that the data written to +.B addr +may be overwritten by a new connection before the application has had time +to process a past connection. If the application knows that a new connection +cannot come in before a previous one has been processed, it may be used as +expected. The multishot variants are available since 5.19. + +See the man page +.BR accept4 (2) +for details of the accept function itself. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. + +.BR io_uring_prep_accept (3) +generates the installed file descriptor as its result. + +.BR io_uring_prep_accept_direct (3) +and +.I file_index +set to a specific direct descriptor +generates +.B 0 +on success. +The caller must remember which direct descriptor was picked for this request. + +.BR io_uring_prep_accept_direct (3) +and +.I file_index +set to +.B IORING_FILE_INDEX_ALLOC +generates the dynamically chosen direct descriptor. + +.BR io_uring_prep_multishot_accept (3) +generates the installed file descriptor in each result. + +.BR io_uring_prep_multishot_accept_direct (3), +generates the dynamically chosen direct descriptor in each result. + +Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it generates the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register_files (3), +.BR io_uring_register_files_sparse (3), +.BR io_uring_register_file_alloc_range (3), +.BR io_uring_register (2), +.BR accept4 (2) diff --git a/man/io_uring_prep_accept_direct.3 b/man/io_uring_prep_accept_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..0404bf59f71a89d48a21cef15590ff558a668023 --- /dev/null +++ b/man/io_uring_prep_accept_direct.3 @@ -0,0 +1 @@ +io_uring_prep_accept.3 \ No newline at end of file diff --git a/man/io_uring_prep_cancel.3 b/man/io_uring_prep_cancel.3 new file mode 100644 index 0000000000000000000000000000000000000000..3c9f2df244fd3b6f7e1a95fb9dd3207f48878dbb --- /dev/null +++ b/man/io_uring_prep_cancel.3 @@ -0,0 +1,118 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_cancel 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_cancel \- prepare a cancelation request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_cancel64(struct io_uring_sqe *" sqe "," +.BI " __u64 " user_data "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_cancel(struct io_uring_sqe *" sqe "," +.BI " void *" user_data "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_cancel_fd(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_cancel (3) +function prepares a cancelation request. The submission queue entry +.I sqe +is prepared to cancel an existing request identified by +.IR user_data . +For the +.I flags +argument, see below. + +.BR io_uring_prep_cancel64 (3) +is identical to +.BR io_uring_prep_cancel (3) , +except it takes a 64-bit integer rather than a pointer type. + +The cancelation request will attempt to find the previously issued request +identified by +.I user_data +and cancel it. The identifier is what the previously issued request has in +their +.I user_data +field in the SQE. + +The +.BR io_uring_prep_cancel_fd (3) +function prepares a cancelation request. The submission queue entry +.I sqe +is prepared to cancel an existing request that used the file descriptor +.IR fd . +For the +.I flags +argument, see below. + +The cancelation request will attempt to find the previously issued request +that used +.I fd +as the file descriptor and cancel it. + +By default, the first request matching the criteria given will be canceled. +This can be modified with any of the following flags passed in: +.TP +.B IORING_ASYNC_CANCEL_ALL +Cancel all requests that match the given criteria, rather than just canceling +the first one found. Available since 5.19. +.TP +.B IORING_ASYNC_CANCEL_FD +Match based on the file descriptor used in the original request rather than +the user_data. This is what +.BR io_uring_prep_cancel_fd (3) +sets up. Available since 5.19. +.TP +.B IORING_ASYNC_CANCEL_ANY +Match any request in the ring, regardless of user_data or file descriptor. +Can be used to cancel any pending request in the ring. Available since 5.19. +.P + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. If no flags are used to cancel multiple requests, +.B 0 +is returned on success. If flags are used to match multiple requests, then +a positive value is returned indicating how many requests were found and +canceled. +.TP +.B -ENOENT +The request identified by +.I user_data +could not be located. This could be because it completed before the cancelation +request was issued, or if an invalid identifier is used. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -EALREADY +The execution state of the request has progressed far enough that cancelation +is no longer possible. This should normally mean that it will complete shortly, +either successfully, or interrupted due to the cancelation. +.SH NOTES +Although the cancelation request uses async request syntax, the kernel side of +the cancelation is always run synchronously. It is guaranteed that a CQE is +always generated by the time the cancel request has been submitted. If the +cancelation is successful, the completion for the request targeted for +cancelation will have been posted by the time submission returns. For +.B -EALREADY +it may take a bit of time to do so. For this case, the caller must wait for the +canceled request to post its completion event. +.SH SEE ALSO +.BR io_uring_prep_poll_remove (3), +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_cancel64.3 b/man/io_uring_prep_cancel64.3 new file mode 120000 index 0000000000000000000000000000000000000000..347db090d1186ccbca6989242927caa4cb029703 --- /dev/null +++ b/man/io_uring_prep_cancel64.3 @@ -0,0 +1 @@ +io_uring_prep_cancel.3 \ No newline at end of file diff --git a/man/io_uring_prep_close.3 b/man/io_uring_prep_close.3 new file mode 100644 index 0000000000000000000000000000000000000000..94780f2c430d252df25c6df48f8fcc62e1003812 --- /dev/null +++ b/man/io_uring_prep_close.3 @@ -0,0 +1,59 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_close 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_close \- prepare a file descriptor close request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_close(struct io_uring_sqe *" sqe "," +.BI " int " fd ");" +.PP +.BI "void io_uring_prep_close_direct(struct io_uring_sqe *" sqe "," +.BI " unsigned " file_index ");" +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_close (3) +function prepares a close request. The submission queue entry +.I sqe +is setup to close the file descriptor indicated by +.IR fd . + +For a direct descriptor close request, the offset is specified by the +.I file_index +argument instead of the +.IR fd . +This is identical to unregistering the direct descriptor, and is provided as +a convenience. + +These functions prepare an async +.BR close (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR close (2) diff --git a/man/io_uring_prep_close_direct.3 b/man/io_uring_prep_close_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..d9ce6a60fa9622ca0b57edb8a79503d190f66322 --- /dev/null +++ b/man/io_uring_prep_close_direct.3 @@ -0,0 +1 @@ +io_uring_prep_close.3 \ No newline at end of file diff --git a/man/io_uring_prep_connect.3 b/man/io_uring_prep_connect.3 new file mode 100644 index 0000000000000000000000000000000000000000..6a7c64a6da7e94e12bf783ea42c36def4007cb05 --- /dev/null +++ b/man/io_uring_prep_connect.3 @@ -0,0 +1,66 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_connect 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_connect \- prepare a connect request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_connect(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " const struct sockaddr *" addr "," +.BI " socklen_t " addrlen ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_connect (3) +function prepares a connect request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +to start connecting to the destination described by the socket address at +.I addr +and of structure length +.IR addrlen . + +This function prepares an async +.BR connect (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR connect (2) diff --git a/man/io_uring_prep_fadvise.3 b/man/io_uring_prep_fadvise.3 new file mode 100644 index 0000000000000000000000000000000000000000..a53ab255156b2275f5c7e8a0844f027743278a92 --- /dev/null +++ b/man/io_uring_prep_fadvise.3 @@ -0,0 +1,59 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_fadvise 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_fadvise \- prepare a fadvise request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_fadvise(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " __u64 " offset "," +.BI " off_t " len "," +.BI " int " advice ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_fadvise (3) +function prepares an fadvise request. The submission queue entry +.I sqe +is setup to use the file descriptor pointed to by +.I fd +to start an fadvise operation at +.I offset +and of +.I len +length in bytes, giving it the advise located in +.IR advice . + +This function prepares an async +.BR posix_fadvise (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR posix_fadvise (2) diff --git a/man/io_uring_prep_fallocate.3 b/man/io_uring_prep_fallocate.3 new file mode 100644 index 0000000000000000000000000000000000000000..86e1d395fcc81bcf903cb1dd9c6dfd3a9b04b724 --- /dev/null +++ b/man/io_uring_prep_fallocate.3 @@ -0,0 +1,59 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_fallocate 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_fallocate \- prepare a fallocate request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_fallocate(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " int " mode "," +.BI " off_t " offset "," +.BI " off_t " len ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_fallocate (3) +function prepares a fallocate request. The submission queue entry +.I sqe +is setup to use the file descriptor pointed to by +.I fd +to start a fallocate operation described by +.I mode +at offset +.I offset +and +.I len +length in bytes. + +This function prepares an async +.BR fallocate (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR fallocate (2) diff --git a/man/io_uring_prep_files_update.3 b/man/io_uring_prep_files_update.3 new file mode 100644 index 0000000000000000000000000000000000000000..bedb85e0debf53e395fa44a7830811af5677cbe2 --- /dev/null +++ b/man/io_uring_prep_files_update.3 @@ -0,0 +1,92 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_files_update 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_files_update \- prepare a registered file update request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_files_update(struct io_uring_sqe *" sqe "," +.BI " int *" fds "," +.BI " unsigned " nr_fds "," +.BI " int " offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_files_update (3) +function prepares a request for updating a number of previously registered file +descriptors. The submission queue entry +.I sqe +is setup to use the file descriptor array pointed to by +.I fds +and of +.I nr_fds +in length to update that amount of previously registered files starting at +offset +.IR offset . + +Once a previously registered file is updated with a new one, the existing +entry is updated and then removed from the table. This operation is equivalent to +first unregistering that entry and then inserting a new one, just bundled into +one combined operation. + +If +.I offset +is specified as IORING_FILE_INDEX_ALLOC, io_uring will allocate free direct +descriptors instead of having the application to pass, and store allocated +direct descriptors into +.I fds +array, +.I cqe->res +will return the number of direct descriptors allocated. + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.I res +will contain the number of successfully updated file descriptors. On error, +the following errors can occur. +.TP +.B -ENOMEM +The kernel was unable to allocate memory for the request. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -EFAULT +The kernel was unable to copy in the memory pointed to by +.IR fds . +.TP +.B -EBADF +On of the descriptors located in +.I fds +didn't refer to a valid file descriptor, or one of the file descriptors in +the array referred to an io_uring instance. +.TP +.B -EOVERFLOW +The product of +.I offset +and +.I nr_fds +exceed the valid amount or overflowed. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2) diff --git a/man/io_uring_prep_fsync.3 b/man/io_uring_prep_fsync.3 new file mode 100644 index 0000000000000000000000000000000000000000..a3259a0cdef88cbb2a1c9bee2873595926b98bf4 --- /dev/null +++ b/man/io_uring_prep_fsync.3 @@ -0,0 +1,70 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_fsync 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_fsync \- prepare an fsync request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_fsync(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_fsync (3) +function prepares an fsync request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +that should get synced, with the modifier flags indicated by the +.I flags +argument. + +This function prepares an fsync request. It can act either like an +.BR fsync (2) +operation, which is the default behavior. If +.B IORING_FSYNC_DATASYNC +is set in the +.I flags +argument, then it behaves like +.BR fdatasync (2). +If no range is specified, the +.I fd +will be synced from 0 to end-of-file. + +It's possible to specify a range to sync, if one is desired. If the +.I off +field of the SQE is set to non-zero, then that indicates the offset to +start syncing at. If +.I len +is set in the SQE, then that indicates the size in bytes to sync from the +offset. Note that these fields are not accepted by this helper, so they have +to be set manually in the SQE after calling this prep helper. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR fsync (2), +.BR fdatasync (2) diff --git a/man/io_uring_prep_link.3 b/man/io_uring_prep_link.3 new file mode 120000 index 0000000000000000000000000000000000000000..6d3059de30f4a4678a1a3e9d798baffaf9cfa257 --- /dev/null +++ b/man/io_uring_prep_link.3 @@ -0,0 +1 @@ +io_uring_prep_linkat.3 \ No newline at end of file diff --git a/man/io_uring_prep_linkat.3 b/man/io_uring_prep_linkat.3 new file mode 100644 index 0000000000000000000000000000000000000000..0949e3b42fb55ebc30effff93eb221ec3829ee37 --- /dev/null +++ b/man/io_uring_prep_linkat.3 @@ -0,0 +1,91 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_linkat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_linkat \- prepare a linkat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_linkat(struct io_uring_sqe *" sqe "," +.BI " int " olddirfd "," +.BI " const char *" oldpath "," +.BI " int " newdirfd "," +.BI " const char *" newpath "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_link(struct io_uring_sqe *" sqe "," +.BI " const char *" oldpath "," +.BI " const char *" newpath "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_linkat (3) +function prepares a linkat request. The submission queue entry +.I sqe +is setup to use the old directory file descriptor pointed to by +.I olddirfd +and old path pointed to by +.I oldpath +with the new directory file descriptor pointed to by +.I newdirfd +and the new path pointed to by +.I newpath +and using the specified flags in +.IR flags . + +The +.BR io_uring_prep_link (3) +function prepares a link request. The submission queue entry +.I sqe +is setup to use the old path pointed to by +.I oldpath +and the new path pointed to by +.IR newpath , +both relative to the current working directory and using the specified flags in +.IR flags . + +These functions prepare an async +.BR linkat (2) +or +.BR link (2) +request. See those man pages for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR linkat (2), +.BR link (2) diff --git a/man/io_uring_prep_madvise.3 b/man/io_uring_prep_madvise.3 new file mode 100644 index 0000000000000000000000000000000000000000..6c5f16bc150b7b8af4e40100b5ec0681c3f3dcd5 --- /dev/null +++ b/man/io_uring_prep_madvise.3 @@ -0,0 +1,56 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_madvise 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_madvise \- prepare a madvise request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_madvise(struct io_uring_sqe *" sqe "," +.BI " void *" addr "," +.BI " off_t " len "," +.BI " int " advice ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_madvise (3) +function prepares an madvise request. The submission queue entry +.I sqe +is setup to start an madvise operation at the virtual address of +.I addr +and of +.I len +length in bytes, giving it the advise located in +.IR advice . + +This function prepares an async +.BR madvise (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR madvise (2) diff --git a/man/io_uring_prep_mkdir.3 b/man/io_uring_prep_mkdir.3 new file mode 120000 index 0000000000000000000000000000000000000000..b3412d1d2c41bd9ed1b932974d95325f65a028da --- /dev/null +++ b/man/io_uring_prep_mkdir.3 @@ -0,0 +1 @@ +io_uring_prep_mkdirat.3 \ No newline at end of file diff --git a/man/io_uring_prep_mkdirat.3 b/man/io_uring_prep_mkdirat.3 new file mode 100644 index 0000000000000000000000000000000000000000..a98b4e354078ffef3a94e0d737390748d211cf4b --- /dev/null +++ b/man/io_uring_prep_mkdirat.3 @@ -0,0 +1,83 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_mkdirat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_mkdirat \- prepare an mkdirat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_mkdirat(struct io_uring_sqe *" sqe "," +.BI " int " dirfd "," +.BI " const char *" path "," +.BI " mode_t " mode ");" +.PP +.BI "void io_uring_prep_mkdir(struct io_uring_sqe *" sqe "," +.BI " const char *" path "," +.BI " mode_t " mode ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_mkdirat (3) +function prepares a mkdirat request. The submission queue entry +.I sqe +is setup to use the directory file descriptor pointed to by +.I dirfd +to start a mkdirat operation on the path identified by +.I path +with the mode given in +.IR mode . + +The +.BR io_uring_prep_mkdir (3) +function prepares a mkdir request. The submission queue entry +.I sqe +is setup to use the current working directory to start a mkdir +operation on the path identified by +.I path +with the mode given in +.IR mode . + +These functions prepare an async +.BR mkdir (2) +or +.BR mkdirat (2) +request. See those man pages for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR mkdirat (2), +.BR mkdir (2) diff --git a/man/io_uring_prep_msg_ring.3 b/man/io_uring_prep_msg_ring.3 new file mode 100644 index 0000000000000000000000000000000000000000..9cf3444d86de851ee64b81a127511e165b97b682 --- /dev/null +++ b/man/io_uring_prep_msg_ring.3 @@ -0,0 +1,72 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_msg_ring 3 "March 10, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_msg_ring \- send a message to another ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_msg_ring(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " unsigned int " len "," +.BI " __u64 " data "," +.BI " unsigned int " flags ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_prep_msg_ring (3) +prepares a to send a CQE to an io_uring file descriptor. The submission queue +entry +.I sqe +is setup to use the file descriptor +.IR fd , +which must identify a io_uring context, to post a CQE on that ring where the +target CQE +.B res +field will contain the content of +.I len +and the +.B user_data +of +.I data +with the request modifier flags set by +.IR flags . +Currently there are no valid flag modifiers, this field must contain +.BR 0 . + +The targeted ring may be any ring that the user has access to, even the ring +itself. This request can be used for simple message passing to another ring, +allowing 32+64 bits of data to be transferred through the +.I len +and +.I data +fields. The use case may be anything from simply waking up someone waiting +on the targeted ring, or it can be used to pass messages between the two +rings. + +.SH RETURN VALUE +None + +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. +.TP +.B -ENOMEM +The kernel was unable to allocate memory for the request. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -EBADFD +The descriptor passed in +.I fd +does not refer to an io_uring file descriptor. +.TP +.B -EOVERFLOW +The kernel was unable to fill a CQE on the target ring. This can happen if +the target CQ ring is in an overflow state and the kernel wasn't able to +allocate memory for a new CQE entry. diff --git a/man/io_uring_prep_multishot_accept.3 b/man/io_uring_prep_multishot_accept.3 new file mode 120000 index 0000000000000000000000000000000000000000..0404bf59f71a89d48a21cef15590ff558a668023 --- /dev/null +++ b/man/io_uring_prep_multishot_accept.3 @@ -0,0 +1 @@ +io_uring_prep_accept.3 \ No newline at end of file diff --git a/man/io_uring_prep_multishot_accept_direct.3 b/man/io_uring_prep_multishot_accept_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..0404bf59f71a89d48a21cef15590ff558a668023 --- /dev/null +++ b/man/io_uring_prep_multishot_accept_direct.3 @@ -0,0 +1 @@ +io_uring_prep_accept.3 \ No newline at end of file diff --git a/man/io_uring_prep_nop.3 b/man/io_uring_prep_nop.3 new file mode 100644 index 0000000000000000000000000000000000000000..81853d776391a2e42c74653ccd118543a3bf1f13 --- /dev/null +++ b/man/io_uring_prep_nop.3 @@ -0,0 +1,28 @@ +.\" Copyright (C) 2022 Samuel Williams +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_nop 3 "October 20, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_nop \- prepare a nop request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_nop(struct io_uring_sqe *" sqe ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_nop (3) +function prepares nop (no operation) request. The submission queue entry +.I sqe +does not require any additional setup. + +.SH RETURN VALUE +None +.SH ERRORS +None +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), diff --git a/man/io_uring_prep_openat.3 b/man/io_uring_prep_openat.3 new file mode 100644 index 0000000000000000000000000000000000000000..e8b4217d1a0f20c23e6c29ae76a02f9de3aab659 --- /dev/null +++ b/man/io_uring_prep_openat.3 @@ -0,0 +1,117 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_openat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_openat \- prepare an openat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_openat(struct io_uring_sqe *" sqe "," +.BI " int " dfd "," +.BI " const char *" path "," +.BI " int " flags "," +.BI " mode_t " mode ");" +.PP +.BI "void io_uring_prep_openat_direct(struct io_uring_sqe *" sqe "," +.BI " int " dfd "," +.BI " const char *" path "," +.BI " int " flags "," +.BI " mode_t " mode "," +.BI " unsigned " file_index ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_openat (3) +function prepares an openat request. The submission queue entry +.I sqe +is setup to use the directory file descriptor +.I dfd +to start opening a file described by +.I path +and using the open flags in +.I flags +and using the file mode bits specified in +.IR mode . + +For a direct descriptor open request, the offset is specified by the +.I file_index +argument. Direct descriptors are io_uring private file descriptors. They +avoid some of the overhead associated with thread shared file tables, and +can be used in any io_uring request that takes a file descriptor. To do so, +.B IOSQE_FIXED_FILE +must be set in the SQE +.I flags +member, and the SQE +.I fd +field should use the direct descriptor value rather than the regular file +descriptor. Direct descriptors are managed like registered files. + +If the direct variant is used, the application must first have registered +a file table using +.BR io_uring_register_files (3) +of the appropriate size. Once registered, a direct accept request may use any +entry in that table, as long as it is within the size of the registered table. +If a specified entry already contains a file, the file will first be removed +from the table and closed. It's consistent with the behavior of updating an +existing file with +.BR io_uring_register_files_update (3). +Note that old kernels don't check the SQE +.I file_index +field, which is not a problem for liburing helpers, but users of the raw +io_uring interface need to zero SQEs to avoid unexpected behavior. + +If +.B IORING_FILE_INDEX_ALLOC +is used as the +.I file_index +for a direct open, then io_uring will allocate a free direct descriptor in +the existing table. The allocated descriptor is returned in the CQE +.I res +field just like it would be for a non-direct open request. If no more entries +are available in the direct descriptor table, +.B -ENFILE +is returned instead. + +These functions prepare an async +.BR openat (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR openat (2) diff --git a/man/io_uring_prep_openat2.3 b/man/io_uring_prep_openat2.3 new file mode 100644 index 0000000000000000000000000000000000000000..338cf7eaae3fa7002099f363721651c6ba284a17 --- /dev/null +++ b/man/io_uring_prep_openat2.3 @@ -0,0 +1,117 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_openat2 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_openat2 \- prepare an openat2 request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_openat2(struct io_uring_sqe *" sqe "," +.BI " int " dfd "," +.BI " const char *" path "," +.BI " int " flags "," +.BI " struct open_how *" how ");" +.PP +.BI "void io_uring_prep_openat2_direct(struct io_uring_sqe *" sqe "," +.BI " int " dfd "," +.BI " const char *" path "," +.BI " int " flags "," +.BI " struct open_how *" how "," +.BI " unsigned " file_index ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_openat2 (3) +function prepares an openat2 request. The submission queue entry +.I sqe +is setup to use the directory file descriptor +.I dfd +to start opening a file described by +.I path +and using the open flags in +.I flags +and using the instructions on how to open the file given in +.IR how . + +For a direct descriptor open request, the offset is specified by the +.I file_index +argument. Direct descriptors are io_uring private file descriptors. They +avoid some of the overhead associated with thread shared file tables, and +can be used in any io_uring request that takes a file descriptor. To do so, +.B IOSQE_FIXED_FILE +must be set in the SQE +.I flags +member, and the SQE +.I fd +field should use the direct descriptor value rather than the regular file +descriptor. Direct descriptors are managed like registered files. + +If the direct variant is used, the application must first have registered +a file table using +.BR io_uring_register_files (3) +of the appropriate size. Once registered, a direct accept request may use any +entry in that table, as long as it is within the size of the registered table. +If a specified entry already contains a file, the file will first be removed +from the table and closed. It's consistent with the behavior of updating an +existing file with +.BR io_uring_register_files_update (3). +Note that old kernels don't check the SQE +.I file_index +field, which is not a problem for liburing helpers, but users of the raw +io_uring interface need to zero SQEs to avoid unexpected behavior. +If +.B IORING_FILE_INDEX_ALLOC +is used as the +.I file_index +for a direct open, then io_uring will allocate a free direct descriptor in +the existing table. The allocated descriptor is returned in the CQE +.I res +field just like it would be for a non-direct open request. If no more entries +are available in the direct descriptor table, +.B -ENFILE +is returned instead. + +These functions prepare an async +.BR openat2 (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR openat2 (2) diff --git a/man/io_uring_prep_openat2_direct.3 b/man/io_uring_prep_openat2_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..2c0e6c9c3de4a1e40516750c21e12e1c5b67bea2 --- /dev/null +++ b/man/io_uring_prep_openat2_direct.3 @@ -0,0 +1 @@ +io_uring_prep_openat2.3 \ No newline at end of file diff --git a/man/io_uring_prep_openat_direct.3 b/man/io_uring_prep_openat_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..67f501e5ced1424be42ac8e5f0d3bb446460d7f2 --- /dev/null +++ b/man/io_uring_prep_openat_direct.3 @@ -0,0 +1 @@ +io_uring_prep_openat.3 \ No newline at end of file diff --git a/man/io_uring_prep_poll_add.3 b/man/io_uring_prep_poll_add.3 new file mode 100644 index 0000000000000000000000000000000000000000..cb6087857dafde44d8529a36570bf770a02e287d --- /dev/null +++ b/man/io_uring_prep_poll_add.3 @@ -0,0 +1,72 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_poll_add 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_poll_add \- prepare a poll request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_poll_add(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " unsigned " poll_mask ");" +.PP +.BI "void io_uring_prep_poll_multishot(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " unsigned " poll_mask ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_poll_add (3) +function prepares a poll request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +that should get polled, with the events desired specified in the +.I poll_mask +argument. + +The default behavior is a single-shot poll request. When the specified event +has triggered, a completion CQE is posted and no more events will be generated +by the poll request. +.BR io_uring_prep_multishot (3) +behaves identically in terms of events, but it persist across notifications +and will repeatedly post notifications for the same registration. A CQE +posted from a multishot poll request will have +.B IORING_CQE_F_MORE +set in the CQE +.I flags +member, indicating that the application should expect more completions from +this request. If the multishot poll request gets terminated or experiences +an error, this flag will not be set in the CQE. If this happens, the application +should not expect further CQEs from the original request and must reissue a +new one if it still wishes to get notifications on this file descriptor. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation, which is a bitmask of the +events notified. See the +.BR poll (2) +man page for details. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR poll (2), +.BR epoll_ctl (3) diff --git a/man/io_uring_prep_poll_multishot.3 b/man/io_uring_prep_poll_multishot.3 new file mode 120000 index 0000000000000000000000000000000000000000..ac8fb8fdb468402149081a9f430d10492cfa6156 --- /dev/null +++ b/man/io_uring_prep_poll_multishot.3 @@ -0,0 +1 @@ +io_uring_prep_poll_add.3 \ No newline at end of file diff --git a/man/io_uring_prep_poll_remove.3 b/man/io_uring_prep_poll_remove.3 new file mode 100644 index 0000000000000000000000000000000000000000..b6f4b2637240b92be8836a17e2e83d22a976cfdd --- /dev/null +++ b/man/io_uring_prep_poll_remove.3 @@ -0,0 +1,55 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_poll_remove 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_poll_remove \- prepare a poll deletion request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_poll_remove(struct io_uring_sqe *" sqe "," +.BI " __u64 " user_data ");" +.BI " +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_poll_remove (3) +function prepares a poll removal request. The submission queue entry +.I sqe +is setup to remove a poll request identified by +.I user_data + +Works like +.BR io_uring_prep_cancel (3) +except only looks for poll requests. Apart from that, behavior is identical. +See that man page for specific details. + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.B 0 +is returned. +.TP +.B -ENOENT +The request identified by +.I user_data +could not be located. This could be because it completed before the cancelation +request was issued, or if an invalid identifier is used. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -EALREADY +The execution state of the request has progressed far enough that cancelation +is no longer possible. This should normally mean that it will complete shortly, +either successfully, or interrupted due to the cancelation. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_prep_cancel (3) diff --git a/man/io_uring_prep_poll_update.3 b/man/io_uring_prep_poll_update.3 new file mode 100644 index 0000000000000000000000000000000000000000..11f6346718d8114e981e771d4a7a7498f53bb2b1 --- /dev/null +++ b/man/io_uring_prep_poll_update.3 @@ -0,0 +1,89 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_poll_update 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_poll_update \- update an existing poll request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_poll_update(struct io_uring_sqe *" sqe "," +.BI " __u64 " old_user_data "," +.BI " __u64 " new_user_data "," +.BI " unsigned " poll_mask "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_poll_update (3) +function prepares a poll update request. The submission queue entry +.I sqe +is setup to update a poll request identified by +.IR old_user_data , +replacing it with the +.I new_user_data +information. The +.I poll_mask +arguments contains the new mask to use for the poll request, and +.I flags +argument contains modifier flags telling io_uring what fields to update. + +The +.I flags +modifier flags is a bitmask and may contain and OR'ed mask of: +.TP +.B IORING_POLL_UPDATE_EVENTS +If set, the poll update request will replace the existing events being waited +for with the ones specified in the +.I poll_mask +argument to the function. +.TP +.B IORING_POLL_UPDATE_USER_DATA +If set, the poll update request will update the existing user_data of the +request with the value passed in as the +.I new_user_data +argument. +.TP +.B IORING_POLL_ADD_MULTI +If set, this will change the poll request from a singleshot to a multishot +request. This must be used along with +.B IORING_POLL_UPDATE_EVENTS +as the event field must be updated to enable multishot. + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.B 0 +is returned. +.TP +.B -ENOENT +The request identified by +.I user_data +could not be located. This could be because it completed before the cancelation +request was issued, or if an invalid identifier is used. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -EALREADY +The execution state of the request has progressed far enough that cancelation +is no longer possible. This should normally mean that it will complete shortly, +either successfully, or interrupted due to the cancelation. +.TP +.B -ECANCELED +.B IORING_POLL_UPDATE_EVENTS +was set and an error occurred re-arming the poll request with the new mask. +The original poll request is terminated if this happens, and that termination +CQE will contain the reason for the error re-arming. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_prep_poll_add (3), +.BR io_uring_prep_poll_multishot (3) diff --git a/man/io_uring_prep_provide_buffers.3 b/man/io_uring_prep_provide_buffers.3 new file mode 100644 index 0000000000000000000000000000000000000000..f3dded9db8f66fb4bc49e79e8b3e7e590f8187d2 --- /dev/null +++ b/man/io_uring_prep_provide_buffers.3 @@ -0,0 +1,131 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_provide_buffers 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_provide_buffers \- prepare a provide buffers request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_provide_buffers(struct io_uring_sqe *" sqe "," +.BI " void *" addr "," +.BI " int " len "," +.BI " int " nr "," +.BI " int " bgid "," +.BI " int " bid ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_provide_buffers (3) +function prepares a request for providing the kernel with buffers. The +submission queue entry +.I sqe +is setup to consume +.I len +number of buffers starting at +.I addr +and identified by the buffer group ID of +.I bgid +and numbered sequentially starting at +.IR bid . + +This function sets up a request to provide buffers to the io_uring context +that can be used by read or receive operations. This is done by filling in +the SQE +.I buf_group +field and setting +.B IOSQE_BUFFER_SELECT +in the SQE +.I flags +member. If buffer selection is used for a request, no buffer should be provided +in the address field. Instead, the group ID is set to match one that was +previously provided to the kernel. The kernel will then select a buffer from +this group for the IO operation. On successful completion of the IO request, +the CQE +.I flags +field will have +.B IORING_CQE_F_BUFFER +set and the selected buffer ID will be indicated by the upper 16-bits of the +.I flags +field. + +Different buffer group IDs can be used by the application to have different +sizes or types of buffers available. Once a buffer has been consumed for an +operation, it is no longer known to io_uring. It must be re-provided if so +desired or freed by the application if no longer needed. + +The buffer IDs are internally tracked from +.I bid +and sequentially ascending from that value. If +.B 16 +buffers are provided and start with an initial +.I bid +of 0, then the buffer IDs will range from +.BR 0..15 . +The application must be aware of this to make sense of the buffer ID passed +back in the CQE. + +Not all requests support buffer selection, as it only really makes sense for +requests that receive data from the kernel rather than write or provide data. +Currently, this mode of operation is supported for any file read or socket +receive request. Attempting to use +.B IOSQE_BUFFER_SELECT +with a command that doesn't support it will result in a CQE +.I res +error of +.BR -EINVAL . +Buffer selection will work with operations that take a +.B struct iovec +as its data destination, but only if 1 iovec is provided. +. +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.I res +will contain the number of successfully provided buffers. On error, +the following errors can occur. +.TP +.B -ENOMEM +The kernel was unable to allocate memory for the request. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -E2BIG +The number of buffers provided was too big, or the +.I bid +was too big. A max value of +.B USHRT_MAX +buffers can be specified. +.TP +.B -EFAULT +Some of the user memory given was invalid for the application. +.TP +.B -EBADF +On of the descriptors located in +.I fds +didn't refer to a valid file descriptor, or one of the file descriptors in +the array referred to an io_uring instance. +.TP +.B -EOVERFLOW +The product of +.I len +and +.I nr +exceed the valid amount or overflowed, or the sum of +.I addr +and the length of buffers overflowed. +.TP +.B -EBUSY +Attempt to update a slot that is already used. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR io_uring_prep_remove_buffers (3) diff --git a/man/io_uring_prep_read.3 b/man/io_uring_prep_read.3 new file mode 100644 index 0000000000000000000000000000000000000000..a7636087013bec6858f7ec86f63a0eb06add113a --- /dev/null +++ b/man/io_uring_prep_read.3 @@ -0,0 +1,69 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_read 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_read \- prepare I/O read request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_read(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " void *" buf "," +.BI " unsigned " nbytes "," +.BI " __u64 " offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_read (3) +prepares an IO read request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start reading +.I nbytes +into the buffer +.I buf +at the specified +.IR offset . + +On files that support seeking, if the offset is set to +.BR -1 , +the read operation commences at the file offset, and the file offset is +incremented by the number of bytes read. See +.BR read (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature, if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the read has been prepared it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_prep_readv (3), +.BR io_uring_prep_readv2 (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_read_fixed.3 b/man/io_uring_prep_read_fixed.3 new file mode 100644 index 0000000000000000000000000000000000000000..523685dbaa427a0e9e6e77cb5d0aaf0c6636a4dd --- /dev/null +++ b/man/io_uring_prep_read_fixed.3 @@ -0,0 +1,72 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_read 3 "February 13, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_read_fixed \- prepare I/O read request with registered buffer +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_read_fixed(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " void *" buf "," +.BI " unsigned " nbytes "," +.BI " __u64 " offset "," +.BI " int " buf_index ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_read_fixed (3) +prepares an IO read request with a previously registered IO buffer. The +submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start reading +.I nbytes +into the buffer +.I buf +at the specified +.IR offset , +and with the buffer matching the registered index of +.IR buf_index . + +This works just like +.BR io_uring_prep_read (3) +except it requires the use of buffers that have been registered with +.BR io_uring_register_buffers (3). +The +.I buf +and +.I nbytes +arguments must fall within a region specified by +.I buf_index +in the previously registered buffer. The buffer need not be aligned with +the start of the registered buffer. + +After the read has been prepared it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_prep_read (3), +.BR io_uring_register_buffers (3) diff --git a/man/io_uring_prep_readv.3 b/man/io_uring_prep_readv.3 new file mode 100644 index 0000000000000000000000000000000000000000..031d70d3df202c00fe11ab58bcefd5f8658c774d --- /dev/null +++ b/man/io_uring_prep_readv.3 @@ -0,0 +1,85 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_readv 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_readv \- prepare vector I/O read request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_readv(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const struct iovec *" iovecs "," +.BI " unsigned " nr_vecs "," +.BI " __u64 " offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_readv (3) +prepares a vectored IO read request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start reading +.I nr_vecs +into the +.I iovecs +array at the specified +.IR offset . + +On files that support seeking, if the offset is set to +.BR -1 , +the read operation commences at the file offset, and the file offset is +incremented by the number of bytes read. See +.BR read (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature, if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the write has been prepared it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +Unless an application explicitly needs to pass in more than iovec, it is more +efficient to use +.BR io_uring_prep_read (3) +rather than this function, as no state has to be maintained for a +non-vectored IO request. +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_prep_read (3), +.BR io_uring_prep_readv2 (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_readv2.3 b/man/io_uring_prep_readv2.3 new file mode 100644 index 0000000000000000000000000000000000000000..88a4bd4994f027f7bf2196d32f1e7b71584394f9 --- /dev/null +++ b/man/io_uring_prep_readv2.3 @@ -0,0 +1,111 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_readv2 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_readv2 \- prepare vector I/O read request with flags +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_readv2(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const struct iovec *" iovecs "," +.BI " unsigned " nr_vecs "," +.BI " __u64 " offset "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_readv2 (3) +prepares a vectored IO read request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start reading +.I nr_vecs +into the +.I iovecs +array at the specified +.IR offset . +The behavior of the function can be controlled with the +.I flags +parameter. + +Supported values for +.I flags +are: +.TP +.B RWF_HIPRI +High priority request, poll if possible +.TP +.B RWF_DSYNC +per-IO O_DSYNC +.TP +.B RWF_SYNC +per-IO O_SYNC +.TP +.B RWF_NOWAIT +per-IO, return +.B -EAGAIN +if operation would block +.TP +.B RWF_APPEND +per-IO O_APPEND + +.P +On files that support seeking, if the offset is set to +.BR -1 , +the read operation commences at the file offset, and the file offset is +incremented by the number of bytes read. See +.BR read (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature, if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the write has been prepared, it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +Unless an application explicitly needs to pass in more than iovec, it is more +efficient to use +.BR io_uring_prep_read (3) +rather than this function, as no state has to be maintained for a +non-vectored IO request. +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_prep_read (3), +.BR io_uring_prep_readv (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_recv.3 b/man/io_uring_prep_recv.3 new file mode 100644 index 0000000000000000000000000000000000000000..b3862369affa9dd9e83bd092f7973b60be484735 --- /dev/null +++ b/man/io_uring_prep_recv.3 @@ -0,0 +1,105 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_recv 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_recv \- prepare a recv request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_recv(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " void *" buf "," +.BI " size_t " len "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_recv_multishot(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " void *" buf "," +.BI " size_t " len "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_recv (3) +function prepares a recv request. The submission +queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +to start receiving the data into the buffer destination +.I buf +of size +.I size +and with modifier flags +.IR flags . + +This function prepares an async +.BR recv (2) +request. See that man page for details on the arguments specified to this +prep helper. + +The multishot version allows the application to issue a single receive request, +which repeatedly posts a CQE when data is available. It requires length to be 0 +, the +.B IOSQE_BUFFER_SELECT +flag to be set and no +.B MSG_WAITALL +flag to be set. +Therefore each CQE will take a buffer out of a provided buffer pool for receiving. +The application should check the flags of each CQE, regardless of it's result. +If a posted CQE does not have the +.B IORING_CQE_F_MORE +flag set then the multishot receive will be done and the application should issue a +new request. +Multishot variants are available since kernel 6.0. + + +After calling this function, additional io_uring internal modifier flags +may be set in the SQE +.I ioprio +field. The following flags are supported: +.TP +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently empty and attempting to +receive data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a receive of the data when the socket has data to be read. +This initial receive attempt can be wasteful for the case where the socket +is expected to be empty, setting this flag will bypass the initial receive +attempt and go straight to arming poll. If poll does indicate that data is +ready to be received, the operation will proceed. + +Can be used with the CQE +.B IORING_CQE_F_SOCK_NONEMPTY +flag, which io_uring will set on CQEs after a +.BR recv (2) +or +.BR recvmsg (2) +operation. If set, the socket still had data to be read after the operation +completed. Both these flags are available since 5.19. +.P + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR recv (2) diff --git a/man/io_uring_prep_recv_multishot.3 b/man/io_uring_prep_recv_multishot.3 new file mode 120000 index 0000000000000000000000000000000000000000..71fe277d6aeb7a7b82aafc49bf54d16ba0c6ed6d --- /dev/null +++ b/man/io_uring_prep_recv_multishot.3 @@ -0,0 +1 @@ +io_uring_prep_recv.3 \ No newline at end of file diff --git a/man/io_uring_prep_recvmsg.3 b/man/io_uring_prep_recvmsg.3 new file mode 100644 index 0000000000000000000000000000000000000000..65f324dfef9b5bb8f8dc5938e62dd367743c036e --- /dev/null +++ b/man/io_uring_prep_recvmsg.3 @@ -0,0 +1,124 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_recvmsg 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_recvmsg \- prepare a recvmsg request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_recvmsg(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " struct msghdr *" msg "," +.BI " unsigned " flags ");" +.PP +.BI "void io_uring_prep_recvmsg_multishot(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " struct msghdr *" msg "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_recvmsg (3) +function prepares a recvmsg request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start receiving the data indicated by +.I msg +with the +.BR recvmsg (2) +defined flags in the +.I flags +argument. + +This function prepares an async +.BR recvmsg (2) +request. See that man page for details on the arguments specified to this +prep helper. + +The multishot version allows the application to issue a single receive request, +which repeatedly posts a CQE when data is available. It requires the +.B IOSQE_BUFFER_SELECT +flag to be set and no +.B MSG_WAITALL +flag to be set. +Therefore each CQE will take a buffer out of a provided buffer pool for receiving. +The application should check the flags of each CQE, regardless of it's result. +If a posted CQE does not have the +.B IORING_CQE_F_MORE +flag set then the multishot receive will be done and the application should issue a +new request. + +Unlike +.BR recvmsg (2) +, multishot recvmsg will prepend a +.I struct io_uring_recvmsg_out +which describes the layout of the rest of the buffer in combination with the initial +.I struct msghdr +submitted with the request. See +.B io_uring_recvmsg_out (3) +for more information on accessing the data. + +Multishot variants are available since kernel 6.0. + +After calling this function, additional io_uring internal modifier flags +may be set in the SQE +.I ioprio +field. The following flags are supported: +.TP +.B IORING_RECVSEND_POLL_FIRST +If set, io_uring will assume the socket is currently empty and attempting to +receive data will be unsuccessful. For this case, io_uring will arm internal +poll and trigger a receive of the data when the socket has data to be read. +This initial receive attempt can be wasteful for the case where the socket +is expected to be empty, setting this flag will bypass the initial receive +attempt and go straight to arming poll. If poll does indicate that data is +ready to be received, the operation will proceed. + +Can be used with the CQE +.B IORING_CQE_F_SOCK_NONEMPTY +flag, which io_uring will set on CQEs after a +.BR recv (2) +or +.BR recvmsg (2) +operation. If set, the socket still had data to be read after the operation +completed. Both these flags are available since 5.19. +.P + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR recvmsg (2) diff --git a/man/io_uring_prep_recvmsg_multishot.3 b/man/io_uring_prep_recvmsg_multishot.3 new file mode 120000 index 0000000000000000000000000000000000000000..cd9566f2c2bea1f4e5cf43735594948f1ec09ebf --- /dev/null +++ b/man/io_uring_prep_recvmsg_multishot.3 @@ -0,0 +1 @@ +io_uring_prep_recvmsg.3 \ No newline at end of file diff --git a/man/io_uring_prep_remove_buffers.3 b/man/io_uring_prep_remove_buffers.3 new file mode 100644 index 0000000000000000000000000000000000000000..cf4f22640829ac17d27c4ad52b2392b4a21810af --- /dev/null +++ b/man/io_uring_prep_remove_buffers.3 @@ -0,0 +1,52 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_remove_buffers 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_remove_buffers \- prepare a remove buffers request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_remove_buffers(struct io_uring_sqe *" sqe "," +.BI " int " nr "," +.BI " int " bgid ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_remove_buffers (3) +function prepares a request for removing previously supplied buffers. The +submission queue entry +.I sqe +is setup to remove +.I nr +number of buffers from the buffer group ID indicated by +.IR bgid . + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.I res +will contain the number of successfully removed buffers. On error, +the following errors can occur. +.TP +.B -ENOMEM +The kernel was unable to allocate memory for the request. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. +.TP +.B -ENOENT +No buffers exist at the specified +.I bgid +buffer group ID. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR io_uring_prep_provide_buffers (3) diff --git a/man/io_uring_prep_rename.3 b/man/io_uring_prep_rename.3 new file mode 120000 index 0000000000000000000000000000000000000000..785b55eb0d63de34a32aa6bf03ebfb1124f7289a --- /dev/null +++ b/man/io_uring_prep_rename.3 @@ -0,0 +1 @@ +io_uring_prep_renameat.3 \ No newline at end of file diff --git a/man/io_uring_prep_renameat.3 b/man/io_uring_prep_renameat.3 new file mode 100644 index 0000000000000000000000000000000000000000..08d4a46efb27d8c64b9efc4a2ca171facaf85a3b --- /dev/null +++ b/man/io_uring_prep_renameat.3 @@ -0,0 +1,96 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_renameat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_renameat \- prepare a renameat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_renameat(struct io_uring_sqe *" sqe "," +.BI " int " olddirfd "," +.BI " const char *" oldpath "," +.BI " int " newdirfd "," +.BI " const char *" newpath "," +.BI " unsigned int " flags ");" +.PP +.BI "void io_uring_prep_rename(struct io_uring_sqe *" sqe "," +.BI " const char *" oldpath "," +.BI " const char *" newpath "," +.BI " unsigned int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_renameat (3) +function prepares a renameat request. The submission queue entry +.I sqe +is setup to use the old directory file descriptor pointed to by +.I olddirfd +and old path pointed to by +.I oldpath +with the new directory file descriptor pointed to by +.I newdirfd +and the new path pointed to by +.I newpath +and using the specified flags in +.IR flags . + +The +.BR io_uring_prep_rename (3) +function prepares a rename request. The submission queue entry +.I sqe +is setup to use the old path pointed to by +.I oldpath +with the new path pointed to by +.IR newpath , +both relative to the current working directory and using the specified flags in +.IR flags . + +These functions prepare an async +.BR renameat2 (2) +or +.BR rename (2) +request. If +.I flags +is zero, then this call is similar to the +.BR renameat (2) +system call. See those man pages for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR renameat (2), +.BR renameat2 (2), +.BR rename (2) diff --git a/man/io_uring_prep_send.3 b/man/io_uring_prep_send.3 new file mode 100644 index 0000000000000000000000000000000000000000..6f6bed03092c341abe62280b5364341d8d51dd4e --- /dev/null +++ b/man/io_uring_prep_send.3 @@ -0,0 +1,57 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_send 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_send \- prepare a send request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_send(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " const void *" buf "," +.BI " size_t " len "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_send (3) +function prepares a send request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +to start sending the data from +.I buf +of size +.I len +bytes and with modifier flags +.IR flags . + +This function prepares an async +.BR send (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR send (2) diff --git a/man/io_uring_prep_send_zc.3 b/man/io_uring_prep_send_zc.3 new file mode 100644 index 0000000000000000000000000000000000000000..0b655f99d8ee6bf3a4546adb32c9060cf7487d9b --- /dev/null +++ b/man/io_uring_prep_send_zc.3 @@ -0,0 +1,64 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_send_zc 3 "September 6, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_prep_send_zc \- prepare a zerocopy send request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_send_zc(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " const void *" buf "," +.BI " size_t " len "," +.BI " int " flags "," +.BI " int " zc_flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_send_zc (3) +function prepares a zerocopy send request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +to start sending the data from +.I buf +of size +.I len +bytes with send modifier flags +.IR flags +and zerocopy modifier flags +.IR zc_flags . + +This function prepares an async zerocopy +.BR send (2) +request. See that man page for details. For details on the zerocopy nature +of it, see +.BR io_uring_enter (2) . + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_prep_send (3), +.BR io_uring_enter (2), +.BR send (2) diff --git a/man/io_uring_prep_sendmsg.3 b/man/io_uring_prep_sendmsg.3 new file mode 100644 index 0000000000000000000000000000000000000000..bc81d91252bdd82e83855c81841e2b880ce633d5 --- /dev/null +++ b/man/io_uring_prep_sendmsg.3 @@ -0,0 +1,69 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_sendmsg 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_sendmsg \- prepare a sendmsg request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_sendmsg(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const struct msghdr *" msg "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_sendmsg (3) +function prepares a sendmsg request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start sending the data indicated by +.I msg +with the +.BR sendmsg (2) +defined flags in the +.I flags +argument. + +This function prepares an async +.BR sendmsg (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR sendmsg (2) diff --git a/man/io_uring_prep_shutdown.3 b/man/io_uring_prep_shutdown.3 new file mode 100644 index 0000000000000000000000000000000000000000..9125e95f5cd174b4279d0c87701b2fbc32b91b05 --- /dev/null +++ b/man/io_uring_prep_shutdown.3 @@ -0,0 +1,53 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_shutdown 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_shutdown \- prepare a shutdown request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_shutdown(struct io_uring_sqe *" sqe "," +.BI " int " sockfd "," +.BI " int " how ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_shutdown (3) +function prepares a shutdown request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I sockfd +that should be shutdown with the +.I how +argument. + +This function prepares an async +.BR shutdown (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR shutdown (2) diff --git a/man/io_uring_prep_socket.3 b/man/io_uring_prep_socket.3 new file mode 100644 index 0000000000000000000000000000000000000000..8c15a901b3f87346092a55e3b75c79c1adeb569b --- /dev/null +++ b/man/io_uring_prep_socket.3 @@ -0,0 +1,118 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_socket 3 "May 27, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_socket \- prepare a socket creation request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_socket(struct io_uring_sqe *" sqe "," +.BI " int " domain "," +.BI " int " type "," +.BI " int " protocol "," +.BI " unsigned int " flags ");" +.PP +.BI "void io_uring_prep_socket_direct(struct io_uring_sqe *" sqe "," +.BI " int " domain "," +.BI " int " type "," +.BI " int " protocol "," +.BI " unsigned int " file_index "," +.BI " unsigned int " flags ");" +.PP +.BI "void io_uring_prep_socket_direct_alloc(struct io_uring_sqe *" sqe "," +.BI " int " domain "," +.BI " int " type "," +.BI " int " protocol "," +.BI " unsigned int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_socket (3) +function prepares a socket creation request. The submission queue entry +.I sqe +is setup to use the communication domain defined by +.I domain +and use the communication type defined by +.I type +and the protocol set by +.IR protocol . +The +.I flags +argument are currently unused. + +The +.BR io_uring_prep_socket_direct (3) +helper works just like +.BR io_uring_prep_socket (3), +except it maps the socket to a direct descriptor rather than return a normal +file descriptor. The +.I file_index +argument should be set to the slot that should be used for this socket. + +The +.BR io_uring_prep_socket_direct_alloc (3) +helper works just like +.BR io_uring_prep_socket_alloc (3), +except it allocates a new direct descriptor rather than pass a free slot in. It +is equivalent to using +.BR io_uring_prep_socket_direct (3) +with +.B IORING_FILE_INDEX_ALLOC +as the +.I +file_index . +Upon completion, the +.I res +field of the CQE will return the direct slot that was allocated for the +socket. + +If the direct variants are used, the application must first have registered +a file table using +.BR io_uring_register_files (3) +of the appropriate size. Once registered, a direct socket request may use any +entry in that table, as long as it is within the size of the registered table. +If a specified entry already contains a file, the file will first be removed +from the table and closed. It's consistent with the behavior of updating an +existing file with +.BR io_uring_register_files_update (3). + +For a direct descriptor socket request, the +.I file_index +argument can be set to +.BR IORING_FILE_INDEX_ALLOC , +In this case a free entry in io_uring file table will +be used automatically and the file index will be returned as CQE +.IR res . +.B -ENFILE +is otherwise returned if there is no free entries in the io_uring file table. + +These functions prepare an async +.BR socket (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR socket (2) diff --git a/man/io_uring_prep_socket_direct.3 b/man/io_uring_prep_socket_direct.3 new file mode 120000 index 0000000000000000000000000000000000000000..15d7b7f08e31745cdb639019be1725eb1b3a6692 --- /dev/null +++ b/man/io_uring_prep_socket_direct.3 @@ -0,0 +1 @@ +io_uring_prep_socket.3 \ No newline at end of file diff --git a/man/io_uring_prep_socket_direct_alloc.3 b/man/io_uring_prep_socket_direct_alloc.3 new file mode 120000 index 0000000000000000000000000000000000000000..15d7b7f08e31745cdb639019be1725eb1b3a6692 --- /dev/null +++ b/man/io_uring_prep_socket_direct_alloc.3 @@ -0,0 +1 @@ +io_uring_prep_socket.3 \ No newline at end of file diff --git a/man/io_uring_prep_splice.3 b/man/io_uring_prep_splice.3 new file mode 100644 index 0000000000000000000000000000000000000000..cb82ad00631e9cc299f32d17fc21a6b6f7b83580 --- /dev/null +++ b/man/io_uring_prep_splice.3 @@ -0,0 +1,80 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_splice 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_splice \- prepare an splice request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_splice(struct io_uring_sqe *" sqe "," +.BI " int " fd_in "," +.BI " int64_t " off_in "," +.BI " int " fd_out "," +.BI " int64_t " off_out "," +.BI " unsigned int " nbytes "," +.BI " unsigned int " splice_flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_splice (3) +function prepares a splice request. The submission queue entry +.I sqe +is setup to use as input the file descriptor +.I fd_in +at offset +.IR off_in , +splicing data to the file descriptor at +.I fd_out +and at offset +.IR off_out . +.I nbytes +bytes of data should be spliced between the two descriptors. +.I splice_flags +are modifier flags for the operation. See +.BR splice (2) +for the generic splice flags. + +If the +.I fd_out +descriptor, +.B IOSQE_FIXED_FILE +can be set in the SQE to indicate that. For the input file, the io_uring +specific +.B SPLICE_F_FD_IN_FIXED +can be set in +.I splice_flags +and +.I fd_in +given as a registered file descriptor offset. + +This function prepares an async +.BR splice (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR splice (2) diff --git a/man/io_uring_prep_statx.3 b/man/io_uring_prep_statx.3 new file mode 100644 index 0000000000000000000000000000000000000000..d9d983a0470f444c2222fb70d090718940026aa4 --- /dev/null +++ b/man/io_uring_prep_statx.3 @@ -0,0 +1,74 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_statx 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_statx \- prepare a statx request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_statx(struct io_uring_sqe *" sqe "," +.BI " int " dirfd "," +.BI " const char *" path "," +.BI " int " flags "," +.BI " unsigned " mask "," +.BI " struct statx *" statxbuf ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_statx (3) +function prepares a statx request. The submission queue entry +.I sqe +is setup to use the directory file descriptor pointed to by +.I dirfd +to start a statx operation on the path identified by +.I path +and using the flags given in +.I flags +for the fields specified by +.I mask +and into the buffer located at +.IR statxbuf . + +This function prepares an async +.BR statx (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR statx (2) diff --git a/man/io_uring_prep_symlink.3 b/man/io_uring_prep_symlink.3 new file mode 120000 index 0000000000000000000000000000000000000000..ae6f41a25c1207d19598e2c5c11d59503d0550ef --- /dev/null +++ b/man/io_uring_prep_symlink.3 @@ -0,0 +1 @@ +io_uring_prep_symlinkat.3 \ No newline at end of file diff --git a/man/io_uring_prep_symlinkat.3 b/man/io_uring_prep_symlinkat.3 new file mode 100644 index 0000000000000000000000000000000000000000..0fa7056636c8b216a3f89e1dc8b77b84920cf001 --- /dev/null +++ b/man/io_uring_prep_symlinkat.3 @@ -0,0 +1,85 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_symlinkat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_symlinkat \- prepare a symlinkat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_symlinkat(struct io_uring_sqe *" sqe "," +.BI " const char *" target "," +.BI " int " newdirfd "," +.BI " const char *" linkpath ");" +.PP +.BI "void io_uring_prep_symlink(struct io_uring_sqe *" sqe "," +.BI " const char *" target "," +.BI " const char *" linkpath ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_symlinkat (3) +function prepares a symlinkat request. The submission queue entry +.I sqe +is setup to symlink the target path pointed to by +.I target +to the new destination indicated by +.I newdirfd +and +.IR linkpath . + +The +.BR io_uring_prep_symlink (3) +function prepares a symlink request. The submission queue entry +.I sqe +is setup to symlink the target path pointed to by +.I target +to the new destination indicated by +.I linkpath +relative to the the current working directory. This function prepares an async +.BR symlink (2) +request. See that man page for details. + +These functions prepare an async +.BR symlinkat (2) +or +.BR symlink (2) +request. See those man pages for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR symlinkat (2), +.BR symlink (2) diff --git a/man/io_uring_prep_sync_file_range.3 b/man/io_uring_prep_sync_file_range.3 new file mode 100644 index 0000000000000000000000000000000000000000..830e4115064d6bb272a519b99d9a8d042731885b --- /dev/null +++ b/man/io_uring_prep_sync_file_range.3 @@ -0,0 +1,59 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_sync_file_range 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_sync_file_range \- prepare a sync_file_range request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_sync_file_range(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " unsigned " len "," +.BI " __u64 " offset "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_sync_file_range (3) +function prepares a sync_file_range request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +that should get +.I len +bytes synced started at offset +.I offset +and with modifier flags in the +.I flags +argument. + +This function prepares an async +.BR sync_file_range (2) +request. See that man page for details on the arguments. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR sync_file_range (2) diff --git a/man/io_uring_prep_tee.3 b/man/io_uring_prep_tee.3 new file mode 100644 index 0000000000000000000000000000000000000000..44aaaf60313ab216e1263c951a4df083723ddc1e --- /dev/null +++ b/man/io_uring_prep_tee.3 @@ -0,0 +1,74 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_tee 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_tee \- prepare a tee request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_tee(struct io_uring_sqe *" sqe "," +.BI " int " fd_in "," +.BI " int " fd_out "," +.BI " unsigned int " nbytes "," +.BI " unsigned int " splice_flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_tee (3) +function prepares a tee request. The submission queue entry +.I sqe +is setup to use as input the file descriptor +.I fd_in +and as output the file descriptor +.I fd_out +duplicating +.I nbytes +bytes worth of data. +.I splice_flags +are modifier flags for the operation. See +.BR tee (2) +for the generic splice flags. + +If the +.I fd_out +descriptor, +.B IOSQE_FIXED_FILE +can be set in the SQE to indicate that. For the input file, the io_uring +specific +.B SPLICE_F_FD_IN_FIXED +can be set and +.I fd_in +given as a registered file descriptor offset. + +This function prepares an async +.BR tee (2) +request. See that man page for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_register (2), +.BR splice (2), +.BR tee (2) diff --git a/man/io_uring_prep_timeout.3 b/man/io_uring_prep_timeout.3 new file mode 100644 index 0000000000000000000000000000000000000000..bfb8791fb66803b2e631ee98bacfbf8ea6d40d17 --- /dev/null +++ b/man/io_uring_prep_timeout.3 @@ -0,0 +1,95 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_poll_timeout 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_timeoute \- prepare a timeout request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_timeout(struct io_uring_sqe *" sqe "," +.BI " struct __kernel_timespec *" ts "," +.BI " unsigned " count "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_timeout (3) +function prepares a timeout request. The submission queue entry +.I sqe +is setup to arm a timeout specified by +.I ts +and with a timeout count of +.I count +completion entries. The +.I flags +argument holds modifier flags for the request. + +This request type can be used as a timeout waking anyone sleeping +for events on the CQ ring. The +.I flags +argument may contain: +.TP +.B IORING_TIMEOUT_ABS +The value specified in +.I ts +is an absolute value rather than a relative one. +.TP +.B IORING_TIMEOUT_BOOTTIME +The boottime clock source should be used. +.TP +.B IORING_TIMEOUT_REALTIME +The realtime clock source should be used. +.TP +.B IORING_TIMEOUT_ETIME_SUCCESS +Consider an expired timeout a success in terms of the posted completion. +Normally a timeout that triggers would return in a +.B -ETIME +CQE +.I res +value. +.PP +The timeout completion event will trigger if either the specified timeout +has occurred, or the specified number of events to wait for have been posted +to the CQ ring. + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.B 0 +is returned. +.TP +.B -ETIME +The specified timeout occurred and triggered the completion event. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. For example, two clocksources +where given, or the specified timeout seconds or nanoseconds where < 0. +.TP +.B -EFAULT +io_uring was unable to access the data specified by +.IR ts . +.TP +.B -ECANCELED +The timeout was canceled by a removal request. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_prep_timeout_remove (3), +.BR io_uring_prep_timeout_update (3) diff --git a/man/io_uring_prep_timeout_remove.3 b/man/io_uring_prep_timeout_remove.3 new file mode 120000 index 0000000000000000000000000000000000000000..5aebd36851249de2158b982d4d652a9e28575551 --- /dev/null +++ b/man/io_uring_prep_timeout_remove.3 @@ -0,0 +1 @@ +io_uring_prep_timeout_update.3 \ No newline at end of file diff --git a/man/io_uring_prep_timeout_update.3 b/man/io_uring_prep_timeout_update.3 new file mode 100644 index 0000000000000000000000000000000000000000..cb9ed12ce977111b9ebb9666ea1824ab4a7748e9 --- /dev/null +++ b/man/io_uring_prep_timeout_update.3 @@ -0,0 +1,98 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_poll_timeout_update 3 "March 12, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_timeoute_update \- prepare a request to update an existing timeout +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_timeout_update(struct io_uring_sqe *" sqe "," +.BI " struct __kernel_timespec *" ts "," +.BI " __u64 " user_data "," +.BI " unsigned " flags ");" +.PP +.BI "void io_uring_prep_timeout_remove(struct io_uring_sqe *" sqe "," +.BI " __u64 " user_data "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +These functions modify or cancel an existing timeout request. The submission +queue entry +.I sqe +is setup to arm a timeout update or removal specified by +.I user_data +and with modifier flags given by +.IR flags . +Additionally, the update request includes a +.I ts +structure, which contains new timeout information. + +For an update request, the +.I flags +member may contain a bitmask of the following values: +.TP +.B IORING_TIMEOUT_ABS +The value specified in +.I ts +is an absolute value rather than a relative one. +.TP +.B IORING_TIMEOUT_BOOTTIME +The boottime clock source should be used. +.TP +.B IORING_TIMEOUT_REALTIME +The realtime clock source should be used. +.TP +.B IORING_TIMEOUT_ETIME_SUCCESS +Consider an expired timeout a success in terms of the posted completion. +Normally a timeout that triggers would return in a +.B -ETIME +CQE +.I res +value. +.PP + +.SH RETURN VALUE +None +.SH ERRORS +These are the errors that are reported in the CQE +.I res +field. On success, +.B 0 +is returned. +.TP +.B -ENOENT +The timeout identified by +.I user_data +could not be found. It may be invalid, or triggered before the update or +removal request was processed. +.TP +.B -EALREADY +The timeout identified by +.I user_data +is already firing and cannot be canceled. +.TP +.B -EINVAL +One of the fields set in the SQE was invalid. For example, two clocksources +where given, or the specified timeout seconds or nanoseconds where < 0. +.TP +.B -EFAULT +io_uring was unable to access the data specified by +.IR ts . +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_prep_timeout (3) diff --git a/man/io_uring_prep_unlink.3 b/man/io_uring_prep_unlink.3 new file mode 120000 index 0000000000000000000000000000000000000000..80f86d2dd9738ce58f2be0aec1e7a7ebeb4061d3 --- /dev/null +++ b/man/io_uring_prep_unlink.3 @@ -0,0 +1 @@ +io_uring_prep_unlinkat.3 \ No newline at end of file diff --git a/man/io_uring_prep_unlinkat.3 b/man/io_uring_prep_unlinkat.3 new file mode 100644 index 0000000000000000000000000000000000000000..ba2633cf0b91509ea4c6e44ce734032e0e2ff712 --- /dev/null +++ b/man/io_uring_prep_unlinkat.3 @@ -0,0 +1,82 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_unlinkat 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_prep_unlinkat \- prepare an unlinkat request +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include +.PP +.BI "void io_uring_prep_unlinkat(struct io_uring_sqe *" sqe "," +.BI " int " dirfd "," +.BI " const char *" path "," +.BI " int " flags ");" +.PP +.BI "void io_uring_prep_unlink(struct io_uring_sqe *" sqe "," +.BI " const char *" path "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_unlinkat (3) +function prepares an unlinkat request. The submission queue entry +.I sqe +is setup to use the directory file descriptor pointed to by +.I dirfd +to start an unlinkat operation on the path identified by +.I path +and using the flags given in +.IR flags . + +The +.BR io_uring_prep_unlink (3) +function prepares an unlink request. The submission queue entry +.I sqe +is setup to start an unlinkat operation on the path identified by +.I path +relative to the current working directory and using the flags given in +.IR flags . + +These functions prepare an async +.BR unlinkat (2) +or +.BR unlink (2) +request. See those man pages for details. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR unlinkat (2), +.BR unlink (2) diff --git a/man/io_uring_prep_write.3 b/man/io_uring_prep_write.3 new file mode 100644 index 0000000000000000000000000000000000000000..794361f05ff7b581f8894ddbda9d1d28429a3525 --- /dev/null +++ b/man/io_uring_prep_write.3 @@ -0,0 +1,67 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_write 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_write \- prepare I/O write request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_write(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const void *" buf "," +.BI " unsigned " nbytes "," +.BI " __u64 " offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_write (3) +prepares an IO write request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start writing +.I nbytes +from the buffer +.I buf +at the specified +.IR offset . + +On files that support seeking, if the offset is set to +.BR -1 , +the write operation commences at the file offset, and the file offset is +incremented by the number of bytes written. See +.BR write (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the write has been prepared, it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_write_fixed.3 b/man/io_uring_prep_write_fixed.3 new file mode 100644 index 0000000000000000000000000000000000000000..54326bc4c7e9defe2d4ff0e391a3d1b173462807 --- /dev/null +++ b/man/io_uring_prep_write_fixed.3 @@ -0,0 +1,72 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_write 3 "February 13, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_write_fixed \- prepare I/O write request with registered buffer +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_prep_write_fixed(struct io_uring_sqe *" sqe "," +.BI " int " fd ", +.BI " const void *" buf "," +.BI " unsigned " nbytes "," +.BI " __u64 " offset "," +.BI " int " buf_index ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_write_fixed (3) +prepares an IO write request with a previously registered IO buffer. The +submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start writing +.I nbytes +from the buffer +.I buf +at the specified +.I offset +and with the buffer matching the registered index of +.IR buf_index . + +This works just like +.BR io_uring_prep_write (3) +except it requires the use of buffers that have been registered with +.BR io_uring_register_buffers (3). +The +.I buf +and +.I nbytes +arguments must fall within a region specified by +.I buf_index +in the previously registered buffer. The buffer need not be aligned with +the start of the registered buffer. + +After the read has been prepared it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH SEE ALSO +.BR io_uring_prep_write (3), +.BR io_uring_register_buffers (3) diff --git a/man/io_uring_prep_writev.3 b/man/io_uring_prep_writev.3 new file mode 100644 index 0000000000000000000000000000000000000000..0a442c207ecf0300cadd9a80d31dc5ba344d008d --- /dev/null +++ b/man/io_uring_prep_writev.3 @@ -0,0 +1,85 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_writev 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_writev \- prepare vector I/O write request +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_writev(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const struct iovec *" iovecs "," +.BI " unsigned " nr_vecs "," +.BI " __u64 " offset ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_writev (3) +prepares a vectored IO write request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start writing +.I nr_vecs +from the +.I iovecs +array at the specified +.IR offset . + +On files that support seeking, if the offset is set to +.BR -1 , +the write operation commences at the file offset, and the file offset is +incremented by the number of bytes written. See +.BR write (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the write has been prepared it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +Unless an application explicitly needs to pass in more than iovec, it is more +efficient to use +.BR io_uring_prep_write (3) +rather than this function, as no state has to be maintained for a +non-vectored IO request. +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_prep_write (3), +.BR io_uring_prep_writev2 (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_prep_writev2.3 b/man/io_uring_prep_writev2.3 new file mode 100644 index 0000000000000000000000000000000000000000..2431b652a81f3bbd0136b41c918ac8b2f16a548a --- /dev/null +++ b/man/io_uring_prep_writev2.3 @@ -0,0 +1,111 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_prep_writev2 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_prep_writev2 \- prepare vector I/O write request with flags +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "void io_uring_prep_writev2(struct io_uring_sqe *" sqe "," +.BI " int " fd "," +.BI " const struct iovec *" iovecs "," +.BI " unsigned " nr_vecs "," +.BI " __u64 " offset "," +.BI " int " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_writev2 (3) +prepares a vectored IO write request. The submission queue entry +.I sqe +is setup to use the file descriptor +.I fd +to start writing +.I nr_vecs +from the +.I iovecs +array at the specified +.IR offset . +The behavior of the function can be controlled with the +.I flags +parameter. + +Supported values for +.I flags +are: +.TP +.B RWF_HIPRI +High priority request, poll if possible +.TP +.B RWF_DSYNC +per-IO O_DSYNC +.TP +.B RWF_SYNC +per-IO O_SYNC +.TP +.B RWF_NOWAIT +per-IO, return +.B -EAGAIN +if operation would block +.TP +.B RWF_APPEND +per-IO O_APPEND + +.P +On files that support seeking, if the offset is set to +.BR -1 , +the write operation commences at the file offset, and the file offset is +incremented by the number of bytes written. See +.BR write (2) +for more details. Note that for an async API, reading and updating the +current file offset may result in unpredictable behavior, unless access +to the file is serialized. It is not encouraged to use this feature if it's +possible to provide the desired IO offset from the application or library. + +On files that are not capable of seeking, the offset must be 0 or -1. + +After the write has been prepared, it can be submitted with one of the submit +functions. + +.SH RETURN VALUE +None +.SH ERRORS +The CQE +.I res +field will contain the result of the operation. See the related man page for +details on possible values. Note that where synchronous system calls will return +.B -1 +on failure and set +.I errno +to the actual error value, io_uring never uses +.IR errno . +Instead it returns the negated +.I errno +directly in the CQE +.I res +field. +.SH NOTES +Unless an application explicitly needs to pass in more than iovec, it is more +efficient to use +.BR io_uring_prep_write (3) +rather than this function, as no state has to be maintained for a +non-vectored IO request. +As with any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_prep_write (3), +.BR io_uring_prep_writev (3), +.BR io_uring_submit (3) diff --git a/man/io_uring_queue_exit.3 b/man/io_uring_queue_exit.3 new file mode 100644 index 0000000000000000000000000000000000000000..00f8ae9b1636aa30ba733dc155d127c8533210fc --- /dev/null +++ b/man/io_uring_queue_exit.3 @@ -0,0 +1,26 @@ +.\" Copyright (C) 2020 Jens Axboe +.\" Copyright (C) 2020 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_queue_exit 3 "July 10, 2020" "liburing-0.7" "liburing Manual" +.SH NAME +io_uring_queue_exit \- tear down io_uring submission and completion queues +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_queue_exit(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_queue_exit (3) +will release all resources acquired and initialized by +.BR io_uring_queue_init (3). +It first unmaps the memory shared between the application and the kernel and then closes the io_uring file descriptor. +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_setup (2), +.BR mmap (2), +.BR io_uring_queue_init (3) diff --git a/man/io_uring_queue_init.3 b/man/io_uring_queue_init.3 new file mode 100644 index 0000000000000000000000000000000000000000..086b70f0300de99165512d89a2dbde888d9b6bfa --- /dev/null +++ b/man/io_uring_queue_init.3 @@ -0,0 +1,89 @@ +.\" Copyright (C) 2020 Jens Axboe +.\" Copyright (C) 2020 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_queue_init 3 "July 10, 2020" "liburing-0.7" "liburing Manual" +.SH NAME +io_uring_queue_init \- setup io_uring submission and completion queues +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_queue_init(unsigned " entries "," +.BI " struct io_uring *" ring "," +.BI " unsigned " flags ");" +.PP +.BI "int io_uring_queue_init_params(unsigned " entries "," +.BI " struct io_uring *" ring "," +.BI " struct io_uring_params *" params ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_queue_init (3) +function executes the +.BR io_uring_setup (2) +system call to initialize the submission and completion queues in the kernel +with at least +.I entries +entries in the submission queue and then maps the resulting file descriptor to +memory shared between the application and the kernel. + +By default, the CQ ring will have twice the number of entries as specified by +.I entries +for the SQ ring. This is adequate for regular file or storage workloads, but +may be too small networked workloads. The SQ ring entries do not impose a limit +on the number of in-flight requests that the ring can support, it merely limits +the number that can be submitted to the kernel in one go (batch). if the CQ +ring overflows, e.g. more entries are generated than fits in the ring before the +application can reap them, then the ring enters a CQ ring overflow state. This +is indicated by +.B IORING_SQ_CQ_OVERFLOW +being set in the SQ ring flags. Unless the kernel runs out of available memory, +entries are not dropped, but it is a much slower completion path and will slow +down request processing. For that reason it should be avoided and the CQ +ring sized appropriately for the workload. Setting +.I cq_entries +in +.I struct io_uring_params +will tell the kernel to allocate this many entries for the CQ ring, independent +of the SQ ring size in given in +.IR entries . +If the value isn't a power of 2, it will be rounded up to the nearest power of +2. + +On success, +.BR io_uring_queue_init (3) +returns 0 and +.I ring +will point to the shared memory containing the io_uring queues. On failure +.BR -errno +is returned. + +.I flags +will be passed through to the io_uring_setup syscall (see +.BR io_uring_setup (2)). + +If the +.BR io_uring_queue_init_params (3) +variant is used, then the parameters indicated by +.I params +will be passed straight through to the +.BR io_uring_setup (2) +system call. + +On success, the resources held by +.I ring +should be released via a corresponding call to +.BR io_uring_queue_exit (3). +.SH RETURN VALUE +.BR io_uring_queue_init (3) +returns 0 on success and +.BR -errno +on failure. +.SH SEE ALSO +.BR io_uring_setup (2), +.BR io_uring_register_ring_fd (3), +.BR mmap (2), +.BR io_uring_queue_exit (3) diff --git a/man/io_uring_queue_init_params.3 b/man/io_uring_queue_init_params.3 new file mode 120000 index 0000000000000000000000000000000000000000..c91609e5f561532479475a82bb9c9c4bc4d6f453 --- /dev/null +++ b/man/io_uring_queue_init_params.3 @@ -0,0 +1 @@ +io_uring_queue_init.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_cmsg_firsthdr.3 b/man/io_uring_recvmsg_cmsg_firsthdr.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_cmsg_firsthdr.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_cmsg_nexthdr.3 b/man/io_uring_recvmsg_cmsg_nexthdr.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_cmsg_nexthdr.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_name.3 b/man/io_uring_recvmsg_name.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_name.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_out.3 b/man/io_uring_recvmsg_out.3 new file mode 100644 index 0000000000000000000000000000000000000000..60f92619d5663ec949a086e1f7009806c1f01874 --- /dev/null +++ b/man/io_uring_recvmsg_out.3 @@ -0,0 +1,78 @@ +.\" Copyright (C), 2022 Dylan Yudaken +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_recvmsg_out 3 "Julyu 26, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_recvmsg_out - access data from multishot recvmsg +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "struct io_uring_recvmsg_out *io_uring_recvmsg_validate(void *" buf "," +.BI " int " buf_len "," +.BI " struct msghdr *" msgh ");" +.PP +.BI "void *io_uring_recvmsg_name(struct io_uring_recvmsg_out *" o ");" +.PP +.BI "struct cmsghdr *io_uring_recvmsg_cmsg_firsthdr(struct io_uring_recvmsg_out * " o "," +.BI " struct msghdr *" msgh ");" +.BI "struct cmsghdr *io_uring_recvmsg_cmsg_nexthdr(struct io_uring_recvmsg_out * " o "," +.BI " struct msghdr *" msgh "," +.BI " struct cmsghdr *" cmsg ");" +.PP +.BI "void *io_uring_recvmsg_payload(struct io_uring_recvmsg_out * " o "," +.BI " struct msghdr *" msgh ");" +.BI "unsigned int io_uring_recvmsg_payload_length(struct io_uring_recvmsg_out *" o "," +.BI " int " buf_len "," +.BI " struct msghdr *" msgh ");" +.PP +.fi + +.SH DESCRIPTION + +These functions are used to access data in the payload delivered by +.BR io_uring_prep_recv_multishot (3) +. +.PP +.BR io_uring_recvmsg_validate (3) +will validate a buffer delivered by +.BR io_uring_prep_recv_multishot (3) +and extract the +.I io_uring_recvmsg_out +if it is valid, returning a pointer to it or else NULL. +.PP +The structure is defined as follows: +.PP +.in +4n +.EX + +struct io_uring_recvmsg_out { + __u32 namelen; /* Name byte count as would have been populated + * by recvmsg(2) */ + __u32 controllen; /* Control byte count */ + __u32 payloadlen; /* Payload byte count as would have been returned + * by recvmsg(2) */ + __u32 flags; /* Flags result as would have been populated + * by recvmsg(2) */ +}; + +.IP * 3 +.BR io_uring_recvmsg_name (3) +returns a pointer to the name in the buffer. +.IP * +.BR io_uring_recvmsg_cmsg_firsthdr (3) +returns a pointer to the first cmsg in the buffer, or NULL. +.IP * +.BR io_uring_recvmsg_cmsg_nexthdr (3) +returns a pointer to the next cmsg in the buffer, or NULL. +.IP * +.BR io_uring_recvmsg_payload (3) +returns a pointer to the payload in the buffer. +.IP * +.BR io_uring_recvmsg_payload_length (3) +Calculates the usable payload length in bytes. + + +.SH "SEE ALSO" +.BR io_uring_prep_recv_multishot (3) diff --git a/man/io_uring_recvmsg_payload.3 b/man/io_uring_recvmsg_payload.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_payload.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_payload_length.3 b/man/io_uring_recvmsg_payload_length.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_payload_length.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_recvmsg_validate.3 b/man/io_uring_recvmsg_validate.3 new file mode 120000 index 0000000000000000000000000000000000000000..8eb17436288dbe999cb7d491dae1830fcb0fdf70 --- /dev/null +++ b/man/io_uring_recvmsg_validate.3 @@ -0,0 +1 @@ +io_uring_recvmsg_out.3 \ No newline at end of file diff --git a/man/io_uring_register.2 b/man/io_uring_register.2 new file mode 100644 index 0000000000000000000000000000000000000000..b34a1f6aa6821944df8d3a09245388b969321da3 --- /dev/null +++ b/man/io_uring_register.2 @@ -0,0 +1,830 @@ +.\" Copyright (C) 2019 Jens Axboe +.\" Copyright (C) 2019 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register 2 2019-01-17 "Linux" "Linux Programmer's Manual" +.SH NAME +io_uring_register \- register files or user buffers for asynchronous I/O +.SH SYNOPSIS +.nf +.BR "#include " +.PP +.BI "int io_uring_register(unsigned int " fd ", unsigned int " opcode , +.BI " void *" arg ", unsigned int " nr_args ); +.fi +.PP +.SH DESCRIPTION +.PP + +The +.BR io_uring_register (2) +system call registers resources (e.g. user buffers, files, eventfd, +personality, restrictions) for use in an +.BR io_uring (7) +instance referenced by +.IR fd . +Registering files or user buffers allows the kernel to take long term +references to internal data structures or create long term mappings of +application memory, greatly reducing per-I/O overhead. + +.I fd +is the file descriptor returned by a call to +.BR io_uring_setup (2). +.I opcode +can be one of: + +.TP +.B IORING_REGISTER_BUFFERS +.I arg +points to a +.I struct iovec +array of +.I nr_args +entries. The buffers associated with the iovecs will be locked in +memory and charged against the user's +.B RLIMIT_MEMLOCK +resource limit. See +.BR getrlimit (2) +for more information. Additionally, there is a size limit of 1GiB per +buffer. Currently, the buffers must be anonymous, non-file-backed +memory, such as that returned by +.BR malloc (3) +or +.BR mmap (2) +with the +.B MAP_ANONYMOUS +flag set. It is expected that this limitation will be lifted in the +future. Huge pages are supported as well. Note that the entire huge +page will be pinned in the kernel, even if only a portion of it is +used. + +After a successful call, the supplied buffers are mapped into the +kernel and eligible for I/O. To make use of them, the application +must specify the +.B IORING_OP_READ_FIXED +or +.B IORING_OP_WRITE_FIXED +opcodes in the submission queue entry (see the +.I struct io_uring_sqe +definition in +.BR io_uring_enter (2)), +and set the +.I buf_index +field to the desired buffer index. The memory range described by the +submission queue entry's +.I addr +and +.I len +fields must fall within the indexed buffer. + +It is perfectly valid to setup a large buffer and then only use part +of it for an I/O, as long as the range is within the originally mapped +region. + +An application can increase or decrease the size or number of +registered buffers by first unregistering the existing buffers, and +then issuing a new call to +.BR io_uring_register (2) +with the new buffers. + +Note that before 5.13 registering buffers would wait for the ring to idle. +If the application currently has requests in-flight, the registration will +wait for those to finish before proceeding. + +An application need not unregister buffers explicitly before shutting +down the io_uring instance. Available since 5.1. + +.TP +.B IORING_REGISTER_BUFFERS2 +Register buffers for I/O. Similar to +.B IORING_REGISTER_BUFFERS +but aims to have a more extensible ABI. + +.I arg +points to a +.I struct io_uring_rsrc_register, +and +.I nr_args +should be set to the number of bytes in the structure. + +.PP +.in +8n +.EX +struct io_uring_rsrc_register { + __u32 nr; + __u32 resv; + __u64 resv2; + __aligned_u64 data; + __aligned_u64 tags; +}; + +.EE +.in +.PP + +.in +8n + +The +.I data +field contains a pointer to a +.I struct iovec +array of +.I nr +entries. +The +.I tags +field should either be 0, then tagging is disabled, or point to an array +of +.I nr +"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this +particular resource (a buffer in this case) is disabled. Otherwise, after the +resource had been unregistered and it's not used anymore, a CQE will be +posted with +.I user_data +set to the specified tag and all other fields zeroed. + +Note that resource updates, e.g. +.B IORING_REGISTER_BUFFERS_UPDATE, +don't necessarily deallocate resources by the time it returns, but they might +be held alive until all requests using it complete. + +Available since 5.13. + +.TP +.B IORING_REGISTER_BUFFERS_UPDATE +Updates registered buffers with new ones, either turning a sparse entry into +a real one, or replacing an existing entry. + +.I arg +must contain a pointer to a struct io_uring_rsrc_update2, which contains +an offset on which to start the update, and an array of +.I struct iovec. +.I tags +points to an array of tags. +.I nr +must contain the number of descriptors in the passed in arrays. +See +.B IORING_REGISTER_BUFFERS2 +for the resource tagging description. + +.PP +.in +8n +.EX + +struct io_uring_rsrc_update2 { + __u32 offset; + __u32 resv; + __aligned_u64 data; + __aligned_u64 tags; + __u32 nr; + __u32 resv2; +}; +.EE +.in +.PP + +.in +8n + +Available since 5.13. + +.TP +.B IORING_UNREGISTER_BUFFERS +This operation takes no argument, and +.I arg +must be passed as NULL. All previously registered buffers associated +with the io_uring instance will be released. Available since 5.1. + +.TP +.B IORING_REGISTER_FILES +Register files for I/O. +.I arg +contains a pointer to an array of +.I nr_args +file descriptors (signed 32 bit integers). + +To make use of the registered files, the +.B IOSQE_FIXED_FILE +flag must be set in the +.I flags +member of the +.IR "struct io_uring_sqe" , +and the +.I fd +member is set to the index of the file in the file descriptor array. + +The file set may be sparse, meaning that the +.B fd +field in the array may be set to +.B -1. +See +.B IORING_REGISTER_FILES_UPDATE +for how to update files in place. + +Note that before 5.13 registering files would wait for the ring to idle. +If the application currently has requests in-flight, the registration will +wait for those to finish before proceeding. See +.B IORING_REGISTER_FILES_UPDATE +for how to update an existing set without that limitation. + +Files are automatically unregistered when the io_uring instance is +torn down. An application needs only unregister if it wishes to +register a new set of fds. Available since 5.1. + +.TP +.B IORING_REGISTER_FILES2 +Register files for I/O. Similar to +.B IORING_REGISTER_FILES. + +.I arg +points to a +.I struct io_uring_rsrc_register, +and +.I nr_args +should be set to the number of bytes in the structure. + +The +.I data +field contains a pointer to an array of +.I nr +file descriptors (signed 32 bit integers). +.I tags +field should either be 0 or or point to an array of +.I nr +"tags" (unsigned 64 bit integers). See +.B IORING_REGISTER_BUFFERS2 +for more info on resource tagging. + +Note that resource updates, e.g. +.B IORING_REGISTER_FILES_UPDATE, +don't necessarily deallocate resources, they might be held until all requests +using that resource complete. + +Available since 5.13. + +.TP +.B IORING_REGISTER_FILES_UPDATE +This operation replaces existing files in the registered file set with new +ones, either turning a sparse entry (one where fd is equal to +.B -1 +) into a real one, removing an existing entry (new one is set to +.B -1 +), or replacing an existing entry with a new existing entry. + +.I arg +must contain a pointer to a +.I struct io_uring_files_update, +which contains +an offset on which to start the update, and an array of file descriptors to +use for the update. +.I nr_args +must contain the number of descriptors in the passed in array. Available +since 5.5. + +File descriptors can be skipped if they are set to +.B IORING_REGISTER_FILES_SKIP. +Skipping an fd will not touch the file associated with the previous +fd at that index. Available since 5.12. + +.TP +.B IORING_REGISTER_FILES_UPDATE2 +Similar to IORING_REGISTER_FILES_UPDATE, replaces existing files in the +registered file set with new ones, either turning a sparse entry (one where +fd is equal to +.B -1 +) into a real one, removing an existing entry (new one is set to +.B -1 +), or replacing an existing entry with a new existing entry. + +.I arg +must contain a pointer to a +.I struct io_uring_rsrc_update2, +which contains +an offset on which to start the update, and an array of file descriptors to +use for the update stored in +.I data. +.I tags +points to an array of tags. +.I nr +must contain the number of descriptors in the passed in arrays. +See +.B IORING_REGISTER_BUFFERS2 +for the resource tagging description. + +Available since 5.13. + +.TP +.B IORING_UNREGISTER_FILES +This operation requires no argument, and +.I arg +must be passed as NULL. All previously registered files associated +with the io_uring instance will be unregistered. Available since 5.1. + +.TP +.B IORING_REGISTER_EVENTFD +It's possible to use eventfd(2) to get notified of completion events on an +io_uring instance. If this is desired, an eventfd file descriptor can be +registered through this operation. +.I arg +must contain a pointer to the eventfd file descriptor, and +.I nr_args +must be 1. Note that while io_uring generally takes care to avoid spurious +events, they can occur. Similarly, batched completions of CQEs may only trigger +a single eventfd notification even if multiple CQEs are posted. The application +should make no assumptions on number of events being available having a direct +correlation to eventfd notifications posted. An eventfd notification must thus +only be treated as a hint to check the CQ ring for completions. Available since +5.2. + +An application can temporarily disable notifications, coming through the +registered eventfd, by setting the +.B IORING_CQ_EVENTFD_DISABLED +bit in the +.I flags +field of the CQ ring. +Available since 5.8. + +.TP +.B IORING_REGISTER_EVENTFD_ASYNC +This works just like +.B IORING_REGISTER_EVENTFD +, except notifications are only posted for events that complete in an async +manner. This means that events that complete inline while being submitted +do not trigger a notification event. The arguments supplied are the same as +for +.B IORING_REGISTER_EVENTFD. +Available since 5.6. + +.TP +.B IORING_UNREGISTER_EVENTFD +Unregister an eventfd file descriptor to stop notifications. Since only one +eventfd descriptor is currently supported, this operation takes no argument, +and +.I arg +must be passed as NULL and +.I nr_args +must be zero. Available since 5.2. + +.TP +.B IORING_REGISTER_PROBE +This operation returns a structure, io_uring_probe, which contains information +about the opcodes supported by io_uring on the running kernel. +.I arg +must contain a pointer to a struct io_uring_probe, and +.I nr_args +must contain the size of the ops array in that probe struct. The ops array +is of the type io_uring_probe_op, which holds the value of the opcode and +a flags field. If the flags field has +.B IO_URING_OP_SUPPORTED +set, then this opcode is supported on the running kernel. Available since 5.6. + +.TP +.B IORING_REGISTER_PERSONALITY +This operation registers credentials of the running application with io_uring, +and returns an id associated with these credentials. Applications wishing to +share a ring between separate users/processes can pass in this credential id +in the sqe +.B personality +field. If set, that particular sqe will be issued with these credentials. Must +be invoked with +.I arg +set to NULL and +.I nr_args +set to zero. Available since 5.6. + +.TP +.B IORING_UNREGISTER_PERSONALITY +This operation unregisters a previously registered personality with io_uring. +.I nr_args +must be set to the id in question, and +.I arg +must be set to NULL. Available since 5.6. + +.TP +.B IORING_REGISTER_ENABLE_RINGS +This operation enables an io_uring ring started in a disabled state +.RB (IORING_SETUP_R_DISABLED +was specified in the call to +.BR io_uring_setup (2)). +While the io_uring ring is disabled, submissions are not allowed and +registrations are not restricted. + +After the execution of this operation, the io_uring ring is enabled: +submissions and registration are allowed, but they will +be validated following the registered restrictions (if any). +This operation takes no argument, must be invoked with +.I arg +set to NULL and +.I nr_args +set to zero. Available since 5.10. + +.TP +.B IORING_REGISTER_RESTRICTIONS +.I arg +points to a +.I struct io_uring_restriction +array of +.I nr_args +entries. + +With an entry it is possible to allow an +.BR io_uring_register (2) +.I opcode, +or specify which +.I opcode +and +.I flags +of the submission queue entry are allowed, +or require certain +.I flags +to be specified (these flags must be set on each submission queue entry). + +All the restrictions must be submitted with a single +.BR io_uring_register (2) +call and they are handled as an allowlist (opcodes and flags not registered, +are not allowed). + +Restrictions can be registered only if the io_uring ring started in a disabled +state +.RB (IORING_SETUP_R_DISABLED +must be specified in the call to +.BR io_uring_setup (2)). + +Available since 5.10. + +.TP +.B IORING_REGISTER_IOWQ_AFF +By default, async workers created by io_uring will inherit the CPU mask of its +parent. This is usually all the CPUs in the system, unless the parent is being +run with a limited set. If this isn't the desired outcome, the application +may explicitly tell io_uring what CPUs the async workers may run on. +.I arg +must point to a +.B cpu_set_t +mask, and +.I nr_args +the byte size of that mask. + +Available since 5.14. + +.TP +.B IORING_UNREGISTER_IOWQ_AFF +Undoes a CPU mask previously set with +.B IORING_REGISTER_IOWQ_AFF. +Must not have +.I arg +or +.I nr_args +set. + +Available since 5.14. + +.TP +.B IORING_REGISTER_IOWQ_MAX_WORKERS +By default, io_uring limits the unbounded workers created to the maximum +processor count set by +.I RLIMIT_NPROC +and the bounded workers is a function of the SQ ring size and the number +of CPUs in the system. Sometimes this can be excessive (or too little, for +bounded), and this command provides a way to change the count per ring (per NUMA +node) instead. + +.I arg +must be set to an +.I unsigned int +pointer to an array of two values, with the values in the array being set to +the maximum count of workers per NUMA node. Index 0 holds the bounded worker +count, and index 1 holds the unbounded worker count. On successful return, the +passed in array will contain the previous maximum valyes for each type. If the +count being passed in is 0, then this command returns the current maximum values +and doesn't modify the current setting. +.I nr_args +must be set to 2, as the command takes two values. + +Available since 5.15. + +.TP +.B IORING_REGISTER_RING_FDS +Whenever +.BR io_uring_enter (2) +is called to submit request or wait for completions, the kernel must grab a +reference to the file descriptor. If the application using io_uring is threaded, +the file table is marked as shared, and the reference grab and put of the file +descriptor count is more expensive than it is for a non-threaded application. + +Similarly to how io_uring allows registration of files, this allow registration +of the ring file descriptor itself. This reduces the overhead of the +.BR io_uring_enter (2) +system call. + +.I arg +must be set to an unsigned int pointer to an array of type +.I struct io_uring_rsrc_register +of +.I nr_args +number of entries. The +.B data +field of this struct must point to an io_uring file descriptor, and the +.B offset +field can be either +.B -1 +or an explicit offset desired for the registered file descriptor value. If +.B -1 +is used, then upon successful return of this system call, the field will +contain the value of the registered file descriptor to be used for future +.BR io_uring_enter (2) +system calls. + +On successful completion of this request, the returned descriptors may be used +instead of the real file descriptor for +.BR io_uring_enter (2), +provided that +.B IORING_ENTER_REGISTERED_RING +is set in the +.I flags +for the system call. This flag tells the kernel that a registered descriptor +is used rather than a real file descriptor. + +Each thread or process using a ring must register the file descriptor directly +by issuing this request. + +The maximum number of supported registered ring descriptors is currently +limited to +.B 16. + +Available since 5.18. + +.TP +.B IORING_UNREGISTER_RING_FDS +Unregister descriptors previously registered with +.B IORING_REGISTER_RING_FDS. + +.I arg +must be set to an unsigned int pointer to an array of type +.I struct io_uring_rsrc_register +of +.I nr_args +number of entries. Only the +.B offset +field should be set in the structure, containing the registered file descriptor +offset previously returned from +.B IORING_REGISTER_RING_FDS +that the application wishes to unregister. + +Note that this isn't done automatically on ring exit, if the thread or task +that previously registered a ring file descriptor isn't exiting. It is +recommended to manually unregister any previously registered ring descriptors +if the ring is closed and the task persists. This will free up a registration +slot, making it available for future use. + +Available since 5.18. + +.TP +.B IORING_REGISTER_PBUF_RING +Registers a shared buffer ring to be used with provided buffers. This is a +newer alternative to using +.B IORING_OP_PROVIDE_BUFFERS +which is more efficient, to be used with request types that support the +.B IOSQE_BUFFER_SELECT +flag. + +The +.I arg +argument must be filled in with the appropriate information. It looks as +follows: +.PP +.in +12n +.EX +struct io_uring_buf_reg { + __u64 ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 pad; + __u64 resv[3]; +}; +.EE +.in +.PP +.in +8n +The +.I ring_addr +field must contain the address to the memory allocated to fit this ring. +The memory must be page aligned and hence allocated appropriately using eg +.BR posix_memalign (3) +or similar. The size of the ring is the product of +.I ring_entries +and the size of +.IR "struct io_uring_buf" . +.I ring_entries +is the desired size of the ring, and must be a power-of-2 in size. The maximum +size allowed is 2^15 (32768). +.I bgid +is the buffer group ID associated with this ring. SQEs that select a buffer +has a buffer group associated with them in their +.I buf_group +field, and the associated CQE will have +.B IORING_CQE_F_BUFFER +set in their +.I flags +member, which will also contain the specific ID of the buffer selected. The rest +of the fields are reserved and must be cleared to zero. + +The +.I flags +argument is currently unused and must be set to zero. + +.i nr_args +must be set to 1. + +Also see +.BR io_uring_register_buf_ring (3) +for more details. Available since 5.19. + +.TP +.B IORING_UNREGISTER_PBUF_RING +Unregister a previously registered provided buffer ring. +.I arg +must be set to the address of a struct io_uring_buf_reg, with just the +.I bgid +field set to the buffer group ID of the previously registered provided buffer +group. +.I nr_args +must be set to 1. Also see +.B IORING_REGISTER_PBUF_RING . + +Available since 5.19. + +.TP +.B IORING_REGISTER_SYNC_CANCEL +Performs a synchronous cancelation request, which works in a similar fashion to +.B IORING_OP_ASYNC_CANCEL +except it completes inline. This can be useful for scenarios where cancelations +should happen synchronously, rather than needing to issue an SQE and wait for +completion of that specific CQE. + +.I arg +must be set to a pointer to a struct io_uring_sync_cancel_reg structure, with +the details filled in for what request(s) to target for cancelation. See +.BR io_uring_register_sync_cancel (3) +for details on that. The return values are the same, except they are passed +back synchronously rather than through the CQE +.I res +field. +.I nr_args +must be set to 1. + +Available since 6.0. + +.TP +.B IORING_REGISTER_FILE_ALLOC_RANGE +sets the allowable range for fixed file index allocations within the +kernel. When requests that can instantiate a new fixed file are used with +.B IORING_FILE_INDEX_ALLOC , +the application is asking the kernel to allocate a new fixed file descriptor +rather than pass in a specific value for one. By default, the kernel will +pick any available fixed file descriptor within the range available. +This effectively allows the application to set aside a range just for dynamic +allocations, with the remainder being used for specific values. + +.I nr_args +must be set to 1 and +.I arg +must be set to a pointer to a struct io_uring_file_index_range: +.PP +.in +12n +.EX +struct io_uring_file_index_range { + __u32 off; + __u32 len; + __u64 resv; +}; +.EE +.in +.PP +.in +8n +with +.I off +being set to the starting value for the range, and +.I len +being set to the number of descriptors. The reserved +.I resv +field must be cleared to zero. + +The application must have registered a file table first. + +Available since 6.0. + +.SH RETURN VALUE + +On success, +.BR io_uring_register (2) +returns either 0 or a positive value, depending on the +.I opcode +used. On error, a negative error value is returned. The caller should not rely +on the +.I errno +variable. + +.SH ERRORS +.TP +.B EACCES +The +.I opcode +field is not allowed due to registered restrictions. +.TP +.B EBADF +One or more fds in the +.I fd +array are invalid. +.TP +.B EBADFD +.B IORING_REGISTER_ENABLE_RINGS +or +.B IORING_REGISTER_RESTRICTIONS +was specified, but the io_uring ring is not disabled. +.TP +.B EBUSY +.B IORING_REGISTER_BUFFERS +or +.B IORING_REGISTER_FILES +or +.B IORING_REGISTER_RESTRICTIONS +was specified, but there were already buffers, files, or restrictions +registered. +.TP +.B EFAULT +buffer is outside of the process' accessible address space, or +.I iov_len +is greater than 1GiB. +.TP +.B EINVAL +.B IORING_REGISTER_BUFFERS +or +.B IORING_REGISTER_FILES +was specified, but +.I nr_args +is 0. +.TP +.B EINVAL +.B IORING_REGISTER_BUFFERS +was specified, but +.I nr_args +exceeds +.B UIO_MAXIOV +.TP +.B EINVAL +.B IORING_UNREGISTER_BUFFERS +or +.B IORING_UNREGISTER_FILES +was specified, and +.I nr_args +is non-zero or +.I arg +is non-NULL. +.TP +.B EINVAL +.B IORING_REGISTER_RESTRICTIONS +was specified, but +.I nr_args +exceeds the maximum allowed number of restrictions or restriction +.I opcode +is invalid. +.TP +.B EMFILE +.B IORING_REGISTER_FILES +was specified and +.I nr_args +exceeds the maximum allowed number of files in a fixed file set. +.TP +.B EMFILE +.B IORING_REGISTER_FILES +was specified and adding +.I nr_args +file references would exceed the maximum allowed number of files the user +is allowed to have according to the +.B +RLIMIT_NOFILE +resource limit and the caller does not have +.B CAP_SYS_RESOURCE +capability. Note that this is a per user limit, not per process. +.TP +.B ENOMEM +Insufficient kernel resources are available, or the caller had a +non-zero +.B RLIMIT_MEMLOCK +soft resource limit, but tried to lock more memory than the limit +permitted. This limit is not enforced if the process is privileged +.RB ( CAP_IPC_LOCK ). +.TP +.B ENXIO +.B IORING_UNREGISTER_BUFFERS +or +.B IORING_UNREGISTER_FILES +was specified, but there were no buffers or files registered. +.TP +.B ENXIO +Attempt to register files or buffers on an io_uring instance that is already +undergoing file or buffer registration, or is being torn down. +.TP +.B EOPNOTSUPP +User buffers point to file-backed memory. diff --git a/man/io_uring_register_buf_ring.3 b/man/io_uring_register_buf_ring.3 new file mode 100644 index 0000000000000000000000000000000000000000..25c958e15138281af9e52c8a18ac84a9c459a879 --- /dev/null +++ b/man/io_uring_register_buf_ring.3 @@ -0,0 +1,140 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_buf_ring 3 "May 18, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_register_buf_ring \- register buffer ring for provided buffers +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_buf_ring(struct io_uring *" ring ", +.BI " struct io_uring_buf_reg *" reg ", +.BI " unsigned int " flags ");" +.BI " +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_buf_ring (3) +function registers a shared buffer ring to be used with provided buffers. For +the request types that support it, provided buffers are given to the ring and +one is selected by a request if it has +.B IOSQE_BUFFER_SELECT +set in the SQE +.IR flags , +when the request is ready to receive data. This allows both clear ownership +of the buffer lifetime, and a way to have more read/receive type of operations +in flight than buffers available. + +The +.I reg +argument must be filled in with the appropriate information. It looks as +follows: +.PP +.in +4n +.EX +struct io_uring_buf_reg { + __u64 ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 pad; + __u64 resv[3]; +}; +.EE +.in +.PP +The +.I ring_addr +field must contain the address to the memory allocated to fit this ring. +The memory must be page aligned and hence allocated appropriately using eg +.BR posix_memalign (3) +or similar. The size of the ring is the product of +.I ring_entries +and the size of +.IR "struct io_uring_buf" . +.I ring_entries +is the desired size of the ring, and must be a power-of-2 in size. The maximum +size allowed is 2^15 (32768). +.I bgid +is the buffer group ID associated with this ring. SQEs that select a buffer +has a buffer group associated with them in their +.I buf_group +field, and the associated CQE will have +.B IORING_CQE_F_BUFFER +set in their +.I flags +member, which will also contain the specific ID of the buffer selected. The rest +of the fields are reserved and must be cleared to zero. + +The +.I flags +argument is currently unused and must be set to zero. + +A shared buffer ring looks as follows: +.PP +.in +4n +.EX +struct io_uring_buf_ring { + union { + struct { + __u64 resv1; + __u32 resv2; + __u16 resv3; + __u16 tail; + }; + struct io_uring_buf bufs[0]; + }; +}; +.EE +.in +.PP +where +.I tail +is the index at which the application can insert new buffers for consumption +by requests, and +.I struct io_uring_buf +is buffer definition: +.PP +.in +4n +.EX +struct io_uring_buf { + __u64 addr; + __u32 len; + __u16 bid; + __u16 resv; +}; +.EE +.in +.PP +where +.I addr +is the address for the buffer, +.I len +is the length of the buffer in bytes, and +.I bid +is the buffer ID that will be returned in the CQE once consumed. + +Reserved fields must not be touched. Applications must use +.BR io_uring_buf_ring_init (3) +to initialise the buffer ring. Applications may use +.BR io_uring_buf_ring_add (3) +and +.BR io_uring_buf_ring_advance (3) +or +.BR io_uring_buf_ring_advance (3) +to provide buffers, which will set these fields and update the tail. + +Available since 5.19. + +.SH RETURN VALUE +On success +.BR io_uring_register_buf_ring (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_buf_ring_init (3), +.BR io_uring_buf_ring_add (3), +.BR io_uring_buf_ring_advance (3), +.BR io_uring_buf_ring_cq_advance (3) diff --git a/man/io_uring_register_buffers.3 b/man/io_uring_register_buffers.3 new file mode 100644 index 0000000000000000000000000000000000000000..656ac42aa0568d7f2901d687dd39bbc34bb823f6 --- /dev/null +++ b/man/io_uring_register_buffers.3 @@ -0,0 +1,61 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_buffers 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_register_buffers \- register buffers for fixed buffer operations +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_buffers(struct io_uring *" ring ", +.BI " const struct iovec *" iovecs ", +.BI " unsigned " nr_iovecs ");" +.PP +.BI "int io_uring_register_buffers_sparse(struct io_uring *" ring ", +.BI " unsigned " nr_iovecs ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_buffers (3) +function registers +.I nr_iovecs +number of buffers defined by the array +.I iovecs +belonging to the +.IR ring . + +The +.BR io_uring_register_buffers_sparse (3) +function registers +.I nr_iovecs +empty buffers belonging to the +.IR ring . +These buffers must be updated before use, using eg +.BR io_uring_register_buffers_update_tag (3). + +After the caller has registered the buffers, they can be used with one of the +fixed buffers functions. + +Registered buffers is an optimization that is useful in conjunction with +.B O_DIRECT +reads and writes, where it maps the specified range into the kernel once when +the buffer is registered rather than doing a map and unmap for each IO +every time IO is performed to that region. Additionally, it also avoids +manipulating the page reference counts for each IO. + +.SH RETURN VALUE +On success +.BR io_uring_register_buffers (3) +and +.BR io_uring_register_buffers_sparse (3) +return 0. On failure they return +.BR -errno . +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_unregister_buffers (3), +.BR io_uring_register_buf_ring (3), +.BR io_uring_prep_read_fixed (3), +.BR io_uring_prep_write_fixed (3) diff --git a/man/io_uring_register_eventfd.3 b/man/io_uring_register_eventfd.3 new file mode 100644 index 0000000000000000000000000000000000000000..5cbe72a0aab0bc28dfe42835ecf00a01e079f8d2 --- /dev/null +++ b/man/io_uring_register_eventfd.3 @@ -0,0 +1,51 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_eventfd 3 "April 16, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_register_eventfd \- register an eventfd with a ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_eventfd(struct io_uring *" ring "," +.BI " int " fd ");" +.PP +.BI "int io_uring_register_eventfd_async(struct io_uring *" ring "," +.BI " int " fd ");" +.PP +.BI "int io_uring_unregister_eventfd(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_register_eventfd (3) +registers the eventfd file descriptor +.I fd +with the ring identified by +.IR ring . + +Whenever completions are posted to the CQ ring, an eventfd notification +is generated with the registered eventfd descriptor. If +.BR io_uring_register_eventfd_async (3) +is used, only events that completed out-of-line will trigger a notification. + +It notifications are no longer desired, +.BR io_uring_unregister_eventfd (3) +may be called to remove the eventfd registration. No eventfd argument is +needed, as a ring can only have a single eventfd registered. + +.SH NOTES +While io_uring generally takes care to avoid spurious events, they can occur. +Similarly, batched completions of CQEs may only trigger a single eventfd +notification even if multiple CQEs are posted. The application should make no +assumptions on number of events being available having a direct correlation to +eventfd notifications posted. An eventfd notification must thus only be treated +as a hint to check the CQ ring for completions. +.SH RETURN VALUE +Returns 0 on success, or +or +.BR -errno +on error. +.SH SEE ALSO +.BR eventfd (2) diff --git a/man/io_uring_register_eventfd_async.3 b/man/io_uring_register_eventfd_async.3 new file mode 120000 index 0000000000000000000000000000000000000000..665995711bf2ed5b6b836d7457336cbc4b676f6f --- /dev/null +++ b/man/io_uring_register_eventfd_async.3 @@ -0,0 +1 @@ +io_uring_register_eventfd.3 \ No newline at end of file diff --git a/man/io_uring_register_file_alloc_range.3 b/man/io_uring_register_file_alloc_range.3 new file mode 100644 index 0000000000000000000000000000000000000000..1afd41bd94ae726cffb507a82b8127d9b764be4e --- /dev/null +++ b/man/io_uring_register_file_alloc_range.3 @@ -0,0 +1,52 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_file_alloc_range 3 "Oct 21, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_register_file_alloc_range \- set range for fixed file allocations +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_file_alloc_range(struct io_uring *" ring ", +.BI " unsigned " off "," +.BI " unsigned " len ");" +.BI " +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_file_alloc_range (3) +function sets the allowable range for fixed file index allocations within the +kernel. When requests that can instantiate a new fixed file are used with +.B IORING_FILE_INDEX_ALLOC , +the application is asking the kernel to allocate a new fixed file descriptor +rather than pass in a specific value for one. By default, the kernel will +pick any available fixed file descriptor within the range available. Calling +this function with +.I off +set to the starting offset and +.I len +set to the number of descriptors, the application can limit the allocated +descriptors to that particular range. This effectively allows the application +to set aside a range just for dynamic allocations, with the remainder being +used for specific values. + +The application must have registered a fixed file table upfront, eg through +.BR io_uring_register_files (3) +or +.BR io_uring_register_files_sparse (3) . + +Available since 6.0. + +.SH RETURN VALUE +On success +.BR io_uring_register_buf_ring (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_register_files (3) +.BR io_uring_prep_accept_direct (3) +.BR io_uring_prep_openat_direct (3) +.BR io_uring_prep_socket_direct (3) diff --git a/man/io_uring_register_files.3 b/man/io_uring_register_files.3 new file mode 100644 index 0000000000000000000000000000000000000000..3feac4eb871802c1b0a695dedc8477e5b97a672e --- /dev/null +++ b/man/io_uring_register_files.3 @@ -0,0 +1,57 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_files 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_register_files \- register file descriptors +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_files(struct io_uring *" ring "," +.BI " const int *" files "," +.BI " unsigned " nr_files ");" +.PP +.BI "int io_uring_register_files_sparse(struct io_uring *" ring "," +.BI " unsigned " nr_files ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_files (3) +function registers +.I nr_files +number of file descriptors defined by the array +.I files +belonging to the +.I ring +for subsequent operations. + +The +.BR io_uring_register_files_sparse (3) +function registers an empty file table of +.I nr_files +number of file descriptors. The sparse variant is available in kernels 5.19 +and later. + +Registering a file table is a prerequisite for using any request that uses +direct descriptors. + +Registered files have less overhead per operation than normal files. This +is due to the kernel grabbing a reference count on a file when an operation +begins, and dropping it when it's done. When the process file table is +shared, for example if the process has ever created any threads, then this +cost goes up even more. Using registered files reduces the overhead of +file reference management across requests that operate on a file. + +.SH RETURN VALUE +On success +.BR io_uring_register_files (3) +and +.BR io_uring_register_files_sparse (3) +return 0. On failure they return +.BR -errno . +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_unregister_files (3) diff --git a/man/io_uring_register_files_sparse.3 b/man/io_uring_register_files_sparse.3 new file mode 120000 index 0000000000000000000000000000000000000000..db38b932776bb66efa7321b0acf1ec98759262b1 --- /dev/null +++ b/man/io_uring_register_files_sparse.3 @@ -0,0 +1 @@ +io_uring_register_files.3 \ No newline at end of file diff --git a/man/io_uring_register_iowq_aff.3 b/man/io_uring_register_iowq_aff.3 new file mode 100644 index 0000000000000000000000000000000000000000..e7829141c1940ef0d7489f614abcf253d34264b5 --- /dev/null +++ b/man/io_uring_register_iowq_aff.3 @@ -0,0 +1,61 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_iowq_aff 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_register_iowq_aff \- register async worker CPU affinities +.SH SYNOPSIS +.nf +.B #include +.B #include +.PP +.BI "int io_uring_register_iowq_aff(struct io_uring *" ring "," +.BI " size_t " cpusz "," +.BI " const cpu_set_t *" mask "); +.PP +.BI "void io_uring_unregister_iowq_aff(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_prep_register_iowq_aff (3) +function registers a set of CPU affinities to be used by the io_uring async +workers. By default, io_uring async workers are allowed to run on any CPU in +the system. If this function is called with +.I ring +set to the ring in question and +.I mask +set to a pointer to a +.B cpu_set_t +value and +.I cpusz +set to the size of the CPU set, then async workers will only be allowed to run +on the CPUs specified in the mask. Existing workers may need to hit a schedule +point before they are migrated. + +For unregistration, +.BR io_uring_unregister_iowq_aff (3) +may be called to restore CPU affinities to the default. + +.SH RETURN VALUE +Returns +.B 0 +on success, or any of the following values in case of error. +.TP +.B -EFAULT +The kernel was unable to copy the memory pointer to by +.I mask +as it was invalid. +.TP +.B -ENOMEM +The kernel was unable to allocate memory for the new CPU mask. +.TP +.B -EINVAL +.I cpusz +or +.I mask +was NULL/0, or any other value specified was invalid. +.SH SEE ALSO +.BR io_uring_queue_init (3), +.BR io_uring_register (2) diff --git a/man/io_uring_register_iowq_max_workers.3 b/man/io_uring_register_iowq_max_workers.3 new file mode 100644 index 0000000000000000000000000000000000000000..2557e21688527e0d27723b91f195669d40c05508 --- /dev/null +++ b/man/io_uring_register_iowq_max_workers.3 @@ -0,0 +1,71 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_iowq_max_workers 3 "March 13, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_register_iowq_max_workers \- modify the maximum allowed async workers +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_iowq_max_workers(struct io_uring *" ring "," +.BI " unsigned int *" values ");" +.fi +.SH DESCRIPTION +.PP +io_uring async workers are split into two types: +.TP +.B Bounded +These workers have a bounded execution time. Examples of that are filesystem +reads, which normally complete in a relatively short amount of time. In case +of disk failures, they are still bounded by a timeout operation that will +abort them if exceeded. +.TP +.B Unbounded +Work items here may take an indefinite amount of time to complete. Examples +include doing IO to sockets, pipes, or any other non-regular type of file. + +.PP +By default, the amount of bounded IO workers is limited to how many SQ entries +the ring was setup with, or 4 times the number of online CPUs in the system, +whichever is smaller. Unbounded workers are only limited by the process task +limit, as indicated by the rlimit +.B RLIMIT_NPROC +limit. + +This can be modified by calling +.B io_uring_register_iowq_max_workers +with +.I ring +set to the ring in question, and +.I values +pointing to an array of two values. The first element should contain the number +of desired bounded workers, and the second element should contain the number +of desired unbounded workers. These are both maximum values, io_uring will +not maintain a high count of idle workers, they are reaped when they are not +necessary anymore. + +If called with both values set to 0, the existing values are returned. + +.SH RETURN VALUE +Returns +.B 0 +on success, with +.I values +containing the previous values for the settings. On error, any of the following +may be returned. +.TP +.B -EFAULT +The kernel was unable to copy the memory pointer to by +.I values +as it was invalid. +.TP +.B -EINVAL +.I values +was +.B NULL +or the new values exceeded the maximum allowed value. +.SH SEE ALSO +.BR io_uring_queue_init (3), +.BR io_uring_register (2) diff --git a/man/io_uring_register_ring_fd.3 b/man/io_uring_register_ring_fd.3 new file mode 100644 index 0000000000000000000000000000000000000000..e70c551d611fdec0a6190d282264aa7b57962dd3 --- /dev/null +++ b/man/io_uring_register_ring_fd.3 @@ -0,0 +1,49 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_ring_fd 3 "March 11, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_register_ring_fd \- register a ring file descriptor +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_ring_fd(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_register_ring_fd (3) +registers the file descriptor of the ring. + +Whenever +.BR io_uring_enter (2) +is called to submit request or wait for completions, the kernel must grab a +reference to the file descriptor. If the application using io_uring is threaded, +the file table is marked as shared, and the reference grab and put of the file +descriptor count is more expensive than it is for a non-threaded application. + +Similarly to how io_uring allows registration of files, this allow registration +of the ring file descriptor itself. This reduces the overhead of the +.BR io_uring_enter (2) +system call. + +If an application using liburing is threaded, then an application should call +this function to register the ring descriptor when a ring is set up. See NOTES +for restrictions when a ring is shared. + +.SH NOTES +When the ring descriptor is registered, it is stored internally in the +.I struct io_uring +structure. For applications that share a ring between threads, for example +having one thread do submits and another reap events, then this optimization +cannot be used as each thread may have a different index for the registered +ring fd. +.SH RETURN VALUE +Returns 1 on success, indicating that one file descriptor was registered, +or +.BR -errno +on error. +.SH SEE ALSO +.BR io_uring_unregister_ring_fd (3), +.BR io_uring_register_files (3) diff --git a/man/io_uring_register_sync_cancel.3 b/man/io_uring_register_sync_cancel.3 new file mode 100644 index 0000000000000000000000000000000000000000..18fcf99ee92ddde0d7f9b456c49a94c4c09de50b --- /dev/null +++ b/man/io_uring_register_sync_cancel.3 @@ -0,0 +1,71 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_sync_cancel 3 "September 21, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_register_sync_cancel \- issue a synchronous cancelation request +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_sync_cancel(struct io_uring *" ring ", +.BI " struct io_uring_sync_cancel_reg *" reg "); +.PP +.SH DESCRIPTION +.PP +The +.BR io_uring_register_sync_cancel (3) +function performs a synchronous cancelation request based on the parameters +specified in +.I reg . + +The +.I reg +argument must be filled in with the appropriate information for the +cancelation request. It looks as follows: +.PP +.in +4n +.EX +struct io_uring_sync_cancel_reg { + __u64 addr; + __s32 fd; + __u32 flags; + struct __kernel_timespec timeout; + __u64 pad[4]; +}; +.EE +.in +.PP + +The arguments largely mirror what the async prep functions support, see +.BR io_uring_prep_cancel (3) +for details. Similarly, the return value is the same. The exception is the +.I timeout +argument, which can be used to limit the time that the kernel will wait for +cancelations to be successful. If the +.I tv_sec +and +.I tv_nsec +values are set to anything but +.B -1UL , +then they indicate a relative timeout upon which cancelations should be +completed by. + +The +.I pad +values must be zero filled. + +.SH RETURN VALUE +See +.BR io_uring_prep_cancel (3) +for details on the return value. If +.I timeout +is set to indicate a timeout, then +.B -ETIME +will be returned if exceeded. If an unknown value is set in the request, +or if the pad values are not cleared to zero, then +.I -EINVAL +is returned. +.SH SEE ALSO +.BR io_uring_prep_cancel (3) diff --git a/man/io_uring_setup.2 b/man/io_uring_setup.2 new file mode 100644 index 0000000000000000000000000000000000000000..cd699949483f2cf9e67ac1523b4d2013057e5449 --- /dev/null +++ b/man/io_uring_setup.2 @@ -0,0 +1,640 @@ +.\" Copyright (C) 2019 Jens Axboe +.\" Copyright (C) 2019 Jon Corbet +.\" Copyright (C) 2019 Red Hat, Inc. +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_setup 2 2019-01-29 "Linux" "Linux Programmer's Manual" +.SH NAME +io_uring_setup \- setup a context for performing asynchronous I/O +.SH SYNOPSIS +.nf +.BR "#include " +.PP +.BI "int io_uring_setup(u32 " entries ", struct io_uring_params *" p ); +.fi +.PPAA +.SH DESCRIPTION +.PP +The +.BR io_uring_setup (2) +system call sets up a submission queue (SQ) and completion queue (CQ) with at +least +.I entries +entries, and returns a file descriptor which can be used to perform +subsequent operations on the io_uring instance. The submission and +completion queues are shared between userspace and the kernel, which +eliminates the need to copy data when initiating and completing I/O. + +.I params +is used by the application to pass options to the kernel, and by the +kernel to convey information about the ring buffers. +.PP +.in +4n +.EX +struct io_uring_params { + __u32 sq_entries; + __u32 cq_entries; + __u32 flags; + __u32 sq_thread_cpu; + __u32 sq_thread_idle; + __u32 features; + __u32 wq_fd; + __u32 resv[3]; + struct io_sqring_offsets sq_off; + struct io_cqring_offsets cq_off; +}; +.EE +.in +.PP +The +.IR flags , +.IR sq_thread_cpu , +and +.I sq_thread_idle +fields are used to configure the io_uring instance. +.I flags +is a bit mask of 0 or more of the following values ORed +together: +.TP +.B IORING_SETUP_IOPOLL +Perform busy-waiting for an I/O completion, as opposed to getting +notifications via an asynchronous IRQ (Interrupt Request). The file +system (if any) and block device must support polling in order for +this to work. Busy-waiting provides lower latency, but may consume +more CPU resources than interrupt driven I/O. Currently, this feature +is usable only on a file descriptor opened using the +.B O_DIRECT +flag. When a read or write is submitted to a polled context, the +application must poll for completions on the CQ ring by calling +.BR io_uring_enter (2). +It is illegal to mix and match polled and non-polled I/O on an io_uring +instance. + +.TP +.B IORING_SETUP_SQPOLL +When this flag is specified, a kernel thread is created to perform +submission queue polling. An io_uring instance configured in this way +enables an application to issue I/O without ever context switching +into the kernel. By using the submission queue to fill in new +submission queue entries and watching for completions on the +completion queue, the application can submit and reap I/Os without +doing a single system call. + +If the kernel thread is idle for more than +.I sq_thread_idle +milliseconds, it will set the +.B IORING_SQ_NEED_WAKEUP +bit in the +.I flags +field of the +.IR "struct io_sq_ring" . +When this happens, the application must call +.BR io_uring_enter (2) +to wake the kernel thread. If I/O is kept busy, the kernel thread +will never sleep. An application making use of this feature will need +to guard the +.BR io_uring_enter (2) +call with the following code sequence: + +.in +4n +.EX +/* + * Ensure that the wakeup flag is read after the tail pointer + * has been written. It's important to use memory load acquire + * semantics for the flags read, as otherwise the application + * and the kernel might not agree on the consistency of the + * wakeup flag. + */ +unsigned flags = atomic_load_relaxed(sq_ring->flags); +if (flags & IORING_SQ_NEED_WAKEUP) + io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP); +.EE +.in + +where +.I sq_ring +is a submission queue ring setup using the +.I struct io_sqring_offsets +described below. +.TP +.BR +Before version 5.11 of the Linux kernel, to successfully use this feature, the +application must register a set of files to be used for IO through +.BR io_uring_register (2) +using the +.B IORING_REGISTER_FILES +opcode. Failure to do so will result in submitted IO being errored with +.B EBADF. +The presence of this feature can be detected by the +.B IORING_FEAT_SQPOLL_NONFIXED +feature flag. +In version 5.11 and later, it is no longer necessary to register files to use +this feature. 5.11 also allows using this as non-root, if the user has the +.B CAP_SYS_NICE +capability. +.TP +.B IORING_SETUP_SQ_AFF +If this flag is specified, then the poll thread will be bound to the +cpu set in the +.I sq_thread_cpu +field of the +.IR "struct io_uring_params" . +This flag is only meaningful when +.B IORING_SETUP_SQPOLL +is specified. When cgroup setting +.I cpuset.cpus +changes (typically in container environment), the bounded cpu set may be +changed as well. +.TP +.B IORING_SETUP_CQSIZE +Create the completion queue with +.IR "struct io_uring_params.cq_entries" +entries. The value must be greater than +.IR entries , +and may be rounded up to the next power-of-two. +.TP +.B IORING_SETUP_CLAMP +If this flag is specified, and if +.IR entries +exceeds +.B IORING_MAX_ENTRIES , +then +.IR entries +will be clamped at +.B IORING_MAX_ENTRIES . +If the flag +.BR IORING_SETUP_SQPOLL +is set, and if the value of +.IR "struct io_uring_params.cq_entries" +exceeds +.B IORING_MAX_CQ_ENTRIES , +then it will be clamped at +.B IORING_MAX_CQ_ENTRIES . +.TP +.B IORING_SETUP_ATTACH_WQ +This flag should be set in conjunction with +.IR "struct io_uring_params.wq_fd" +being set to an existing io_uring ring file descriptor. When set, the +io_uring instance being created will share the asynchronous worker +thread backend of the specified io_uring ring, rather than create a new +separate thread pool. +.TP +.B IORING_SETUP_R_DISABLED +If this flag is specified, the io_uring ring starts in a disabled state. +In this state, restrictions can be registered, but submissions are not allowed. +See +.BR io_uring_register (2) +for details on how to enable the ring. Available since 5.10. +.TP +.B IORING_SETUP_SUBMIT_ALL +Normally io_uring stops submitting a batch of request, if one of these requests +results in an error. This can cause submission of less than what is expected, +if a request ends in error while being submitted. If the ring is created with +this flag, +.BR io_uring_enter (2) +will continue submitting requests even if it encounters an error submitting +a request. CQEs are still posted for errored request regardless of whether or +not this flag is set at ring creation time, the only difference is if the +submit sequence is halted or continued when an error is observed. Available +since 5.18. +.TP +.B IORING_SETUP_COOP_TASKRUN +By default, io_uring will interrupt a task running in userspace when a +completion event comes in. This is to ensure that completions run in a timely +manner. For a lot of use cases, this is overkill and can cause reduced +performance from both the inter-processor interrupt used to do this, the +kernel/user transition, the needless interruption of the tasks userspace +activities, and reduced batching if completions come in at a rapid rate. Most +applications don't need the forceful interruption, as the events are processed +at any kernel/user transition. The exception are setups where the application +uses multiple threads operating on the same ring, where the application +waiting on completions isn't the one that submitted them. For most other +use cases, setting this flag will improve performance. Available since 5.19. +.TP +.B IORING_SETUP_TASKRUN_FLAG +Used in conjunction with +.B IORING_SETUP_COOP_TASKRUN, +this provides a flag, +.B IORING_SQ_TASKRUN, +which is set in the SQ ring +.I flags +whenever completions are pending that should be processed. liburing will check +for this flag even when doing +.BR io_uring_peek_cqe (3) +and enter the kernel to process them, and applications can do the same. This +makes +.B IORING_SETUP_TASKRUN_FLAG +safe to use even when applications rely on a peek style operation on the CQ +ring to see if anything might be pending to reap. Available since 5.19. +.TP +.B IORING_SETUP_SQE128 +If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized +variant. This is a requirement for using certain request types, as of 5.19 +only the +.B IORING_OP_URING_CMD +passthrough command for NVMe passthrough needs this. Available since 5.19. +.TP +.B IORING_SETUP_CQE32 +If set, io_uring will use 32-byte CQEs rather than the normal 16-byte sized +variant. This is a requirement for using certain request types, as of 5.19 +only the +.B IORING_OP_URING_CMD +passthrough command for NVMe passthrough needs this. Available since 5.19. +.TP +.B IORING_SETUP_SINGLE_ISSUER +A hint to the kernel that only a single task (or thread) will submit requests, which is +used for internal optimisations. The submission task is either the task that created the +ring, or if +.B IORING_SETUP_R_DISABLED +is specified then it is the task that enables the ring through +.BR io_uring_register (2) . +The kernel enforces this rule, failing requests with +.B -EEXIST +if the restriction is violated. +Note that when +.B IORING_SETUP_SQPOLL +is set it is considered that the polling task is doing all submissions +on behalf of the userspace and so it always complies with the rule disregarding +how many userspace tasks do +.BR io_uring_enter(2). +Available since 6.0. +.TP +.B IORING_SETUP_DEFER_TASKRUN +By default, io_uring will process all outstanding work at the end of any system +call or thread interrupt. This can delay the application from making other progress. +Setting this flag will hint to io_uring that it should defer work until an +.BR io_uring_enter(2) +call with the +.B IORING_ENTER_GETEVENTS +flag set. This allows the application to request work to run just before it wants to +process completions. +This flag requires the +.BR IORING_SETUP_SINGLE_ISSUER +flag to be set, and also enforces that the call to +.BR io_uring_enter(2) +is called from the same thread that submitted requests. +Note that if this flag is set then it is the application's responsibility to periodically +trigger work (for example via any of the CQE waiting functions) or else completions may +not be delivered. +Available since 6.1. +.PP +If no flags are specified, the io_uring instance is setup for +interrupt driven I/O. I/O may be submitted using +.BR io_uring_enter (2) +and can be reaped by polling the completion queue. + +The +.I resv +array must be initialized to zero. + +.I features +is filled in by the kernel, which specifies various features supported +by current kernel version. +.TP +.B IORING_FEAT_SINGLE_MMAP +If this flag is set, the two SQ and CQ rings can be mapped with a single +.I mmap(2) +call. The SQEs must still be allocated separately. This brings the necessary +.I mmap(2) +calls down from three to two. Available since kernel 5.4. +.TP +.B IORING_FEAT_NODROP +If this flag is set, io_uring supports almost never dropping completion events. +If a completion event occurs and the CQ ring is full, the kernel stores +the event internally until such a time that the CQ ring has room for more +entries. If this overflow condition is entered, attempting to submit more +IO will fail with the +.B -EBUSY +error value, if it can't flush the overflown events to the CQ ring. If this +happens, the application must reap events from the CQ ring and attempt the +submit again. If the kernel has no free memory to store the event internally +it will be visible by an increase in the overflow value on the cqring. +Available since kernel 5.5. Additionally +.BR io_uring_enter (2) +will return +.B -EBADR +the next time it would otherwise sleep waiting for completions (since kernel 5.19). + +.TP +.B IORING_FEAT_SUBMIT_STABLE +If this flag is set, applications can be certain that any data for +async offload has been consumed when the kernel has consumed the SQE. Available +since kernel 5.5. +.TP +.B IORING_FEAT_RW_CUR_POS +If this flag is set, applications can specify +.I offset +== +.B -1 +with +.B IORING_OP_{READV,WRITEV} +, +.B IORING_OP_{READ,WRITE}_FIXED +, and +.B IORING_OP_{READ,WRITE} +to mean current file position, which behaves like +.I preadv2(2) +and +.I pwritev2(2) +with +.I offset +== +.B -1. +It'll use (and update) the current file position. This obviously comes +with the caveat that if the application has multiple reads or writes in flight, +then the end result will not be as expected. This is similar to threads sharing +a file descriptor and doing IO using the current file position. Available since +kernel 5.6. +.TP +.B IORING_FEAT_CUR_PERSONALITY +If this flag is set, then io_uring guarantees that both sync and async +execution of a request assumes the credentials of the task that called +.I +io_uring_enter(2) +to queue the requests. If this flag isn't set, then requests are issued with +the credentials of the task that originally registered the io_uring. If only +one task is using a ring, then this flag doesn't matter as the credentials +will always be the same. Note that this is the default behavior, tasks can +still register different personalities through +.I +io_uring_register(2) +with +.B IORING_REGISTER_PERSONALITY +and specify the personality to use in the sqe. Available since kernel 5.6. +.TP +.B IORING_FEAT_FAST_POLL +If this flag is set, then io_uring supports using an internal poll mechanism +to drive data/space readiness. This means that requests that cannot read or +write data to a file no longer need to be punted to an async thread for +handling, instead they will begin operation when the file is ready. This is +similar to doing poll + read/write in userspace, but eliminates the need to do +so. If this flag is set, requests waiting on space/data consume a lot less +resources doing so as they are not blocking a thread. Available since kernel +5.7. +.TP +.B IORING_FEAT_POLL_32BITS +If this flag is set, the +.B IORING_OP_POLL_ADD +command accepts the full 32-bit range of epoll based flags. Most notably +.B EPOLLEXCLUSIVE +which allows exclusive (waking single waiters) behavior. Available since kernel +5.9. +.TP +.B IORING_FEAT_SQPOLL_NONFIXED +If this flag is set, the +.B IORING_SETUP_SQPOLL +feature no longer requires the use of fixed files. Any normal file descriptor +can be used for IO commands without needing registration. Available since +kernel 5.11. +.TP +.B IORING_FEAT_ENTER_EXT_ARG +If this flag is set, then the +.BR io_uring_enter (2) +system call supports passing in an extended argument instead of just the +.IR "sigset_t" +of earlier kernels. This. +extended argument is of type +.IR "struct io_uring_getevents_arg" +and allows the caller to pass in both a +.IR "sigset_t" +and a timeout argument for waiting on events. The struct layout is as follows: +.TP +.in +8n +.EX +struct io_uring_getevents_arg { + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad; + __u64 ts; +}; +.EE + +and a pointer to this struct must be passed in if +.B IORING_ENTER_EXT_ARG +is set in the flags for the enter system call. Available since kernel 5.11. +.TP +.B IORING_FEAT_NATIVE_WORKERS +If this flag is set, io_uring is using native workers for its async helpers. +Previous kernels used kernel threads that assumed the identity of the +original io_uring owning task, but later kernels will actively create what +looks more like regular process threads instead. Available since kernel +5.12. +.TP +.B IORING_FEAT_RSRC_TAGS +If this flag is set, then io_uring supports a variety of features related +to fixed files and buffers. In particular, it indicates that registered +buffers can be updated in-place, whereas before the full set would have to +be unregistered first. Available since kernel 5.13. +.TP +.B IORING_FEAT_CQE_SKIP +If this flag is set, then io_uring supports setting +.B IOSQE_CQE_SKIP_SUCCESS +in the submitted SQE, indicating that no CQE should be generated for this +SQE if it executes normally. If an error happens processing the SQE, a +CQE with the appropriate error value will still be generated. Available since +kernel 5.17. +.TP +.B IORING_FEAT_LINKED_FILE +If this flag is set, then io_uring supports sane assignment of files for SQEs +that have dependencies. For example, if a chain of SQEs are submitted with +.B IOSQE_IO_LINK, +then kernels without this flag will prepare the file for each link upfront. +If a previous link opens a file with a known index, eg if direct descriptors +are used with open or accept, then file assignment needs to happen post +execution of that SQE. If this flag is set, then the kernel will defer +file assignment until execution of a given request is started. Available since +kernel 5.17. + +.PP +The rest of the fields in the +.I struct io_uring_params +are filled in by the kernel, and provide the information necessary to +memory map the submission queue, completion queue, and the array of +submission queue entries. +.I sq_entries +specifies the number of submission queue entries allocated. +.I sq_off +describes the offsets of various ring buffer fields: +.PP +.in +4n +.EX +struct io_sqring_offsets { + __u32 head; + __u32 tail; + __u32 ring_mask; + __u32 ring_entries; + __u32 flags; + __u32 dropped; + __u32 array; + __u32 resv[3]; +}; +.EE +.in +.PP +Taken together, +.I sq_entries +and +.I sq_off +provide all of the information necessary for accessing the submission +queue ring buffer and the submission queue entry array. The +submission queue can be mapped with a call like: +.PP +.in +4n +.EX +ptr = mmap(0, sq_off.array + sq_entries * sizeof(__u32), + PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, + ring_fd, IORING_OFF_SQ_RING); +.EE +.in +.PP +where +.I sq_off +is the +.I io_sqring_offsets +structure, and +.I ring_fd +is the file descriptor returned from +.BR io_uring_setup (2). +The addition of +.I sq_off.array +to the length of the region accounts for the fact that the ring +located at the end of the data structure. As an example, the ring +buffer head pointer can be accessed by adding +.I sq_off.head +to the address returned from +.BR mmap (2): +.PP +.in +4n +.EX +head = ptr + sq_off.head; +.EE +.in + +The +.I flags +field is used by the kernel to communicate state information to the +application. Currently, it is used to inform the application when a +call to +.BR io_uring_enter (2) +is necessary. See the documentation for the +.B IORING_SETUP_SQPOLL +flag above. +The +.I dropped +member is incremented for each invalid submission queue entry +encountered in the ring buffer. + +The head and tail track the ring buffer state. The tail is +incremented by the application when submitting new I/O, and the head +is incremented by the kernel when the I/O has been successfully +submitted. Determining the index of the head or tail into the ring is +accomplished by applying a mask: +.PP +.in +4n +.EX +index = tail & ring_mask; +.EE +.in +.PP +The array of submission queue entries is mapped with: +.PP +.in +4n +.EX +sqentries = mmap(0, sq_entries * sizeof(struct io_uring_sqe), + PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, + ring_fd, IORING_OFF_SQES); +.EE +.in +.PP +The completion queue is described by +.I cq_entries +and +.I cq_off +shown here: +.PP +.in +4n +.EX +struct io_cqring_offsets { + __u32 head; + __u32 tail; + __u32 ring_mask; + __u32 ring_entries; + __u32 overflow; + __u32 cqes; + __u32 flags; + __u32 resv[3]; +}; +.EE +.in +.PP +The completion queue is simpler, since the entries are not separated +from the queue itself, and can be mapped with: +.PP +.in +4n +.EX +ptr = mmap(0, cq_off.cqes + cq_entries * sizeof(struct io_uring_cqe), + PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, ring_fd, + IORING_OFF_CQ_RING); +.EE +.in +.PP +Closing the file descriptor returned by +.BR io_uring_setup (2) +will free all resources associated with the io_uring context. +.PP +.SH RETURN VALUE +.BR io_uring_setup (2) +returns a new file descriptor on success. The application may then +provide the file descriptor in a subsequent +.BR mmap (2) +call to map the submission and completion queues, or to the +.BR io_uring_register (2) +or +.BR io_uring_enter (2) +system calls. + +On error, a negative error code is returned. The caller should not rely on +.I errno +variable. +.PP +.SH ERRORS +.TP +.B EFAULT +params is outside your accessible address space. +.TP +.B EINVAL +The resv array contains non-zero data, p.flags contains an unsupported +flag, +.I entries +is out of bounds, +.B IORING_SETUP_SQ_AFF +was specified, but +.B IORING_SETUP_SQPOLL +was not, or +.B IORING_SETUP_CQSIZE +was specified, but +.I io_uring_params.cq_entries +was invalid. +.TP +.B EMFILE +The per-process limit on the number of open file descriptors has been +reached (see the description of +.B RLIMIT_NOFILE +in +.BR getrlimit (2)). +.TP +.B ENFILE +The system-wide limit on the total number of open files has been +reached. +.TP +.B ENOMEM +Insufficient kernel resources are available. +.TP +.B EPERM +.B IORING_SETUP_SQPOLL +was specified, but the effective user ID of the caller did not have sufficient +privileges. +.SH SEE ALSO +.BR io_uring_register (2), +.BR io_uring_enter (2) diff --git a/man/io_uring_sq_ready.3 b/man/io_uring_sq_ready.3 new file mode 100644 index 0000000000000000000000000000000000000000..ba155b3210d02c544708a47c1440773cda90e82e --- /dev/null +++ b/man/io_uring_sq_ready.3 @@ -0,0 +1,31 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_sq_ready 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_sq_ready \- number of unconsumed or unsubmitted entries in the SQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "unsigned io_uring_sq_ready(const struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_sq_ready (3) +function returns the number of unconsumed (if SQPOLL) or unsubmitted entries +that exist in the SQ ring belonging to the +.I ring +param. + +Usage of this function only applies if the ring has been setup with +.B IORING_SETUP_SQPOLL, +where request submissions, and hence consumption from the SQ ring, happens +through a polling thread. + +.SH RETURN VALUE +Returns the number of unconsumed or unsubmitted entries in the SQ ring. +.SH SEE ALSO +.BR io_uring_cq_ready (3) diff --git a/man/io_uring_sq_space_left.3 b/man/io_uring_sq_space_left.3 new file mode 100644 index 0000000000000000000000000000000000000000..6fd04c4e90ea432fc021d9ad0d85117e4be680a4 --- /dev/null +++ b/man/io_uring_sq_space_left.3 @@ -0,0 +1,25 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_sq_space-left 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_sq_space_left \- free space in the SQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "unsigned io_uring_sq_space_left(const struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_sq_space_left (3) +function returns how much space is left in the SQ ring belonging to the +.I ring +param. + +.SH RETURN VALUE +Returns the number of availables entries in the SQ ring. +.SH SEE ALSO +.BR io_uring_sq_ready (3) diff --git a/man/io_uring_sqe_set_data.3 b/man/io_uring_sqe_set_data.3 new file mode 100644 index 0000000000000000000000000000000000000000..274a892db41bcf926b869e1d0c9d266dabf4c10f --- /dev/null +++ b/man/io_uring_sqe_set_data.3 @@ -0,0 +1,48 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_sqe_set_data 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_sqe_set_data \- set user data for submission queue event +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_sqe_set_data(struct io_uring_sqe *" sqe "," +.BI " void *" user_data ");" +.BI " +.BI "void io_uring_sqe_set_data64(struct io_uring_sqe *" sqe "," +.BI " __u64 " data ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_sqe_set_data (3) +function stores a +.I user_data +pointer with the submission queue entry +.IR sqe . + +The +.BR io_uring_sqe_set_data64 (3) +function stores a 64-bit +.I data +value with the submission queue entry +.IR sqe . + +After the caller has requested a submission queue entry (SQE) with +.BR io_uring_get_sqe (3) , +they can associate a data pointer or value with the SQE. Once the completion +arrives, the function +.BR io_uring_cqe_get_data (3) +or +.BR io_uring_cqe_get_data64 (3) +can be called to retrieve the data pointer or value associated with the +submitted request. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_cqe_get_data (3) diff --git a/man/io_uring_sqe_set_data64.3 b/man/io_uring_sqe_set_data64.3 new file mode 120000 index 0000000000000000000000000000000000000000..8bbd6927f35e240385b9dad6fc8330f5141bd4f6 --- /dev/null +++ b/man/io_uring_sqe_set_data64.3 @@ -0,0 +1 @@ +io_uring_sqe_set_data.3 \ No newline at end of file diff --git a/man/io_uring_sqe_set_flags.3 b/man/io_uring_sqe_set_flags.3 new file mode 100644 index 0000000000000000000000000000000000000000..ab0bb8e7b531beec3144138a156492bdca21ea1e --- /dev/null +++ b/man/io_uring_sqe_set_flags.3 @@ -0,0 +1,87 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_sqe_set_flags 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_sqe_set_flags \- set flags for submission queue entry +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "void io_uring_sqe_set_flags(struct io_uring_sqe *" sqe "," +.BI " unsigned " flags ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_sqe_set_flags (3) +function allows the caller to change the behavior of the submission queue entry +by specifying flags. It enables the +.I flags +belonging to the +.I sqe +submission queue entry param. + +.I flags +is a bit mask of 0 or more of the following values ORed together: +.TP +.B IOSQE_FIXED_FILE +The file descriptor in the SQE refers to the index of a previously registered +file or direct file descriptor, not a normal file descriptor. +.TP +.B IOSQE_ASYNC +Normal operation for io_uring is to try and issue an sqe as non-blocking first, +and if that fails, execute it in an async manner. To support more efficient +overlapped operation of requests that the application knows/assumes will +always (or most of the time) block, the application can ask for an sqe to be +issued async from the start. Note that this flag immediately causes the SQE +to be offloaded to an async helper thread with no initial non-blocking attempt. +This may be less efficient and should not be used liberally or without +understanding the performance and efficiency tradeoffs. +.TP +.B IOSQE_IO_LINK +When this flag is specified, the SQE forms a link with the next SQE in the +submission ring. That next SQE will not be started before the previous request +completes. This, in effect, forms a chain of SQEs, which can be arbitrarily +long. The tail of the chain is denoted by the first SQE that does not have this +flag set. Chains are not supported across submission boundaries. Even if the +last SQE in a submission has this flag set, it will still terminate the current +chain. This flag has no effect on previous SQE submissions, nor does it impact +SQEs that are outside of the chain tail. This means that multiple chains can be +executing in parallel, or chains and individual SQEs. Only members inside the +chain are serialized. A chain of SQEs will be broken if any request in that +chain ends in error. +.TP +.B IOSQE_IO_HARDLINK +Like +.B IOSQE_IO_LINK , +except the links aren't severed if an error or unexpected result occurs. +.TP +.B IOSQE_IO_DRAIN +When this flag is specified, the SQE will not be started before previously +submitted SQEs have completed, and new SQEs will not be started before this +one completes. +.TP +.B IOSQE_CQE_SKIP_SUCCESS +Request that no CQE be generated for this request, if it completes successfully. +This can be useful in cases where the application doesn't need to know when +a specific request completed, if it completed successfully. +.TP +.B IOSQE_BUFFER_SELECT +If set, and if the request types supports it, select an IO buffer from the +indicated buffer group. This can be used with requests that read or receive +data from a file or socket, where buffer selection is deferred until the kernel +is ready to transfer data, instead of when the IO is originally submitted. The +application must also set the +.I buf_group +field in the SQE, indicating which previously registered buffer group to select +a buffer from. + +.SH RETURN VALUE +None +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_register (3) +.BR io_uring_register_buffers (3) +.BR io_uring_register_buf_ring (3) diff --git a/man/io_uring_sqring_wait.3 b/man/io_uring_sqring_wait.3 new file mode 100644 index 0000000000000000000000000000000000000000..4d3a5676d3f373dd20e6aca2e427276a5f453fda --- /dev/null +++ b/man/io_uring_sqring_wait.3 @@ -0,0 +1,34 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_sqring_wait 3 "January 25, 2022" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_sqring_wait \- wait for free space in the SQ ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_sqring_wait(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The function +.BR io_uring_sqring_wait (3) +allows the caller to wait for space to free up in the SQ ring belonging to the +.I ring +param, which happens when the kernel side thread +has consumed one or more entries. If the SQ ring is currently non-full, +no action is taken. + +This feature can only be used when the ring has been setup with +.B IORING_SETUP_SQPOLL +and hence is using an offloaded approach to request submissions. + +.SH RETURN VALUE +On success it returns the free space. If the kernel does not support the +feature, -EINVAL is returned. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe (3), +.BR io_uring_wait_cqes (3) diff --git a/man/io_uring_submit.3 b/man/io_uring_submit.3 new file mode 100644 index 0000000000000000000000000000000000000000..f871b891989628f0394e64ad64b878a196563928 --- /dev/null +++ b/man/io_uring_submit.3 @@ -0,0 +1,46 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_submit 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_submit \- submit requests to the submission queue +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_submit(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_submit (3) +function submits the next events to the submission queue belonging to the +.IR ring . + +After the caller retrieves a submission queue entry (SQE) with +.BR io_uring_get_sqe (3) +and prepares the SQE using one of the provided helpers, it can be submitted with +.BR io_uring_submit (3) . + +.SH RETURN VALUE +On success +.BR io_uring_submit (3) +returns the number of submitted submission queue entries. On failure it returns +.BR -errno . +.SH NOTES +For any request that passes in data in a struct, that data must remain +valid until the request has been successfully submitted. It need not remain +valid until completion. Once a request has been submitted, the in-kernel +state is stable. Very early kernels (5.4 and earlier) required state to be +stable until the completion occurred. Applications can test for this +behavior by inspecting the +.B IORING_FEAT_SUBMIT_STABLE +flag passed back from +.BR io_uring_queue_init_params (3). +In general, the man pages for the individual prep helpers will have a note +mentioning this fact as well, if required for the given command. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit_and_wait (3), +.BR io_uring_submit_and_wait_timeout (3) diff --git a/man/io_uring_submit_and_get_events.3 b/man/io_uring_submit_and_get_events.3 new file mode 100644 index 0000000000000000000000000000000000000000..9e143d1dff4767151b73356ab7ee00dcd4a5947e --- /dev/null +++ b/man/io_uring_submit_and_get_events.3 @@ -0,0 +1,31 @@ +.\" Copyright (C), 2022 dylany +.\" You may distribute this file under the terms of the GNU Free +.\" Documentation License. +.TH io_uring_submit_and_get_events 3 "September 5, 2022" "liburing-2.3" "liburing Manual" +.SH NAME +io_uring_submit_and_get_events \- submit requests to the submission queue and flush completions +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_submit_and_get_events(struct io_uring *" ring ");" +.fi + +.SH DESCRIPTION +The +.BR io_uring_submit_and_get_events (3) +function submits the next events to the submission queue as with +.BR io_uring_submit (3) . +After submission it will flush CQEs as with +.BR io_uring_get_events (3) . + +The benefit of this function is that it does both with only one system call. + +.SH RETURN VALUE +On success +.BR io_uring_submit_and_get_events (3) +returns the number of submitted submission queue entries. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_get_events (3) diff --git a/man/io_uring_submit_and_wait.3 b/man/io_uring_submit_and_wait.3 new file mode 100644 index 0000000000000000000000000000000000000000..ad4dc8e5e535395d3b3384f763f96c7d6128ba93 --- /dev/null +++ b/man/io_uring_submit_and_wait.3 @@ -0,0 +1,38 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_submit_and_wait 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_submit_and_wait \- submit requests to the submission queue and wait for completion +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_submit_and_wait(struct io_uring *" ring "," +.BI " unsigned " wait_nr ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_submit_and_wait (3) +function submits the next requests from the submission queue belonging to the +.I ring +and waits for +.I wait_nr +completion events. + +After the caller retrieves a submission queue entry (SQE) with +.BR io_uring_get_sqe (3) +and prepares the SQE, it can be submitted with +.BR io_uring_submit_and_wait (3) . + +.SH RETURN VALUE +On success +.BR io_uring_submit_and_wait (3) +returns the number of submitted submission queue entries. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_submit_and_wait_timeout (3) diff --git a/man/io_uring_submit_and_wait_timeout.3 b/man/io_uring_submit_and_wait_timeout.3 new file mode 100644 index 0000000000000000000000000000000000000000..6533cec778f24a71ef62e8af6b4ccfe25f029f30 --- /dev/null +++ b/man/io_uring_submit_and_wait_timeout.3 @@ -0,0 +1,56 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_submit_and_wait_timeout 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_submit_and_wait_timeout \- submit requests to the submission queue and +wait for the completion with timeout +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_submit_and_wait_timeout(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr "," +.BI " unsigned " wait_nr "," +.BI " struct __kernel_timespec *" ts "," +.BI " sigset_t *" sigmask ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_submit_and_wait_timeout (3) +function submits the next requests from the submission queue belonging to the +.I ring +and waits for +.I wait_nr +completion events, or until the timeout +.I ts +expires. The completion events are stored in the +.I cqe_ptr +array. The +.I sigmask +specifies the set of signals to block. The prevailing signal mask is restored +before returning. + +After the caller retrieves a submission queue entry (SQE) with +.BR io_uring_get_sqe (3) +and prepares the SQE, it can be submitted with +.BR io_uring_submit_and_wait_timeout (3) . + +.SH RETURN VALUE +On success +.BR io_uring_submit_and_wait_timeout (3) +returns the number of submitted submission queue entries. On failure it returns +.BR -errno . +Note that in earlier versions of the liburing library, the return value was 0 +on success. +The most common failure case is not receiving a completion within the specified +timeout, +.B -ETIME +is returned in this case. +.SH SEE ALSO +.BR io_uring_get_sqe (3), +.BR io_uring_submit (3), +.BR io_uring_submit_and_wait (3), +.BR io_uring_wait_cqe (3) diff --git a/man/io_uring_unregister_buf_ring.3 b/man/io_uring_unregister_buf_ring.3 new file mode 100644 index 0000000000000000000000000000000000000000..ee87e860aafb02d254aa87c56d0b51e43faaabb5 --- /dev/null +++ b/man/io_uring_unregister_buf_ring.3 @@ -0,0 +1,30 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_buf_ring 3 "May 18, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_unregister_buf_ring \- unregister a previously registered buffer ring +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_buf_ring(struct io_uring *" ring ", +.BI " int " bgid ");" +.BI " +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_unregister_buf_ring (3) +function unregisters a previously registered shared buffer ring indicated by +.IR bgid . + +.SH RETURN VALUE +On success +.BR io_uring_unregister_buf_ring (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_register_buf_ring (3), +.BR io_uring_buf_ring_free (3) diff --git a/man/io_uring_unregister_buffers.3 b/man/io_uring_unregister_buffers.3 new file mode 100644 index 0000000000000000000000000000000000000000..f066679bf90d9460e1368383ee1d05cd2506d590 --- /dev/null +++ b/man/io_uring_unregister_buffers.3 @@ -0,0 +1,27 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_buffers 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_unregister_buffers \- unregister buffers for fixed buffer operations +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_buffers(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_unregister_buffers (3) +function unregisters the fixed buffers previously registered to the +.IR ring . + +.SH RETURN VALUE +On success +.BR io_uring_unregister_buffers (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_register_buffers (3) diff --git a/man/io_uring_unregister_eventfd.3 b/man/io_uring_unregister_eventfd.3 new file mode 120000 index 0000000000000000000000000000000000000000..665995711bf2ed5b6b836d7457336cbc4b676f6f --- /dev/null +++ b/man/io_uring_unregister_eventfd.3 @@ -0,0 +1 @@ +io_uring_register_eventfd.3 \ No newline at end of file diff --git a/man/io_uring_unregister_files.3 b/man/io_uring_unregister_files.3 new file mode 100644 index 0000000000000000000000000000000000000000..c468d0813f2489a0cb342573e89daddebe7d051b --- /dev/null +++ b/man/io_uring_unregister_files.3 @@ -0,0 +1,27 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_files 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_unregister_files \- unregister file descriptors +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_files(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_unregister_files (3) +function unregisters the file descriptors previously registered to the +.IR ring . + +.SH RETURN VALUE +On success +.BR io_uring_unregister_files (3) +returns 0. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_register_files (3) diff --git a/man/io_uring_unregister_iowq_aff.3 b/man/io_uring_unregister_iowq_aff.3 new file mode 120000 index 0000000000000000000000000000000000000000..c29bd44ea833e0cd095bbf6f68ceca3b26538815 --- /dev/null +++ b/man/io_uring_unregister_iowq_aff.3 @@ -0,0 +1 @@ +io_uring_register_iowq_aff.3 \ No newline at end of file diff --git a/man/io_uring_unregister_ring_fd.3 b/man/io_uring_unregister_ring_fd.3 new file mode 100644 index 0000000000000000000000000000000000000000..85aca14151591eb8f969c18aab8b22f249eaa16f --- /dev/null +++ b/man/io_uring_unregister_ring_fd.3 @@ -0,0 +1,32 @@ +.\" Copyright (C) 2022 Jens Axboe +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_ring_fd 3 "March 11, 2022" "liburing-2.2" "liburing Manual" +.SH NAME +io_uring_unregister_ring_fd \- unregister a ring file descriptor +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_ring_fd(struct io_uring *" ring ");" +.fi +.SH DESCRIPTION +.PP +.BR io_uring_unregister_ring_fd (3) +unregisters the file descriptor of the ring. + +Unregisters a ring descriptor previously registered with the task. This is +done automatically when +.BR io_uring_queue_exit (3) +is called, but can also be done to free up space for new ring registrations. +For more information on ring descriptor registration, see +.BR io_uring_register_ring_fd (3) + +.SH RETURN VALUE +Returns 1 on success, indicating that one file descriptor was unregistered, or +.BR -errno +on error. +.SH SEE ALSO +.BR io_uring_register_ring_fd (3), +.BR io_uring_register_files (3) diff --git a/man/io_uring_wait_cqe.3 b/man/io_uring_wait_cqe.3 new file mode 100644 index 0000000000000000000000000000000000000000..2656c8582a77c4b7923d5574e2f2787886c95d6e --- /dev/null +++ b/man/io_uring_wait_cqe.3 @@ -0,0 +1,40 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_wait_cqe 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_wait_cqe \- wait for one io_uring completion event +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_wait_cqe(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_wait_cqe (3) +function waits for an IO completion from the queue belonging to the +.I ring +param, waiting for it if necessary. If an event is already available in +the ring when invoked, no waiting will occur. The +.I cqe_ptr +param is filled in on success. + +After the caller has submitted a request with +.BR io_uring_submit (3), +the application can retrieve the completion with +.BR io_uring_wait_cqe (3). + +.SH RETURN VALUE +On success +.BR io_uring_wait_cqe (3) +returns 0 and the cqe_ptr param is filled in. On failure it returns +.BR -errno . +The return value indicates the result of waiting for a CQE, and it has no +relation to the CQE result itself. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqes (3) diff --git a/man/io_uring_wait_cqe_nr.3 b/man/io_uring_wait_cqe_nr.3 new file mode 100644 index 0000000000000000000000000000000000000000..34c5348ebd65c2e111a48556b64e76e4d30e1fb3 --- /dev/null +++ b/man/io_uring_wait_cqe_nr.3 @@ -0,0 +1,43 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_wait_cqe_nr 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_wait_cqe_nr \- wait for one or more io_uring completion events +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_wait_cqe_nr(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr "," +.BI " unsigned " wait_nr ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_wait_cqe_nr (3) +function returns +.I wait_nr +IO completion events from the queue belonging to the +.I ring +param, waiting for it if necessary. If the requested number of events are +already available in the ring when invoked, no waiting will occur. The +.I cqe_ptr +param is filled in on success. + +After the caller has submitted a request with +.BR io_uring_submit (3), +the application can retrieve the completion with +.BR io_uring_wait_cqe (3). + +.SH RETURN VALUE +On success +.BR io_uring_wait_cqe_nr (3) +returns 0 and the cqe_ptr param is filled in. On failure it returns +.BR -errno . +The return value indicates the result of waiting for a CQE, and it has no +relation to the CQE result itself. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqes (3) diff --git a/man/io_uring_wait_cqe_timeout.3 b/man/io_uring_wait_cqe_timeout.3 new file mode 100644 index 0000000000000000000000000000000000000000..1b562cc2d61f83477bf8623827ba5e47f13dbd33 --- /dev/null +++ b/man/io_uring_wait_cqe_timeout.3 @@ -0,0 +1,53 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_wait_cqe_timeout 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_wait_cqe_timeout \- wait for one io_uring completion event with timeout +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_wait_cqe_timeout(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr "," +.BI " struct __kernel_timespec *" ts ");" +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_wait_cqe_timeout (3) +function waits for one IO completion to be available from the queue belonging +to the +.I ring +param, waiting for it if necessary or until the timeout +.I ts +expires. If an event is already available in the ring when invoked, no waiting +will occur. + +The +.I cqe_ptr +param is filled in on success. + +If +.I ts +is specified and an older kernel without +.B IORING_FEAT_EXT_ARG +is used, the application does not need to call +.BR io_uring_submit (3) +before calling +.BR io_uring_wait_cqes (3). +For newer kernels with that feature flag set, there is no implied submit +when waiting for a request. + +.SH RETURN VALUE +On success +.BR io_uring_wait_cqes (3) +returns 0 and the cqe_ptr param is filled in. On failure it returns +.BR -errno . +The return value indicates the result of waiting for a CQE, and it has no +relation to the CQE result itself. +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe_timeout (3), +.BR io_uring_wait_cqe (3) diff --git a/man/io_uring_wait_cqes.3 b/man/io_uring_wait_cqes.3 new file mode 100644 index 0000000000000000000000000000000000000000..902b6572782f7cea5d36be3d149e02f41ea4fd9e --- /dev/null +++ b/man/io_uring_wait_cqes.3 @@ -0,0 +1,56 @@ +.\" Copyright (C) 2021 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_wait_cqes 3 "November 15, 2021" "liburing-2.1" "liburing Manual" +.SH NAME +io_uring_wait_cqes \- wait for one or more io_uring completion events +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_wait_cqes(struct io_uring *" ring "," +.BI " struct io_uring_cqe **" cqe_ptr "," +.BI " unsigned " wait_nr "," +.BI " struct __kernel_timespec *" ts "," +.BI " sigset_t *" sigmask "); +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_wait_cqes (3) +function returns +.I wait_nr +IO completions from the queue belonging to the +.I ring +param, waiting for them if necessary or until the timeout +.I ts +expires. The +.I sigmask +specifies the set of signals to block. The prevailing signal mask is restored +before returning. + +The +.I cqe_ptr +param is filled in on success. + +If +.I ts +is specified and an older kernel without +.B IORING_FEAT_EXT_ARG +is used, the application does not need to call +.BR io_uring_submit (3) +before calling +.BR io_uring_wait_cqes (3). +For newer kernels with that feature flag set, there is no implied submit +when waiting for a request. + +.SH RETURN VALUE +On success +.BR io_uring_wait_cqes (3) +returns 0 and the cqe_ptr param is filled in. On failure it returns +.BR -errno . +.SH SEE ALSO +.BR io_uring_submit (3), +.BR io_uring_wait_cqe_timeout (3), +.BR io_uring_wait_cqe (3)