# SwiftSoup **Repository Path**: curryluya-github/SwiftSoup ## Basic Information - **Project Name**: SwiftSoup - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-23 - **Last Updated**: 2025-04-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

SwiftSoup

![Platform OS X | iOS | tvOS | watchOS | Linux](https://img.shields.io/badge/platform-Linux%20%7C%20OS%20X%20%7C%20iOS%20%7C%20tvOS%20%7C%20watchOS-orange.svg) [![SPM compatible](https://img.shields.io/badge/SPM-compatible-4BC51D.svg?style=flat)](https://github.com/apple/swift-package-manager) ![🐧 linux: ready](https://img.shields.io/badge/%F0%9F%90%A7%20linux-ready-red.svg) ![Carthage compatible](https://img.shields.io/badge/Carthage-compatible-4BC51D.svg?style=flat) [![Build Status](https://travis-ci.org/scinfu/SwiftSoup.svg?branch=master)](https://travis-ci.org/scinfu/SwiftSoup) [![Version](https://img.shields.io/cocoapods/v/SwiftSoup.svg?style=flat)](http://cocoapods.org/pods/SwiftSoup) [![License](https://img.shields.io/cocoapods/l/SwiftSoup.svg?style=flat)](http://cocoapods.org/pods/SwiftSoup) [![Twitter](https://img.shields.io/badge/twitter-@scinfu-blue.svg?style=flat)](http://twitter.com/scinfu) --- SwiftSoup is a pure Swift library designed for seamless HTML parsing and manipulation across multiple platforms, including macOS, iOS, tvOS, watchOS, and Linux. It offers an intuitive API that leverages the best aspects of DOM traversal, CSS selectors, and jQuery-like methods for effortless data extraction and transformation. Built to conform to the **WHATWG HTML5 specification**, SwiftSoup ensures that parsed HTML is structured just like modern browsers do. ### Key Features: - **Parse and scrape** HTML from a URL, file, or string. - **Find and extract** data using DOM traversal or CSS selectors. - **Modify HTML** elements, attributes, and text dynamically. - **Sanitize user-submitted content** using a safe whitelist to prevent XSS attacks. - **Generate clean and well-structured HTML** output. SwiftSoup is designed to handle all types of HTML—whether perfectly structured or messy tag soup—ensuring a logical and reliable parse tree in every scenario. --- ## Swift Swift 5 ```>=2.0.0``` Swift 4.2 ```1.7.4``` ## Installation ### Cocoapods SwiftSoup is available through [CocoaPods](http://cocoapods.org). To install it, simply add the following line to your Podfile: ```ruby pod 'SwiftSoup' ``` ### Carthage SwiftSoup is also available through [Carthage](https://github.com/Carthage/Carthage). To install it, simply add the following line to your Cartfile: ```ruby github "scinfu/SwiftSoup" ``` ### Swift Package Manager SwiftSoup is also available through [Swift Package Manager](https://github.com/apple/swift-package-manager). To install it, simply add the dependency to your Package.Swift file: ```swift ... dependencies: [ .package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0"), ], targets: [ .target( name: "YourTarget", dependencies: ["SwiftSoup"]), ] ... ``` --- ## Usage Examples ### Parse an HTML Document ```swift import SwiftSoup let html = """ Example

Hello, SwiftSoup!

""" let document: Document = try SwiftSoup.parse(html) print(try document.title()) // Output: Example ``` --- ### Select Elements with CSS Query ```swift let html = """

SwiftSoup is powerful!

Parsing HTML in Swift

""" let document = try SwiftSoup.parse(html) let messages = try document.select("p.message") for message in messages { print(try message.text()) } // Output: // SwiftSoup is powerful! // Parsing HTML in Swift ``` --- ### Extract Text and Attributes ```swift let html = "Visit the site" let document = try SwiftSoup.parse(html) let link = try document.select("a").first() if let link = link { print(try link.text()) // Output: Visit the site print(try link.attr("href")) // Output: https://example.com } ``` --- ### Modify the DOM ```swift var document = try SwiftSoup.parse("
") let div = try document.select("#content").first() try div?.append("

New content added!

") print(try document.html()) // Output: //

New content added!

``` --- ### Clean HTML for Security (Whitelist) ```swift let dirtyHtml = "Important text" let cleanHtml = try SwiftSoup.clean(dirtyHtml, Whitelist.basic()) print(cleanHtml) // Output: Important text ``` --- ### Use CSS selectors to find elements (from [jsoup](https://jsoup.org/cookbook/extracting-data/selector-syntax)) #### Selector overview - `tagname`: find elements by tag, e.g. `div` - `#id`: find elements by ID, e.g. `#logo` - `.class`: find elements by class name, e.g. `.masthead` - `[attribute]`: elements with attribute, e.g. `[href]` - `[^attrPrefix]`: elements with an attribute name prefix, e.g. `[^data-]` finds elements with HTML5 dataset attributes - `[attr=value]`: elements with attribute value, e.g. `[width=500]` (also quotable, like `[data-name='launch sequence']`) - `[attr^=value]`, `[attr$=value]`, `[attr*=value]`: elements with attributes that start with, end with, or contain the value, e.g. `[href*=/path/]` - `[attr~=regex]`: elements with attribute values that match the regular expression; e.g. `img[src~=(?i)\.(png|jpe?g)]` - `*`: all elements, e.g. `*` - `[*]` selects elements that have any attribute. e.g. `p[*]` finds paragraphs with at least one attribute, and `p:not([*])` finds those with no attributes. - `ns|tag`: find elements by tag in a namespace prefix, e.g. `dc|name` finds `` elements - `*|tag`: find elements by tag in any namespace prefix, e.g. `*|name` finds `` and `` elements - `:empty`: selects elements that have no children (ignoring blank text nodes, comments, etc.); e.g. `li:empty` #### Selector combinations - `el#id`: elements with ID, e.g. `div#logo` - `el.class`: elements with class, e.g. `div.masthead` - `el[attr]`: elements with attribute, e.g. `a[href]` - Any combination, e.g. `a[href].highlight` - `ancestor child`: child elements that descend from ancestor, e.g. `.body p` finds `p` elements anywhere under a block with class "body" - `parent > child`: child elements that descend directly from parent, e.g. `div.content > p` finds `p` elements; and `body > *` finds the direct children of the body tag - `siblingA + siblingB`: finds sibling B element immediately preceded by sibling A, e.g. `div.head + div` - `siblingA ~ siblingX`: finds sibling X element preceded by sibling A, e.g. `h1 ~ p` - `el, el, el`: group multiple selectors, find unique elements that match any of the selectors; e.g. `div.masthead, div.logo` #### Pseudo selectors - `:has(selector)`: find elements that contain elements matching the selector; e.g. `div:has(p)` - `:is(selector)`: find elements that match any of the selectors in the selector list; e.g. `:is(h1, h2, h3, h4, h5, h6)` finds any heading element - `:not(selector)`: find elements that do not match the selector; e.g. `div:not(.logo)` - `:lt(n)`: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than `n`; e.g. `td:lt(3)` - `:gt(n)`: find elements whose sibling index is greater than `n`; e.g. `div p:gt(2)` - `:eq(n)`: find elements whose sibling index is equal to `n`; e.g. `form input:eq(1)` - Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc #### Text content pseudo selectors - `:contains(text)`: find elements that contain (directly or via children) the given normalized text. The search is case-insensitive; e.g. `div:contains(jsoup)` - `:containsOwn(text)`: find elements whose own text directly contains the given text. e.g. `p:containsOwn(jsoup)` - `:containsData(text)`: selects elements that contain the specified data (e.g. within `