# app-llm **Repository Path**: myyunrep/app-llm ## Basic Information - **Project Name**: app-llm - **Description**: llm app learn - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-28 - **Last Updated**: 2025-08-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # README ## Road Map - [ ] 多进程支持 - [ ] html to markdown 的格式转换优化 - [ ] 包括 code 部分 - [ ] code 部分可能会带序号 - [ ] 原文保存/调试 - [ ] url 可配置/cli - [ ] url 和文件同时支持 - [ ] 双语对照 - [ ] prompt 需要增加一些专业术语保留的例子 - [ ] html2md 返回的 markdown 文件,需要进行后处理来去除冗余元素,现在观察到的有: - 去除到标题 `#` 之前的所有冗余元素 - [ ] PDF 格式支持自动翻译 - [ ] 多个 AI 平台的支持和抽象 - 暂时支持的是 deepseek - 后续看是不是支持 kimi、元宝、千问等 - [ ] 不同 LLM 模型的对比测评 ## 问题 https://oswalt.dev/2020/11/anatomy-of-a-binary-executable/ 的网站 title 解析错误 - [ ] https://tweedegolf.nl/en/blog/154/what-is-my-fuzzer-doing 网站的 title 解析错误 - newspaper3k 默认会尝试提取 `

`、`` 或 OpenGraph 标签中的标题。如果网页结构复杂或标题不在标准位置,可能会导致提取失败。 - 使用 soup 来手动发现,从 `<h1> --> <title> --> default title` 来给出 title - ? 是否出现 `<title>` 比 `<h1>` 更合适作为标题的场景? ```python from newspaper import Article from bs4 import BeautifulSoup article = Article(url) article.download() article.parse() # 用 BeautifulSoup 手动提取标题 soup = BeautifulSoup(article.html, 'html.parser') # 优先从<h1>提取,其次<title>,最后使用默认值 title = ( soup.find('h1').text.strip() if soup.find('h1') else soup.title.text.strip() if soup.title else "Default Title" )[:100] # 限制100字符 print("Title:", title) ``` ### 可能需要配置 http 代理来访问某些 blog 在 wsl 上,可以使用脚本 `set_proxy.sh`: ```bash #!/bin/bash proxy_ip=$(cat /etc/resolv.conf | grep nameserver | awk '{ print $2 }') echo "Setting http_proxy to http://${proxy_ip}:1080" export http_proxy="http://${proxy_ip}:1080" export https_proxy="http://${proxy_ip}:1080" echo "http_proxy set to $http_proxy" ``` 来设置 http 代理为 windows 的代理 运行脚本时,确保使用 source 或 . 命令,而不是直接运行脚本。因为直接运行脚本会在子进程中设置环境变量,不会影响当前 shell 环境。 ```bash chmod +x set_proxy.sh source set_proxy.sh # 或者 . set_proxy.sh ``` 可以使用函数来运行, 将下面的函数放到 `.bashrc` 中: ```bash set_proxy() { gateway_ip=$(ip route show | grep -i default | awk '{print $3}') if [ -z "$gateway_ip" ]; then echo "Error: Could not determine default gateway" return 1 fi export http_proxy="${gateway_ip}:10010" echo "Proxy set to: $http_proxy" } ``` - `source ~/.bashrc` # 如果写入到了.bashrc - `set_proxy` # 直接调用函数 单独脚本: ```bash #!/bin/bash gateway_ip=$(ip route show | grep -i default | awk '{print $3}') if [ -z "$gateway_ip" ]; then echo "Error: Could not determine default gateway" >&2 exit 1 fi export http_proxy="${gateway_ip}:10010" echo "Proxy set to: $http_proxy" ``` - 使用方式: `source set_proxy.sh # 或简写为 `. set_proxy.sh` 参考: - [ ] https://ysantos.com/blog/malloc-in-rust 这个会 404 需要增加异常处理 ### 单次延迟执行 ```bash # 安装 at sudo apt-get update sudo apt-get install at # 启动任务 echo "cd /path/to/your/project && uv run main.py" | at now + 1 hour # 时间 at $(date -d "now + 30 min" +"%H:%M") at now + 30 min at now + 30 minutes at 15:30 # 假设当前时间是14:30 at now + 60 minutes at now + 1 hour at tomorrow 10:00 at next week at now + 90 minutes at midnight at noon ``` - now + 1 hour 表示 1 小时后执行。 - 使用绝对路径(如 /path/to/your/project)避免路径错误。 - 可通过 atq 查看待执行任务列表。 ```bash atq # 查看任务队列 sudo service atd status # 确保 at 服务运行 atrm 2 # 删除ID为2的任务 ``` at 任务默认不继承当前终端的环境变量。若需特定变量,可在命令中显式设置: ```bash #!/bin/bash source ~/.bashrc # 加载用户环境 export PATH=/usr/local/bin:$PATH ``` ```bash echo "export PATH=/usr/local/bin:$PATH && cd /path/to/project && uv run main.py" | at now + 30 min # 日志记录 echo "cd /path/to/project && uv run main.py >> /tmp/uv.log 2>&1" | at now + 30 min ``` 长脚本支持: ```bash # 使用 <<EOF 语法将多行命令传递给 at at now + 30 minutes <<EOF cd /path/to/project source venv/bin/activate # 如果需要虚拟环境 uv run main.py --arg1 value1 >> /tmp/uv.log 2>&1 # 其他命令... EOF # 调用外部脚本文件(推荐长脚本)​ # 1. 创建独立脚本文件​ 将命令写入脚本文件(例如 /path/to/run_uv.sh): # 2. 赋予执行权限​ chmod +x /path/to/run_uv.sh # 3. 通过 at 调用脚本​ echo "/path/to/run_uv.sh" | at now + 30 minutes # ​​脚本文件 /path/to/run_uv.sh​​,通过 -f 来指定文件 at now + 30 minutes -f /path/to/run_uv.sh ``` ## Blog https://gendignoux.com/blog/2025/03/03/rust-interning-2000x.html https://ryhl.io/blog/async-what-is-blocking/ - 中文版:https://bingowith.me/2021/05/09/translation-async-what-is-blocking/ todo: - https://my.oschina.net/emacs_8689417/blog/17011895 - https://diveintosystems.org - https://github.com/alexpusch/rust-magic-patterns - https://kornel.ski/rust-c-speed - https://fasterthanli.me/articles/introducing-facet-reflection-for-rust - https://blog.m-ou.se/format-args/ - https://without.boats/blog/pin/ - https://without.boats/blog/pinned-places/ ## Book - [rust-for-c-programmers](https://rust-for-c-programmers.com/) - https://www.evolvebenchmark.com/blog-posts/how-we-wrap-external-c-and-cpp-libraries-in-rust - 我们如何在 Rust 中包装外部 C 和 C++ 库 - https://d34dl0ck.me/rust-bites-designing-error-types-in-rust-libraries/index.html - Rust 库中的错误类型设计 ## video and audio https://www.youtube.com/watch?v=zwO3Vnp7DrY - David 介绍了他的链接器 Wild。他讲解了编译器的工作原理、链接器的概念以及 Rust 如何帮助他编写雄心勃勃的大型项目。以下是他的一些引言:“我的主要兴趣是让链接器尽可能快,尤其是在开发过程中。 ” ## Some Author https://without.boats/ https://burntsushi.net/ https://smallcultfollowing.com/babysteps/ https://epage.github.io/blog/ https://fasterthanli.me/ https://blog.m-ou.se/ https://matklad.github.io/ https://www.ralfj.de/blog/ https://manishearth.github.io/ https://seanmonstar.com/blog/ https://blog.yoshuawuyts.com/ https://ryhl.io/ - https://corrode.dev/blog/idiomatic-rust-resources/ - 惯用 Rust 资源,并且有 blog - https://osblog.stephenmarz.com/index.html - The Adventures of OS: Making a RISC-V Operating System using Rust # My todo 写一篇关于 Rust 的工具链的文章介绍: - rustfmt - rust-doc - cargo - many tools > https://doc.rust-lang.org/rustdoc/what-is-rustdoc.html # todo tutorial [12-factor-agents](https://github.com/humanlayer/12-factor-agents) [Repo to accompany my mastering LLM engineering course](https://github.com/ed-donner/llm_engineering) [prompt-eng-interactive-tutorial](https://github.com/anthropics/prompt-eng-interactive-tutorial/tree/master/AmazonBedroc) # todo blog [The Rust for Linux Workshop](https://kangrejos.com/) https://www.less-bug.com/archives/ - [基于 DevContainer 的 Rust for Linux 内核开发环境搭建笔记(也支持纯C/混合开发)](https://www.less-bug.com/posts/setting-up-a-rust-for-linux-kernel-development-environment-using-devcontainers/) - [C:实现一个迷你无栈协程框架——Minico](https://www.less-bug.com/posts/c-implement-a-mini-stackless-coroutine-framework-minico/) - [Rust:认识各种盒子吧!(Box, Rc, Arc, Cell, RefCell)](https://www.less-bug.com/posts/rust-get-to-know-all-kinds-of-boxes-box-rc-arc-cell-refcell/) - [在 Rust 中捕获 Ctrl-C 信号的技巧](https://www.less-bug.com/posts/tips-for-catching-ctrl-c-signals-in-rust/) - [https://www.less-bug.com/posts/usage-examples-of-mutex-and-rwlock-in-rust/](https://www.less-bug.com/posts/usage-examples-of-mutex-and-rwlock-in-rust/) - [Rust:学习 Rc、Arc 和 Weak 并动手实现它](https://www.less-bug.com/posts/rust-learn-rc-arc-and-weak-and-implement-it-yourself/) - [Rust:一种线程崩溃处理机制研究](https://www.less-bug.com/posts/rust-research-on-a-thread-crash-handling-mechanism/) - [Rust 开发编译器速成(一):计算解释器](https://www.less-bug.com/posts/rust-development-compiler-crash-1-calc-interpreter/) - [Rust:Arena 分配器使用实践](https://www.less-bug.com/posts/rust-arena-allocator-usage-practice/) - [Rust 错误分析:多次借用问题](https://www.less-bug.com/posts/rust-error-analysis-multiple-borrow-problems/) - [SHA256 哈希算法原理和 Rust 实现](https://www.less-bug.com/posts/sha256-hash-algorithm-principle-and-rust-implementation/) - [Rust 生命周期(有效期)通俗解释](https://www.less-bug.com/posts/rust-life-cycle-for-popular-explanation/)