# app-llm
**Repository Path**: myyunrep/app-llm
## Basic Information
- **Project Name**: app-llm
- **Description**: llm app learn
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-28
- **Last Updated**: 2025-08-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# README
## Road Map
- [ ] 多进程支持
- [ ] html to markdown 的格式转换优化
- [ ] 包括 code 部分
- [ ] code 部分可能会带序号
- [ ] 原文保存/调试
- [ ] url 可配置/cli
- [ ] url 和文件同时支持
- [ ] 双语对照
- [ ] prompt 需要增加一些专业术语保留的例子
- [ ] html2md 返回的 markdown 文件,需要进行后处理来去除冗余元素,现在观察到的有:
- 去除到标题 `#` 之前的所有冗余元素
- [ ] PDF 格式支持自动翻译
- [ ] 多个 AI 平台的支持和抽象
- 暂时支持的是 deepseek
- 后续看是不是支持 kimi、元宝、千问等
- [ ] 不同 LLM 模型的对比测评
## 问题
https://oswalt.dev/2020/11/anatomy-of-a-binary-executable/ 的网站 title 解析错误
- [ ] https://tweedegolf.nl/en/blog/154/what-is-my-fuzzer-doing 网站的 title 解析错误
- newspaper3k 默认会尝试提取 `
`、`` 或 OpenGraph 标签中的标题。如果网页结构复杂或标题不在标准位置,可能会导致提取失败。
- 使用 soup 来手动发现,从 ` --> --> default title` 来给出 title
- ? 是否出现 `` 比 `` 更合适作为标题的场景?
```python
from newspaper import Article
from bs4 import BeautifulSoup
article = Article(url)
article.download()
article.parse()
# 用 BeautifulSoup 手动提取标题
soup = BeautifulSoup(article.html, 'html.parser')
# 优先从提取,其次,最后使用默认值
title = (
soup.find('h1').text.strip() if soup.find('h1')
else soup.title.text.strip() if soup.title
else "Default Title"
)[:100] # 限制100字符
print("Title:", title)
```
### 可能需要配置 http 代理来访问某些 blog
在 wsl 上,可以使用脚本 `set_proxy.sh`:
```bash
#!/bin/bash
proxy_ip=$(cat /etc/resolv.conf | grep nameserver | awk '{ print $2 }')
echo "Setting http_proxy to http://${proxy_ip}:1080"
export http_proxy="http://${proxy_ip}:1080"
export https_proxy="http://${proxy_ip}:1080"
echo "http_proxy set to $http_proxy"
```
来设置 http 代理为 windows 的代理
运行脚本时,确保使用 source 或 . 命令,而不是直接运行脚本。因为直接运行脚本会在子进程中设置环境变量,不会影响当前 shell 环境。
```bash
chmod +x set_proxy.sh
source set_proxy.sh # 或者 . set_proxy.sh
```
可以使用函数来运行, 将下面的函数放到 `.bashrc` 中:
```bash
set_proxy() {
gateway_ip=$(ip route show | grep -i default | awk '{print $3}')
if [ -z "$gateway_ip" ]; then
echo "Error: Could not determine default gateway"
return 1
fi
export http_proxy="${gateway_ip}:10010"
echo "Proxy set to: $http_proxy"
}
```
- `source ~/.bashrc` # 如果写入到了.bashrc
- `set_proxy` # 直接调用函数
单独脚本:
```bash
#!/bin/bash
gateway_ip=$(ip route show | grep -i default | awk '{print $3}')
if [ -z "$gateway_ip" ]; then
echo "Error: Could not determine default gateway" >&2
exit 1
fi
export http_proxy="${gateway_ip}:10010"
echo "Proxy set to: $http_proxy"
```
- 使用方式: `source set_proxy.sh # 或简写为 `. set_proxy.sh`
参考:
- [ ] https://ysantos.com/blog/malloc-in-rust 这个会 404 需要增加异常处理
### 单次延迟执行
```bash
# 安装 at
sudo apt-get update
sudo apt-get install at
# 启动任务
echo "cd /path/to/your/project && uv run main.py" | at now + 1 hour
# 时间
at $(date -d "now + 30 min" +"%H:%M")
at now + 30 min
at now + 30 minutes
at 15:30 # 假设当前时间是14:30
at now + 60 minutes
at now + 1 hour
at tomorrow 10:00
at next week
at now + 90 minutes
at midnight
at noon
```
- now + 1 hour 表示 1 小时后执行。
- 使用绝对路径(如 /path/to/your/project)避免路径错误。
- 可通过 atq 查看待执行任务列表。
```bash
atq # 查看任务队列
sudo service atd status # 确保 at 服务运行
atrm 2 # 删除ID为2的任务
```
at 任务默认不继承当前终端的环境变量。若需特定变量,可在命令中显式设置:
```bash
#!/bin/bash
source ~/.bashrc # 加载用户环境
export PATH=/usr/local/bin:$PATH
```
```bash
echo "export PATH=/usr/local/bin:$PATH && cd /path/to/project && uv run main.py" | at now + 30 min
# 日志记录
echo "cd /path/to/project && uv run main.py >> /tmp/uv.log 2>&1" | at now + 30 min
```
长脚本支持:
```bash
# 使用 <> /tmp/uv.log 2>&1
# 其他命令...
EOF
# 调用外部脚本文件(推荐长脚本)
# 1. 创建独立脚本文件
将命令写入脚本文件(例如 /path/to/run_uv.sh):
# 2. 赋予执行权限
chmod +x /path/to/run_uv.sh
# 3. 通过 at 调用脚本
echo "/path/to/run_uv.sh" | at now + 30 minutes
# 脚本文件 /path/to/run_uv.sh,通过 -f 来指定文件
at now + 30 minutes -f /path/to/run_uv.sh
```
## Blog
https://gendignoux.com/blog/2025/03/03/rust-interning-2000x.html
https://ryhl.io/blog/async-what-is-blocking/
- 中文版:https://bingowith.me/2021/05/09/translation-async-what-is-blocking/
todo:
- https://my.oschina.net/emacs_8689417/blog/17011895
- https://diveintosystems.org
- https://github.com/alexpusch/rust-magic-patterns
- https://kornel.ski/rust-c-speed
- https://fasterthanli.me/articles/introducing-facet-reflection-for-rust
- https://blog.m-ou.se/format-args/
- https://without.boats/blog/pin/
- https://without.boats/blog/pinned-places/
## Book
- [rust-for-c-programmers](https://rust-for-c-programmers.com/)
- https://www.evolvebenchmark.com/blog-posts/how-we-wrap-external-c-and-cpp-libraries-in-rust
- 我们如何在 Rust 中包装外部 C 和 C++ 库
- https://d34dl0ck.me/rust-bites-designing-error-types-in-rust-libraries/index.html
- Rust 库中的错误类型设计
## video and audio
https://www.youtube.com/watch?v=zwO3Vnp7DrY
- David 介绍了他的链接器 Wild。他讲解了编译器的工作原理、链接器的概念以及 Rust 如何帮助他编写雄心勃勃的大型项目。以下是他的一些引言:“我的主要兴趣是让链接器尽可能快,尤其是在开发过程中。 ”
## Some Author
https://without.boats/
https://burntsushi.net/
https://smallcultfollowing.com/babysteps/
https://epage.github.io/blog/
https://fasterthanli.me/
https://blog.m-ou.se/
https://matklad.github.io/
https://www.ralfj.de/blog/
https://manishearth.github.io/
https://seanmonstar.com/blog/
https://blog.yoshuawuyts.com/
https://ryhl.io/
- https://corrode.dev/blog/idiomatic-rust-resources/
- 惯用 Rust 资源,并且有 blog
- https://osblog.stephenmarz.com/index.html
- The Adventures of OS: Making a RISC-V Operating System using Rust
# My todo
写一篇关于 Rust 的工具链的文章介绍:
- rustfmt
- rust-doc
- cargo
- many tools
> https://doc.rust-lang.org/rustdoc/what-is-rustdoc.html
# todo tutorial
[12-factor-agents](https://github.com/humanlayer/12-factor-agents)
[Repo to accompany my mastering LLM engineering course](https://github.com/ed-donner/llm_engineering)
[prompt-eng-interactive-tutorial](https://github.com/anthropics/prompt-eng-interactive-tutorial/tree/master/AmazonBedroc)
# todo blog
[The Rust for Linux Workshop](https://kangrejos.com/)
https://www.less-bug.com/archives/
- [基于 DevContainer 的 Rust for Linux 内核开发环境搭建笔记(也支持纯C/混合开发)](https://www.less-bug.com/posts/setting-up-a-rust-for-linux-kernel-development-environment-using-devcontainers/)
- [C:实现一个迷你无栈协程框架——Minico](https://www.less-bug.com/posts/c-implement-a-mini-stackless-coroutine-framework-minico/)
- [Rust:认识各种盒子吧!(Box, Rc, Arc, Cell, RefCell)](https://www.less-bug.com/posts/rust-get-to-know-all-kinds-of-boxes-box-rc-arc-cell-refcell/)
- [在 Rust 中捕获 Ctrl-C 信号的技巧](https://www.less-bug.com/posts/tips-for-catching-ctrl-c-signals-in-rust/)
- [https://www.less-bug.com/posts/usage-examples-of-mutex-and-rwlock-in-rust/](https://www.less-bug.com/posts/usage-examples-of-mutex-and-rwlock-in-rust/)
- [Rust:学习 Rc、Arc 和 Weak 并动手实现它](https://www.less-bug.com/posts/rust-learn-rc-arc-and-weak-and-implement-it-yourself/)
- [Rust:一种线程崩溃处理机制研究](https://www.less-bug.com/posts/rust-research-on-a-thread-crash-handling-mechanism/)
- [Rust 开发编译器速成(一):计算解释器](https://www.less-bug.com/posts/rust-development-compiler-crash-1-calc-interpreter/)
- [Rust:Arena 分配器使用实践](https://www.less-bug.com/posts/rust-arena-allocator-usage-practice/)
- [Rust 错误分析:多次借用问题](https://www.less-bug.com/posts/rust-error-analysis-multiple-borrow-problems/)
- [SHA256 哈希算法原理和 Rust 实现](https://www.less-bug.com/posts/sha256-hash-algorithm-principle-and-rust-implementation/)
- [Rust 生命周期(有效期)通俗解释](https://www.less-bug.com/posts/rust-life-cycle-for-popular-explanation/)