Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-a...
最近更新: 6天前HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
最近更新: 6天前AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
最近更新: 6天前This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
最近更新: 6天前State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
最近更新: 6天前DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while...
最近更新: 6天前Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Offici...
最近更新: 6天前A Dockerized python Script to fetch Garmin health data and populate that in a InfluxDB Database, for visualization long term health trends with Gra...
最近更新: 6天前Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
最近更新: 6天前I made my AI think harder by making it argue with itself repeatedly. It works stupidly well.
最近更新: 6天前