An GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
FlowGram是一个基于节点的流程构建引擎,帮助开发者快速创建固定布局或自由连接布局的 workflows。
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
最近更新: 2个月前The official code for "GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning"
最近更新: 2个月前Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation".
最近更新: 3个月前The official repo for "D -Attn: Decomposed Attention for Large Vision-and-Language Model"
最近更新: 3个月前Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
最近更新: 3个月前🔥 [EMNLP 2025] Official open-source repo for Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
最近更新: 3个月前