# jina **Repository Path**: deeplearningrepos/jina ## Basic Information - **Project Name**: jina - **Description**: An easier way to build neural search on the cloud - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-30 - **Last Updated**: 2021-08-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
An easier way to build neural search on the cloud
Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud. β±οΈ **Time Saver** - *The* design pattern of neural search systems, from zero to a production-ready system in minutes. π± **Full-Stack Ownership** - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools. π **Universal Search** - Large-scale indexing and querying of unstructured data: video, image, long/short text, music, source code, etc. π§ **First-Class AI Models** - First-class support for [state-of-the-art AI models](https://docs.jina.ai/chapters/all_exec.html), easily usable and extendable with a Pythonic interface. π©οΈ **Fast & Cloud Ready** - Decentralized architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket. β€οΈ **Made with Love** - Never compromise on quality, actively maintained by a [passionate full-time, venture-backed team](https://jina.ai). ---Docs β’ Hello World β’ Quick Start β’ Learn β’ Examples β’ Contribute β’ Jobs β’ Website β’ Slack
## Installation | π¦
Index | ```python # save four docs (both embedding and structured info) into storage with f: f.index(docs, on_done=print) ``` |
Search |
```python
# retrieve top-3 neighbours of π², this print π²π¦π’ with score 0, 1, 1 respectively
with f:
f.search(docs[0], top_k=3, on_done=lambda x: print(x.docs[0].matches))
```
```json
{"id": "π²", "tags": {"guardian": "Azure Dragon", "position": "East"}, "embedding": {"dense": {"buffer": "AAAAAAAAAAAAAAAAAAAAAA==", "shape": [2], "dtype": " |
Update | ```python # update π² embedding in the storage docs[0].embedding = np.array([1, 1]) with f: f.update(docs[0]) ``` |
Delete | ```python # remove π¦π² Documents from the storage with f: f.delete(['π¦', 'π²']) ``` |
```python import numpy from jina import Document d0 = Document(id='π²', embedding=np.array([0, 0])) d1 = Document(id='π¦', embedding=np.array([1, 0])) d2 = Document(id='π’', embedding=np.array([0, 1])) d3 = Document(id='π―', embedding=np.array([1, 1])) d0.chunks.append(d1) d0.chunks[0].chunks.append(d2) d0.matches.append(d3) d0.plot() # simply `d0` on JupyterLab ``` |
|
Input |
Example of index /search
|
Explain |
numpy.ndarray
|
```python with f: f.index_ndarray(numpy.random.random([4,2])) ``` | Input four `Document`s, each `document.blob` is an `ndarray([2])` |
CSV | ```python with f, open('index.csv') as fp: f.index_csv(fp, field_resolver={'pic_url': 'uri'}) ``` | Each line in `index.csv` is constructed as a `Document`, CSV field `pic_url` mapped to `document.uri`. |
JSON Lines/ndjson /LDJSON
|
```python with f, open('index.ndjson') as fp: f.index_ndjson(fp, field_resolver={'question_id': 'id'}) ``` | Each line in `index.ndjson` is constructed as a `Document`, JSON field `question_id` mapped to `document.id`. |
Files with wildcards | ```python with f: f.index_files(['/tmp/*.mp4', '/tmp/*.pdf']) ``` | Each file captured is constructed as a `Document`, and Document content (`text`, `blob`, `buffer`) is auto-guessed & filled. |
123.456.78.9 | ```bash # have docker installed docker run --name=jinad --network=host -v /var/run/docker.sock:/var/run/docker.sock jinaai/jina:latest-daemon --port-expose 8000 # to stop it docker rm -f jinad ``` |
Local | ```python import numpy as np from jina import Flow f = (Flow() .add() .add(name='gpu_pod', uses='mwu_encoder.yml', host='123.456.78.9:8000', parallel=2, upload_files=['mwu_encoder.py']) .add()) with f: f.index_ndarray(np.random.random([10, 100]), output=print) ``` |
![]() |
![]() |
![]() |
π |
NLP Semantic Wikipedia Search with Transformers and DistilBERTBrand new to neural search? See a simple text-search example to understand how Jina works |
π |
Add Incremental Indexing to Wikipedia SearchIndex more effectively by adding incremental indexing to your Wikipedia search |
π |
Search Lyrics with Transformers and PyTorchGet a better understanding of chunks by searching a lyrics database. Now with shiny front-end! |
πΌοΈ |
Google's Big Transfer Model in (PokΓ©-)ProductionUse SOTA visual representation for searching PokΓ©mon! |
π§ |
Search YouTube audio data with VggishA demo of neural search for audio data based Vggish model. |
ποΈ |
Search Tumblr GIFs with KerasEncoderUse prefetching and sharding to improve the performance of your index and query Flow when searching animated GIFs. |