# jupyter-text2code **Repository Path**: rogerspy/jupyter-text2code ## Basic Information - **Project Name**: jupyter-text2code - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-08 - **Last Updated**: 2021-08-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Text2Code for Jupyter notebook ### A proof-of-concept jupyter extension which converts english queries into relevant python code. ![](jupyter-text2code-demo.gif) ### Blog post with more details: #### [Data analysis made easy: Text2Code for Jupyter notebook](https://towardsdatascience.com/data-analysis-made-easy-text2code-for-jupyter-notebook-5380e89bb493?source=friends_link&sk=2c46fff2c31f7fe59b667350e4596b18) ### Demo Video: #### [Text2Code for Jupyter notebook](https://www.youtube.com/watch?v=3gZ7_9W-TJs) ## Supported Operating Systems: - Ubuntu - macOS ## Installation ### NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version. ``` pip uninstall mopp ``` #### CPU-only install: For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install. ``` export JUPYTER_TEXT2CODE_MODE="cpu" ``` #### GPU install dependencies: ``` sudo apt-get install libopenblas-dev libomp-dev ``` #### Installation commands: ``` git clone https://github.com/deepklarity/jupyter-text2code.git cd jupyter-text2code pip install . jupyter nbextension enable jupyter-text2code/main ``` ## Uninstallation: ``` pip uninstall jupyter-text2code ``` ## Usage Instructions: - Start Jupyter notebook server by running the following command: ``` jupyter notebook ``` - If you don't see ``` Nbextensions``` tab in Jupyter notebook run the following command:``` jupyter contrib nbextension install --user ``` - You can open the sample ``` notebooks/ctds.ipynb``` notebook for testing - If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from `tensorflow_hub` - Click on the `Terminal` Icon which appears on the menu (to activate the extension) - Type "help" to see a list of currently supported commands in the repo - Watch [Demo video](https://www.youtube.com/watch?v=3gZ7_9W-TJs) for some examples ## Docker containers for jupyter-text2code We have published CPU and GPU images to docker hub with all dependencies pre-installed. ##### Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions. ##### CPU image size: ``` 1.51 GB ``` ##### GPU image size: ``` 2.56 GB ``` ## Model training: ### Generate training data: From a list of templates present at `jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv`, generate training data by running the following command: ``` cd scripts && python generate_training_data.py ``` This command will generate data for intent matching and NER(Named Entity Recognition). ### Create intent index faiss Use the generated data to create a intent-matcher using faiss. ``` cd scripts && python create_intent_index.py ``` ### Train NER model ``` cd scripts && python train_spacy_ner.py ``` ### Steps to add more intents: - Add more templates in `ner_templates` with a new intent_id - Generate training data. Modify `generate_training_data.py` if different generation techniques are needed or if introducing a new entity. - Train intent index - Train NER model - modify `jupyter_text2code/jupyter_text2code_serverextension/__init__.py` with new intent's condition and add actual code for the intent - Reinstall plugin by running: `pip install .` ### TODO: - [x] Publish Docker image - [ ] Refactor code and make it mode modular, remove duplicate code, etc - [ ] Add support for Windows - [ ] Add support for more commands - [ ] Improve intent detection and NER - [ ] Explore sentence Paraphrasing to generate higher-quality training data - [ ] Gather real-world variable names, library names as opposed to randomly generating them - [ ] Try NER with a transformer-based model - [ ] With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline - [ ] Create a survey to collect linguistic data - [ ] Add Speech2Code support #### Authored By: - [Deepak Rawat](https://twitter.com/dsr_ai) - [Kartik Godawat](https://twitter.com/kartik_godawat)