AI OpenSource List


  • TensorFlow #Project#: TensorFlow is an open source software library for numerical computation using data flow graphs.
  • Pytorch #Project#: Tensors and Dynamic neural networks in Python with strong GPU acceleration
  • scikit-learn #Project#: scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
  • SciPy #Project#: SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.
  • 2019-Deep Java Library (DJL) #Project#: An Engine-Agnostic Deep Learning Framework.
  • 2019-NNI #Project#: An open source AutoML toolkit for neural architecture search, model compression and hyper-parameter tuning.
  • 2019-Thinc #Project#: A refreshing functional take on deep learning, compatible with your favorite libraries.
  • 2019-Streamlit #Project#: Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours! All in pure Python. All for free.
  • 2020-MegEngine #Project#: MegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架。
  • 2021-Kedro #Project#: Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.


  • tensorflow-playground: Play with neural networks!
  • Sonnet #Project#: Sonnet is a library built on top of TensorFlow for building complex neural networks.
  • TFLearn: Deep learning library featuring a higher-level API for TensorFlow.
  • Spleeter #Project#: Spleeter is the Deezer source separation library with pretrained models written in Python and uses Tensorflow.


  • PyTorch Lightning #Project#: The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Universal Toolkits

  • 2021-AugLy #Project#: AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations.

Dataset Management

  • Hub #Project#: Fastest unstructured dataset management for TensorFlow/PyTorch. Stream data real-time & version-control it.


  • TensorSpace.js #Project#: Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js
  • Curve #Project#: An Integrated Experimental Platform for time series data anomaly detection.
  • wandb #Project#: Our tool wandb helps you track and visualize machine learning experiments.
  • Streamlit #Project#: Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts.
  • 2021-lux #Project#: Python API for Intelligent Visual Data Discovery

Utils & IDE

  • 2020-Otto #Project#: Otto is an intelligent chat application, designed to help aspiring machine learning engineers go from idea to implementation with minimal domain knowledge.
  • 2020-Spyder #Project#: Spyder is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package.
  • 2014-Jupyter #Project#: Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.
  • 2019-Jupytext #Project#: Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.

Pretrained Models

Machine Learning

  • NumPy #Project#: NumPy is the fundamental package for scientific computing with Python.
  • pandas #Project#: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Matplotlib #Project#: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • feature-selector #Project#: Feature selector is a tool for dimensionality reduction of machine learning datasets
  • SPTAG #Project#: A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

Feature Engineering

Time Series

  • 2019-adtk #Project#: A Python toolkit for unsupervised anomaly detection in time series
  • 2020-sktime #Project#: A unified framework for machine learning with time series.
  • 2021-Kats #Project#: Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Deep Learning

  • tfjs #Project#: A WebGL accelerated, browser based JavaScript library for training and deploying ML models.
  • brain.js #Project#: brain.js is a library of Neural Networks written in JavaScript.
  • neurojs #Project#: neurojs is a JavaScript framework for deep learning in the browser. It mainly focuses on reinforcement learning, but can be used for any neural network based task. It contains neat demos to visualise these capabilities, for instance a 2D self-driving car.

Natural Language Processing

  • SnowNLP #Project#: SnowNLP 是一个 Python 写的类库,可以方便的处理中文文本内容,是受到了 TextBlob 的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和 TextBlob 不同的是,这里没有用 NLTK,所有的算法都是自己实现的,并且自带了一些训练好的字典。
  • nlp_compromise #Project#: a cool way to use natural language in javascript
  • flair #Project#: A very simple framework for state-of-the-art Natural Language Processing (NLP)
  • Chinese NLP #Project#: Shared tasks, datasets and state-of-the-art results for Chinese Natural Language Processing (NLP).
  • 2019-Transformers #Project#: 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
  • 2020-MiNLP #Project#: 小米自然语言处理平台(MiNLP)具备词法、句法、语义分析等数十个功能模块,已经在公司业务中得到了广泛应用。
  • 2020-fastNLP #Project#: fastNLP 是一款轻量级的自然语言处理(NLP)工具包。你既可以用它来快速地完成一个 NLP 任务, 也可以用它在研究中快速构建更复杂的模型。

Language Representation

  • 2018-BERT #Project#: BERT is method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). 海量中文预训练 ALBERT 模型
  • 2019-GPT2 #Project#: Code and models from the paper "Language Models are Unsupervised Multitask Learners".
    • 2019-GPT2 Chinese #Project#: Chinese version of GPT2 training code, using BERT or BPE tokenizer.
    • 2021-gpt neo #Project#: An implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow.


  • 2016-FastText #Project#: FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

Syntax & Semantic Analysis

  • Snips NLU #Project#: Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information.
  • Word2Bits #Project#: Word2Bits extends the Word2Vec algorithm to output high quality quantized word vectors that take 8x-16x less storage/memory than regular word vectors.
  • ansj_seg #Project#: ansj 分词.ict 的真正 java 实现.分词效果速度都超过开源版的 ict. 中文分词,人名识别,词性标注,用户自定义词典。
  • gensim #Project#: topic modelling for humans
  • 2019-pkuseg #Project#: pkuseg 简单易用,支持细分领域分词,有效提升了分词准确度。
  • 2019-Synonyms #Project#: 最好的中文近义词工具包。Synonyms 可以用于自然语言理解的很多任务:文本对齐,推荐算法,相似度计算,语义偏移,关键字提取,概念提取,自动摘要,搜索引擎等。

Knowledge Graph | 知识图谱

  • 2018-OpenKE #Project#: An Open-Source Package for Knowledge Embedding (KE).
  • 基于医药知识图谱的智能问答系统 #Project#: 这是一个基于 Python 模块 REfO 实现的知识库问答初级系统. 该问答系统可以解析输入的自然语言问句生成 SPARQL 查询,进一步请求后台基于 TDB 知识库的 Apache Jena Fuseki 服务, 进而得到问题的结果。
  • 2019-KnowledgeGraphData #Project#: 知识就是力量,知识图谱是人工智能新时代的产物,简单地说知识图谱就是通过关联关系将知识组成网状的结构,然后我们的人工智能可以通过这个图谱来认识其代表的这一个现实事件,这个事件可以是现实,也可以是虚构的。


  • 2019-Project DeepSpeech #Project#: A TensorFlow implementation of Baidu's DeepSpeech architecture.
  • 2020-TTS #Project#: TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.
  • 2021-MockingBird #Project#: 🚀AI 拟声: 5 秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Dialogue System & Bot