AI OpenSource List

Universal Toolkits


  • TensorFlow #Project#: TensorFlow is an open source software library for numerical computation using data flow graphs.

  • Pytorch #Project#: Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • scikit-learn #Project#: scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.

  • SciPy #Project#: SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

  • 2019-Deep Java Library (DJL) #Project#: An Engine-Agnostic Deep Learning Framework.

  • 2019-NNI #Project#: An open source AutoML toolkit for neural architecture search, model compression and hyper-parameter tuning.

  • 2019-Thinc #Project#: A refreshing functional take on deep learning, compatible with your favorite libraries.

  • 2019-Streamlit #Project#: Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours! All in pure Python. All for free.

  • 2020-MegEngine #Project#: MegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架。


  • tensorflow-playground: Play with neural networks!

  • Sonnet #Project#: Sonnet is a library built on top of TensorFlow for building complex neural networks.

  • TFLearn: Deep learning library featuring a higher-level API for TensorFlow.


  • PyTorch Lightning #Project#: The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.


  • TensorSpace.js #Project#: Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js

  • Curve #Project#: An Integrated Experimental Platform for time series data anomaly detection.

  • wandb #Project#: Our tool wandb helps you track and visualize machine learning experiments.

  • Streamlit #Project#: Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts.


  • 2014-Jupyter #Project#: Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.

  • 2019-Jupytext #Project#: Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.


  • 2020-Otto #Project#: Otto is an intelligent chat application, designed to help aspiring machine learning engineers go from idea to implementation with minimal domain knowledge.

Data Analysis

Feature Engineering

Time Series

Machine Learning

  • NumPy #Project#: NumPy is the fundamental package for scientific computing with Python.

  • pandas #Project#: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

  • Matplotlib #Project#: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

  • feature-selector #Project#: Feature selector is a tool for dimensionality reduction of machine learning datasets

Deep Learning

  • tfjs #Project#: A WebGL accelerated, browser based JavaScript library for training and deploying ML models.

  • brain.js #Project#: brain.js is a library of Neural Networks written in JavaScript.

  • neurojs #Project#: neurojs is a JavaScript framework for deep learning in the browser. It mainly focuses on reinforcement learning, but can be used for any neural network based task. It contains neat demos to visualise these capabilities, for instance a 2D self-driving car.

Natural Language Processing

  • SnowNLP #Project#: SnowNLP 是一个 Python 写的类库,可以方便的处理中文文本内容,是受到了 TextBlob 的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和 TextBlob 不同的是,这里没有用 NLTK,所有的算法都是自己实现的,并且自带了一些训练好的字典。

  • nlp_compromise #Project#: a cool way to use natural language in javascript

  • flair #Project#: A very simple framework for state-of-the-art Natural Language Processing (NLP)

  • Chinese NLP #Project#: Shared tasks, datasets and state-of-the-art results for Chinese Natural Language Processing (NLP).

  • 2019-Transformers #Project#: 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.


  • 2016-FastText #Project#: FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.


  • 2016-Hubot #Project#: Hubot is a framework to build chat bots, modeled after GitHub's Campfire bot of the same name, hubot. He's pretty cool. He's extendable with scripts and can work on many different chat services.

  • 2019-Botpress #Project#: The ultimate open-source conversational platform with built-in natural language processing (NLU), easy-to-use graphical interface and dialog manager.

  • Olivia #Project#: Your new best friend built with an artificial neural networ.

  • Leon #Project#: Leon is your open-source personal assistant.

Syntax & Semantic Analysis

  • Snips NLU #Project#: Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information.

  • Word2Bits #Project#: Word2Bits extends the Word2Vec algorithm to output high quality quantized word vectors that take 8x-16x less storage/memory than regular word vectors.

  • ansj_seg #Project#: ansj 分词.ict 的真正 java 实现.分词效果速度都超过开源版的 ict. 中文分词,人名识别,词性标注,用户自定义词典。

  • gensim #Project#: topic modelling for humans

  • 2019-GPT2 Chinese #Project#: Chinese version of GPT2 training code, using BERT or BPE tokenizer.

  • 2019-pkuseg #Project#: pkuseg 简单易用,支持细分领域分词,有效提升了分词准确度。

  • 2019-Synonyms #Project#: 最好的中文近义词工具包。Synonyms 可以用于自然语言理解的很多任务:文本对齐,推荐算法,相似度计算,语义偏移,关键字提取,概念提取,自动摘要,搜索引擎等。

Knowledge Graph | 知识图谱


Dialogue System


  • Common Voice #Project#: The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

  • DeepSpeech #Project#: Project DeepSpeech is an open source Speech-To-Text engine. It uses a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.

  • wav2letter #Project#: wav2letter is a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.

Computer Vision

  • CVAT #Project#: Powerful and efficient Computer Vision Annotation Tool (CVAT).


  • Tess4j #Project#: Java JNA wrapper for Tesseract OCR API.

  • 2018-alpr-unconstrained: License Plate Detection and Recognition in Unconstrained Scenarios.

  • 2020-PaddleOCR #Project#: PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.

  • 2020-EasyOCR #Project#: Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai

Object Detection


Face Recognition


  • 2018-videoflow #Project#: Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.

Deep Face


  • tianshou #Project#: An elegant, flexible, and superfast PyTorch deep Reinforcement Learning platform.

Distributed Training

  • BytePS #Project#: BytePS is a high performance and general distributed training framework.

  • SQLFlow #Project#: SQLFlow is a bridge that connects a SQL engine, e.g. MySQL, Hive or MaxCompute, with TensorFlow, XGBoost and other machine learning toolkits. SQLFlow extends the SQL syntax to enable model training, prediction and model explanation.

  • Horovod #Project#: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

  • 2019-ElasticDL #Project#: ElasticDL is a Kubernetes-native deep learning framework built on top of TensorFlow 2.0 that supports fault-tolerance and elastic scheduling.

  • 2019-Alink #Project#: Alink 是基于 Flink 的通用算法平台,由阿里巴巴计算平台 PAI 团队研发。

Integrated Tools

  • Deepo #Project#: Deepo is a Docker image with a full reproducible deep learning research environment. It contains most popular deep learning frameworks: theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, torch.

  • 2017-Turi Create #Project#: Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.

  • Ludwig #Project#: Ludwig is a toolbox that allows to train and test deep learning models without the need to write code.

Federated Learning

  • FATE #Project#: 微众银行 AI 团队自主研发的全球首个工业级联邦学习框架 FATE(Federated AI Technology Enabler),提供基于数据隐私保护的分布式安全计算框架,为机器学习、深度学习、迁移学习算法提供高性能的安全计算支持,此外,FATE 还提供友好的跨域交互信息管理方案,能够解决联邦学习信息安全审计难问题。


Self Driving