DataScienceAI OpenSource List

Universal Toolkits

  • TensorFlow #Project#: TensorFlow is an open source software library for numerical computation using data flow graphs.

  • Pytorch #Project#: Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • scikit-learn #Project#: scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.

  • SciPy #Project#: SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

Business Intelligence


  • TensorSpace.js #Project#: Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js

  • Curve #Project#: An Integrated Experimental Platform for time series data anomaly detection.

  • wandb #Project#: Our tool wandb helps you track and visualize machine learning experiments.

  • Streamlit #Project#: Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts.

Data Analysis

Feature Engineering

Machine Learning

  • NumPy #Project#: NumPy is the fundamental package for scientific computing with Python.

  • pandas #Project#: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

  • Matplotlib #Project#: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

  • feature-selector #Project#: Feature selector is a tool for dimensionality reduction of machine learning datasets

Deep Learning

  • tfjs #Project#: A WebGL accelerated, browser based JavaScript library for training and deploying ML models.

  • brain.js #Project#: brain.js is a library of Neural Networks written in JavaScript.

  • neurojs #Project#: neurojs is a JavaScript framework for deep learning in the browser. It mainly focuses on reinforcement learning, but can be used for any neural network based task. It contains neat demos to visualise these capabilities, for instance a 2D self-driving car.

Natural Language Processing

  • SnowNLP #Project#: SnowNLP 是一个 Python 写的类库,可以方便的处理中文文本内容,是受到了 TextBlob 的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和 TextBlob 不同的是,这里没有用 NLTK,所有的算法都是自己实现的,并且自带了一些训练好的字典。

  • nlp_compromise #Project#: a cool way to use natural language in javascript

  • flair #Project#: A very simple framework for state-of-the-art Natural Language Processing (NLP)

  • Chinese NLP #Project#: Shared tasks, datasets and state-of-the-art results for Chinese Natural Language Processing (NLP).


  • 2016-FastText #Project#: FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.


  • 2019-Botpress #Project#: The ultimate open-source conversational platform with built-in natural language processing (NLU), easy-to-use graphical interface and dialog manager.

Syntax & Semantic Analysis

  • Snips NLU #Project#: Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information.

  • Word2Bits #Project#: Word2Bits extends the Word2Vec algorithm to output high quality quantized word vectors that take 8x-16x less storage/memory than regular word vectors.

  • ansj_seg #Project#: ansj 分词.ict 的真正 java 实现.分词效果速度都超过开源版的 ict. 中文分词,人名识别,词性标注,用户自定义词典。

  • gensim #Project#: topic modelling for humans

  • 2019-GPT2 Chinese #Project#: Chinese version of GPT2 training code, using BERT or BPE tokenizer.

  • 2019-pkuseg #Project#: pkuseg 简单易用,支持细分领域分词,有效提升了分词准确度。

Knowledge Graph | 知识图谱

Dialogue System


  • Common Voice #Project#: The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

  • DeepSpeech #Project#: Project DeepSpeech is an open source Speech-To-Text engine. It uses a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.

  • wav2letter #Project#: wav2letter is a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.

Computer Vision

  • 2018-FastPhotoStyle #Project#: This code repository contains an implementation of our fast photorealistic style transfer algorithm.

  • 2018-videoflow #Project#: Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.

Object Detection

  • 2017-Detectron #Project#: Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including Mask R-CNN.


Face Recognition



Distributed Training

  • BytePS #Project#: BytePS is a high performance and general distributed training framework.

  • SQLFlow #Project#: SQLFlow is a bridge that connects a SQL engine, e.g. MySQL, Hive or MaxCompute, with TensorFlow, XGBoost and other machine learning toolkits. SQLFlow extends the SQL syntax to enable model training, prediction and model explanation.

  • Horovod #Project#: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

  • 2019-ElasticDL #Project#: ElasticDL is a Kubernetes-native deep learning framework built on top of TensorFlow 2.0 that supports fault-tolerance and elastic scheduling.

Integrated Tools

  • Deepo #Project#: Deepo is a Docker image with a full reproducible deep learning research environment. It contains most popular deep learning frameworks: theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, torch.

  • 2017-Turi Create #Project#: Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.

  • Ludwig #Project#: Ludwig is a toolbox that allows to train and test deep learning models without the need to write code.

Federated Learning

  • FATE #Project#: 微众银行 AI 团队自主研发的全球首个工业级联邦学习框架 FATE(Federated AI Technology Enabler),提供基于数据隐私保护的分布式安全计算框架,为机器学习、深度学习、迁移学习算法提供高性能的安全计算支持,此外,FATE 还提供友好的跨域交互信息管理方案,能够解决联邦学习信息安全审计难问题。