Home > News content

2018 GtiHub Open Source Project TOP 25: Data Science & Machine Learning

via:雷锋网     time:2019/1/13 7:04:36     readed:423

cutting edge

What is the best platform for hosting code, working with team members, and acting as an online resume to demonstrate personal code writing skills? Ask any data scientist who will let you go to GitHub. In recent years, GitHub has changed the way we host and even write code as a truly transformative platform.

But this is not all. In addition, it is still a learning platform. If you ask how to learn, I can give you a hint - open source project!

The world's leading technology companies open source their projects by publishing code for their popular algorithms on GitHub. In 2018, under the leadership of companies such as Google and Facebook, such open source projects have increased dramatically. The best part of the open source project, the code writers also provide a pre-training model, so that you and me do not have to waste time creating a difficult model from scratch.

At the same time, for the coder andDevelopmentThere are also a lot of popular open source projects - including cheat sheets,videoLinks, e-books, links to research papers, and more. No matter which level you are in your field of expertise (beginner, intermediate, and advanced), you can always find new things to learn on GitHub.

For many sub-areas of data science, 2018 is an extraordinary year, which I will cover shortly. With projects such as ULMFiT and BERT open source on GitHub, Natural Language Processing (NLP) quickly became the most talked-about area in the community. I am committed to contributing my best efforts to such a great GitHub community. During the year, I carefully selected the TOP 5 open source project that every data scientist should know and compiled it into a monthly list. You can go to the full list by clicking on the link below:

Some of these articles will coincide with my 2018 AI and ML field breakthroughs. You can also read this article at the following address - it is fundamentally an inventory of major advances in the field. The list, I think everyone in the field should have some understanding. As an extra benefit, there are predictions from experts that everyone should not want to miss it.


Now, be ready to explore new projects and work hard to become the data science star of 2019. Continue scrolling down and you can go to the GitHub codebase by clicking on the link after each item.

  • Topics covered in this article

  • Tools and framework

  • Computer vision

  • Generic confrontation network (GANs)

  • Other deep learning projects

  • Natural Language Processing (NLP)

  • Automatic Machine Learning (AutoML)

  • Reinforcement learning

Tools and framework

Let's start by looking at the best open source projects for tools, development libraries, and frameworks. Since we are talking about a software depot platform, it seems that this part is the correct way to open it.

Technology is growing rapidly, and computing costs are lower than before, so there are now a large number of open source projects available to us. Now, can it be called the golden age of machine learning coding? This is an open question, but one thing we all agree with is that it is a good time to be a programmer in the field of data science. In this part (and throughout the article), I'm trying to make the programming language as diverse as possible, but Python can't avoid it.


Open source address:Https://github.com/dotnet/machinelearning

What if you guys .NET developers want to learn a little machine learning to complement existing skills? There is now a perfect open source project to help you get started! This perfect open source project isMicrosoftOne project, ML.NET, is an open source machine learning framework that lets you design and develop models with .NET.

You can even integrate your existing machine learning model into your application without asking you to know exactly how to develop a machine learning model. ML.NET has actually been used in multiple Microsoft products, such asWindows, Bing search, MSOfficeand many more.

ML.NET runs on Windows, Linux, and MacOS.


Open source address:Https://github.com/tensorflow/tfjs

Implement machine learning in the browser! A few years ago, this was just an illusion, and now it has become a shocking reality. Most people in this field are inseparable from our favorite IDEs, and TensorFlow.js is likely to change our habits. Since its release earlier this year, it has become a very popular open source project, and its flexibility continues to surprise people.

As introduced in the open source project, TensorFlow.js has three main characteristics:

  • The browser itself can develop machine learning and deep learning models;

  • You can run an existing TensorFlow model in your browser;

  • These existing models can be retrained or fine-tuned at the same time.

If you are familiar with Keras, you will also be familiar with its advanced layer API. Currently in GitHub's open source project, there are a lot of examples that are open to the public. You can go to the community to check out and learn your learning curve.

PyTorch 1.0

Open source address:Https://github.com/pytorch/pytorch

For PyTorch, 2018 is a very exciting year. It has won the hearts of global data scientists and machine learning researchers, and now they continue to contribute to PyTorch. PyTorch is easy to understand, flexible, and used in many high-profile research (discussed later in this article). The latest version of PyTorch (PyTorch 1.0) has scaled up a large number of Facebook products and services, including 60 billion text translations per day. If you want to know when to start getting involved in PyTorch, that is now.

If you are a beginner in this field, you can take a look at the PyTorch Getting Started Guide by Faizan Shaikh:Https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/

Papers with Code

Open source address:Https://github.com/zziz/pwc

Strictly speaking, the open source project Papers with Code is not a tool or framework, but for data scientists, it is a "gold mine." Most of us are trying to read the paper and then practice the methods proposed by the paper (at least I did). A large number of moving parts seem to be unable to work on our machines.

This is where you need to use "Papers with Code". As the name suggests, they have code implementations for important papers published in the last 6 years or so. This collectionwebsiteExciting, you will find yourself can't help but admire it. They even added the code for the papers shown in NIPS (NeurIPS) 2018. Go ahead and use Papers with Code.

Computer vision

Thanks to the decline in computational costs and the surge in breakthroughs by top researchers (some events show that the two may be interrelated), more and more people can now use deep learning to conduct research. In the field of deep learning, computer vision projects are the most common – most of the open source projects mentioned in this chapter include a computer vision technology or another computer vision technology.

Computer vision is now the hottest area of ​​deep learning and will remain so popular in the foreseeable future. Whether it's target detection or pose estimation, almost all computer vision tasks have corresponding open source projects. Now is the best time to understand these developments – you may get a lot of work opportunities soon.

Facebook Detectron

Open source address:https://github.com/facebookresearch/Detectron

When it was released in early 2018, Detectron had set off a thousand waves. Developed by the Facebook Institute for Artificial Intelligence (FAIR), it implements the most advanced target detection framework. Detectron has written code in the Python language (surprise, surprise!) and has helped implement several projects, including DensePose (which we will mention later).

This open source project includes code and 70+ pre-training models. If you miss such a good opportunity, ask you to disagree?

NVIDIA's vid2vid technology

Open source address:https://github.com/NVIDIA/vid2vid

The target detection of the image is now doing very well. Is the target detection in the video? Not only that, can we extend this concept and convert the style of one video to another? Yes, we can! This is a very cool concept and NVIDIA has very generously released the PyTorch implementation for everyone to try.

This open source project includes videos introducing the technology, complete research papers, and code. In the example of Nvidia, the Cityscapes dataset that can be publicly registered for download is applied (download address:Https://www.cityscapes-dataset.com/). This is my personal favorite open source project since 2018.

Train a model on the ImageNet dataset in 18 seconds

Open source address:Https://github.com/diux-dev/imagenet18

Train a deep learning model in 18 seconds? At the same time, do not use high-end computing resources? Believe me, it can be achieved now. Fast.ai's Jeremy Howard and his student team created a model on the popular ImageNet dataset that outperformed Google's approach.

I suggest that you at least take a look at this open source project and find out how these researchers built the code. Not everyone has multiple GPUs (some people don't even have one), so this open source project is significant for "small shrimp."

Complete set of target detection papers

Open source address:Https://github.com/hoya012/deep_learning_object_detection

This is another open source project for research papers. It can often help you understand how the selected research topics have evolved over many years. This one-stop history can help you understand target detection over many years. The changes experienced. It has collected the papers from 2014 to the present, and even collected the code corresponding to each paper as much as possible.

The chart above shows how the target detection framework has evolved and transformed over the past five years. It's amazing, isn't it? The picture even includes work in 2019, so you are busy.

Facebook's DensePose

Open source address:https://github.com/facebookresearch/DensePose

Let us turn our attention to the field of attitude detection. I learned about the concept itself this year and I am deeply fascinated since then. The above image captures the essence of this open source project – an intensive human pose assessment in an outdoor scene.

The open source project includes code to train and evaluate the DensePose-RCNN model, as well as notes that can be used to visualize the DensePose COCO dataset. This is a great place to start a posture assessment study.

Everybody Dance Now - Attitude Assessment

Open source address:https://github.com/nyoki-mtl/pytorch-EverybodyDanceNow

The above picture (taken from the video) really aroused my interest. I wrote the open source project for the research paper in the inventory article in August and continued to admire the technology. This technology shifts the movements between human targets in different videos. The video I mentioned can also be seen in open source projects - it works beyond your imagination!

This open source project further includes the PyTorch implementation of this approach. The number of complex details that this method can capture and copy is amazing.

Generic confrontation network (GANs)

I'm sure most of you must have been exposed to the GAN application (even though you may not realize it at the time). GAN, or the Generic Confrontation Network, was introduced by Ian Goodfellow in 2014 and has since become popular. They are dedicated to performing creative tasks, especially artistic tasks. Everyone can goHttps://www.analyticsvidhya.com/blog/2017/06/introductory-generative-adversarial-networks-gans/Check out the introductory guide written by Faizan Shaikh, which also includes implementations using the Python language.

In 2018, we saw too many GAN-based projects, so I also wanted to use a separate chapter to introduce GAN-related open source projects.

Deep Painterly Harmonization

Open source address:Https://github.com/luanfujun/deep-painterly-harmonization

Start with one of my favorite open source projects. I hope that you take the time to just enjoy the image above. Can you tell which one is made by humans and which one is machine-generated? I am sure you can't. Here, the first picture is the input image (original), and the third picture is generated by this technique.

Very surprised, is it? This algorithm adds an external object of your choice to any image and successfully makes it look as if it should be there. You might want to check out this code and try to manipulate the technology yourself in a series of different images.

Image Outpainting

Open source address:https://github.com/bendangnuksung/Image-OutPainting

What if I give you an image and let you expand its picture boundaries by imagining how the image will appear in the full scene in the picture? Normally, you might import this image into an image editing software for operation. But now there is a great new piece of software - you can do this with a few lines of code.

This project is the "Image Outpainting" paper by Stanford University.Https://cs230.stanford.edu/projects_spring_2018/posters/8265861.pdfThis is a very amazing paper with examples - this is what most research papers should look like! ) Keras implementation. You can either create a model from scratch or use the model provided by the open source project author. Deep learning never stops surprises.

Visualize and understand GANs

Open source address:https://github.com/CSAILVision/gandissect

If you haven't mastered GANs yet, try this open source project. This project was proposed by the MIT CSAIL to help researchers visualize and understand GANs. You can explore what it has learned by observing or manipulating the nerves of the GAN model.

I suggest you check out the official home page of the MIT project (Https://gandissect.csail.mit.edu/), there are a lot of resources (including video demo), which can make you more familiar with this concept.


Open source address:https://github.com/albertpumarola/GANimation

This algorithm allows you to change the facial expression of any person in the image, which makes people happy and embarrassing. The image above in the green box is the original image, and the rest are images generated by GANimation.

The open source project link includes a getting started guide, data preparation resources, preliminary knowledge, and Python code. As the author of the paper mentioned, don't use it for unethical purposes.

NVIDIA's FastPhotoStyle

Open source address:https://github.com/NVIDIA/FastPhotoStyle

FastPhotoStyle This open source project is very similar to the Deep Painterly Harmonization mentioned earlier. But it is worth mentioning that it comes from NVIDIA itself. As you can see in the image above, the FastPhotoStyle algorithm requires two inputs—a style image and a content image. This algorithm will then run on one of these two inputs to produce the output - either format the code with realistic images or use semantic label maps.

Other deep learning open source projects

The field of computer vision may have eclipsed other things in deep learning, but I still want to list several representative open source projects beyond computer vision.

NVIDIA's WaveGlow

Open source address:https://github.com/NVIDIA/waveglow

Audio processing is another area where deep learning begins to score. Not limited to generating music, you can also complete tasks such as audio classification, fingerprint recognition, segmentation, and annotation. There is still a lot of space to explore in this field. Who knows, maybe you can use these open source projects to embark on the peak of life.

Here are two very intuitive articles to help you get familiar with this open source work:

Go back to Nvidia here. WaveGlow is a stream-based network that produces high quality audio. Essentially, it is a single network for speech synthesis.

This open source project includes WaveGlow's PyTorch implementation and pre-trained models available for download. At the same time, the researchers also listed the use steps above. If you want to train your model from scratch, follow the steps.


Open source address:Https://github.com/google-research/exoplanet-ml

Want to discover your own planet? AstroNet's open source project may be overestimated, but it does make you closer to your dreams. In December 2017, the "Google Brain" team discovered two new planets by applying AstroNet. AstroNet is a deep neural network dedicated to processing astronomical data. It embodies a wider range of applications for deep learning and is a truly milestone.

Now, the technology's R&D team has made the entire code for running AstroNet open source (hint: this model is based on CNNs!).

VisualDL – Visual Deep Learning Model

Open source address:https://github.com/PaddlePaddle/VisualDL

Who doesn't like visualization? But imagine how the deep learning model works, it is a bit scary. However, Visual DL can better mitigate these challenges by designing specific deep learning tasks.

For visual tasks, VisualDL currently supports the following sections:

  • Quantity

  • Histogram

  • image

  • Audio

  • chart

  • High-dimensional

Natural Language Processing (NLP)

I am surprised to see that NLP is in the position behind the list? This is mainly because I want to take stock of almost all the important open source projects in this article. In the open source projects before NLP, I highly recommend everyone to check. In the NLP section, the frameworks I mentioned include ULMFiT, Google's BERT, ELMo, and Facebook's PyText. I will briefly mention BERT and several other open source projects because I found them very useful.

Google's BERT

Open source address:Https://github.com/google-research/bert

In this part of NLP, I have to mention BERT. The open source project of Google AI has brought breakthroughs to the NLP field and won strong attention from NLP enthusiasts and experts. Following ULMFiT and ELMo, BERT beat the game with its performance and achieved the best results in 11 NLP missions.

In addition to the official link to the Google open source project I attached above, BERT's PyTorch implementation (view address:https://github.com/huggingface/pytorch-pretrained-BERTIt is also worth visiting. As for whether it will allow NLP to enter a new era, we will know soon.


Open source project:https://github.com/NTMC-Community/MatchZoo

MatchZoo can help you know how the model behaves on a benchmark. For NLP, especially the deep text matching model, I found the MatchZoo toolkit to be very reliable. Other related tasks that MatchZoo can apply include:

  • dialogue

  • Question and answer

  • Text implication

  • Information retrieval

  • Interpretation identification

The MatchZoo 2,0 version is still under development, so look forward to this already useful toolbox and add more new features.

NLP Progress

Open source address:https://github.com/sebastianruder/NLP-progress

This open source project was developed by Sebastian Ruder and aims to track the latest developments in the NLP space, which includes data sets and state-of-the-art models.

Any NLP technology you've ever wanted to learn more about – there's a good opportunity in front of you. This open source project covers traditional and core NLP tasks such as reading comprehension and part-of-speech tagging. Even if you are only interested in this area, you must mark the mark/mark this open source project.

Automatic Machine Learning (AutoML)

In 2018, it was also a brilliant year for AutoML. As the industry integrates machine learning into their core work, the demand for data science experts continues to rise. At present, there is a big gap between supply and demand, and AutoML tools are likely to fill this gap.

These tools are designed for those who lack expertise in data science. While there are some other great tools besides these tools, most of them are much more expensive—most individuals can't afford it. So, in 2018, our great open source community came to support everyone, and it also brought two popular open source projects.

Auto Keras

Open source address:Https://github.com/jhfjhfj1/autokeras

A few months ago, Auto Keras caused a sensation as soon as it was released. And it will inevitably cause a sensation. For a long time, deep learning has been regarded as a very professional field, so a development library that can automate most tasks is naturally popular. Quote their official website: "The ultimate goal of Auto Keras is to provide in-depth learning tools that can be easily applied to industry experts with only a certain data science or machine learning background."

You can install this development library with the following seed:

This open source project also contains some simple examples that will let you know the entire workflow of Auto Keras.

Google's AdaNet

Open source address:Https://github.com/tensorflow/adanet

AdaNet is a framework for automatically learning high-quality models with no requirements for programming expertise. Since AdaNet was developed by Google, this framework is based on TensorFlow. You can use AdaNet to create all the models and extend its application to train the neural network.

Reinforcement learning

Because I have included some intensive learning open source projects in the 2018 review article, the introduction to this chapter will be fairly straightforward. I hope that in these chapters, including RL, we can promote discussion of our community and hope to accelerate the research process in this field.

First, you can take a look at OpenAI's Spinning Up open source project (project address:Https://github.com/openai/spinningup), it is a complete for beginnerseducationType of open source project. Then you can check out Google's dopamine open source project (project address:Https://github.com/google/dopamine), it is a research framework to accelerate research in this area that is still in its initial stage of development. Next, let's take a look at other open source projects.


Open source address:https://github.com/xbpeng/DeepMimic

If you follow some researchers on social media, you must have seen the image above in the video. A stickman runs on the ground, or tries to stand up, or some other action. Dear readers, these are intensive learning in (human) movements.

Here is an iconic example of reinforcement learning—training human figures to mimic multiple motor skills. The link page for the above open source project includes code, examples, and a step-by-step exercise guide.

Reinforcement Learning Notebooks

Open source address:https://github.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks

This open source project is a collection of enhanced learning algorithms from books written by Richard Sutton and Andrew Barto and other research papers presented in Python notebooks format in open source projects.

As the developers of the open source project mentioned, if you practice at the same time in the process of learning, you can really learn it. This project is more complicated. If you don't practice it or just read the resources like reading a novel, you will get nothing.


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments