{{ post.date | date: '%B %d, %Y' }}
-- {{ post.title }} -
-{{ post.excerpt | remove: '
' | remove: '
' | truncate: 500}} -diff --git a/_config.yml b/_config.yml index a19288554894..c53fd5e8d3d2 100644 --- a/_config.yml +++ b/_config.yml @@ -64,6 +64,9 @@ collections: output: true news: output: true + blog: + output: true + permalink: /blog/:path/ pagination: enabled: true diff --git a/_includes/blog_jumbotron.html b/_includes/blog_jumbotron.html new file mode 100644 index 000000000000..04baadb1e7b1 --- /dev/null +++ b/_includes/blog_jumbotron.html @@ -0,0 +1,16 @@ +
Featured Post
+{{ post.preview | truncate: 150 }}
+ + + Read More + + + {% endfor %} +Featured Post
-{{ post.excerpt | remove: '
' | remove: '
' | truncate: 100 }} - - - Read More - - - {% endfor %} -{{ post.date | date: '%B %d, %Y' }}
-{{ post.excerpt | remove: '
' | remove: '
' | truncate: 500}} -
@@ -171,7 +173,7 @@ This graph representation (IR) that TorchScript generated enables several optimi
* Tensor creation on the CPU is expensive, but there is ongoing work to make it faster. At this point, a LSTMCell runs three CUDA kernels: two `gemm` kernels and one for the single pointwise group. One of the things we noticed was that there was a large gap between the finish of the second `gemm` and the start of the single pointwise group. This gap was a period of time when the GPU was idling around and not doing anything. Looking into it more, we discovered that the problem was that `torch.chunk` constructs new tensors and that tensor construction was not as fast as it could be. Instead of constructing new Tensor objects, we taught the fusion compiler how to manipulate a data pointer and strides to do the `torch.chunk` before sending it into the fused kernel, shrinking the amount of idle time between the second gemm and the launch of the element-wise fusion group. This give us around 1.2x increase speed up on the LSTM forward pass.
-By doing the above tricks, we are able to fuse the almost all `LSTMCell` forward graph (except the two gemm kernels) into a single fusion group, which corresponds to the `prim::FusionGroup_0` in the above IR graph. It will then be launched into a single fused kernel for execution. With these optimizations the model performance improves significantly with average forward time reduced by around 17ms (1.7x speedup) to 10ms, and average backward time reduce by 37ms to 27ms (1.37x speed up).
+By doing the above tricks, we are able to fuse the almost all `LSTMCell` forward graph (except the two gemm kernels) into a single fusion group, which corresponds to the `prim::FusionGroup_0` in the above IR graph. It will then be launched into a single fused kernel for execution. With these optimizations the model performance improves significantly with average forward time reduced by around 17ms (1.7x speedup) to 10ms, and average backward time reduce by 37ms to 27ms (1.37x speed up).
### LSTM Layer (forward)
@@ -195,31 +197,31 @@ class LSTMLayer(jit.ScriptModule):
We did several tricks on the IR we generated for TorchScript LSTM to boost the performance, some example optimizations we did:
* Loop Unrolling: We automatically unroll loops in the code (for big loops, we unroll a small subset of it), which then empowers us to do further optimizations on the for loops control flow. For example, the fuser can fuse together operations across iterations of the loop body, which results in a good performance improvement for control flow intensive models like LSTMs.
-* Batch Matrix Multiplication: For RNNs where the input is pre-multiplied (i.e. the model has a lot of matrix multiplies with the same LHS or RHS), we can efficiently batch those operations together into a single matrix multiply while chunking the outputs to achieve equivalent semantics.
+* Batch Matrix Multiplication: For RNNs where the input is pre-multiplied (i.e. the model has a lot of matrix multiplies with the same LHS or RHS), we can efficiently batch those operations together into a single matrix multiply while chunking the outputs to achieve equivalent semantics.
-By applying these techniques, we reduced our time in the forward pass by an additional 1.6ms to 8.4ms (1.2x speed up) and timing in backward by 7ms to around 20ms (1.35x speed up).
+By applying these techniques, we reduced our time in the forward pass by an additional 1.6ms to 8.4ms (1.2x speed up) and timing in backward by 7ms to around 20ms (1.35x speed up).
### LSTM Layer (backward)
* “Tree” Batch Matrix Muplication: It is often the case that a single weight is reused multiple times in the LSTM backward graph, forming a tree where the leaves are matrix multiplies and nodes are adds. These nodes can be combined together by concatenating the LHSs and RHSs in different dimensions, then computed as a single matrix multiplication. The formula of equivalence can be denoted as follows:
-
+
$L1 * R1 + L2 * R2 = torch.cat((L1, L2), dim=1) * torch.cat((R1, R2), dim=0)$
-
-* Autograd is a critical component of what makes PyTorch such an elegant ML framework. As such, we carried this through to PyTorch JIT, but using a new **Automatic Differentiation** (AD) mechanism that works on the IR level. JIT automatic differentiation will slice the forward graph into symbolically differentiable subgraphs, and generate backwards nodes for those subgraphs. Taking the above IR as an example, we group the graph nodes into a single `prim::DifferentiableGraph_0` for the operations that has AD formulas. For operations that have not been added to AD formulas, we will fall back to Autograd during execution.
+
+* Autograd is a critical component of what makes PyTorch such an elegant ML framework. As such, we carried this through to PyTorch JIT, but using a new **Automatic Differentiation** (AD) mechanism that works on the IR level. JIT automatic differentiation will slice the forward graph into symbolically differentiable subgraphs, and generate backwards nodes for those subgraphs. Taking the above IR as an example, we group the graph nodes into a single `prim::DifferentiableGraph_0` for the operations that has AD formulas. For operations that have not been added to AD formulas, we will fall back to Autograd during execution.
* Optimizing the backwards path is hard, and the implicit broadcasting semantics make the optimization of automatic differentiation harder. PyTorch makes it convenient to write tensor operations without worrying about the shapes by broadcasting the tensors for you. For performance, the painful point in backward is that we need to have a summation for such kind of broadcastable operations. This results in the derivative of every broadcastable op being followed by a summation. Since we cannot currently fuse reduce operations, this causes FusionGroups to break into multiple small groups leading to bad performance. To deal with this, refer to this great [post](http://lernapparat.de/fast-lstm-pytorch/) written by Thomas Viehmann.
### Misc Optimizations
* In addition to the steps laid about above, we also eliminated overhead between CUDA kernel launches and unnecessary tensor allocations. One example is when you do a tensor device look up. This can provide some poor performance initially with a lot of unnecessary allocations. When we remove these this results in a reduction from milliseconds to nanoseconds between kernel launches.
-* Lastly, there might be normalization applied in the custom LSTMCell like LayerNorm. Since LayerNorm and other normalization ops contains reduce operations, it is hard to fuse it in its entirety. Instead, we automatically decompose Layernorm to a statistics computation (reduce operations) + element-wise transformations, and then fuse those element-wise parts together. As of this post, there are some limitations on our auto differentiation and graph fuser infrastructure which limits the current support to inference mode only. We plan to add backward support in a future release.
+* Lastly, there might be normalization applied in the custom LSTMCell like LayerNorm. Since LayerNorm and other normalization ops contains reduce operations, it is hard to fuse it in its entirety. Instead, we automatically decompose Layernorm to a statistics computation (reduce operations) + element-wise transformations, and then fuse those element-wise parts together. As of this post, there are some limitations on our auto differentiation and graph fuser infrastructure which limits the current support to inference mode only. We plan to add backward support in a future release.
-With the above optimizations on operation fusion, loop unrolling, batch matrix multiplication and some misc optimizations, we can see a clear performance increase on our custom TorchScript LSTM forward and backward from the following figure:
+With the above optimizations on operation fusion, loop unrolling, batch matrix multiplication and some misc optimizations, we can see a clear performance increase on our custom TorchScript LSTM forward and backward from the following figure:
@@ -214,7 +219,7 @@ Empirically, SWAG performs on par or better than popular alternatives including
MultiSWAG [9] uses multiple independent SWAG models to form a mixture of Gaussians as an approximate posterior distribution. Different basins of attraction contain highly complementary explanations of the data. Accordingly, marginalizing over these multiple basins provides a significant boost in accuracy and uncertainty representation. MultiSWAG can be viewed as a generalization of deep ensembles, but with performance improvements.
-Indeed, we see in Figure 8 that MultiSWAG entirely mitigates double descent -- more flexible models have monotonically improving performance -- and provides significantly improved generalization over SGD. For example, when the ResNet-18 has layers of width 20, Multi-SWAG achieves under 30% error whereas SGD achieves over 45%, more than a 15% gap!
+Indeed, we see in Figure 8 that MultiSWAG entirely mitigates double descent -- more flexible models have monotonically improving performance -- and provides significantly improved generalization over SGD. For example, when the ResNet-18 has layers of width 20, Multi-SWAG achieves under 30% error whereas SGD achieves over 45%, more than a 15% gap!
@@ -227,18 +232,18 @@ Another [method](https://arxiv.org/abs/1907.07504), Subspace Inference, construc
## Try it Out!
-One of the greatest open questions in deep learning is why SGD manages to find good solutions, given that the training objectives are highly multimodal, and there are many settings of parameters that achieve no training loss but poor generalization. By understanding geometric features such as flatness, which relate to generalization, we can begin to resolve these questions and build optimizers that provide even better generalization, and many other useful features, such as uncertainty representation. We have presented SWA, a simple drop-in replacement for standard optimizers such as SGD and Adam, which can in principle, benefit anyone training a deep neural network. SWA has been demonstrated to have a strong performance in several areas, including computer vision, semi-supervised learning, reinforcement learning, uncertainty representation, calibration, Bayesian model averaging, and low precision training.
+One of the greatest open questions in deep learning is why SGD manages to find good solutions, given that the training objectives are highly multimodal, and there are many settings of parameters that achieve no training loss but poor generalization. By understanding geometric features such as flatness, which relate to generalization, we can begin to resolve these questions and build optimizers that provide even better generalization, and many other useful features, such as uncertainty representation. We have presented SWA, a simple drop-in replacement for standard optimizers such as SGD and Adam, which can in principle, benefit anyone training a deep neural network. SWA has been demonstrated to have a strong performance in several areas, including computer vision, semi-supervised learning, reinforcement learning, uncertainty representation, calibration, Bayesian model averaging, and low precision training.
-We encourage you to try out SWA! SWA is now as easy as any standard training in PyTorch. And even if you have already trained your model, you can use SWA to significantly improve performance by running it for a small number of epochs from a pre-trained model.
+We encourage you to try out SWA! SWA is now as easy as any standard training in PyTorch. And even if you have already trained your model, you can use SWA to significantly improve performance by running it for a small number of epochs from a pre-trained model.
[1] Averaging Weights Leads to Wider Optima and Better Generalization; Pavel Izmailov, Dmitry Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson; Uncertainty in Artificial Intelligence (UAI), 2018.
-[2] There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average; Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson;
+[2] There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average; Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson;
International Conference on Learning Representations (ICLR), 2019.
-[3] Improving Stability in Deep Reinforcement Learning with Weight Averaging; Evgenii Nikishin, Pavel Izmailov, Ben Athiwaratkun, Dmitrii Podoprikhin,
+[3] Improving Stability in Deep Reinforcement Learning with Weight Averaging; Evgenii Nikishin, Pavel Izmailov, Ben Athiwaratkun, Dmitrii Podoprikhin,
Timur Garipov, Pavel Shvechikov, Dmitry Vetrov, Andrew Gordon Wilson; UAI 2018 Workshop: Uncertainty in Deep Learning, 2018.
[4] A Simple Baseline for Bayesian Uncertainty in Deep Learning
@@ -249,7 +254,7 @@ Pavel Izmailov, Wesley Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov,
Uncertainty in Artificial Intelligence (UAI), 2019.
[6] SWALP : Stochastic Weight Averaging in Low Precision Training
-Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai,
+Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai,
Andrew Gordon Wilson, Christopher De Sa; International Conference on Machine Learning (ICML), 2019.
[7] David Ruppert. Efficient estimations from a slowly convergent Robbins-Monro process; Technical report, Cornell University Operations Research and Industrial Engineering, 1988.
@@ -257,7 +262,7 @@ Andrew Gordon Wilson, Christopher De Sa; International Conference on Machine Lea
[8] Acceleration of stochastic approximation by averaging. Boris T Polyak and Anatoli B Juditsky; SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
[9] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
-Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov,
+Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov,
Andrew Gordon Wilson. Neural Information Processing Systems (NeurIPS), 2018.
[10] Bayesian Deep Learning and a Probabilistic Perspective of Generalization
diff --git a/_posts/2020-1-15-pytorch-1-dot-4-released-and-domain-libraries-updated.md b/_posts/2020-1-15-pytorch-1-dot-4-released-and-domain-libraries-updated.md
index 2be782f18b47..dd1de7d70f8b 100644
--- a/_posts/2020-1-15-pytorch-1-dot-4-released-and-domain-libraries-updated.md
+++ b/_posts/2020-1-15-pytorch-1-dot-4-released-and-domain-libraries-updated.md
@@ -2,6 +2,9 @@
layout: blog_detail
title: 'PyTorch 1.4 released, domain libraries updated'
author: Team PyTorch
+image: /assets/images/bert2.png
+tags: [five]
+preview: 'Today, we’re announcing the availability of PyTorch 1.4, along with updates to the PyTorch domain libraries. These releases build on top of the announcements from [NeurIPS 2019](https://pytorch.org/blog/pytorch-adds-new-tools-and-libraries-welcomes-preferred-networks-to-its-community/), where we shared the availability of PyTorch Elastic, a new classification framework for image and video, and the addition of Preferred Networks to the PyTorch community. For those that attended the workshops at NeurIPS, the content can be found [here](https://research.fb.com/neurips-2019-expo-workshops/).'
---
Today, we’re announcing the availability of PyTorch 1.4, along with updates to the PyTorch domain libraries. These releases build on top of the announcements from [NeurIPS 2019](https://pytorch.org/blog/pytorch-adds-new-tools-and-libraries-welcomes-preferred-networks-to-its-community/), where we shared the availability of PyTorch Elastic, a new classification framework for image and video, and the addition of Preferred Networks to the PyTorch community. For those that attended the workshops at NeurIPS, the content can be found [here](https://research.fb.com/neurips-2019-expo-workshops/).
@@ -43,7 +46,7 @@ To learn more about the APIs and the design of this feature, see the links below
* [Distributed Autograd design doc](https://pytorch.org/docs/stable/notes/distributed_autograd.html)
* [Remote Reference design doc](https://pytorch.org/docs/stable/notes/rref.html)
-For the full tutorials, see the links below:
+For the full tutorials, see the links below:
* [A full RPC tutorial](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html)
* [Examples using model parallel training for reinforcement learning and with an LSTM](https://github.com/pytorch/examples/tree/master/distributed/rpc)
diff --git a/_posts/2020-3-26-introduction-to-quantization-on-pytorch.md b/_posts/2020-3-26-introduction-to-quantization-on-pytorch.md
index a23bdc353b4b..7dd77f23efd2 100644
--- a/_posts/2020-3-26-introduction-to-quantization-on-pytorch.md
+++ b/_posts/2020-3-26-introduction-to-quantization-on-pytorch.md
@@ -2,6 +2,9 @@
layout: blog_detail
title: 'Introduction to Quantization on PyTorch'
author: Raghuraman Krishnamoorthi, James Reed, Min Ni, Chris Gottbrath, and Seth Weidman
+image: /assets/images/bert2.png
+tags: [five]
+preview: 'It’s important to make efficient use of both server-side and on-device compute resources when developing machine learning applications. To support more efficient deployment on servers and edge devices, PyTorch added a support for model quantization using the familiar eager mode Python API.'
---
It’s important to make efficient use of both server-side and on-device compute resources when developing machine learning applications. To support more efficient deployment on servers and edge devices, PyTorch added a support for model quantization using the familiar eager mode Python API.
diff --git a/_posts/2020-4-21-pytorch-1-dot-5-released-with-new-and-updated-apis.md b/_posts/2020-4-21-pytorch-1-dot-5-released-with-new-and-updated-apis.md
index e81d2f7da780..1793dafd2b12 100644
--- a/_posts/2020-4-21-pytorch-1-dot-5-released-with-new-and-updated-apis.md
+++ b/_posts/2020-4-21-pytorch-1-dot-5-released-with-new-and-updated-apis.md
@@ -2,6 +2,9 @@
layout: blog_detail
title: 'PyTorch 1.5 released, new and updated APIs including C++ frontend API parity with Python'
author: Team PyTorch
+image: /assets/images/bert2.png
+tags: [yellow]
+preview: 'Today, we’re announcing the availability of PyTorch 1.5, along with new and updated libraries. This release includes several major new API additions and improvements. PyTorch now includes a significant update to the C++ frontend, ‘channels last’ memory format for computer vision models, and a stable release of the distributed RPC framework used for model-parallel training. The release also has new APIs for autograd for hessians and jacobians, and an API that allows the creation of Custom C++ Classes that was inspired by pybind.'
---
diff --git a/_posts/2020-4-21-pytorch-library-updates-new-model-serving-library.md b/_posts/2020-4-21-pytorch-library-updates-new-model-serving-library.md
index 69101b8abc09..e6f875f8333d 100644
--- a/_posts/2020-4-21-pytorch-library-updates-new-model-serving-library.md
+++ b/_posts/2020-4-21-pytorch-library-updates-new-model-serving-library.md
@@ -2,10 +2,13 @@
layout: blog_detail
title: 'PyTorch library updates including new model serving library '
author: Team PyTorch
+image: /assets/images/bert2.png
+tags: [five]
+preview: 'Along with the PyTorch 1.5 release, we are announcing new libraries for high-performance PyTorch model serving and tight integration with TorchElastic and Kubernetes. Additionally, we are releasing updated packages for torch_xla (Google Cloud TPUs), torchaudio, torchvision, and torchtext. All of these new libraries and enhanced capabilities are available today and accompany all of the core features [released in PyTorch 1.5](https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis).'
---
-Along with the PyTorch 1.5 release, we are announcing new libraries for high-performance PyTorch model serving and tight integration with TorchElastic and Kubernetes. Additionally, we are releasing updated packages for torch_xla (Google Cloud TPUs), torchaudio, torchvision, and torchtext. All of these new libraries and enhanced capabilities are available today and accompany all of the core features [released in PyTorch 1.5](https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis).
+Along with the PyTorch 1.5 release, we are announcing new libraries for high-performance PyTorch model serving and tight integration with TorchElastic and Kubernetes. Additionally, we are releasing updated packages for torch_xla (Google Cloud TPUs), torchaudio, torchvision, and torchtext. All of these new libraries and enhanced capabilities are available today and accompany all of the core features [released in PyTorch 1.5](https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis).
## TorchServe (Experimental)
@@ -35,7 +38,7 @@ To learn more see the [TorchElastic repo](http://pytorch.org/elastic/0.2.0rc0/ku
## torch_xla 1.5 now available
-[torch_xla](http://pytorch.org/xla/) is a Python package that uses the [XLA linear algebra compiler](https://www.tensorflow.org/xla) to accelerate the [PyTorch deep learning framework](https://pytorch.org/) on [Cloud TPUs](https://cloud.google.com/tpu/) and [Cloud TPU Pods](https://cloud.google.com/tpu/docs/tutorials/pytorch-pod). torch_xla aims to give PyTorch users the ability to do everything they can do on GPUs on Cloud TPUs as well while minimizing changes to the user experience. The project began with a conversation at NeurIPS 2017 and gathered momentum in 2018 when teams from Facebook and Google came together to create a proof of concept. We announced this collaboration at PTDC 2018 and made the PyTorch/XLA integration broadly available at PTDC 2019. The project already has 28 contributors, nearly 2k commits, and a repo that has been forked more than 100 times.
+[torch_xla](http://pytorch.org/xla/) is a Python package that uses the [XLA linear algebra compiler](https://www.tensorflow.org/xla) to accelerate the [PyTorch deep learning framework](https://pytorch.org/) on [Cloud TPUs](https://cloud.google.com/tpu/) and [Cloud TPU Pods](https://cloud.google.com/tpu/docs/tutorials/pytorch-pod). torch_xla aims to give PyTorch users the ability to do everything they can do on GPUs on Cloud TPUs as well while minimizing changes to the user experience. The project began with a conversation at NeurIPS 2017 and gathered momentum in 2018 when teams from Facebook and Google came together to create a proof of concept. We announced this collaboration at PTDC 2018 and made the PyTorch/XLA integration broadly available at PTDC 2019. The project already has 28 contributors, nearly 2k commits, and a repo that has been forked more than 100 times.
This release of [torch_xla](http://pytorch.org/xla/) is aligned and tested with PyTorch 1.5 to reduce friction for developers and to provide a stable and mature PyTorch/XLA stack for training models using Cloud TPU hardware. You can [try it for free](https://medium.com/pytorch/get-started-with-pytorch-cloud-tpus-and-colab-a24757b8f7fc) in your browser on an 8-core Cloud TPU device with [Google Colab](https://colab.research.google.com/), and you can use it at a much larger scaleon [Google Cloud](https://cloud.google.com/gcp).
@@ -48,9 +51,9 @@ torchaudio, torchvision, and torchtext complement PyTorch with common datasets,
### torchaudio 0.5
The torchaudio 0.5 release includes new transforms, functionals, and datasets. Highlights for the release include:
-* Added the Griffin-Lim functional and transform, `InverseMelScale` and `Vol` transforms, and `DB_to_amplitude`.
+* Added the Griffin-Lim functional and transform, `InverseMelScale` and `Vol` transforms, and `DB_to_amplitude`.
* Added support for `allpass`, `fade`, `bandpass`, `bandreject`, `band`, `treble`, `deemph`, and `riaa` filters and transformations.
-* New datasets added including `LJSpeech` and `SpeechCommands` datasets.
+* New datasets added including `LJSpeech` and `SpeechCommands` datasets.
See the release full notes [here](https://github.com/pytorch/audio/releases) and full docs can be found [here](https://pytorch.org/audio/).
@@ -58,7 +61,7 @@ See the release full notes [here](https://github.com/pytorch/audio/releases) and
The torchvision 0.6 release includes updates to datasets, models and a significant number of bug fixes. Highlights include:
* Faster R-CNN now supports negative samples which allows the feeding of images without annotations at training time.
-* Added `aligned` flag to `RoIAlign` to match Detectron2.
+* Added `aligned` flag to `RoIAlign` to match Detectron2.
* Refactored abstractions for C++ video decoder
See the release full notes [here](https://github.com/pytorch/vision/releases) and full docs can be found [here](https://pytorch.org/docs/stable/torchvision/index.html).
@@ -68,9 +71,9 @@ The torchtext 0.6 release includes a number of bug fixes and improvements to doc
* Fixed an issue related to the SentencePiece dependency in conda package.
* Added support for the experimental IMDB dataset to allow a custom vocab.
-* A number of documentation updates including adding a code of conduct and a deduplication of the docs on the torchtext site.
+* A number of documentation updates including adding a code of conduct and a deduplication of the docs on the torchtext site.
-Your feedback and discussions on the experimental datasets API are welcomed. You can send them to [issue #664](https://github.com/pytorch/text/issues/664). We would also like to highlight the pull request [here](https://github.com/pytorch/text/pull/701) where the latest dataset abstraction is applied to the text classification datasets. The feedback can be beneficial to finalizing this abstraction.
+Your feedback and discussions on the experimental datasets API are welcomed. You can send them to [issue #664](https://github.com/pytorch/text/issues/664). We would also like to highlight the pull request [here](https://github.com/pytorch/text/pull/701) where the latest dataset abstraction is applied to the text classification datasets. The feedback can be beneficial to finalizing this abstraction.
See the release full notes [here](https://github.com/pytorch/text/releases) and full docs can be found [here](https://pytorch.org/text/).
diff --git a/_posts/2020-5-5-updates-improvements-to-pytorch-tutorials.md b/_posts/2020-5-5-updates-improvements-to-pytorch-tutorials.md
index 1f0f8a9fc6d5..e5fbffbef2ed 100644
--- a/_posts/2020-5-5-updates-improvements-to-pytorch-tutorials.md
+++ b/_posts/2020-5-5-updates-improvements-to-pytorch-tutorials.md
@@ -2,14 +2,17 @@
layout: blog_detail
title: 'Updates & Improvements to PyTorch Tutorials'
author: Team PyTorch
+image: /assets/images/bert2.png
+tags: [five]
+preview: 'PyTorch.org provides researchers and developers with documentation, installation instructions, latest news, community projects, tutorials, and more. Today, we are introducing usability and content improvements including tutorials in additional categories, a new recipe format for quickly referencing common topics, sorting using tags, and an updated homepage.'
---
-PyTorch.org provides researchers and developers with documentation, installation instructions, latest news, community projects, tutorials, and more. Today, we are introducing usability and content improvements including tutorials in additional categories, a new recipe format for quickly referencing common topics, sorting using tags, and an updated homepage.
+PyTorch.org provides researchers and developers with documentation, installation instructions, latest news, community projects, tutorials, and more. Today, we are introducing usability and content improvements including tutorials in additional categories, a new recipe format for quickly referencing common topics, sorting using tags, and an updated homepage.
-Let’s take a look at them in detail.
+Let’s take a look at them in detail.
## TUTORIALS HOME PAGE UPDATE
-The tutorials home page now provides clear actions that developers can take. For new PyTorch users, there is an easy-to-discover button to take them directly to “A 60 Minute Blitz”. Right next to it, there is a button to view all recipes which are designed to teach specific features quickly with examples.
+The tutorials home page now provides clear actions that developers can take. For new PyTorch users, there is an easy-to-discover button to take them directly to “A 60 Minute Blitz”. Right next to it, there is a button to view all recipes which are designed to teach specific features quickly with examples.
@@ -26,7 +29,7 @@ The following additional resources can also be found at the bottom of the Tutori
* [PyTorch Examples](https://github.com/pytorch/examples)
* [Tutorial on GitHub](https://github.com/pytorch/tutorials)
-## PYTORCH RECIPES
+## PYTORCH RECIPES
Recipes are new bite-sized, actionable examples designed to teach researchers and developers how to use specific PyTorch features. Some notable new recipes include:
* [Loading Data in PyTorch](https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html)
* [Model Interpretability Using Captum](https://pytorch.org/tutorials/recipes/recipes/Captum_Recipe.html)
@@ -35,7 +38,7 @@ Recipes are new bite-sized, actionable examples designed to teach researchers an
View the full recipes [here](http://pytorch.org/tutorials/recipes/recipes_index.html).
## LEARNING PYTORCH
-This section includes tutorials designed for users new to PyTorch. Based on community feedback, we have made updates to the current [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) tutorial, one of our most popular tutorials for beginners. Upon completion, one can understand what PyTorch and neural networks are, and be able to build and train a simple image classification network. Updates include adding explanations to clarify output meanings and linking back to where users can read more in the docs, cleaning up confusing syntax errors, and reconstructing and explaining new concepts for easier readability.
+This section includes tutorials designed for users new to PyTorch. Based on community feedback, we have made updates to the current [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) tutorial, one of our most popular tutorials for beginners. Upon completion, one can understand what PyTorch and neural networks are, and be able to build and train a simple image classification network. Updates include adding explanations to clarify output meanings and linking back to where users can read more in the docs, cleaning up confusing syntax errors, and reconstructing and explaining new concepts for easier readability.
## DEPLOYING MODELS IN PRODUCTION
This section includes tutorials for developers looking to take their PyTorch models to production. The tutorials include:
@@ -45,7 +48,7 @@ This section includes tutorials for developers looking to take their PyTorch mod
* [Exploring a Model from PyTorch to ONNX and Running it using ONNX Runtime](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html)
## FRONTEND APIS
-PyTorch provides a number of frontend API features that can help developers to code, debug, and validate their models more efficiently. This section includes tutorials that teach what these features are and how to use them. Some tutorials to highlight:
+PyTorch provides a number of frontend API features that can help developers to code, debug, and validate their models more efficiently. This section includes tutorials that teach what these features are and how to use them. Some tutorials to highlight:
* [Introduction to Named Tensors in PyTorch](https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html)
* [Using the PyTorch C++ Frontend](https://pytorch.org/tutorials/advanced/cpp_frontend.html)
* [Extending TorchScript with Custom C++ Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html)
@@ -59,7 +62,7 @@ Deep learning models often consume large amounts of memory, power, and compute d
* [Static Quantization with Eager Mode in PyTorch](https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html)
## PARALLEL AND DISTRIBUTED TRAINING
-PyTorch provides features that can accelerate performance in research and production such as native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++. This section includes tutorials on parallel and distributed training:
+PyTorch provides features that can accelerate performance in research and production such as native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++. This section includes tutorials on parallel and distributed training:
* [Single-Machine Model Parallel Best Practices](https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html)
* [Getting started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)
* [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html)
diff --git a/_posts/2020-7-28-pytorch-1.6-released.md b/_posts/2020-7-28-pytorch-1.6-released.md
index d1a18284dc69..1bf36e61e665 100644
--- a/_posts/2020-7-28-pytorch-1.6-released.md
+++ b/_posts/2020-7-28-pytorch-1.6-released.md
@@ -2,24 +2,27 @@
layout: blog_detail
title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintainers for Windows'
author: Team PyTorch
+image: /assets/images/bert2.png
+tags: [five]
+preview: 'Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at [Microsoft is now maintaining Windows builds and binaries](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch) and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums.'
---
Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at [Microsoft is now maintaining Windows builds and binaries](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch) and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums.
-The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training.
-A few of the highlights include:
+The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training.
+A few of the highlights include:
-1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) for more details) - thanks for NVIDIA’s contributions;
-2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning;
+1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) for more details) - thanks for NVIDIA’s contributions;
+2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning;
3. Added support for complex tensors to the frontend API surface;
4. New profiling tools providing tensor-level memory consumption information;
5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages.
-Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here](https://github.com/pytorch/pytorch/releases).
+Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here](https://github.com/pytorch/pytorch/releases).
# Performance & Profiling
-## [Stable] Automatic Mixed Precision (AMP) Training
+## [Stable] Automatic Mixed Precision (AMP) Training
AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported `torch.cuda.amp` API, AMP provides convenience methods for mixed precision, where some operations use the `torch.float32 (float)` datatype and other operations use `torch.float16 (half)`. Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype.
@@ -27,7 +30,7 @@ AMP allows users to easily enable automatic mixed precision training enabling hi
* Documentation ([Link](https://pytorch.org/docs/stable/amp.html))
* Usage examples ([Link](https://pytorch.org/docs/stable/notes/amp_examples.html))
-## [Beta] Fork/Join Parallelism
+## [Beta] Fork/Join Parallelism
This release adds support for a language-level construct as well as runtime support for coarse-grained parallelism in TorchScript code. This support is useful for situations such as running models in an ensemble in parallel, or running bidirectional components of recurrent nets in parallel, and allows the ability to unlock the computational power of parallel architectures (e.g. many-core CPUs) for task level parallelism.
@@ -48,10 +51,10 @@ def example(x):
print(example(torch.ones([])))
```
-
+
* Documentation ([Link](https://pytorch.org/docs/stable/jit.html))
-## [Beta] Memory Profiler
+## [Beta] Memory Profiler
The `torch.autograd.profiler` API now includes a memory profiler that lets you inspect the tensor memory cost of different operators inside your CPU and GPU models.
@@ -83,7 +86,7 @@ print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))
* PR ([Link](https://github.com/pytorch/pytorch/pull/37775))
* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler))
-# Distributed Training & RPC
+# Distributed Training & RPC
## [Beta] TensorPipe backend for RPC
@@ -103,11 +106,11 @@ torch.distributed.rpc.rpc_sync(...)
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/35251))
* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html))
-## [Beta] DDP+RPC
+## [Beta] DDP+RPC
PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features worked independently and users couldn’t mix and match these to try out hybrid parallelism paradigms.
-Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet.
+Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet.
```python
// On each trainer
@@ -139,11 +142,11 @@ def async_add_chained(to, x, y, z):
)
ret = rpc.rpc_sync(
- "worker1",
- async_add_chained,
+ "worker1",
+ async_add_chained,
args=("worker2", torch.ones(2), 1, 1)
)
-
+
print(ret) # prints tensor([3., 3.])
```
@@ -153,15 +156,15 @@ print(ret) # prints tensor([3., 3.])
# Frontend API Updates
-## [Beta] Complex Numbers
+## [Beta] Complex Numbers
-The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community.
+The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community.
# Updated Domain Libraries
-## torchvision 0.7
+## torchvision 0.7
-torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ResNet50](https://arxiv.org/abs/1411.4038) and [DeepLabV3 ResNet50](https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy.
+torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ResNet50](https://arxiv.org/abs/1411.4038) and [DeepLabV3 ResNet50](https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy.
* Release notes ([Link](https://github.com/pytorch/vision/releases))
@@ -178,10 +181,10 @@ torchaudio now officially supports Windows. This release also introduces a new m
The Global PyTorch Summer Hackathon is back! This year, teams can compete in three categories virtually:
1. **PyTorch Developer Tools:** Tools or libraries designed to improve productivity and efficiency of PyTorch for researchers and developers
- 2. **Web/Mobile Applications powered by PyTorch:** Applications with web/mobile interfaces and/or embedded devices powered by PyTorch
+ 2. **Web/Mobile Applications powered by PyTorch:** Applications with web/mobile interfaces and/or embedded devices powered by PyTorch
3. **PyTorch Responsible AI Development Tools:** Tools, libraries, or web/mobile apps for responsible AI development
-This is a great opportunity to connect with the community and practice your machine learning skills.
+This is a great opportunity to connect with the community and practice your machine learning skills.
* [Join the hackathon](http://pytorch2020.devpost.com/)
* [Watch educational videos](https://www.youtube.com/pytorch)
@@ -189,11 +192,11 @@ This is a great opportunity to connect with the community and practice your mach
## LPCV Challenge
-The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline is coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+.
+The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline is coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+.
## Prototype Features
-To reiterate, Prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to Beta or Stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback.
+To reiterate, Prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to Beta or Stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback.
#### Distributed RPC/Profiler
Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source).
diff --git a/_sass/blog.scss b/_sass/blog.scss
index 7e898e502bf0..7d439dc16cd1 100644
--- a/_sass/blog.scss
+++ b/_sass/blog.scss
@@ -62,7 +62,7 @@
}
@include desktop {
margin-top: 380px + $desktop_header_height;
- .row.blog-index
+ /*.row.blog-index
[class*="col-"]:not(:first-child):not(:last-child):not(:nth-child(3n)) {
padding-right: rem(35px);
padding-left: rem(35px);
@@ -74,7 +74,7 @@
.row.blog-index [class*="col-"]:nth-child(3n + 1) {
padding-right: rem(35px);
- }
+ }*/
.col-md-4 {
margin-bottom: rem(23px);
@@ -139,7 +139,7 @@
overflow: unset;
white-space: unset;
text-overflow: unset;
- }
+ }
}
h1 {
@@ -221,7 +221,7 @@
}
}
- .page-link {
+ .page-link, .all-blogs {
font-size: rem(20px);
letter-spacing: 0;
line-height: rem(34px);
@@ -230,6 +230,37 @@
text-align: center;
}
+ .all-blogs {
+ width: inherit;
+ padding: 0.5rem 3.75rem;
+ color: $dark_grey;
+ &:hover {
+ color: $orange;
+ }
+ }
+
+ .dropdown {
+ margin-bottom: 3rem;
+ }
+
+ #dropdownMenuButton {
+ cursor: pointer;
+ position: absolute;
+ right: 0;
+ bottom: 1rem;
+ z-index: 1;
+ top: inherit;
+ max-width: 4rem;
+ border: none;
+ background: inherit;
+ padding: inherit;
+ }
+
+ .dropdown-item:hover {
+ color: $orange;
+ cursor: pointer;
+ }
+
@media (max-width: 1067px) {
.jumbotron {
h1 {
@@ -271,3 +302,17 @@ twitterwidget {
margin-bottom: rem(18px) !important;
}
+.blog .pagination {
+ .page {
+ border: 1px solid #dee2e6;
+ padding: 0.5rem 0.75rem;
+ }
+
+ .active .page {
+ background-color: #dee2e6;
+ }
+}
+
+.blog .blog-img {
+ border: 1px solid $dark_grey;
+}
diff --git a/assets/filter-hub-tags.js b/assets/filter-hub-tags.js
index 65e59f0339d0..dfb5cd80e99f 100644
--- a/assets/filter-hub-tags.js
+++ b/assets/filter-hub-tags.js
@@ -2,11 +2,21 @@ var filterScript = $("script[src*=filter-hub-tags]");
var listId = filterScript.attr("list-id");
var displayCount = Number(filterScript.attr("display-count"));
var pagination = filterScript.attr("pagination");
+var options;
+
+if (listId == "all-blog-posts") {
+ options = {
+ valueNames: [{ data: ["tags"] }],
+ page: displayCount
+ };
+}
+else {
+ options = {
+ valueNames: ["github-stars-count-whole-number", { data: ["tags", "date-added", "title"] }],
+ page: displayCount
+ };
+}
-var options = {
- valueNames: ["github-stars-count-whole-number", { data: ["tags", "date-added", "title"] }],
- page: displayCount
-};
$(".next-news-item").on("click" , function(){
$(".pagination").find(".active").next().trigger( "click" );
@@ -101,3 +111,19 @@ $("#sortTitleLow").on("click", function() {
$("#sortTitleHigh").on("click", function() {
hubList.sort("title", { order: "asc" });
});
+
+// Filter the blog posts based on the selected tag
+
+$(".blog-filter-btn").on("click", function() {
+ filterBlogPosts($(this).data("tag"));
+});
+
+function filterBlogPosts(tag) {
+ hubList.filter(function (item) {
+ if (item.values().tags == tag) {
+ return true;
+ } else {
+ return false;
+ }
+ });
+}
diff --git a/blog.html b/blog.html
deleted file mode 100644
index e39a2a2a555c..000000000000
--- a/blog.html
+++ /dev/null
@@ -1,10 +0,0 @@
----
-layout: blog
-title: Blog
-permalink: /blog/
-body-class: blog
-redirect_from: "/blog/categories/"
-pagination:
- enabled: true
- permalink: /:num/
----
diff --git a/blog/all-posts.html b/blog/all-posts.html
new file mode 100644
index 000000000000..a9b442c493e4
--- /dev/null
+++ b/blog/all-posts.html
@@ -0,0 +1,50 @@
+---
+layout: blog
+title: Blog
+permalink: /blog/all-posts
+body-class: blog
+---
+
+{% assign posts = site.posts %}
+
+{% include blog_jumbotron.html posts=posts %}
+
+{{ post.preview | truncate: 150 }}
+{{ post.date | date: '%B %d, %Y' }}
+{{ post.date | date: '%B %d, %Y' }}
+{{ post.excerpt | remove: '
' | remove: '
' | truncate: 500 | strip_html }} +