Summary on Learning PyTorch
Suggested Tutorial
Intro to PyTorch(Simplified Chinese)
About Model Evaluation Metrics
Common metrics to evaluate machine learning models include:
- Accuracy: Percentage of correct predictions
- Precision: True positives / (True positives + False positives)
- Recall: True positives / (True positives + False negatives)
- F1 Score: Harmonic mean of precision and recall
- ROC Curve: True Positive Rate vs False Positive Rate
- AUC: Area under the ROC curve
PyTorch provides tools to calculate these metrics through torchmetrics
library.
The tutorial also discussed about the above metrics, with custom algorithms.
About Tensor
Suggestions on understanding tensor:
- treat Tensor as an extended Array/Vector/Matrix, with multiple dimensions.
- try not to think of its “visual” representation, like a common matrix or 1-D array. Rather, think of the “meaning” for each dimension. For example, for a single picture, it can be represented by a 3-D tensor:
(width, height, channel)
. For a more specific one, let’s say, a random RGB image of size 256*256, should be initialized bytorch.rand(256, 256, 3)
. Furthermore, for a batch of pictures, we could use a 4-D tensor, by addinig another dimension to represent the “total number” of pictures,(number_of_pics, width, height, channel)
. - See PyTorch doc for further details on how to operate on those tensors exactly.
- I don’t like “broadcasting”, ‘cause I think it’s a very dangerous feature, that will introduce hidden errors.
About Autograd
- Cool stuff! I finally understand how exactly those gradients are calculated. Keep in mind of
require_grads=True
andgrad_fn
something. - Here’s some math thing happening, keep in mind of the Jaccob Matrix: suppose a vector function $y = f(x)$, the gradients between $y$ and $x$ is a jaccob matrix. For function $f$: $R^n -> R^m$, the size of its Jaccob matrix is $(m, n)$:
- Also about chain rule: suppose $v$ is the gradient of $l = g(y)$,
then we can get
\[v J=\left(\begin{array}{lll}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)\left(\begin{array}{ccc}\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}\end{array}\right)=\left(\begin{array}{lll}\frac{\partial l}{\partial x_{1}} & \cdots & \frac{\partial l}{\partial x_{n}}\end{array}\right)\](J here is the matrix to calc the gradient between $y$ and $x$ just discussed above)
- Remember that gradient is accumulated.
- Though for most of the time, we(actually I mean myself, as a ‘not-very-interested-in-maths-guy’ involved in traditional software security) just need to use the API call instead of studying its mathematical or implementation details, its still kinda fun though.
About Training in Multiple Graphic Cards
This section is skipped because:
- I don’t have multiple graphic cards right now.
- I don’t care about how to make the training process more efficient.
- They are all wrapped in a convenient API:
nn.DataParallel(model)
. (Data parallelism only) - Distributed Data Parallel is kinda difficult for me.