Waifu2x/Readme.md · yangheng/Waifu2X-Image-Scale at main (2024)

Re-implementation on the original waifu2x in PyTorch with additional super resolution models. This repo is mainly used to explore interesting super resolution models. User-friendly tools may not be available now ><.

Dependencies

  • Python 3x
  • PyTorch >= 1 ( > 0.41 shall also work, but not guarantee)
  • Nvidia/Apex (used for mixed precision training, you may use the python codes directly)

Optinal: Nvidia GPU. Model inference (32 fp only) can run in cpu only.

What's New

How to Use

Compare the input image and upscaled image

from utils.prepare_images import *from Models import *from torchvision.utils import save_imagemodel_cran_v2 = CARN_V2(color_channels=3, mid_channels=64, conv=nn.Conv2d, single_conv_size=3, single_conv_group=1, scale=2, activation=nn.LeakyReLU(0.1), SEBlock=True, repeat_blocks=3, atrous=(1, 1, 1)) model_cran_v2 = network_to_half(model_cran_v2)checkpoint = "model_check_points/CRAN_V2/CARN_model_checkpoint.pt"model_cran_v2.load_state_dict(torch.load(checkpoint, 'cpu'))# if use GPU, then comment out the next line so it can use fp16. model_cran_v2 = model_cran_v2.float() demo_img = "input_image.png"img = Image.open(demo_img).convert("RGB")# originimg_t = to_tensor(img).unsqueeze(0) # used to compare the originimg = img.resize((img.size[0] // 2, img.size[1] // 2), Image.BICUBIC) # overlapping split# if input image is too large, then split it into overlapped patches # details can be found at [here](https://github.com/nagadomi/waifu2x/issues/238)img_splitter = ImageSplitter(seg_size=64, scale_factor=2, boarder_pad_size=3)img_patches = img_splitter.split_img_tensor(img, scale_method=None, img_pad=0)with torch.no_grad(): out = [model_cran_v2(i) for i in img_patches]img_upscale = img_splitter.merge_img_tensor(out)final = torch.cat([img_t, img_upscale])save_image(final, 'out.png', nrow=2)

Training

If possible, fp16 training is preferred because it is much faster with minimal quality decrease.

Sample training script is available in train.py, but you may need to change some liens.

Image Processing

Original images are all at least 3k x 3K. I downsample them by LANCZOS so that one side has at most 2048, then I randomly cut them into 256x256 patches as target and use 128x128 with jpeg noise as input images. All input patches have at least 14 kb, and they are stored in SQLite with BLOB format. SQlite seems to have better performance than file system for small objects. H5 file format may not be optimal because of its larger size.

Although convolutions can take in any sizes of images, the content of image matters. For real life images, small patches may maintain color,brightness, etc variances in small regions, but for digital drawn images, colors are added in block areas. A small patch may end up showing entirely one color, and the model has little to learn.

For example, the following two plots come from CARN and have the same settings, including initial parameters. Both training loss and ssim are lower for 64x64, but they perform worse in test time compared to 128x128.

Downsampling methods are uniformly chosen among [PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS] , so different patches in the same image might be down-scaled in different ways.

Image noise are from JPEG format only. They are added by re-encoding PNG images into PIL's JPEG data with various quality. Noise level 1 means quality ranges uniformly from [75, 95]; level 2 means quality ranges uniformly from [50, 75].

Models

Models are tuned and modified with extra features.

From Waifu2x

Models Comparison

Images are from Key: サマボケ(Summer Pocket).

The left column is the original image, and the right column is bicubic, DCSCN, CRAN_V2

Scores

The list will be updated after I add more models.

Images are twitter icons (PNG) from Key: サマボケ(Summer Pocket). They are cropped into non-overlapping 96x96 patches and down-scaled by 2. Then images are re-encoded into JPEG format with quality from [75, 95]. Scores are PSNR and MS-SSIM.

Total ParametersBICUBICRandom*
CRAN V22,149,60734.0985 (0.9924)34.0509 (0.9922)
DCSCN 121,889,97431.5358 (0.9851)31.1457 (0.9834)
Upconv 7552,48031.4566 (0.9788)30.9492 (0.9772)

*uniformly select down scale methods from Image.BICUBIC, Image.BILINEAR, Image.LANCZOS.

DCSCN

Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network

DCSCN is very interesting as it has relatively quick forward computation, and both the shallow model (layerr 8) and deep model (layer 12) are quick to train. The settings are different from the paper.

  • I use exponential decay to decrease the number of feature filters in each layer. Here is the original filter decay method.

  • I also increase the reconstruction filters from 48 to 128.

  • All activations are replaced by SELU. Dropout and weight decay are not added neither because they significantly increase the training time.

  • The loss function is changed from MSE to L1. According to Loss Functions for Image Restoration with NeuralNetworks, L1 seems to be more robust and converges faster than MSE. But the authors find the results from L1 and MSE are similar.

I need to thank jiny2001 (one of the paper's author) to test the difference of SELU and PRELU. SELU seems more stable and has fewer parameters to train. It is a good drop in replacement

layers=8, filters=96 and dataset=yang91+bsd200. The details can be found in here.

A pre-trained 12-layer model as well as model parameters are available. The model run time is around 3-5 times of Waifu2x. The output quality is usually visually indistinguishable, but its PSNR and SSIM are bit higher. Though, such comparison is not fair since the 12-layer model has around 1,889,974 parameters, 5 times more than waifu2x's Upconv_7 model.

CARN

Channels are set to 64 across all blocks, so residual adds are very effective. Increase the channels to 128 lower the loss curve a little bit but doubles the total parameters from 0.9 Millions to 3 Millions. 32 Channels has much worse performance. Increasing the number of cascaded blocks from 3 to 5 doesn't lower the loss a lot.

SE Blocks seems to have the most obvious improvement without increasing the computation a lot. Partial based padding seems have little effect if not decrease the quality. Atrous convolution is slower about 10%-20% than normal convolution in Pytorch 1.0, but there are no obvious improvement.

Another more effective model is to add upscaled input image to the final convolution. A simple bilinear upscaled image seems sufficient.

More examples on model configurations can be found in docs/CARN folder

Waifu2x Original Models

Models can load waifu2x's pre-trained weights. The function forward_checkpoint sets the nn.LeakyReLU to compute data inplace.

Upconv_7

Original waifu2x's model. PyTorch's implementation with cpu only is around 5 times longer for large images. The output images have very close PSNR and SSIM scores compared to images generated from the caffe version , thought they are not identical.

Vgg_7

Not tested yet, but it is ready to use.

Waifu2x/Readme.md · yangheng/Waifu2X-Image-Scale at main (2024)

References

Top Articles
Public Storage hiring Customer Service-Self Storage Manager in Fort Walton Beach, Florida, United States | LinkedIn
"Big Brother" 2024: Ein Container-Baby? Christian und Maja sprechen über Kinder-Wunsch
Use Copilot in Microsoft Teams meetings
Zabor Funeral Home Inc
Myexperience Login Northwell
Practical Magic 123Movies
2024 Fantasy Baseball: Week 10 trade values chart and rest-of-season rankings for H2H and Rotisserie leagues
Wausau Marketplace
Mail Healthcare Uiowa
Www Thechristhospital Billpay
Wordscape 5832
Blog:Vyond-styled rants -- List of nicknames (blog edition) (TouhouWonder version)
Missed Connections Dayton Ohio
Chastity Brainwash
Comics Valley In Hindi
Best Uf Sororities
Spider-Man: Across The Spider-Verse Showtimes Near Marcus Bay Park Cinema
Ibukunore
Nearest Walgreens Or Cvs Near Me
Maxpreps Field Hockey
Puretalkusa.com/Amac
Putin advierte que si se permite a Ucrania usar misiles de largo alcance, los países de la OTAN estarán en guerra con Rusia - BBC News Mundo
[PDF] NAVY RESERVE PERSONNEL MANUAL - Free Download PDF
Seeking Arrangements Boston
Understanding Gestalt Principles: Definition and Examples
Gina Wilson Angle Addition Postulate
Www.craigslist.com Austin Tx
Defending The Broken Isles
Kirsten Hatfield Crime Junkie
Maisons près d'une ville - Štanga - Location de vacances à proximité d'une ville - Štanga | Résultats 201
Mjc Financial Aid Phone Number
Taylored Services Hardeeville Sc
Paradise Point Animal Hospital With Veterinarians On-The-Go
Culver's Hartland Flavor Of The Day
Tra.mypatients Folio
The Land Book 9 Release Date 2023
Final Exam Schedule Liberty University
Empire Visionworks The Crossings Clifton Park Photos
Msnl Seeds
Jail View Sumter
301 Priest Dr, KILLEEN, TX 76541 - HAR.com
Ezpawn Online Payment
Sound Of Freedom Showtimes Near Lewisburg Cinema 8
Tunica Inmate Roster Release
Divinity: Original Sin II - How to Use the Conjurer Class
Juiced Banned Ad
Pike County Buy Sale And Trade
Alba Baptista Bikini, Ethnicity, Marriage, Wedding, Father, Shower, Nazi
Myapps Tesla Ultipro Sign In
St Als Elm Clinic
Tyrone Unblocked Games Bitlife
Die 10 wichtigsten Sehenswürdigkeiten in NYC, die Sie kennen sollten
Latest Posts
Article information

Author: Otha Schamberger

Last Updated:

Views: 6495

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.