Splitting Images

Jun 13, 2020

In each version of AutoComic, handling comics larger than one page has been a difficult problem. If a comic image is taller than the pdf page size, the image must be split into multiple parts.

Bash Version

The original bash version of AutoComic split every image regardless of size. Since it had to manually create pages (rather than letting the LaTeX compiler handle them), this made sense. It also led to a lot of problem pages. In addition to all of the problems from the next version (listed below), it handled text as part of the badly split images.

Original Python Version

The first python version of AutoComic only split images larger than the page size (which improved the look of the pages significantly). It calculated split points using each line's coefficient of variation. Lines with a coefficient lower than some threshold were chosen as split points. Although this approach worked in some capacity, it also led to a lot of problems, such as:

Lower contrast stylized comics were split more than highly detailed comics.
Darker lines were less likely to be split than lighter lines (since the average of a nearly black line is near zero, the coefficient of variation is very high)
Titles could be on a different page than the associated comic (as could mouseover text).
Parts of images could be very small (only a few pixels tall) or very large (too large for the page) depending on the comic.
The original LaTeX code for placing images directly to one another was inconsistent and could leave gaps or overlap images.

Current Version

The current version uses a much more sophisticated algorithm that leads to better results. For example:

Images from Order of the Stick by Rich Burlew.

Images from Order of the Stick by Rich Burlew.

This new algorithm has three main improvements: choosing better lines, minimizing the number of lines that need to be split, and allowing vertical splits.

Minimizing Split Lines

Instead of splitting along every "fit" line, the script chooses the "best" split. It then splits each half recursively until each section is smaller than a page. This has two positive effects: (a) the script only splits images where it needs to and (b) small images are unlikely. This change by itself improved the quality of the images significantly.

Choosing Better Lines

The original algorithm had 2 main problems for choosing lines to split:

Constants (namely the threshold) needed to vary from image to image but did not.
The coefficient of variation was not a good measure of where images should be split.

To fix both of these problems, the "best" line is selected using a linear combination of three factors:

How many color changes are in the row
How different the line is from the rest of the image
How close a line is to the center of the image

All three factors are image relative, meaning that constants do not need to be changed for each image. These numbers reflect how people view images and are much more likely to find a break between panels.

Color Changes In Each Row

If a line is entirely one color, the number of color changes is 0. If a line has a lot going on, this number will be high and make for an unlikely split point. Thus, a solid line between panels will have few color changes and be a likely split point.

How different the line is from the rest of the image

This factor is relatively simple. It's the difference between the average color for the image and the average color for the line (and then averaged across the RGB channels). Panel splits are usually a different color than the rest of the image.

How Close a Line Is To the Center of the Image

In general, lines near the edge of the image should be avoided. All other factors being equal, a line at the center of the image is the best split and a line near at edge is the worst. However, since my main focus was avoiding the edge of an image, I used the following equation:

-(abs((2 / imageHeight) * index - 1) ** 3) + 1

Since this is a cubic function, values at the edge will be very low but values not near the edge will be higher. For example, if an image is 500 pixels high, the values would be:

Row	Value
0	0
1	0.012
100	0.784
200	0.992
250	1
300	0.992
400	0.784
499	0.012
500	0

This discourages choosing values near the edge. Values not near the edge, even if they are not very close to the middle, are nearly equal.

Allowing Vertical Splits

If an image is significantly wider than the page, a vertical split also makes sense. This is simply done by rotating the image in memory, splitting it normally, and then rotating each result.

Conclusion

Using these methods, the pdf output is much more reasonable than before. Iterating on this algorithm allowed me to have many ideas that created reasonable splits.