Images in PDFs can be saved in one of two ways: As regular (generally jpeg) images, or as masks. JPEG can not save transparency information, so the image will have a white or black background, which makes using the image in a VTT look a little ugly. The mask allows for transparency, but since they're two separate files they have to be combined into a PNG before use.
In order to extract the images and then combine them for transparency we're going to need to use the Linux command line.
You don't need a dedicated Linux box or partition.
Windows 10 has an official subsystem that allows you to use the command line without requiring a Virtual Machine.
If you already have access to a Linux command line, just install ImageMagick and Poppler-Utils and use the script below.
Setting up the Windows Subsystem for Linux
https://docs.microsoft.com/en-us/window ... tall-win10
https://wiki.ubuntu.com/WSL
NOTE: This can not be done on Windows 10 "S Mode".
1. Open Windows Powershell as Administrator
2. Enter the following:
3. Reboot Windows.
4. Open the Windows Store (Microsoft Store) and search for Ubuntu.
5. Click 'Get'.
6. After it has downloaded, 'Get' will change to 'Launch'.
7. Click 'Launch'. It will take a few minutes to set up the first time. It will then prompt for a username and password. Make up whatever you want.
You should see something that looks like this:
To open the command line in the future, you may want to Pin 'Ubuntu' to the Start menu.
First, you'll want to ensure all of your programs are up to date. To do that, type the following:
The 'sudo' command tells Linux to execute the following commands as an administrator (root). You'll be prompted for your password.
This may take a little bit, and you may be prompted to reboot your system.
You can either reboot your entire PC, or just use Powershell (run as Administrator). Wait about 3 seconds after the completion of 'net stop LxssManager' before running 'net start LxssManager'
Now you need to install the utilities the script will call. To do that, in the Ubuntu prompt type the following:
Your Windows drive(s) will be mounted under '/mnt/'. If you don't know what that means, read "Navigating the Linux command line" below:
Red Hat article on Navigation
NOTE: Linux commands use white space as separators. You need to enclose file and folder names with double quotation marks if they contain whitespace.
You can autocomplete by pressing TAB. If there is more than one possible auto completion you may hear a ding. Press TAB twice to show all possible completions.
'pwd' displays the current directory that you're in in.
'mkdir' creates a new directory. 'mkdir new folder' will create TWO folders, "new" and "folder". To create a folder with spaces (whitespace), type
'ls' lists all files and folders in the current directory. You can give it the name of a folder to view the contents of that folder. Examples:
There is no output for the last command because the folder is empty.
'/' is the root directory. Everything else is mounted here.
'cd' changes your current directory. It requires arguments. '..' means one directory up. '/' at the beginning means the root directory. Running 'cd' without any arguments will return you to your home directory (the one you start in when you launch Ubuntu from Windows).
You can access files on your Windows computer by navigating to '/mnt/'
I have two drives, C: and S:
They are mounted under /mnt/ as '/mnt/c' and '/mnt/s'
Once you're able to navigate to your Desktop, make a directory and enter it.
On your Windows Desktop, a folder should have appeared. Make a copy of your target pdf in that folder.
I tested this script with Paizo's free 'Skittershot' Starfinder module, available from their website at https://paizo.com/products/btpya1aa?Sta ... itter-Shot
After downloading the Skitter Shot pdf, rename it to "Skitter Shot.pdf". All of the image files extracted will be prefaced with the name of the PDF, so you don't want the PDF to have a long name. I would use AA1, AA2, and Core; for Alien Archive 1, Alien Archive 2, and the Core Rulebook.
Download this file into the folder.
Run the script. Use the filename of the pdf as an argument.
This will take a few minutes, and Skitter Shot is only 20 pages.
You may notice that "Skitter Shot" is "Skitter\ Shot.pdf" without double quotes. That's another way to deal with whitespace, putting a "\" before the character. If you use TAB to autocomplete, it will add those in for you.
https://wiki.ubuntu.com/WSL
NOTE: This can not be done on Windows 10 "S Mode".
1. Open Windows Powershell as Administrator
2. Enter the following:
Code: Select all
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
4. Open the Windows Store (Microsoft Store) and search for Ubuntu.
5. Click 'Get'.
6. After it has downloaded, 'Get' will change to 'Launch'.
7. Click 'Launch'. It will take a few minutes to set up the first time. It will then prompt for a username and password. Make up whatever you want.
You should see something that looks like this:
To open the command line in the future, you may want to Pin 'Ubuntu' to the Start menu.
First, you'll want to ensure all of your programs are up to date. To do that, type the following:
Code: Select all
sudo apt update
sudo apt upgrade
This may take a little bit, and you may be prompted to reboot your system.
You can either reboot your entire PC, or just use Powershell (run as Administrator). Wait about 3 seconds after the completion of 'net stop LxssManager' before running 'net start LxssManager'
Code: Select all
net stop LxssManager
net start LxssManager
Now you need to install the utilities the script will call. To do that, in the Ubuntu prompt type the following:
Code: Select all
sudo apt install poppler-utils
sudo apt install imagemagick
Navigating the Linux command line
Red Hat article on Navigation
NOTE: Linux commands use white space as separators. You need to enclose file and folder names with double quotation marks if they contain whitespace.
You can autocomplete by pressing TAB. If there is more than one possible auto completion you may hear a ding. Press TAB twice to show all possible completions.
'pwd' displays the current directory that you're in in.
Code: Select all
:~$ pwd
/home/username
:~$
Code: Select all
:~$ mkdir "new folder"
Code: Select all
:~$ ls
'new folder'
:~$ ls "new folder"
:~$
Code: Select all
:~$ ls /
bin dev home lib lib64 media opt root sbin srv tmp var
boot etc init lib32 libx32 mnt proc run snap sys usr
'cd' changes your current directory. It requires arguments. '..' means one directory up. '/' at the beginning means the root directory. Running 'cd' without any arguments will return you to your home directory (the one you start in when you launch Ubuntu from Windows).
You can access files on your Windows computer by navigating to '/mnt/'
I have two drives, C: and S:
They are mounted under /mnt/ as '/mnt/c' and '/mnt/s'
Code: Select all
:~$ cd /mnt
:~$ ls
c s
:~$
On your Windows Desktop, a folder should have appeared. Make a copy of your target pdf in that folder.
I tested this script with Paizo's free 'Skittershot' Starfinder module, available from their website at https://paizo.com/products/btpya1aa?Sta ... itter-Shot
After downloading the Skitter Shot pdf, rename it to "Skitter Shot.pdf". All of the image files extracted will be prefaced with the name of the PDF, so you don't want the PDF to have a long name. I would use AA1, AA2, and Core; for Alien Archive 1, Alien Archive 2, and the Core Rulebook.
Download this file into the folder.
use the 'mv' command to rename "imageExtraction.txt" to "imageExtraction.sh"NOTE: You're about to run a file provided by some rando on the internet. Open it in notepad, and even if you don't understand most of it make sure it doesn't have anything sketchy in it, like IP address calls, or URLs.
Normally, use of the 'rm' and 'mv' command would be suspect because that will permanently delete a file. I use 'rm' it in this script to cleanup extraneous image files after transparencies have been composited and 'mv' for overwriting temporary log files the script creates.
Check the use of 'cd' commands and see if the script tries to escape the current directory to do unknown nasties.
Code: Select all
:~$ mv imageExtraction.txt imageExtraction.sh
Code: Select all
:~$ ./imageExtraction.sh ./imageExtraction.sh Skitter\ Shot.pdf
You may notice that "Skitter Shot" is "Skitter\ Shot.pdf" without double quotes. That's another way to deal with whitespace, putting a "\" before the character. If you use TAB to autocomplete, it will add those in for you.
Cleanup
Assuming you used the Skitter Shot pdf, the folder will be "SkitterShot.tmp".
Open it up, and you should see the following: There's a lot of stuff here.
First off, the naming structure for all of the files is
Next, you'll notice that there are a lot of duplicate images. If you scroll down, you'll find "SkitterShot-015-165.png", which is an uncropped version of "SkitterShot-003-031.png" at the top.
You'll also see the many, many, many red line images. Most of them are exact duplicates of one another, but some are different by a few pixels in width and/or height.
For exact duplicates (pixel for pixel exact matches), you can search for 'duplicate file cleaner'. I use fslint for Linux, and CCleaner for Windows.
But for cropper/rotated images, you'll need a dedicated image duplication cleaner.
AntiDupl is free, and available for Windows at https://sourceforge.net/projects/antidupl/
Image duplication cleaners aren't perfect, so they provide you with images they believe to be duplicates and require you to confirm the deletion. To make that take less time, you should run CCleaner or FSlint first to eliminate the exact, byte-for-byte duplicates.
2. Select the only folder (It defaults to the folder that the program is located in)
3. Click "Change" in the lower right corner.
4. Select the SkitterShot.tmp folder.
5. Click "Start Search" (Green Arrow , two buttons to the left of the "Paths" icon. The author's website has good info on how to use the interface: https://ermig1979.github.io/AntiDupl/da ... eview.html
The short of it, is the NumPad will be your friend here. Num1 will delete the first image, Num2 will delete the second image, and Num5 will not delete either image. Hover over the icons to see what they mean and what their shortcuts are.
The dimensions column will highlight the smaller image in red. Generally, you want to keep the largest image available and then scale that one image down as you need.
1. Click on the "Paths" icon
Finally, you'll notice a lot of weird misc images.
That's just how they were saved in the PDF for whatever editor's reasons. There's no scripts or programs for these, just delete them.
From what's left, remember that the file nomenclature is "PDF-SOURCE PAGE-UNNECESSARY NUMBER", so when you start renaming the files, use the page number as a reference for what the NPC's name is, or if there's extra info you need about an image.
Open it up, and you should see the following: There's a lot of stuff here.
First off, the naming structure for all of the files is
Code: Select all
PDF NAME-PAGE NUMBER-UNIQUE NUMBER.EXTENSION
You'll also see the many, many, many red line images. Most of them are exact duplicates of one another, but some are different by a few pixels in width and/or height.
For exact duplicates (pixel for pixel exact matches), you can search for 'duplicate file cleaner'. I use fslint for Linux, and CCleaner for Windows.
But for cropper/rotated images, you'll need a dedicated image duplication cleaner.
AntiDupl is free, and available for Windows at https://sourceforge.net/projects/antidupl/
Image duplication cleaners aren't perfect, so they provide you with images they believe to be duplicates and require you to confirm the deletion. To make that take less time, you should run CCleaner or FSlint first to eliminate the exact, byte-for-byte duplicates.
CCleaner Instructions
1. After opening CCleaner, navigate to Tools -> Duplicate Finder.
3. Click Search 4. Right click on any of the files and click on "Select All". This will not select all. It will select all but one of each duplicate.
5. Click "Delete Selected" in the bottom right corner of the window.
6. Confirm.
2. Ensure that all of your other drives in the Include pane are unchecked, then click Add, and add the SkitterShot.tmp folder.3. Click Search 4. Right click on any of the files and click on "Select All". This will not select all. It will select all but one of each duplicate.
5. Click "Delete Selected" in the bottom right corner of the window.
6. Confirm.
AntiDupl Instructions
2. Select the only folder (It defaults to the folder that the program is located in)
3. Click "Change" in the lower right corner.
4. Select the SkitterShot.tmp folder.
5. Click "Start Search" (Green Arrow , two buttons to the left of the "Paths" icon. The author's website has good info on how to use the interface: https://ermig1979.github.io/AntiDupl/da ... eview.html
The short of it, is the NumPad will be your friend here. Num1 will delete the first image, Num2 will delete the second image, and Num5 will not delete either image. Hover over the icons to see what they mean and what their shortcuts are.
The dimensions column will highlight the smaller image in red. Generally, you want to keep the largest image available and then scale that one image down as you need.
From what's left, remember that the file nomenclature is "PDF-SOURCE PAGE-UNNECESSARY NUMBER", so when you start renaming the files, use the page number as a reference for what the NPC's name is, or if there's extra info you need about an image.
The Script
I haven't coded non-gcode since I was a teenager, and I had to teach myself bash scripting to do this, so I would greatly appreciate any improvements from real programmers.
Code: Select all
#!/bin/bash
filename="$1"
if [[ -z $filename ]]; then
echo "Usage: imageExtraction <PDF-File>"
exit
elif [[ ! -r $filename ]]; then
echo "$filename is not a readable file."
exit
fi
pdf="${filename%.*}"
pdf="${pdf//[[:space:]]/}"
folder="$pdf".tmp
mkdir "$folder"
if [ $? != 0 ]; then
echo "Unable to create directory "$folder""
exit
fi
log="$pdf".pdfimages.log
pdfimages -list "$filename" > "$folder"/"$log"
pdfimagesExit=$?
case "$?" in
0)
;;
1)
echo "pdfimages is unable to open "$filename""
exit
;;
2)
echo "pdfimages is unable to open an output file."
exit
;;
3)
echo "pdfimages does not have PDF permissions."
exit
;;
*)
echo "Unknown error in pdfimages."
exit
;;
esac
cd "$folder"
echo "Extracting images from "$filename"..."
pdfimages -all -p ../"$filename" "$pdf"
case $? in
0)
echo "Extraction complete."
;;
*)
echo "Unknown error in pdfimages."
cd ..
exit
;;
esac
tail -n +3 "$log" > "$log".tmp
mv "$log".tmp "$log"
lastNum=''
lastPage=''
lastType=''
curFile=''
lastFile=''
while read page num type remainder; do
if [[ $type == smask ]] || [[ $type == mask ]]; then
#The numbers in pdfimages -list output are not fixed width, while the actual file output is.
printf -v num "%03d" $num
printf -v page "%03d" $page
printf -v lastNum "%03d" $lastNum
printf -v lastPage "%03d" $lastPage
curFile=$(echo "$pdf"-"$page"-"$num".*)
lastFile=$(echo "$pdf"-"$lastPage"-"$lastNum".*)
newFile="${lastFile%.*}".png
echo "Masking "$lastFile" with "$curFile""
composite "$curFile" "$lastFile" -compose copy-opacity "$newFile"
echo "Deleting "$curFile""
echo "Deleting "$lastFile""
rm "$curFile" "$lastFile"
fi
lastNum=$num
lastPage=$page
lastType=$type
done < "$log"
NOTE: The script will only extract masking layers that are present in the PDF. It can not auto-generate masking layers for images in PDFs that don't have them (scans).
I have no problem with Paizo PDFs, but on trying a Wizards of the Coast 5E pdf, it threw errors because they use JPEG2000 (*.jp2). The most current version of Imagemagick can decode .jp2 files, but not the one in the Ubuntu repository.