Image analysis has been around for a while now, but it hasn't been made
available to the general public in an easy-to-use way. Further, the concept
has been hidden behind a pricey tag handed out by forensics companies.
We wanted to change this so that you all could have access!
Optimizations welcome!
Our algorithm (outlined below) calculates the most significant digits in a Discrete Cosine Transformed (DCT) array of chunks created from a JPEG image, then uses the Chi Squared test to check how closely the Most Significant Digits (MSD) of the results align with the weights as they are distributed in Benford's Law.
We love boxes as much as the next person, so here's some to help paint a better picture of what's going on. Our algorithm consists of roughly 6 chunks, each chunk performing a different function. Check it out.
All images are represented as an array of pixels. In this step, convert the JPEG image into an array of integers, each ranging from 0-255.
Not all images are square, which means that their height and width are not equal. In this step, we add additional values to the array of bytes so that it has an natural square root.
Analysis has shown that chunking the array into 8x8 squares is better for the transformation. Here, we continue to take 64 values at a time from the original array and place them into 8x8 matrices. We pad the remaining values at the end to ensure that every matrix is 8x8.
Once we have the chunks, we run them through a discrete cosine transformation.
Each chunk, now transformed, still consists of an 8x8 matrix of values. Here, we grab the most significant digit from each value, and store those in an array. We then aggregate the digits to get cumulative counts per digit (0-9). After that, we calculate the weights by diving each cumulative count by the total number of digit counts.
Lastly, we apply the chi squared test to determine the similarity of the weights of the images when compared to the distribution in Benford's Law.
We have gathered much inspiration from various sources, and we'll
list them here, because they deserve all the credit.
The 'Digits' Episode of Connected on Netflix
Benford's Law on Wikipedia
Benford's Law In Image Processing
A generalized Benford’s law for JPEG coefficients and its applications in image forensics
Detecting Doctored JPEG Images Via DCT Coefficient Analysis
bellbind's JPEG algorithms
This project couldn't have been done without the help of others. And it can't
get better without your help.
You can reach out to us at email.