Appreciating Art through the Lens of Data Science
TLDR:
In part 3, Team Splatoon elaborated on the struggles of mimicking the human art perspective with color quantization algorithms. We also presented the first draft of an interactive time series visualization that will eventually be hosted on the Budget Collector website.
We eventually decided to use a Python package called Colorific because it appeared to address our concerns. In this blog, we will explain how we tuned Colorific for the Budget Collector Data set and showcase further development of our time series visualization.
Tuning Colorific
Recall from the last blog that Colorific has 6 tuning hyperparameters:
- N_QUANTIZED – Reduced palette size
- MIN_DISTANCE – The minimum distance to consider two colors different
- MIN_PROMINENCE – The minimum presence a color must occur in the image to be considered a part of the palette
- MIN_SATURATION – Saturation threshold
- MAX_COLORS – Maximum number of colors to include in palette
- BACKGROUND_PROMINENCE – Level of prominence indicating a background color
By tuning these 6 hyperparameters, we are teaching our algorithm to “see” art as a human would. However, before we can begin, we needed to develop “truth” data for the artwork in the Budget Collector dataset. The truth data will be determined by first using a clustering algorithm to reduce the total number of colors on the image and then follow up with a subjective human assessment to decide on the dominant color. This assessment can also be paired with art expert opinion to decide on the prominent colors.
Figure 1 is an example of this technique Giuseppe Maria Crespi’s Saint Paul (top) and on William James Glackens’ Race Day (bottom). The initial clustering algorithm outputs 15 different colors that we would select from to determine the dominant and secondary color of the image.
We will create an entire training data set consisting of several art pieces processed through this “truth” development technique. The training set can then be used to tune the hyperparameters listed above, using a grid search technique or other optimization algorithms. We will measure “correctness” of the chosen hyperparameters based on how closely the colorific algorithm matches the training data set. This can be accomplished by minimizing a distance metric in RGB space. This tuning process is further depicted in Figure 2.
By optimizing the colorific output, we expect to arrive at a reasonable assessment of each image’s prominent colors without requiring direct human observation of each image assessed.
If different set(s) of hyperparameters are needed for certain classes of artwork, versus others (e.g., Gothic style versus Impressionist), we will consider ways of providing custom hyperparameter settings based on our observations.
Time Series Visualization: Update
We have made major strides in our time series visualization since our last blog. Our landing page displays the main visualization, which is a scatter plot of the dominant colors with year as the x-axis (Figure 3Error! Reference source not found.). The size of each dot represents the prominence of the color in the image. The visual has other features such as hover-over details that reveal the associated image and further respective details such as name of the artwork, artist, and year. We also included drop down menus that allow users to filter the data set and visualize color trends based on region and time-period.
From Figure 3, we can see that the full data set has primarily browns and darker colors between the 17th and 18th century. Comparatively, the dominant colors in the 20th and 21st century are more vivid and colorful.
See below for further examples of how users can use our visualization’s filter functions to drill down into the region and style information, in order to dissect color trends over time. For example, Figure 4 shows the North American region tends to have relatively light and bright dominant colors. Alternatively, users can also filter the time-period for 19th century, like in Figure 5, which shows dominant colors are more generally dark browns and grays. Conveniently, users can also use both time and region filters simultaneously like in Figure 6. where we filtered for European art pieces in the 20th century.
Path Forward
For the next step of our visualization, we will continue to add more filtering options, as well as additional graphs to showcase our analytic findings. The visualization will eventually include any relevant findings from our color study.