Appreciating Art through the Lens of Data Science

TLDR:

In part 3, Team Splatoon elaborated on the struggles of mimicking the human art perspective with color quantization algorithms. We also presented the first draft of an interactive time series visualization that will eventually be hosted on the Budget Collector website.

We eventually decided to use a Python package called Colorific because it appeared to address our concerns. In this blog, we will explain how we tuned Colorific for the Budget Collector Data set and showcase further development of our time series visualization.

Tuning Colorific

Recall from the last blog that Colorific has 6 tuning hyperparameters:

N_QUANTIZED – Reduced palette size
MIN_DISTANCE – The minimum distance to consider two colors different
MIN_PROMINENCE – The minimum presence a color must occur in the image to be considered a part of the palette
MIN_SATURATION – Saturation threshold
MAX_COLORS – Maximum number of colors to include in palette
BACKGROUND_PROMINENCE – Level of prominence indicating a background color

By tuning these 6 hyperparameters, we are teaching our algorithm to “see” art as a human would. However, before we can begin, we needed to develop “truth” data for the artwork in the Budget Collector dataset. The truth data will be determined by first using a clustering algorithm to reduce the total number of colors on the image and then follow up with a subjective human assessment to decide on the dominant color. This assessment can also be paired with art expert opinion to decide on the prominent colors.

Figure 1 is an example of this technique Giuseppe Maria Crespi’s Saint Paul (top) and on William James Glackens’ Race Day (bottom). The initial clustering algorithm outputs 15 different colors that we would select from to determine the dominant and secondary color of the image.

We will create an entire training data set consisting of several art pieces processed through this “truth” development technique. The training set can then be used to tune the hyperparameters listed above, using a grid search technique or other optimization algorithms. We will measure “correctness” of the chosen hyperparameters based on how closely the colorific algorithm matches the training data set. This can be accomplished by minimizing a distance metric in RGB space. This tuning process is further depicted in Figure 2.

By optimizing the colorific output, we expect to arrive at a reasonable assessment of each image’s prominent colors without requiring direct human observation of each image assessed.

If different set(s) of hyperparameters are needed for certain classes of artwork, versus others (e.g., Gothic style versus Impressionist), we will consider ways of providing custom hyperparameter settings based on our observations.

Time Series Visualization: Update

We have made major strides in our time series visualization since our last blog. Our landing page displays the main visualization, which is a scatter plot of the dominant colors with year as the x-axis (Figure 3Error! Reference source not found.). The size of each dot represents the prominence of the color in the image. The visual has other features such as hover-over details that reveal the associated image and further respective details such as name of the artwork, artist, and year. We also included drop down menus that allow users to filter the data set and visualize color trends based on region and time-period.

From Figure 3, we can see that the full data set has primarily browns and darker colors between the 17th and 18th century. Comparatively, the dominant colors in the 20th and 21st century are more vivid and colorful.

See below for further examples of how users can use our visualization’s filter functions to drill down into the region and style information, in order to dissect color trends over time. For example, Figure 4 shows the North American region tends to have relatively light and bright dominant colors. Alternatively, users can also filter the time-period for 19th century, like in Figure 5, which shows dominant colors are more generally dark browns and grays. Conveniently, users can also use both time and region filters simultaneously like in Figure 6. where we filtered for European art pieces in the 20th century.

Path Forward

For the next step of our visualization, we will continue to add more filtering options, as well as additional graphs to showcase our analytic findings. The visualization will eventually include any relevant findings from our color study.

To be continued…

Matthew Harvey

Aerospace Engineer

I’m Matthew, living a couple of hours south of Atlanta, Georgia. I am an aerospace engineer working in the defense industry. I have enjoyed the OMSA program so far, and am looking forward to leveraging the knowledge against real world problems to gain actionable predictions. I enjoy time with my family, working outdoors, and playing ultimate frisbee.

Michelle Koh

Software Developer

My name is Michelle and I’m based in San Diego. I’m a software developer at a fintech company with a background in Accounting and Economics. I started pursuing my Data Analytics degree back in 2020, and so far, it has been one of the best career decisions I have made! In my free time I enjoy being outdoors and surrounded by nature.

Cindy Tran

Optimization Analyst

My name is Cindy Tran and I’m from Houston, Texas. My background is in chemical engineering, and I am currently an optimization analyst for a petrochemical company. I chose to pursue a Masters in Analytics because I think the knowledge could be leveraged well in the petrochemical industry. I have been loving the program so far and how applicable it is to my current role. In my free time I love watching movies and working out.

Art Scene Disrupting the Market Through Inclusion

Appreciating Art through the Lens of Data Science