Making Art Sm’art’er
Team Members
Firstly, a brief introduction to the team working on the project.
Progress
Our group has explored two datasets during the past three weeks: first one from the Art Institute of Chicago and the second from WikiArt Emotions. For this progress update, we will describe our exploration phase with the datasets and common tools or concepts that can be useful for developing the project.
Data Pull
Art Institute of Chicago (AIOC): After downloading the full list of image file namesfrom the AIOC GitHub repository, we connected to the provided API to gather the actual images into a folder. We also collected the metadata into a CSV file with fields including year, artist, and place of origin. This resulted in a collection of close to 3000 works of art.
WikiArt Emotions: Like AIOC, this dataset came with the image URL as well as the additional research done by Saif M. Mohammad and Svetlana Kiritchenko that included the ratings of the art as well as the emotions invoked by the art. We downloaded the images and extracted the most common color used in the art works. This dataset has just over 4000 works in total.
Color Extraction
The AIOC dataset already has the main color extracted from the paintings. However, we ran a process to extract the colors again so that we could have a list of colors (primary color, secondary color etc.). For this we used the python libraries OpenCV and ColorSort on. The process ran on Google Colab for around 14 hours. The result was a list of 8 RGB colors extracted per painting sorted by luminance. Then, those 8 colors columns were converted to hexadecimal values and then appended to the original CSV metadata. These are some images from the AIOC dataset with the 8 colors extracted ranked from top to bottom:
We did not do a similar color extraction for the WikiArt dataset yet but we plan to
do this so we have consistant dataset.
Data Cleaning and Transformation
The following steps were taken in no particular order:
- Since we had many countries with little painting representation, we appended an additional column to group them into regions. We matched them using this table.
- We extracted the HSL (Hue Saturation Light) values for the primary and secondary painting colors and used hue for exploring patterns across years.
- We removed some paintings with no year or region data.
- We experimented with different year groupings: 5 years, 10 years, 25 years, and 100 years.
- Within year groups we experimented with taking average and median hue values for each group.
- With the WikiArt dataset, we had to get the artist nationality for the country of origin and region. This was a result of the artworks not having their own country of origin.
- We created a sentiment column for the emotions and summed the emotion percentage on if they were positive or negative emotions. The negative emotions have a negative weight. Neural emotions were not added since we gave them a weight of 0.
Exploratory Data Analysis (EDA)
- The distribution of the datasets when grouped by region is biased towards Europe and North America. The distribution of the first dataset is below.
- We collected some evidence of linear patterns when plotting Hue and Saturation values for the primary color against year
- Running K-Means (K=10) on Year vs Primary Color hue yields some opportunities for year grouping and shows more diversity on color hue on later centuries.
- Running a regression model on the art ratings by styles shows a decreasing art rating with newer art.
Art Visualization Exploration
From the provided art visualization examples provided by Budget Collector we are leaning towards picking elements from:
- For working with maps: Visual Earth
- For displaying painting samples on a timeline: Krautli’s Time Tool
Additionally, we are looking into using “p5.js” to make some art of our own using the data from our dataset.
Next Steps
Each group member will prepare a mockup of how the final visualization will look like. These mockups will serve two main purposes:
- Anticipate the structure of the dataset we need to generate to better select
the type of analysis we want to run on the data. - Assess the feasibility of the visualizations we can, selecting the tools we can
better use based on our current knowledge and some training we can get
within the practicum time limit.
We also will need to decide on the primary dataset that we will be using which might be facilitated from the decisions we are making on the final design we decide to go ahead with.