This is a simple CNN (convolutional neural network) approach to identify whether a spider in a top-down cropped photo is male or female. I wrote this with behavioral video tracking in mind, where a fast and reliable way to id the sex of ‘blobs’ within video frames can be helpful (for instance when a tracking program switches the identity of a subject after a close interaction). The approach is mostly based on the excellent “Deep Learning in R” by F Chollet and JJ Allaire. Currently working with my model species Habronattus pyrrithrix at >96% accuracy, can be further pushed by retraining the model with a larger batch of training images, adapted to a different species, or extended to work across species with a sufficiently large training image set.

Setting up a new image classification CNN from scratch

Load required packages.

library(keras)
library(stringr)
library(dplyr)
library(ggplot2)
theme_set(theme_bw())

Split images into training, validation, and test sets

Images from both sexes should be in a single folder, with file names like e.g. “male.27.png”. This chunk will create a folder structure that can be fed to the network later.

original_dataset_dir <- "hapy"
base_dir <- "hapy_small"
dir.create(base_dir)
train_dir <- file.path(base_dir, "train")
dir.create(train_dir)
validation_dir <- file.path(base_dir, "validation")
dir.create(validation_dir)
test_dir <- file.path(base_dir, "test")
dir.create(test_dir)
train_female_dir <- file.path(train_dir, "female")
dir.create(train_female_dir)
train_male_dir <- file.path(train_dir, "male")
dir.create(train_male_dir)
validation_female_dir <- file.path(validation_dir, "female")
dir.create(validation_female_dir)
validation_male_dir <- file.path(validation_dir, "male")
dir.create(validation_male_dir)
test_female_dir <- file.path(test_dir, "female")
dir.create(test_female_dir)
test_male_dir <- file.path(test_dir, "male")
dir.create(test_male_dir)

The original image set is now ready to be split and copied into the folder structure we just set up. We’ll do an 80/20 training/test split, with the training set also split 80/20 into train and validation images (so 64/16/20).

images <- list.files(original_dataset_dir)
n.female <- sum(str_detect(images, "female"))
n.male <- sum(!str_detect(images, "female"))

idx.f.train <- sample(1:n.female, n.female * 0.64)
idx.f.val <- sample((1:n.female)[-idx.f.train], n.female * 0.16)
idx.f.test <- sample((1:n.female)[-c(idx.f.train, idx.f.val)], n.female * 0.2)

idx.m.train <- sample(1:n.male, n.male * 0.64)
idx.m.val <- sample((1:n.male)[-idx.f.train], n.male * 0.16)
idx.m.test <- sample((1:n.male)[-c(idx.f.train, idx.f.val)], n.male * 0.2)

fnames <- paste0("female.", idx.f.train, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(train_female_dir))

fnames <- paste0("female.", idx.f.val, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(validation_female_dir))

fnames <- paste0("female.", idx.f.test, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(test_female_dir))

fnames <- paste0("male.", idx.m.train, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(train_male_dir))

fnames <- paste0("male.", idx.m.val, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(validation_male_dir))

fnames <- paste0("male.", idx.m.test, ".png")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(test_male_dir))

Initialize Neural Network

Time to set up the network (or model). This consists of a stack of convolution layers that filter the source images into increasingly abstract visual “modules”, pooling layers that downsample the resulting complexity, and a big dense layer (the actual “neural” part) followed by a classification layer that makes the final decision.

model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
                input_shape = c(150, 150, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_flatten() %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = 512, activation = "relu") %>%
  layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 1e-4),
  metrics = c("acc")
)

Data Augmentation

For this application I only used a very small number of training images. That means that characteristics of the photos themselves (white balance, subject orientation, etc.) can have a large undesirable impact on classification. In this step, we carry out random transforms on the training images to augment the dataset.

datagen <- image_data_generator(
  rescale = 1/255, 
  rotation_range = 40,
  width_shift_range = 0.2, 
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  channel_shift_range = 25,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

Plot some examples of randomly transformed images

fnames <- list.files(train_female_dir, full.names = TRUE)
img_path <- fnames[[3]]
img <- image_load(img_path, target_size = c(150, 150))
img_array <- image_to_array(img)
img_array <- array_reshape(img_array, c(1, 150, 150, 3))
augmentation_generator <- flow_images_from_data(
  img_array,
  generator = datagen,
  batch_size = 1
)
op <- par(mfrow = c(2, 2), pty = "s", mar = c(1, 0, 1, 0))
for (i in 1:4) {
  batch <- generator_next(augmentation_generator)
  plot(as.raster(batch[1,,,]))
}
par(op)