This is a simple CNN (convolutional neural network) approach to identify whether a spider in a top-down cropped photo is male or female. I wrote this with behavioral video tracking in mind, where a fast and reliable way to id the sex of ‘blobs’ within video frames can be helpful (for instance when a tracking program switches the identity of a subject after a close interaction). The approach is mostly based on the excellent “Deep Learning in R” by F Chollet and JJ Allaire. Currently working with my model species Habronattus pyrrithrix at >96% accuracy, can be further pushed by retraining the model with a larger batch of training images, adapted to a different species, or extended to work across species with a sufficiently large training image set.
Load required packages.
library(keras)
library(stringr)
library(dplyr)
library(ggplot2)
theme_set(theme_bw())
Images from both sexes should be in a single folder, with file names like e.g. “male.27.png”. This chunk will create a folder structure that can be fed to the network later.
original_dataset_dir <- "hapy"
base_dir <- "hapy_small"
dir.create(base_dir)
train_dir <- file.path(base_dir, "train")
dir.create(train_dir)
validation_dir <- file.path(base_dir, "validation")
dir.create(validation_dir)
test_dir <- file.path(base_dir, "test")
dir.create(test_dir)
train_female_dir <- file.path(train_dir, "female")
dir.create(train_female_dir)
train_male_dir <- file.path(train_dir, "male")
dir.create(train_male_dir)
validation_female_dir <- file.path(validation_dir, "female")
dir.create(validation_female_dir)
validation_male_dir <- file.path(validation_dir, "male")
dir.create(validation_male_dir)
test_female_dir <- file.path(test_dir, "female")
dir.create(test_female_dir)
test_male_dir <- file.path(test_dir, "male")
dir.create(test_male_dir)
The original image set is now ready to be split and copied into the folder structure we just set up. We’ll do an 80/20 training/test split, with the training set also split 80/20 into train and validation images (so 64/16/20).
images <- list.files(original_dataset_dir)
n.female <- sum(str_detect(images, "female"))
n.male <- sum(!str_detect(images, "female"))
idx.f.train <- sample(1:n.female, n.female * 0.64)
idx.f.val <- sample((1:n.female)[-idx.f.train], n.female * 0.16)
idx.f.test <- sample((1:n.female)[-c(idx.f.train, idx.f.val)], n.female * 0.2)
idx.m.train <- sample(1:n.male, n.male * 0.64)
idx.m.val <- sample((1:n.male)[-idx.f.train], n.male * 0.16)
idx.m.test <- sample((1:n.male)[-c(idx.f.train, idx.f.val)], n.male * 0.2)
fnames <- paste0("female.", idx.f.train, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(train_female_dir))
fnames <- paste0("female.", idx.f.val, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(validation_female_dir))
fnames <- paste0("female.", idx.f.test, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(test_female_dir))
fnames <- paste0("male.", idx.m.train, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(train_male_dir))
fnames <- paste0("male.", idx.m.val, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(validation_male_dir))
fnames <- paste0("male.", idx.m.test, ".png")
file.copy(file.path(original_dataset_dir, fnames),
file.path(test_male_dir))
Time to set up the network (or model). This consists of a stack of convolution layers that filter the source images into increasingly abstract visual “modules”, pooling layers that downsample the resulting complexity, and a big dense layer (the actual “neural” part) followed by a classification layer that makes the final decision.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-4),
metrics = c("acc")
)
For this application I only used a very small number of training images. That means that characteristics of the photos themselves (white balance, subject orientation, etc.) can have a large undesirable impact on classification. In this step, we carry out random transforms on the training images to augment the dataset.
datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
channel_shift_range = 25,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
fnames <- list.files(train_female_dir, full.names = TRUE)
img_path <- fnames[[3]]
img <- image_load(img_path, target_size = c(150, 150))
img_array <- image_to_array(img)
img_array <- array_reshape(img_array, c(1, 150, 150, 3))
augmentation_generator <- flow_images_from_data(
img_array,
generator = datagen,
batch_size = 1
)
op <- par(mfrow = c(2, 2), pty = "s", mar = c(1, 0, 1, 0))
for (i in 1:4) {
batch <- generator_next(augmentation_generator)
plot(as.raster(batch[1,,,]))
}
par(op)