Hello there, I wanted to share a project i have completed during my internship in Barcelona. After I completed the project I also write a blog post about technical aspects and you can find the original text here:

Machine Learning on Mobile Platforms

If you are already familiar with machine learning, computer vision and transfer learning you can skip intro

Intro
- Frameworks and Other Stuff
Comparing Models and Training
- Updating and Training Models
  - Loading Data
  - Overcoming Memory Limit
- Comparing Models
- Testing Final Model
Compressing Model
- Quantization
Implementing For Mobile
Further Developments
- Full Quantization
- Quantization Aware training
- YOLO
Conclusion
Tips and Tricks

Intro

In the recent years machine learning started to get in our daily lifes in many fields. Machine learning algorithms works with a simple principle, calculate a statistical model for a given dataset that gives the highest accuracy on another dataset that is different from the first one. Of course this is a really simpplified definition and you can search online for more info since that subject is out of scope for this post. One of the fields that benefited the most from machine learning is Computer Vision. This field deals with developing methods for computers to have high level understanding of digital images and videos.

Many machine learning models have been developed that can perform different tasks on digital images and videos. For example there is YOLO which can detect object types and location in the picture in real time, also there is InceptionV3 which is a model for classifying images. InceptionV3 and couple of other models are used in transfer learning to speed up model training phase. Transfer learning is the process of using a pre trained network’s certain layers with their weights in a new model. It is done to be able to train the final network faster and with more accuracy.

In this post we are focusing only on classification task in computer vision. Classification is to try to determine which class is a given picture is. In this post we are trying to classify a dog picture according to its breed. And after we sucessfully classify the image, we are going to implement the model in a mobile application. And running the model on a mobile device brings some constraints we need to be carefull about.

Frameworks and Other Stuff

In this project using tensorflow is a must since tensorflow lite allows running trained tensorflow models for evaluation on iOS and Android. And like most of the machine learning models, all of our models use Python. Besides that, all of the models used in this project except fenglf model, has a dependency on Keras. Besides that Pillow library, h5py, scipy, numpy and pandas libraries was used for various tasks such as loading images from a file, creating train and test datasets, saving calculated values and so on.

Comparing Models and Training

First step of this project is building or finding a machine learning model that can run with high accuracy on unseen data. Since building a model from scratch and training is a costly and time consuming operation, already existing models used for this project. Every model used is designed for and trained on Stanford Dog Database or kaggle dog breed competition which gets the data from Stanford dog database. Couple of models was used in this project. You can find more information in their repos:

Stormy-ua Model
- Transfer Learning
fenglf Models
- Single_model_finetune
- gap_train
- pair_train
andre15 Models
- CNN
- Transfer Learning

Updating and Training Models

To be able to run the models we need to update couple of dependicies, fix some code that broken because of framework and python updates and more… You know tedious things that I dont want to bore you with. However after fixing everything up and finally being able to run the models with python, keras and tensorflow, there is a bottleneck to deal with: processing power. To overcome this problem we can use google colab. And after a while using a personal computer with a gpu seemed more practical since we can save the data without being have to download it. But soon it turn out that the real bottleneck was not the CPU or GPU, and it was in the memory.

Loading Data

Since models are written based on different file structures, we need to implement a system for getting the files from the dataset. Stanford dog dataset has all the images in a folder, grouped by dog breeds. We can parse the folders in the root image folder to get images and their labels:

import os
from tqdm import tqdm

def get_files(path = '/Images'):
    filenames = []
    labels = []
    label_names = []
    breeds = os.listdir(path)
    for i,breed in tqdm(enumerate(breeds)):
        #Get breed info from folder name
        breed_path = os.path.join(path,breed)
        pics = os.listdir(breed_path)
        filenames += pics
        labels+= [i]*len(pics)
        label_names.append(breed)
    return filenames,labels,label_names

After that we can seperate the data to train and test.

from sklearn.model_selection import train_test_split

f,l,n = get_files()
f_train, f_test, l_train, l_test = train_test_split(f, l, test_size=0.1, shuffle = True, stratify=l)

Don’t forget that those just filenames and not the data itself. Since dataset is too big it creates memory issues to load them all at once.

Overcoming Memory Limit

To overcome to memory limit, there is two possible options, using a h5 file or if the model is using keras, using ImageDataGenerator. To use a h5 file, we need to load images one by one and save them to the same h5 file. Which is a really simple implemantation and we can easily use to data just like its in the memory.

import h5py

hdf5_store = h5py.File(os.path.join(os.getcwd(),'cache_train.hdf5'), 'a')
data_train = hdf5_store.create_dataset('data', (len(f_train),DIM,DIM,3), compression='gzip')
hdf5_store_test = h5py.File(os.path.join(os.getcwd(),'cache_test.hdf5'), 'a')
data_test = hdf5_store_test.create_dataset('data', (len(f_test),DIM,DIM,3), compression='gzip')

Now we can use data_train and data_test as its in memory. However it is not as fast as keras’s ImageDataGenerator method. Since keras applies pipelining while running the model it is processing much more faster then the h5 method. But it requires the model to be written with keras. Also if you dont want to write a custom class, it requires you to have spesific folder structure:

-Images
 ├train
 | ├0
 | | ├image1.jpg
 | | :
 | | ├magex.jpg
 | ├1
 | | ├image1.jpg
 | | :
 | | └imagex.jpg
 | :
 | |  
 | └120
 |   ├image1.jpg
 |   :
 |   └imagex.jpg
 └validation
   ├0
   | ├image1.jpg
   | :
   | ├imagex.jpg
   ├1
   | ├image1.jpg
   | :
   | └-imagex.jpg
   :
   |
   └120
     ├image1.jpg
     :
     └imagex.jpg

Keep in mind that numbers between 0 and 120 represents dog breeds. And used as labels in data. And image names and path names are ignored as long as they are in the folder with label names. To create this folder structure:

def saveForGenerator(files,labels,names,path):
    path_to_save = os.path.join(os.getcwd(),'Images')
    for i, filename in tqdm(enumerate(files)):
        if not os.path.exists(os.path.join(path_to_save,path,str(labels[i]))):
            os.makedirs(os.path.join(path_to_save,path,str(labels[i])))
        copyfile(os.path.join(defaultPath,folderNames[labels[i]],filename), os.path.join(path_to_save,path,str(labels[i]),filename))

saveForGenerator(f_train, l_train, n, 'train')
saveForGenerator(f_test, l_test, n, 'validation')

After the folder structure created we can load the path into the generator and use it as our dataset:

from keras.preprocessing.image import *
import os

train_data_dir=os.path.join(os.getcwd(),'Images','train') #Train data path
test_data_dir=os.path.join(os.getcwd(),'Images','validation') #Test data path
img_size=299
batch_size=32

train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_size, img_size),
    batch_size=batch_size,
    class_mode='categorical')

test_datagen = ImageDataGenerator()
test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_size, img_size),
    batch_size=batch_size,
    class_mode='categorical')

Now we can use the generator as

history_ft = model.fit_generator(
    train_generator,
    steps_per_epoch=len(train_generator.filenames) // batch_size,
    epochs=5,
    validation_data=validation_generator,
    validation_steps=len(test_generator.filenames) // batch_size)

fenlgf models are already using generator, however we still need to update paths in the code and create the folder structure for it to be usefull. andre15 models doesnt use it so we need to modify it to be able to use it. After constructing generators changing only model.fit to model.fit_generator function is enough keras automaticly does everything else. However Stormy-ua Model uses a similar approach to h5 file approach and uses a tfrecords file to write all the images before training the model.

Comparing Models

After successfully running models even thought this models shows over %90 accuracy for training data I couldn’t get more then 20% accuracy in validation data during training epochs for fenglf Models and andre15 Models’ CNN. I believe these models are overfitting the data or I have changed something that caused them to overfit data. Either way these models were not really usefull in this state. I couldnt get to run andre15 Models’ Transfer Learning method in a reasonable time limit. After this devestating result I decided to stop using these models and moved on to stormy-ua’s model. After updating the code a little bit I managed to get %89 accuracy on unseed data. It also has a benefit of being implemented in pure tensorflow. This eliminates the need to transform keras model into pure tensorflow model. After freezing the model and getting frozen model as a tensorflow (.pb) file, it is almost ready to deploy for mobile platforms. You can also use the docker container on the stormy-ua’s reposity to get pre-trained pb file.

Testing Final Model

After selecting the model we are going to use, we need to test it on another unseen database once more to make sure of the results. However we needed to create another dataset to use since there is no other dataset avaliable then Stanford dog dataset. To do that, a python library called google_images_download have been used. This library downloads google image search result’s images one by one. Breed names we already have used as search queries to download 50 pictures for each breed. After that, each image reviewed manually to make sure its fits in. Couple of breed name queries was completely wrong and the search results was completely different. Wrong queries fixed by adding the keyword dog, and those queries downloaded and reviewed again. And after completing this part we got a completely unsed database of dog breed picture. Of course it is hard to describe this dataset as a ‘clean’ dataset but its good enough for testing.

import pandas as pd
from google_images_download import google_images_download

breeds_path = os.path.join(os.getcwd(),'dog-breeds-classification','data','breeds.csv')
IMAGE_COUNT = 50
data = pd.read_csv(breeds_path)
response = google_images_download.googleimagesdownload()   #class instantiation
arguments = {'keywords':','.join(data['breed']).replace('-',' ').replace('_',' '),'limit':IMAGE_COUNT,'silent_mode':True}
paths = response.download(arguments)

After creating a new dataset, it is time to use the tensorflow file to determine our categorical accuracy. This accuracy calculated by checking if the index of maximum value in the output layer is equal to the label. Output layers gives 120 different values, each one of them corresponding to the probability of each class. Total of those values are equal to one.

import tensorflow as tf
import numpy as np
from tqdm import tqdm
import os
import pandas as pd
from sklearn import preprocessing
from PIL import Image

main_path = os.getcwd()
path_to_pb = os.path.join(main_path,'dog-breeds-classification','saved_model.pb')
breed_path = os.path.join(main_path,'dog-breeds-classification','data','breeds.csv')

#Taken from dog-breeds-classification/src/data_preparation/dataset.py
def one_hot_label_encoder():
    train_Y_orig = pd.read_csv(breed_path, dtype={'breed': np.str})
    lb = preprocessing.LabelBinarizer()
    lb.fit(train_Y_orig['breed'])

    def encode(labels):
        return np.asarray(lb.transform(labels), dtype=np.float32)

    def decode(one_hots):
        return np.asarray(lb.inverse_transform(one_hots), dtype=np.str)

    return encode, decode

encode, decode = one_hot_label_encoder()
names = decode(np.identity(120)).reshape(-1)
breeds = decode(np.identity(120)).reshape(-1)

files,labels,names_dataset= get_files(os.path.join(main_path,'download')) #From previous code definations

names = [n.lower().replace('-',' ').replace('_',' ') for n in names]
label_names = names_dataset.copy()
label_names = [n.lower() for n in label_names]

# Test Images
true = 0
in_first_3 = 0
in_first_5 = 0
with tf.gfile.GFile(path_to_pb, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    tf.import_graph_def(graph_def, name='')
    g = tf.get_default_graph()
    #Tensor names are predefined by inceptionV3 and stormy-ua's model
    input = g.get_tensor_by_name('Cast:0')
    output = g.get_tensor_by_name('output_node:0')
    with tf.Session().as_default() as sess:
        for i in tqdm(range(0,len(files))):# f is from data_saver.py
            img_raw = np.asarray(Image.open(os.path.join(main_path,'downloads',n[l[i]],files[i])).convert('RGB'))
            probs = sess.run(output, feed_dict={input: img_raw})
            df = pd.DataFrame(data={'prob': probs.reshape(-1), 'breed': breeds})
            sorted = df.sort_values(['prob'], ascending=False)
            sorted = np.array(sorted.index)
            if names[sorted[0]]==label_names[l[i]]: true+=1
            indcs = sorted[0:3]
            if label_names[l[i]] in [ names[k] for k in indcs]: in_first_3 +=1
            indcs = sorted[0:5]
            if label_names[l[i]] in [ names[k] for k in indcs]: in_first_5 +=1
print(true/len(files))
print(in_first_3/len(files))
print(in_first_5/len(files))

After testing the pb file on an unseen dataset we have %84 accuracy, as the label guessed correctly, and %95 accuract as label is in first 3 guesses and %97 accuract as label is in first 5 guesses. The model seems to be doing a good job, however the size of the file is 96.3 MB, which is too big for a mobile platform. We need to shrink it and test it again to see how much accuracy we are losing and how much we can shrink it.

Compressing Model

Compressing the model seems like a hard job, but thanks to the amazing features of tensorflow and tensorflow lite we can shrink the models a lot without losing a lot of accuracy. There is couple of methods we can implement to compress the method. We can re-design the model with less layers and nodes which would take a lot of time and doesnt guarantee enough accuracy. Or we can quantize the model.

Quantization

Quantizing a model is using integer values instead of float values for the weights. This method shrinks down the model and speeds up running the model. It is done by using maximum and minumum values in nodes, and mapping float numbers to integer numbers. You can think it like instead of multiplying by 0.00003, we are multpyling by 3 and later on, in the final layer we are multiplying by 0.00001. Therefore all the float values becomes integers and the difference in the weights are calculated in the final layer with significantly less float values. Of course this is a huge over simplification but we need signal and systems to fully explain whats going on, which is reallt out of of scope for this post.

There is two ways we can quantize a given model, during training and after training. To quantize a model before training we need to add simulated nodes to the model that simulates the quantization during training then remove those nodes and quantize the model after training so we can have a more accurute model. This is called Quantization-aware training. However this method is not really stable and requires modyfing and re-training the model. Another method to quantizate a model is using Post-training quantization. This method implemented in tensorflow lite and it is using given model file to create a new model file specific to tensorflow lite platform. Which helps us a lot since we want to use tensorflow lite on mobile platforms. Keep in mind this method requires defined shapes for input and output layers. And since we have JPEG decode layers with undefined shape, we can skip this layers and use the first layer with a shape as our input layer. Tensorflow lite doesnt support JPEG decode layers anyway and we will be working with pure data on mobile platforms not JPEG encoded files.

import tensorflow as tf

localpb = os.path.join(os.getcwd(),'dog-breeds-classification','saved_model.pb')
tflite_file = 'model.lite'

converter = tf.lite.TFLiteConverter.from_frozen_graph(
    localpb,
    ['Sub'], #Input layer after the JPEG decode layers. This is the first layer that has a shape
    ['output_node'] #Output layer
)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_model = converter.convert()
open(tflite_file,'wb').write(tflite_model)

This method shrunk our file from 96.3 MB to 30.7 MB. Which is a great improvement but we need to see how it performs in accuracy. Lucky for us tensorflow provides a method to run tensorflow lite files (usually .lite) on our computers.

#Assuming we already have names, breeds, filenames, labels, label_names from previously on this post
import tensorflow as tf

# Load TFLite model and allocate tensors.
path = os.path.join(os.getcwd(),'model.lite')
interpreter = tf.lite.Interpreter(model_path=path)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']

true = 0
in_first_3 = 0
in_first_5 = 0

label_names = names.copy()
label_names = [n.lower() for n in label_names]
names = [n.replace('-',' ').replace('_',' ') for n in names]

for i in tqdm(range(0,len(files))):
    img_raw = np.array([np.array(Image.open(os.path.join(main_path,'downloads',n[l[i]],files[i])).convert('RGB').resize(input_shape[1:3])).astype(np.float32)])
    interpreter.set_tensor(input_details[0]['index'], img_raw)
    interpreter.invoke()
    probs = interpreter.get_tensor(output_details[0]['index'])
    sorted = np.argsort(probs[:,0])[::-1]
    if names[sorted[0]]==label_names[l[i]]: true+=1
    indcs = sorted[0:3]
    if label_names[l[i]] in [ names[k] for k in indcs]: in_first_3 +=1
    indcs = sorted[0:5]
    if label_names[l[i]] in [ names[k] for k in indcs]: in_first_5 +=1
print(true/len(files))
print(in_first_3/len(files))
print(in_first_5/len(files))

And after running lite model on an unseen dataset we have %79 accuracy, as the label guessed correctly, and %93 label was in first 3 guesses and %95 label was in first 5 guesses. Which is pretty good for a compressed model. Now we can use this model in our application. To further improve the results we can try to use fully quantization with representative dataset later on.

Implementing for mobile

To implement this model we are going to need tensorflow lite for android and iOS, and some functions that we can find online. After adding the lite file to Assets folder for Android and in our project for iOS we can implement the code. For implementing the interpreter on Android you can check here and here. And for implementing on iOS you can check here. To convert UIImage to CVPixelBuffer you can take a look here. You can also convert UIImage to Data without using CVPixelBuffer, however this method can cause crashes since CGBitmapInfp can not support both byteOrder32Big and byteOrder16Big.

Further Developments

Full Quantization

Tensorflow enables us to use a represantative dataset to improve our quantization to full extend. Which means, it uses the represantative dataset to create a better mapping from float values to integer values and quantifies every sinle value instead of quantifying some values, and leaving some values as floats. To enable full quantification we used %10 of the dataset, and tried to create a tensorflow lite file again.

tflite_quant_model='model_fully_quantified.lite'
# Assuming we have get_files() function from previous
files,l,n = get_files()
# get %10 of the data, we are only using f_test here since we only need %10 percent
from sklearn.model_selection import train_test_split
f_train, f_test, l_train, l_test = train_test_split(files, l, test_size=0.1, shuffle = True, stratify=l)

#a function to get represantative data, its like a generator and used by tensorflow
num_calibration_steps = len(f_test)
from PIL import Image
def representative_dataset_gen():
    for i in range(num_calibration_steps):
        yield [np.array([(np.array(Image.open(os.path.join(defaultPath,n[l_test[i]],f_test[i])).convert("RGB").resize((299,299))).astype(np.float32))])]

converter = tf.lite.TFLiteConverter.from_frozen_graph(
    saved_model_dir,
    ['Sub'],
    ['output_node']
)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
open(tflite_file,'wb').write(tflite_quant_model)

However using this method with the final model gives an error which I couldn’t find a solution. It is possible that there is an operation that is not supported by quantization in tensorflow. Therefore we couldn’t use this method.

Quantization Aware training

To be able to train our model with quantization aware training we need to add quantization nodes that simulates quantization during training. Lucky for us tensorflow already provides us an easy way to do this. Only adding a line of code before training is enough.

tf.contrib.quantize.create_training_graph(input_graph=g)

However we need to save the model in a different way to use while creating an evaluation graph.

tf.contrib.quantize.create_eval_graph(input_graph=g)
with open(paths.EVAL_GRAPH_FILE, "w") as f:
    f.write(str(g.as_graph_def()))
    saver = tf.train.Saver()
    saver.save(sess, paths.EVAL_CHECKPOINT)

Then we can use the saved file with the tensorflow file representing quantization simulated model to create a fully quantified graph. For more info you can check here. Unfortunetly, final model is using transfer learning and it is not really compatible with Inception V3 quantized model. And tensorflow doesn’t support a quntified model in a way that can run with the method used in stormy-ua model. Or I couldn’t get to run it. Therefore we couldn’t use this method.

YOLO

Using YOLO model can help with determining the dog in the picture. And cropping the dog from the picture to get more accurute results. However, YOLO model is not exactly reliable, and has its own limitations.

These limitations include, not being able to find the dog in every picture if the threshold is not correctly set, not being able to find the dog if there is multiple dogs in the picture, and not being able to detect multiple dogs if they are really close to each other. Our current model also doesn’t support images with multiple dogs in it but it can still give the correct result if those dogs belongs to the same breeds. Also problem statement becomes invalid for the cases that there is multiple dogs with different breeds. Besides those setbacks, YOLO model adds another 10 MB. Also, more code needs to be added for running the YOLO model and cropping the image according to output which is really complex with multiple probabilities for the position and object class. Another issue that the model needs to be trained again after YOLO model applied to training set for accurate and better results. Considering all that, we opted out training and testing the model again with YOLO.

Another method could be that, ditching the model currently used, and training only the YOLO model with extra dog breeds. However this method is not expected to give good results since it would add extra 120 classes to already existing classes on YOLO model and it would require to define the bounding boxes for each image in the picture.

To implement the YOLO with tensorflow lite, we can implement YOLO with tensorflow first and then use the quantization methods discussed before. For more info on implementing YOLO with the tensorflow check here.

Conclusion

This project was a great challange to implement an optimum method. Thanks to amazing engineers of Tensorflow, I managed to overcome this challange. After creating the mobile applications in a simple way, the project was ready to ship with an offline model which can classify an image in approximately 1 second, and has a size of 70 MB. There is extra data for offline images, and application’s code that adds up on top of 30 MB machine learning model.

Tips and Tricks

Updating multiple machine learning models to run on current frameworks and libraries was a little bit hard but it is doable. The trick is reading the documents on the faulty function, and checking it in which context it was used. After understanding faulty functions position, it is possible to modify it according to documentation. But sometimes nothing solves your problem, then just downgrading the framework can help. Another challange in this project is to overcome the memory issue, even thought some implemented models has been dealt with it, sometimes its needed to search for a solution. And if the solution can not be found online, check the source code of already implemented models. Those codes can teach a lot. And if a new dataset is needed, a parser and a search engine can do a lot of work instead of manual labor. But you still need to check every picture quickly to make sure the dataset is okay. But it is definetly much more faster than creating a manuel dataset.

« Pipeline Application for Speed Using Youtube for Culture Development »

Yunus Güngör