Implementation of a Quantum Random Forest

This article will guide you through an exploration of a Quantum Random Forest (QRF) adapting information from papers, GitHub, and proprietary adjustments.

Jan 21, 2024

The paper "A kernel-based quantum random forest for improved classification" by Srikumar et al., presents a Quantum Random Forest (QRF) model. This model extends the linear quantum support vector machine (QSVM) by including a kernel function via quantum kernel estimation (QKE), forming a decision tree classifier. The QRF aims to address the limitations of previous quantum models. Key aspects include developing a decision tree structure with QSVM nodes, incorporating a low-rank Nyström approximation to mitigate overfitting, and theoretical guarantees to limit finite sampling errors. The QRF shows improved performance over QSVMs, especially in multi-class classification problems, and requires fewer kernel estimations.

The conclusion makes a point of saying that QRF is not linear like QNN and QSVM. Instead, it is non-linear and works better with datasets where quantum embedding does not perfectly separate instances. QRF's probabilistic output is beneficial for multi-class problems as well. The paper also suggests potential enhancements and acknowledges the need for further exploration of hyperparameters and quantum split function optimization.

Fortunately, the authors shared the code on GitHub for our tests and explorations. The repository can be found here.

In this article, we will follow this repository code step by step, testing at the end with the UCI Credit Card dataset. It is crucial to download all the *.py modules available in the repository so you don’t have issues testing the “example” Jupyter Notebook. The simplest way to get the whole repository is by clicking “<> Code” and “Download ZIP”. At the same time, remember that the libraries must be installed with the same version shared by the authors:

- cirq==0.11.0
- cirq-core==0.11.0
- matplotlib==3.4.2
- more-itertools==8.8.0
- numpy==1.19.5
- pandas==1.3.0
- qiskit==0.27.0
- scikit-learn==0.24.2
- scipy==1.7.0
- tqdm==4.61.1
- tensorflow==2.4.1

Setup your environment

from quantum_random_forest import QuantumRandomForest, set_multiprocessing
from split_function import SplitCriterion
from data_construction import data_preprocessing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics, datasets
from sklearn.model_selection import train_test_split

Load and adapt your dataset

Remember to adapt your datasets during the preprocessing phase. You can follow the instructions shared in this article:

Code Tutorials

Dimensionality Reduction for Quantum Machine Learning: Integrating LDA and K-Means

Javier Mancilla

January 5, 2024

Dimensionality Reduction for Quantum Machine Learning: Integrating LDA and K-Means

In Quantum Machine Learning (QML), the preparation of data is a crucial step that often determines the effectiveness and efficiency of the algorithms used. One of the key challenges in this process is dimensionality reduction, which involves transforming high-dimensional data into a lower-dimensional space. This is particularly important in quantum comp…

Read full story

It is important to generate a training_set and testing_set for the further stages of the code. The example from the authors already includes a data_preprocessing function, but I suggest editing that and creating your own transformations to play with some variations until you eventually have better results. If you made your own adjustments for the preprocessing phase, consider the following change:.

Remove this:

training_set, testing_set = data_preprocessing(X, y, 
                                               train_prop=0.75,
                                               X_dim=2)

And add this:

training_set = pd.DataFrame(zip(X_train, y_train), columns=['X', 'y'])
test_set = pd.DataFrame(zip(X_test, y_test), columns=['X', 'y'])

Setting Model Parameters

n_qubits = 2                                         
dt_type = 'qke'                                     
ensemble_var = None                                 
branch_var = ['eff_anz_pqc_arch', 
              'iqp_anz_pqc_arch', 
              'eff_anz_pqc_arch']                   
num_trees = 3                           
split_num = 2                                      
pqc_sample_num = 2024                               
num_classes = 2                           
max_depth = 4                                       
num_params_split = n_qubits*(n_qubits +1)            
num_rand_gen = 1                                   
num_rand_meas_q = n_qubits                         
svm_num_train = 5                                   
svm_c = 20                                         
min_samples_split = svm_num_train                    
embedding_type = ['as_params_all', 
                  'as_params_iqp', 
                  'as_params_all']                   
criterion = SplitCriterion.init_info_gain('clas')    
device = 'cirq'

Purpose: To establish the necessary parameters for building and training the Quantum Random Forest model.
Parameters Details:
- n_qubits: The number of qubits used for quantum embedding.
- ensemble_var, dt_type, branch_var: Various settings for the ensemble and decision tree types, including the anzatz types for different levels of the tree.
- num_trees: The number of trees in the ensemble.
- split_num, pqc_sample_num, svm_num_train, svm_c: Parameters related to quantum circuit samples, SVM landmark number, and SVM optimization.
- max_depth, num_params_split, embedding_type: Settings for maximum tree depth, number of parameters in embedding, and the type of quantum embedding.
- criterion, device: The criterion for splitting nodes and the quantum computing device or simulator to use.

Model setup

qrf = QuantumRandomForest(n_qubits, 'clas', num_trees, criterion, max_depth=max_depth, min_samples_split=min_samples_split, tree_split_num=split_num, num_rand_meas_q=num_rand_meas_q, ensemble_var=ensemble_var, dt_type=dt_type, num_classes=num_classes, ensemble_vote_type='ave', num_params_split=num_params_split, num_rand_gen=num_rand_gen, pqc_sample_num=pqc_sample_num, embed=embedding_type, branch_var=branch_var, svm_num_train=svm_num_train, svm_c=svm_c, nystrom_approx=True, device=device)

Purpose: To initialize the Quantum Random Forest model with the specified parameters.
How It Works:
- An instance of QuantumRandomForest is created with the defined parameters. This includes the number of qubits, tree structure, embedding types, SVM settings, and the device for quantum computation.

Training the Model

cores = 6
set_multiprocessing(True, cores)
qrf.train(training_set, partition_sample_size=100)

Purpose: To train the Quantum Random Forest model on the training dataset.
How It Works:
- Enables multiprocessing with the specified number of cores to parallelize the training process.
- The train method of the QRF model is called with the training data. partition_sample_size indicates the size of data each tree in the ensemble receives.

Testing the Model

acc, preds_qrf = qrf.test(testing_set, ret_pred=True, parallel=False, calc_tree_corr=True)

Purpose: To test the QRF model on the testing dataset and evaluate its performance.
How It Works:
- The test method of the QRF model evaluates the model on the testing set.
- It returns the accuracy and the predictions. It also calculates the correlation between trees if calc_tree_corr is True.

Analyzing the Model

print(metrics.classification_report(testing_set.y, preds_qrf))
print(metrics.roc_auc_score(testing_set.y, preds_qrf))

Purpose: To provide a detailed classification report and calculate the AUC (Area Under the Curve) score.
How It Works:
- Uses Scikit-Learn's metrics to print the classification report and compute the AUC score, offering insights into the model's performance.

Exploration with UCI dataset

After doing dimensionality reduction using LDA and K-means, a test was executed with the following parameters adjusted:

sample = 500
test_size = 0.3
n_qubits = 2
svm_c = 50
svm_num_train = 5
partition_sample_size = 100

And the results were the following:

Classification report for QRF:
              precision    recall  f1-score   support

           0       0.80      0.90      0.85       114
           1       0.50      0.31      0.38        36

    accuracy                           0.76       150
   macro avg       0.65      0.60      0.62       150
weighted avg       0.73      0.76      0.74       150

AUC for QRF:
0.60453

The result is clearly not a satisfactory one, but this can definitely depend on the parameters you want to use at different stages of the code and also on the dataset you want to play with.

Quantum Fighter

Dimensionality Reduction for Quantum Machine Learning: Integrating LDA and K-Means

Discussion about this post