Tech Updates: Working on Classification algorithm

After a long time Today I got a chance to work on Classification in Python. I selected the world favorite Iris dataset.

One of my teams submitted a very interesting idea as part of a Hackathon and they needed help on Classification logic. I am not revealing the idea itself as that team might file for a patent.

The code below is what I wrote to demo how they can use classification to solve their problem.

I used the Iris dataset available at https://archive.ics.uci.edu/ml/datasets/Iris as a sample dataset.

To give a brief about this dataset, Iris is a flower and this dataset has details of about 150 such flowers samples.

Each sample (each row) in this dataset has the following 5 columns

sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class : Which can be any one of "Iris Setosa", "Iris Versicolour" or "Iris Virginica"

In the dataset there were some rows where "Versicolor" is used instead of "Versicolour" basically due to the US and British style of spelling the word "Colour".

Due to this I had to handle this in my code.

Now, coming to the classification logic itself, like any classification problem, I used this dataset to train my model. This model basically predicts the flower class given the 4 parameters [sepal length, sepal width, petal length, petal width].

Finally below is the code that did the classification

import pandas as pd 


def get_iris_data():

 url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
 col_names = ["sepal length", "sepal width", "petal length", "petal width", "classification"]

 iris = pd.read_csv(url, header=None, names=col_names)
 #print iris.head()
 return iris

def classify(training_features, training_classification, test_features):
 from sklearn.linear_model import SGDClassifier

 classifier = SGDClassifier(loss="hinge", penalty="l2")
 classifier.fit(training_features, training_classification)
 pred_class = classifier.predict([test_features])
 return pred_class

def print_class(ind):
 if ind == 1 :
  return "Iris-setosa"
  
 elif ind == 2:
  return "Iris-versicolour" 
  
 elif ind == 3:
  return "Iris-virginica"

def get_numeric_classification(iris_classification):
 
 trans_classification = []
 for iclass in iris_classification:
  
  if iclass == "Iris-setosa":
   trans_classification.append(1)
  elif (iclass == "Iris-versicolour" or iclass == "Iris-versicolor"):
   trans_classification.append(2)
  elif iclass == "Iris-virginica":
   trans_classification.append(3)

 return trans_classification


# Get the Iris Dataset
iris = get_iris_data()

# Except for "classfication" we are considering all other columns to build our model.
train_columns = ["sepal length", "sepal width", "petal length","petal width"];

# Seperate out the features that we are using to build our model from the main dataset
iris_features = iris[train_columns]

# Separate out the classification column that we will be using to train our model
iris_classification = iris.classification

# Since the classification algo can not take strings, we are transforming our classification into integers
iris_transformed_classification = get_numeric_classification(iris_classification)

# Sample test data
test_data = [1.1, 2.3, 4.5, 2.8]

# The predicted class of the sample test data
pred_class = classify(iris_features, iris_transformed_classification, test_data)

print pred_class

# Transforming back the predicted class into the actual classification
print print_class(pred_class)

Tech Updates

contextera

Wednesday, March 15, 2017

Working on Classification algorithm

No comments:

Post a Comment

Search This Blog

About Me

Blog Archive

Followers