Linear regression example, using simdfied

· Algorithms, Machine Learning, SIMD, simdfied
Authors

Let’s test drive simdfied library with a linear regression example.

We’ll use MLplaygroung.org, that uses simdfied for Machine Learning and can read csv or mat files. For example, a csv file representing house prices according to its square-foot and number of bedrooms:

square foot, #bedrooms, price
2461.68 , 4 , 467883
1872, 4 , 385983

And so on…

At first our file plot looks like this:

linear_reg_step_1

In our case, the y data is linear, not labeled, so coloring as y makes no sense. Lets change y axis to the price and the color axis to the number of bedrooms. Now we get:

linear_reg_step_2

Which makes more visual sense. Now lets run linear regression from the menu:

linear_reg_step_3

We can see that cost is going down smoothly, but can still go down farther. The linear regression itself is far from being linear, though..

It tries to plot a line connecting all the hypothesis function results with the existing X data. Since our “theta” vector is not perfect yet, each y-price point is not yet optimally “centered” in the imaginary linear line, and since our data points are not ordered by our current x-axis we get this “polygraph” like line.

Now let’s do some iterative optimizations; performance wise, I always prefer to update the alpha learning rate before adding more iterations.

At alpha = 0.03 we get a closer linear line and our cost is getting close to its minimum, though starting to get “elbow” shape like. The “elbow” shape tells us we probably got to the highest, if not too high, alpha learning rate:

linear_reg_step_4

At the same alpha, adding 500 iteration gives us a first “strait line” of prediction, but the cost “elbow” shape got worse:

linear_reg_step_5

After playing with more variations, we can get the same linear result with alpha = 0.1 and a 100 iterations; performance wise – this would be a nice choice.

The linear plot gives us a sense of which direction house prices will go with relations to their square-foot and #bedrooms provided data. If we’d like an actual prediction we can use simdfied directly (MLplayground will soon have the ability to predict a “y” according to a new x vector). Something like that:

//load our X matrix with 2 features: square footage and number of bedrooms
var X = simdfied.mat().from2dArray( [2461.68, 1872, …], [4, 4, …] );

//load our y vector with house prices
var y = simdfied.vec().fromArray( [467883, 385983, …] );

//run and predict a price for a 3 bedroom, 900 square footage house:
var ml = simdfied.ml().algo(“linReg”).X(X).y(y).set(“iter”, 100).set(“alpha”, 0.1);
ml.run( function(ml){ ml.predOne( [900, 3] ); } );

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: