import numpy as np
import matplotlib.pyplot as plt
# Create data
= np.arange(0, 5, 0.1)
x = x
y
# Plot
plt.plot(x, y)0, 5)
plt.xlim(0, 5)
plt.ylim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
'y = x')
plt.title(
plt.show()
Linear Regression
Remember that machine learning is about learning some form of input-output mapping, and then using that model to make predictions on new data.
We will learn how to do this using linear regression, which is a simple yet powerful technique.
But… Let’s revisit high school math for a moment.
High School Math
Linear equation
Let’s say we have a linear equation:
y = x
How the curve looks like?
How about this one?
y = 2x
How 2
changes the curve?
# Draw y = 2x
import numpy as np
import matplotlib.pyplot as plt
# Create data
= np.arange(0, 5, 0.1)
x = 2* x
y
# Plot
plt.plot(x, y)0, 5)
plt.xlim(0, 5)
plt.ylim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
"y = 2x")
plt.title(
plt.show()
Notice the difference?
The slope is steeper.
How about y = 0.5x
?
# Draw y = 0.5x
import numpy as np
import matplotlib.pyplot as plt
# Create data
= np.arange(0, 5, 0.1)
x = 0.5*x
y
# Plot
plt.plot(x, y)0, 5)
plt.xlim(0, 5)
plt.ylim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
"y = 0.5x")
plt.title(
plt.show()
Now, how about this one?
y = 2x + 1
How +1
changes the curve?
# Draw y = 2x + 1
import numpy as np
import matplotlib.pyplot as plt
# Create data
= np.arange(0, 5, 0.1)
x = 2*x + 1
y
# Plot
plt.plot(x, y)0, 5)
plt.xlim(0, 5)
plt.ylim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
"y = 2x + 1")
plt.title(
plt.show()
The line intersect with the y-axis at 1, and the slope of the line is 2.
The previous equations are all linear equations. Linear equations can be represented as:
y = ax + b
where a
is the slope of the line and b
is the y-intercept.
Exercise: Linear Equation
Visualizing 2D linear equation
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
def draw(a, b):
# Draw y = ax + b
# Create data
= np.arange(0, 5, 0.1)
x = a*x + b
y
# Plot
plt.plot(x, y)0, 5)
plt.xlim(0, 5)
plt.ylim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
"y = ax + b")
plt.title(
plt.show()
= widgets.FloatSlider(min=0, max=10, step=0.1, value=2, description='a:')
a_slider = widgets.FloatSlider(min=0, max=10, step=0.1, value=3, description='b:')
b_slider
# Display the widgets and plot
=a_slider, b=b_slider) widgets.interactive(draw, a
Visualizing 3D linear equation
3D linear equation can be represented as:
z = ax + by + c
where a
is the slope of the line on the x-axis, b
is the slope of the line on the y-axis, and c
is the z-intercept.
It can also be seen as an equation with two variables x
and y
:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
def draw(a, b, c):
# Draw z = ax + by + c
# Create data
= np.arange(0, 50, 1)
x = np.arange(0, 50, 1)
y = np.meshgrid(x, y)
X, Y = a*X + b*Y + c
Z
# Plot 3D
= plt.figure()
fig = fig.add_subplot(111, projection='3d')
ax
ax.plot_surface(X, Y, Z)
# Fix the projection from X = [0, 5] and Y = [0, 5] and Z = [0, 5]
0, 50)
ax.set_xlim(0, 50)
ax.set_ylim(0, 50)
ax.set_zlim(
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
"z = ax + by + c")
plt.title(
plt.show()
= widgets.FloatSlider(min=-5, max=5, step=0.1, value=0, description='a:')
a_slider = widgets.FloatSlider(min=-5, max=5, step=0.1, value=0, description='b:')
b_slider = widgets.FloatSlider(min=-50, max=50, step=0.1, value=0, description='c:')
c_slider
# Display the widgets and plot
=a_slider, b=b_slider, c=c_slider) widgets.interactive(draw, a
Fitting scatter plot into a line
Now, let say we have the following data:
import numpy as np
import matplotlib.pyplot as plt
# Create scattered dot around y = 3x + 8
= np.random.rand(100) * 10
x_data = np.random.normal(0, 2, x_data.shape)
noise = 3*x_data + 8 + noise
y_data
# Plot
=1)
plt.scatter(x_data, y_data, s
# Set the x and y labels
'x')
plt.xlabel('y')
plt.ylabel(
plt.show()
The data can about anything, for example, the number of hours spent studying (x
) and the grade you get in the exam (y
).
It can be obtained by asking your friends, or by doing a survey.
Give that experiment data, can we predict the grade of a student if we know how many hours he/she spent studying?
How?
One way to do this is to fit a line into the data.
How do we do that?
import numpy as np
import matplotlib.pyplot as plt
# Create scattered dot around y = 3x + 8
= np.random.rand(100) * 10
x_data = np.random.normal(0, 2, x_data.shape)
noise = 3*x_data + 8 + noise
y_data
# Plot
=1)
plt.scatter(x_data, y_data, s
# Plot y = 3x + 8
= np.arange(0, 10, 0.1)
x
= 3
a = 8
b = a*x + b
y ='red')
plt.plot(x, y, color
# Set the x and y labels
'time spent on studying')
plt.xlabel('grade')
plt.ylabel(
plt.show()
Let’s try manual approach first, i.e. we try different values of m
and b
until we get a good fit.
Remember, the equation is y = mx + b
.
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
= np.random.rand(100) * 10
x_data = np.random.normal(0, 2, x_data.shape)
noise = 3*x_data + 8 + noise
y_data
# Define the update function
def draw(a, b):
= np.arange(0, 10, 0.1)
x = a*x + b
y
='red')
plt.plot(x, y, color=1)
plt.scatter(x_data, y_data, s'x')
plt.xlabel('y')
plt.ylabel(
f"Equation y = {a}x + {b}")
plt.title(
plt.show()
# Create scattered dot around y = 3x + 8
= np.random.rand(100) * 10
x = np.random.normal(0, 2, x.shape)
noise = 3*x + 8 + noise
y
# Define the slider widgets
= widgets.FloatSlider(min=0, max=10, step=0.1, value=0, description='a:')
a_slider = widgets.FloatSlider(min=0, max=10, step=0.1, value=0, description='b:')
b_slider
# Display the widgets and plot
=a_slider, b=b_slider) widgets.interactive(draw, a
Given that equation, what would be the grade if the student spent 1 hour studying?