Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Questions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
1. The general trend in the curve is the higher the percentage of data used for training, the higher the accuracy of the test.
2. The lower half of the curve appears to have more noise than the top half of the curve, which may be because there is less data to build a model with and thus, we cannot expect very representative models.
3. The greater the number of trials, the more smooth the curve. At 100 trials, I start to get a smoother curve with some repetitive bumps. When increased to 500, I get a fairly smooth curve.
4. When the C value is increased, the computer's accuracy at lower percentages of data seems to improve. When the C value is dexreased, the computer's accuracy at higher percentages of data seems to improve.
9 changes: 9 additions & 0 deletions learning_curve.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,15 @@

# TODO: your code here

for i in range(len(train_percentages)):
result = 0
for j in range(num_trials):
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, train_size=train_percentages[i]/100.0)
model = LogisticRegression(C=10**-10)
model.fit(X_train, y_train)
result += model.score(X_test, y_test)
test_accuracies[i] = result/num_trials

fig = plt.figure()
plt.plot(train_percentages, test_accuracies)
plt.xlabel('Percentage of Data Used for Training')
Expand Down
Binary file added plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.