sd16fall · jlee66 · Nov 10, 2016 · Nov 10, 2016 · Nov 10, 2016 · Nov 10, 2016
diff --git a/figure_1.png b/figure_1.png
diff --git a/learning_curve.py b/learning_curve.py
@@ -17,7 +17,14 @@
 # You should repeat each training percentage num_trials times to smooth out variability
 # for consistency with the previous example use model = LogisticRegression(C=10**-10) for your learner
 
-# TODO: your code here
+model = LogisticRegression(C=10**-50)
+for x in train_percentages:
+        summing = []
+        for i in range(num_trials):
+            x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, train_size=float(x) * .01)
+            model.fit(x_train, y_train)
+            summing.append(model.score(x_test, y_test))
+        test_accuracies[(x-1)/5] = float(sum(summing))/len(summing)
 
 fig = plt.figure()
 plt.plot(train_percentages, test_accuracies)

diff --git a/questions.txt b/questions.txt
@@ -0,0 +1,4 @@
+1. The general trend of the curve is upwards. It seems like as accuracy increases, the percentage of data used for training increases, yet the curve appears to be leveled off at higher percentages.
+2. It seems like the curve appears to be noisier in the range of 55 to 80 percent of data used for training. The reason would be that there are less observation in the model to continuously measure the accuracy of the model.
+3. About 100 trials to get the a smooth curve.
+4. When I tried with C=10**-1 (the larger value than the one that I used), the curve rapidly increases and normalizes at a high percentage.