sd16fall · awu1 · Nov 5, 2016
diff --git a/Questions.txt b/Questions.txt
@@ -0,0 +1,4 @@
+1. The general trend in the curve is the higher the percentage of data used for training, the higher the accuracy of the test. 
+2. The lower half of the curve appears to have more noise than the top half of the curve, which may be because there is less data to build a model with and thus, we cannot expect very representative models. 
+3. The greater the number of trials, the more smooth the curve. At 100 trials, I start to get a smoother curve with some repetitive bumps. When increased to 500, I get a fairly smooth curve. 
+4. When the C value is increased, the computer's accuracy at lower percentages of data seems to improve. When the C value is dexreased, the computer's accuracy at higher percentages of data seems to improve.
diff --git a/learning_curve.py b/learning_curve.py
@@ -19,6 +19,15 @@
 
 # TODO: your code here
 
+for i in range(len(train_percentages)):
+    result = 0
+    for j in range(num_trials):
+        X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, train_size=train_percentages[i]/100.0)
+        model = LogisticRegression(C=10**-10)
+        model.fit(X_train, y_train)
+        result += model.score(X_test, y_test)
+    test_accuracies[i] = result/num_trials
+
 fig = plt.figure()
 plt.plot(train_percentages, test_accuracies)
 plt.xlabel('Percentage of Data Used for Training')

diff --git a/plot.png b/plot.png