moduff.github.io/older_papers.html at master · moduff/moduff.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
<html> <head>
<br>
<br>
<p>
<p>


<title>(Mostly Reinforcement Learning Theory)</title>
</head>


<body bgcolor="#445544" text=#99AA99 link=#DD7711 vlink=#DD7711 alink=#DD7711>

<TABLE><TD width="20%"><TD width=600>


<font face='helvetica'>


<img src = "mouse-maze.jpg" width="175" height="95">
<img src = "mouse-in-maze.jpg" width="175" height="95"><br>
<img src = "bamdp.jpg" width="175" height="95">
<img src = "backup.jpg" width="175" height="95">

<h3>(Mostly Reinforcement Learning Theory)</h3>


<br>
<p>

<ul>

<li>Y Niv, MO Duff & P Dayan (2005) - <a href="http://www.behavioralandbrainfunctions.com/content/1/1/6">Dopamine, Uncertainty and TD Learning</a> - Behavioral and Brain Functions 1:6 (4 May 2005), doi:10.1186/1744-9081-1-6.<br><p>

<li>Duff, Chudova, Wold, Smyth, & Mjolsness. <a href="http://www.sigmoid.org/publications/icsb2005.pdf">Statistical inference of biologically-plausible dynamic regulatory networks with core-leaf topology</a>, ICSB, 2005.<br><p>

<li>Duff, M. <a href="http://www.aaai.org/Papers/ICML/2003/ICML03-020.pdf">Design for an optimal probe<a/>. Proceedings of the 20th International Conference on Machine Learning, 2003: 131-138.<br><p>

<li>Duff, M. <a href="http://www.aaai.org/Papers/ICML/2003/ICML03-021.pdf">Diffusion approximation for Bayesian Markov chains</a>. Proceedings of the 20th International Conference on Machine Learning, 2003: 139-146.<br><p>

<li>Duff, M.
<a href="http://graveleylab.cam.uchc.edu/WebData/mduff/duff_thesis.pdf"> Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes</a>. Ph.D. Thesis, Dept. of Computer Science, Univ. of Massachusetts, Amherst, 2002.<br><p>

<li>Duff, M. & Barto, A. <a href="http://papers.nips.cc/paper/1230-local-bandit-approximation-for-optimal-learning-problems.pdf">Local bandit approximation for optimal learning problems</a>. Advances in Neural Information Processing Systems 9. 1997: 1019-1025.<br><p>

<li>Duff, M. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.1916&rep=rep1&type=pdf">Q-learning for bandit problems</a>. Proceedings of the 12th International Conference on Machine Learning, 1995: 209-217.<br><p>

<li>Bradke, S. & Duff, M. <a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=AFBA4907F489AA38EA2F9A3CAC069898?doi=10.1.1.32.9030&rep=rep1&type=pdf">Reinforcement learning methods for continuous-time Markov decision processes</a>. Advances in Neural Information Processing Systems 7. 1995: 393-400.<br><p>

<li>Duff, M. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.9257&rep=rep1&type=pdf">Solving Bellman's equation by the method of continuation</a>. Proceedings of the American Control Conference, 1994.: 2671-2682.<br><p>

<li>Barto A. & Duff, M. <a href="http://papers.nips.cc/paper/865-monte-carlo-matrix-inversion-and-reinforcement-learning.pdf">Monte-Carlo matrix inversion and reinforcement learning</a>. Advances in Neural Information Processing Systems, vol 6 1994: 687-694.<br><p>

<li>Duff, M. Backpropagation and Bach's 5th cello suite (Sarabande). Proceedings of the International Joint Conference on Neural Networks.<br><p>

<li>Szilagyi, M., Duff, M., & Yakowitz, S. Procedure for electron and ion lens optimization. Applied Physics Letters.<br><p>

</ul>

<p>
<br>

<a href="chrono_bio.html"><font color="#3399FF">My Erdos number is 3.</font></a>

<TD width="20%">
</TABLE>

</body>


</html>