RJ NowlingWeb site for RJ Nowling, an Associate Professor in Electrical Engineering & Computer Science at the Milwaukee School of Engineering
http://rnowling.github.io/
Sun, 13 Aug 2023 04:12:33 +0000Sun, 13 Aug 2023 04:12:33 +0000Jekyll v3.9.3Mission Accomplished: Promoted to Associate Professor<p>On April 15, 2023, I received the official notification that my promotion to Associate Professor was approved. At <a href="https://msoe.edu">MSOE</a>, faculty are eligible for rank advancement every five years. In my opinion, this happens to be the perfect length of time to dedicate towards a well-defined goal if you want to go deep and make a real impact, so it’s not surprising that I aligned my professional goals with the promotion timeline.</p>
<p>When I joined MSOE, I had several well-defined goals for myself:</p>
<ul>
<li>Develop high-quality data science and machine learning classes for MSOE’s (new when I started) Computer Science major</li>
<li>Establish a research program that:
<ul>
<li>engages undergraduate students in genuine research experiences</li>
<li>allows me to grow my technical and subject-matter expertise in genomics, machine learning / data science, and such</li>
<li>produces novel results worth of publications</li>
<li>supports collaborations with faculty at research-intensive institutions</li>
<li>is competitive for external funding</li>
</ul>
</li>
</ul>
<p>I’ve accomplished all that and more:</p>
<ul>
<li>Developed into a respectable teacher with a diverse set of techniques in my toolbox</li>
<li>Helped recruit and hire several more faculty. These faculty are now teaching most of the classes I developed and vastly improved them in ways I could have never done alone. Our Computer Science already has a strong, positive reputation among employers and graduate programs.</li>
<li>Engaged in collaborative projects that combined computation and experiment as equal partners. Pushed the envelope in my two major research areas (chromosomal inversions and functional sequencing methods) while supporting various projects of biologist collaborators.</li>
<li>Awarded a NSF CRII grant and supplemental REU funding. The grant supported students in paid research assistant positions, provided me with summary salary, and provided funding for conference travel, publications, and research equipment.</li>
<li>Mentored 15 undergraduate research assistants. Thanks to the kindness of my collaborators, these students were introduced to research intensive environments and interdisciplinary work. I’ve been able to take many of the students to conferences. Three are in / have been admitted to graduate programs (1 M.S., 2 Ph.D.).</li>
<li>Published 9 conference and journal papers with 1 more currently under review.</li>
<li>Continued to expand my technical and biological knowledge</li>
<li>Pushed the boundaries of what was considered possible for undergraduate research at MSOE</li>
<li>Transitioned into teaching graduate courses</li>
<li>Found a community that nurtures and nourishes me while providing opportunities to mentor and support others</li>
</ul>
<p>I’m thankful that my colleagues and administration recognized and celebrated my accomplishments by recommending my promotion.</p>
Fri, 28 Apr 2023 12:13:19 +0000
http://rnowling.github.io/career/2023/04/28/mission-accomplished.html
http://rnowling.github.io/career/2023/04/28/mission-accomplished.htmlresearchfundingcareerNSF CISE CRII is Funded!<p>I am incredibly excited to share that I received an offical notice of award for my NSF CISE CRII proposal titled <a href="https://nsf.gov/awardsearch/showAward?AWD_ID=1947257&HistoricalAwards=false">“CRII: III: RUI: Association Testing and Inversion Detection without Reference Genomes”</a>!</p>
<p>Reference genomes are the backbone (literally) of nearly all existing methods for analyzing genetic variation. Unfortunately, this is sub-optimal. For many non-model species, it is extraordinary difficult to assemble complete genomes. Challenges include difficulties in rearing organisms in the lab to facilitate inbreeding.</p>
<p>Even when reference genomes are available, if the genomes are not assembled from representative samples, it will not be possible to detect all possible variations with that genome. For example, a recent study performed a <em>de novo</em> assembly of reads that did not align to the human reference genome from <a href="https://www.nature.com/articles/s41588-018-0273-y">910 humans of African descent</a>. The study found a substantial fraction of the new assembly was absent from the standard human reference genome. This has massive ethical implications when reference genomes are used for medical applications.</p>
<p>I proposed adapting my previous methods for detecting inversions from SNPs to k-mers. Based on the work of my group and others, we know that PCA can detect inversions from SNPs. PCA does not take into account the spatial relationships of the SNPs, meaning that it does not in fact need a reference genome. While my previous work was targeted at improving the accuracy of inversion detection with PCA by exploiting spatial information, we will need to develop new ways to perform confident inversion detection for this proposal.</p>
<p>We will demonstrate the utility of our methods through applications to association testing in insect vectors. Due to the number of large inversions in insects, inversion information is often corrected for when doing genome-wide association studies. The methods developed by my team will enable association testing for problems such as identifying the genetic basis of insecticide resistance in non-model insect vectors.</p>
<p>The award will providing funding for salary and travel for me and two students per year for a total of two years.</p>
Fri, 19 Jun 2020 12:13:19 +0000
http://rnowling.github.io/research/2020/06/19/crii.html
http://rnowling.github.io/research/2020/06/19/crii.htmlresearchfundingresearchReflections on a Productive First Academic Summer<p>As a new faculty member at a PUI, the switch from a rigid and packed academic year schedule to the completely unstructured summer schedule can be daunting and challenging. I ended up being very productive this summer, possibly too productive at the expense of not giving myself enough time to recharge. I wanted to reflect on my summer to identify what worked and what could be improved with the hope that my thoughts may be useful to other new faculty.</p>
<h2 id="my-first-summer-in-a-nutshell">My First Summer in a Nutshell</h2>
<p>I defined seven goals for the summer and accomplished all of them. I describe these accomplishments below.</p>
<h3 id="1-wrote-and-submitted-my-first-grant-proposal">1. Wrote and Submitted My First Grant Proposal</h3>
<p>I entered the summer with a one-page summary of my objectives (similar to the <a href="https://www.biosciencewriters.com/NIH-Grant-Applications-The-Anatomy-of-a-Specific-Aims-Page.aspx">specific aims page</a> of an NIH proposal). By the end of June, I had a complete first draft of my project description. In July, I wrote my RUI impact statement, revised my project description, and produced the other associated pieces (e.g., the budget, a description of the facilities, list of collaborators). The final proposal was submitted to NSF on August 6.</p>
<h3 id="2-wrote-and-submitted-a-manuscript-to-a-journal">2. Wrote and Submitted a Manuscript to a Journal</h3>
<p>During the academic year, I applied my inversion detection method on additional data sets. I began to draft a paper in Winter. The paper proved to be challenging to write and I ultimately had iterative through multiple narratives until I found the “one” that worked. I completed the manuscript in early August, <a href="https://www.biorxiv.org/content/10.1101/736900v1">posted it as a pre-print on BioRxiv</a>, and submitted it to a journal.</p>
<h3 id="3-mentored-two-students">3. Mentored Two Students</h3>
<p>I mentored two students for the summer. Krystal was a BioMolecular Engineering major who did research with me during her last quarter. Despite graduating in May, Krystal continued working with me over the summer. Krystal finished her comparison of inversion detection methods and presented a <a href="/publications/AGS_2019.pdf">poster at AGS and IEEE COMPSAC</a>. For the last-half of summer, Krystal implemented a variant-calling pipeline for <em>Anopheles</em> data from my collaborator.</p>
<p>Matt finished his first-year as a Computer Science major. My goal for Matt was to expand his horizons and introduce him to bioinformatics and population genetics. Matt attended AGS and then spent June implementing a population genetics simulator to study the impact of effects such as mutation, recombination, and repression of recombination on nucleotide diversity and PCA of SNPs. Matt presented his work as a <a href="/publications/IEEE_COMPSAC_2019.pdf">poster at IEEE COMPSAC</a>.</p>
<h3 id="4-attended-ags-with-my-students">4. Attended AGS with My Students</h3>
<p>The Arthropod Genomics Symposium is one of my favorite conferences. AGS attracts the same folks year after year, is small enough (150-200 people) to get to know everyone, is frequently held in the Midwest (reducing travel costs), and provides opportunities to learn biology.</p>
<p>This year, I brought Krystal and Matt. We flew to Kansas City, MO, rented a car, and drove to Manhattan, KS. Krystal had opportunities to present her work as a poster, explore options for graduate school, and learn about potential staff bioinformatics positions. Matt was exposed to biology and research. In both cases, students were exposed to research and the academic world beyond our small, engineering-focused university.</p>
<h3 id="5-grew-and-nurtured-a-collaboration">5. Grew and Nurtured a Collaboration</h3>
<p>Last October, I started collaborating with a colleague at the Medical College of Wisconsin. I reached out after citing one of her papers. She has been nice enough to share her time with me. Over the summer, I performed some statistical analyses for her and she co-mentored Krystal. By working together, we began to better understand each others’ work and identify projects of mutual interest. Since starting her M.S. in Bioinformatics, Krystal effectively became my collaborator’s student; Krystal’s work on the variant-calling pipeline laid a foundation for a potential M.S. thesis project.</p>
<h3 id="6-updated-my-materials-for-the-data-science-class">6. Updated My Materials for the Data Science class</h3>
<p>As I described in an earlier blog post, I was constantly operating under recurring deadlines my first year. Most of my materials were minimal and essentially “placeholders.” I spent the month of August substantially updating my materials for the Data Science class. I significantly edited and expanded existing lectures and labs, created new lectures, tutorials, and labs, and updated the materials to a new textbook. This work will pay off significantly in the Fall when I teach the class for the second time.</p>
<h3 id="7-identified-a-research-topic-for-the-academic-year">7. Identified a Research Topic for the Academic Year</h3>
<p>At the end of the summer, I identified a research problem that dovetails with my collaborator’s work on enhancer maps for <em>Anopheles</em> genomes. Over the next academic year, I’ll work with a new MSOE student to explore the problem and prior work. My goal for next summer is to be able to write either a grant proposal or paper.</p>
<h2 id="lessons-learned">Lessons Learned</h2>
<p>By reflecting on my summer, I identified several factors that helped me.</p>
<h3 id="1-optimizing-the-types-of-efforts-for-the-academic-year-and-summer">1. Optimizing the types of efforts for the academic year and summer</h3>
<p>The summer provides big blocks of time for focused work. The academic year provides smaller blocks of time that are interspaced with classes and other responsibilities. I would describe research time during the summer as “highly concentrated”, while the available research time during the academic year is “lowly concentrated” and spread out. One of my main challenges is to identify which times of year are best suited for each part of the research process (e.g., background reading, planning, experiments, and writing).</p>
<p>In retrospect, I realized my summer accomplishments were seeded by efforts during the academic year. For my grant proposal, I spent the academic year reading background literature, thinking about potential research problems, and working on my one-page specific aims document. For my paper, I did most of the data analysis and thinking about the narrative during the academic year. In both cases, the most productive writing was done entirely during the summer.</p>
<p>The same pattern applies to Matt’s project: I spent the academic year reading about modeling and simulation for population genetics and writing prototypes. When it came time to mentor Matt, I was able to guide him in his own effort in a step-by-step fashion because I already had a detailed plan laid out.</p>
<p>I think part of this is that certain parts of the research process can’t be rushed. Background reading, project planning, and experiments are not necessarily sped up by having larger chunks of time as long as I have enough time to make consistent progress. After reading a paper or executing an experiment, I need a few days to think deeply about how to interpret the information and understand the implications. In that situation, additional work hours generate diminishing returns.</p>
<p>In contrast, I find that writing productively requires keeping large chunks of contextual information in my head. The dense schedule of research time during summer, which requires less multi-tasking, is well-suited for writing.</p>
<h3 id="2-organizing-my-work">2. Organizing My Work</h3>
<ol>
<li><strong>I limited my goals to what I could complete.</strong> The summer will go quickly. At first I thought I wouldn’t know how to fill / use that time productivity. But then I quickly found that I didn’t have enough time. If I spread myself too thin, I wouldn’t be able to finish anything. I limited the number of goals to what I can finish.</li>
<li><strong>I chose “big” goals which needed time and attention I couldn’t provide during the academic year.</strong></li>
<li><strong>I prioritized one or two goals over the rest.</strong> These are what I “needed” to accomplish; everything else is what I “wanted” to accomplish. In this case, I needed to submit my grant proposal.</li>
<li><strong>I staggered the timelines for my goals.</strong> Trying to make progress on 5 fronts at once wasn’t conducive to getting anything done. Instead, I focused on two goals at a time.</li>
<li><strong>I tracked my work.</strong> I used a Trello board with a list for each week. I used colored cards for each day. Every day that I worked on something, I added a card for the appropriate goal. At the end of each week, each month, and the summer, I could review how my time was spent and use it to improve my time organization.</li>
</ol>
<h3 id="3-working-productively">3. Working Productively</h3>
<ol>
<li><strong>I limited other distractions (commitments, meetings, etc.).</strong> The academic year has a rigid schedule with a lot of context switching. Summer was my time to focus on one thing at a time (less mult-tasking!). To avoid losing productivity, I consistently worked to reduce and weed out distractions. I made an effort to say “no” where I could and limit the number of meetings and commitments in any given week. This was hard to do, however, and required continuous effort and re-assessment.</li>
<li><strong>I found my most productive time and guarded it.</strong> I was more productive in the morning than in the afternoon. If I had a meeting in the morning, I struggled to focus and be productive later in the day. Likewise, I found that I needed large blocks of time. Therefore, I reserved mornings for focused work and scheduled meetings for the afternoons.</li>
<li><strong>I found my most productive environment.</strong> I find it difficult to concentrate and focus in my office. I’m able to focus more easily when working at a local coffee shop or in my living room. I used this to my advantage by avoiding campus over the summer.</li>
<li><strong>I started a local grant writing group.</strong> Through one of my colleagues, I met several other faculty who were also working to submit grants and do summer research. Through regular meetings, this group provided support as well as practical feedback and advice that helped me achieve my goals.</li>
</ol>
<h3 id="4-taking-care-of-myself">4. Taking Care of Myself</h3>
<ol>
<li><strong>I built and relied on a support network.</strong> Outside of the grant writing group, I had several friends and colleagues I turned to when I needed to vent. I found that helped me substantially this summer, especially when I was feeling overwhelmed, intimidated, or stressed out by writing my first grant proposal.</li>
<li><strong>I kept up with my self care.</strong> I kept a regular schedule with my personal trainer. Exercise has been and continues to be an important part of my mental health; workouts do as much for my mental health as they do my physical health. I could have done a better job of getting out on my bike (that fell by the wayside) and spending time with friends.</li>
<li><strong>I (should have) scheduled a vacation or two.</strong> I took a few days off during the summer (e.g., a trip to a local botanical garden) but nothing that counted as real vacation. My wife and I did technically schedule a long weekend vacation over Labor Day but had to cancel it. In retrospect, I should have made more time for downtime.</li>
</ol>
<h2 id="conclusion">Conclusion</h2>
<p>My summer was very productive. I can’t complain about the outcome: I accomplished every goal I set. I am particularly proud about (and grateful for) submitting my first grant proposal. With the proposal completed and submitted, I now have a better sense of what a proposal entails and feel confident that I can write more. Overcoming that hurdle along with developing the skills and knowledge for writing grant proposals will pay off significant dividends over the rest of my career.</p>
<p>Growing my relationship with my collaborator will also pay off. My hope is that the collaboration will be long-term, providing opportunities and benefits on all sides.</p>
<p>As I prepare to start classes on Monday, I am grateful for the time I spent improving my materials for the Data Science class. I feel the class is planned to the point that I have very little to do. In contrast, I am spending a lot of time remembering the content for the Introduction to Software Engineering course. Over the quarter, I will need to re-organize and polishing my materials.</p>
<p>I find two resources particularly helpful this summer. The book <a href="https://smile.amazon.com/How-Write-Lot-Practical-Productive/dp/1433829738">How to Write a Lot</a> by Paul J. Silvia is short but packed with great advice on how to be productive. Secondly, my Ph.D. advisor Scott Emrich emailed me a handy overview of approaches for organizing writing time that he’s observed in academia.</p>
<p>All of that said, my summer was very intense. I was very stressed through large chunks of the summer. And this did affect those around me.</p>
<p>Next year, I want to be a little less ambitious and enjoy more of my summer time. I want to do a better job of riding my bike consistently, taking my dogs to the dog park, and enjoy a vacation or two with my spouse.</p>
Sat, 07 Sep 2019 12:13:19 +0000
http://rnowling.github.io/research/2019/09/07/first-academic-summer.html
http://rnowling.github.io/research/2019/09/07/first-academic-summer.htmlteachingresearchThoughts on Surviving My First Year of Teaching at a PUI<p>With my summer wrapped up, I can now say that my first year as an assistant professor at a primarily-undergraduate institution (PUI) is done. As I prepare to start my second year, I reflected on my first year and some of the lessons I learned.</p>
<h2 id="my-first-year-in-a-nutshell">My First Year in a Nutshell</h2>
<p>MSOE is on a quarter system. In my department (EECS), most faculty teach three sections of about 20 students each per quarter. Most classes are in the format of three hours of lecture and a two-hour lab. Frequently, two of the sections are for the same class, while the third section is for a second class. This means faculty teach 9 sections per year corresponding to 6 classes.</p>
<p>First-year faculty also teach 6 classes but only one section of each. For 2018-2019, I taught the following classes:</p>
<ul>
<li>Introduction to Software Development I and II</li>
<li>Data Structures</li>
<li>Algorithms</li>
<li>Data Science</li>
<li>Machine Learning</li>
</ul>
<p>Overall, my first year was incredibly busy and, at times, overwhelming, but it went well. In the end, my students learned a ton, and I earned good teaching evaluations from them. Going into my second year, I am much more confident in my grasp on my courses, their materials, and the overall patterns of the academic year.</p>
<h2 id="teaching-tips-for-first-year-faculty">Teaching Tips for First-Year Faculty</h2>
<p>Multiple new faculty members are joining our department this year, and for most of them, this will be their first year as a professor at a PUI. I found myself repeating some observations, so I wanted to write them down and share them more broadly in case they are useful to others.</p>
<h3 id="1-optimize-for-short-term-realities">1. Optimize for Short-Term Realities</h3>
<p>One colleague gave the advice that my only goal for my first year was to survive. In retrospect, this may have been some of the most useful advice I received. I had very high expectations for myself going in and ended up hitting a wall. I quickly learned that I needed to moderate my expectations to be able to manage my workload and challenges of constantly doing something new every day for 9 months.</p>
<p>In practical terms, I had 5-6 lectures and 2 labs to prepare for every week. Each lecture and lab took me about 2-3 hours of preparation, which included creating materials as well as reviewing and re-teaching the material to myself. On top of that, I also had grading, office hours, and service and research commitments. (Research is not a focus at MSOE, but several of my colleagues and I are passionate about research and mentoring students.)</p>
<p>Several faculty gave me well-meaning and good advice on best practices. I was not, however, in a good position to use all of it. In particular, I was once told to try to prepare my class materials with the long-term in mind: prepare them well and then you can easily re-use them, saving time in the future.</p>
<p>Simply put, I didn’t have time to do things the “right” way. In retrospect, I realized that many of my materials were placeholders or drafts. I created minimal and unpolished slides, examples, labs, and assessments. In the short term, it is easier to create something that you plan to use once and, potentially, throw away. Optimizing for the short term was absolutely necessary to hit all of my weekly lecture and lab deadlines.</p>
<p>The same goes with grading. Be careful about spending too much time on grading; course materials go with you from one year to the next but marked assignments do not. I used spreadsheets to record rubrics, point assignments, and comments for labs (homework). I copied the overall assignment grades and comments into Blackboard and sent a class-wise email with the rubric. Only in my last quarter did I improve on this by creating scripts that generated reports with my comments embedded alongside the students’ code.</p>
<p>During the summer after your first year, you can spend time polishing, extending, and re-developing your materials. By then, you’ll have a sense of the “big picture” and overall themes for your courses. That’s the best time to prepare reusable materials for the long term.</p>
<h3 id="2-plan-for-the-unexpected-and-unknown">2. Plan for the Unexpected and Unknown</h3>
<p>With the time needed to prepare my materials, I spent the entire academic year preparing everything right before it was needed. I had no free time (or spare energy) to work ahead.</p>
<p>This approach assumes that you will always have the needed time for preparation. But life doesn’t work that way. You may have an unexpected situation arise or simply want a date night with your partner. In those cases, you need “backup” materials to fill lecture time when you don’t have time to prepare.</p>
<p>In my case, I showed recorded talks that would be interesting to students. I found recorded talks from industry conferences and other sources for general audiences, on current topics, or containing interesting observations related to the course material. The talks were great for broadening students’ horizons and exposing them to cutting edge work.</p>
<p>Additionally, I would occasionally run out of lecture material because I was still figuring out the timing for my lectures. I used in-class exercises to fill time. The exercises gave all of the students opportunities to practice the material and often engaged or entertained more advanced students more than they would be otherwise. I never required the students to submit their solutions or graded them — the exercises were purely meant to be fun puzzles and provide practice.</p>
<h3 id="3-quizzes-the-multi-faceted-tool">3. Quizzes: The Multi-Faceted Tool</h3>
<p>Early on, a colleague suggested I give weekly quizzes. My views and ways of using quizzes were further shaped by discussions with many of my colleagues. Quizzes became one of my favorite tools in my toolbox.</p>
<p>I’ve discussed quizzes in a <a href="/teaching/2019/03/09/quizzes-multifaceted-teaching-tool.html">previous blog post</a>. But, in summary, weekly quizzes provide opportunities for you and your students to see how your students are doing with the material. They can indicate whether a topic needs to be revisited and do so early enough to enable course corrections (both for students and myself!) ahead of exams.</p>
<p>I also found that quizzes gave opportunities for review. I would often give a quiz in class and then immediately discuss the answers. Students were more engaged than in the usual lectures, enjoyed the opportunity to try on their own, and appreciated the immediate feedback. A quiz and the follow up review would frequently fill an entire lecture.</p>
<p>Students really appreciated the opportunity to practice testing. My quizzes tended to reflect my exams. Consequently, students felt more confident and at ease going into my exams. This is especially true since many students aren’t yet skilled at creating realistic testing situations for practice. Further, when students struggled with a concept, I would keep quizzing them until they had the concept down. This also helped students prepare for the exam.</p>
<p>Lastly, weekly quizzes provided structure and motivation to help students stay on top of studying. I gave quizzes at the same time each week. This gave students motivations to study every week instead of waiting for the exams.</p>
<h3 id="4-accept-failure-your-ideas-will-fail-and-your-decisions-will-backfire">4. Accept Failure: Your Ideas Will Fail and Your Decisions Will Backfire</h3>
<p>In my first year of teaching, every teaching idea I had was new and untested. The first year was really just a bunch of experiments. Some of those ideas went well, while others failed miserably. I had to learn to accept failure without taking it personally.</p>
<p>I have two examples from my Data Structures class. My first lecture on linked lists failed miserably. In Data Structures, I rotated between conceptual and live-coding lectures. In the case of linked lists, I didn’t spend enough time on the concepts before jumping into live-coding the implementation. Secondly, I originally attempted to closely follow the textbook. Unfortunately, the textbook’s code example was designed to avoid code duplication. My students left the lecture feeling completely lost and intimidated.</p>
<p>I ended up redoing the lecture. I spent more time lecturing on the concepts before diving into the code. I also decided not to follow the textbook and create my own linked list implementation. I embraced repetition to emphasize the common patterns associated with traversing a linked list. This second lecture went much better. Through repetition, the students grasped the traversal patterns. By the end of the second lecture, students were anticipating how to implement various operations such as searching the list.</p>
<p>Another idea I had did go well. I decided to introduce unit testing on quizzes, labs, and exams in the style of test-driven development (TDD). In labs, multiple students reported that their tests caught bugs and exposed edge cases they hadn’t initially considered. On quizzes and exams, I asked the students to write test code before starting on their implementations. Data Structures problems tend to be more abstract and complex than the problems in the introductory courses. My goal for the students was to improve their understanding of the problems by creating example inputs and outputs. Multiple students confirmed that the tests helped to clarify their understanding and were valuable exercises.</p>
<h3 id="5-use-a-light-weight-task-tracking-solution">5. Use a Light-Weight Task-Tracking Solution</h3>
<p>With six lectures and two labs every week, I had a large number of weekly deadlines to hit and tasks to complete. I had never been so busy or felt so under the gun in my previous work (either my Ph.D. or industry positions). I also had to keep an eye on deadlines that were farther out such as exams and paper deadlines.</p>
<p>I found it incredibly useful to create a customized task management solution using Trello. I borrowed the concept of a Kanban board. Kanban boards normally have lists like “To Do”, “In Progress”, “Blocked,” and “Done.” In my case, I used lists titled “Due in a Month,” “Due in a Week,” “Due Tomorrow / Next Class Meeting,” and “Done.” By organizing tasks into different time scales, I was able to see what was due soon and further out in a single place. I reviewed and updated the Trello board on a daily and weekly basis. I wrote about my approach in <a href="/productivity/2019/01/11/organizing-class-agile.html">another blog post</a>.</p>
<h3 id="6-keep-up-with-hobbies-exercise-and-other-activities">6. Keep up with Hobbies, Exercise, and Other Activities</h3>
<p>My first year was incredibly stressful. Teaching, even at the college level, requires substantial emotional labor. I was, in effect, always “on call” for forty students who needed my time and attention.</p>
<p>Feeling overwhelmed, I canceled on the gym and other activities in my first month. In the short term, that may have helped me meet a few deadlines. But in the longer term, my mental and physical health took a hit. With the urging of my wife and a friend, I set up a regular gym schedule again. That time became a place in my schedule to work out my frustrations and anxiety, feel accomplished, and have time to myself.</p>
<h3 id="7-seek-advice-from-your-colleagues">7. Seek Advice from Your Colleagues</h3>
<p>My success in teaching in my first year was made possible due in no small part to the wisdom and support of my colleagues. I benefited greatly by bouncing ideas off of my colleagues and listening to their experiences. I would often take the same piece of material to two or three faculty members to get different points of view. Many of my colleagues have 10 or more years of accumulated experience. Their advice and feedback often kept me from going down a wrong turn and helped me polish my rough, untested ideas before coming into contact with the harsh nature of reality. I am greatly indebted to them for the time, attention, and support they gave me.</p>
<p>My colleagues were happy, if not eager, to share their course materials with me. I found that creating my own slides and materials was a necessary part of my process of preparing for lectures. Nonetheless, having examples to study saved me time when preparing my own materials and made it easier to synchronize across sections. Don’t be afraid to ask for and borrow materials!</p>
<h2 id="epilogue">Epilogue</h2>
<p>My first year of full-time teaching as an assistant professor at a PUI was quite a challenge, but I’m comfortable calling it a success. From the various challenges, I was able to grow by leaps and bounds both professionally and personally. The net result is that as I prepare to enter my second year I feel more confident in my teaching; I’m facing fewer unknowns and better prepared all around.</p>
<p>(Thank you to <a href="https://faculty-web.msoe.edu/yoder/">Josiah Yoder</a> for helpful edits and feedback!)</p>
Mon, 02 Sep 2019 12:13:19 +0000
http://rnowling.github.io/teaching/2019/09/02/first-year-teaching.html
http://rnowling.github.io/teaching/2019/09/02/first-year-teaching.htmlteachingteachingQuizzes as a Multi-Faceted Teaching Technique<p>“How I Learned to Stop Worrying and Love the Quiz” – Dr. Strangelove (paraphrased slightly)</p>
<p>As a new instructor, I’ve come to appreciate quizzes as one of the most powerful and versatile yet low-cost tools in my toolbox. I initially introduced weekly quizzes in my introductory classes to measure the effectiveness of my lectures. I figured that having a tight feedback loop would enable me to quickly identify problems and implement changes. I certainly realized this benefit but I also stumbled upon numerous other benefits along the way. Quizzes revolutionized my teaching and, now, are one of my most trusted tools.</p>
<p>From the perspective of an instructor, quizzes are quicker and easier to write and grade than full exams. At the same time, weekly quizzes add up to enough total points to motivate students but each individual quiz is not worth so much that a single bad quiz grade will significantly hamper a student’s overall grade.</p>
<p>Some ways to use quizzes include:</p>
<ol>
<li><strong>Feedback Tool for Lecturers:</strong> As a new instructor, I’m still trying to identify best practices in the classroom or how best to present each topic. Weekly quizzes give me a rapid and regular feedback. If a majority of the class is struggling to understand a topic, this is apparent on the weekly quiz. I can then review the material and adjust my approach going forward.</li>
<li><strong>Self-Evaluation Tool for Students:</strong> First-year students are still developing self-evaluation skills. Until they have developed these skills, students often think they better understand the material than they really do. When the students take a quiz in a testing situation, the illusions fall away and the students leave with a more realistic evaluation.</li>
<li><strong>Encouraging Study Habits:</strong> Many students struggle with the transition from high school classes to college classes with fewer contact hours and a greater need for self study. When quizzes are given on a regular basis, students come to expect them and develop a study schedule around the quizzes. This leads students to develop a habit of studying on a regular basis that continues past the current class.</li>
<li><strong>Practice for Exams:</strong> When I gave my students take-home practice exams, they used the exam as a list of topics to study but not to evaluate themselves. Quizzes provide practice opportunities in realistic testing environments, so students get a better sense of their strengths and weaknesses. I found that giving my students quizzes with questions similar to what would be on the exam led to improved exam grades compared with only giving them take-home practice exams.</li>
<li><strong>Opportunity to Review Material:</strong> After each quiz, I discuss the answers with the class. This gives me an opportunity to review the material and give students a second look. Further, since the questions are fresh in the students’ minds and the students have a better sense of what they do and don’t understand, they tend to be more engaged and interactive than usual.</li>
<li><strong>Improve Office Hours Attendance:</strong> Many of my first-year students are initially uncomfortable with the idea of visiting a professor’s office. To break the ice, I offer to replace my students’ first quiz grade with a 100% if they visit my office hours in the first two weeks of the term. This has proven to be popular, with most of my students taking the offer. For many students, that first office hours visit turns into regular visits. These regular visits means that I can help students early, before problems (academic or otherwise) fester and become insurmountable.</li>
</ol>
Sat, 09 Mar 2019 12:13:19 +0000
http://rnowling.github.io/teaching/2019/03/09/quizzes-multifaceted-teaching-tool.html
http://rnowling.github.io/teaching/2019/03/09/quizzes-multifaceted-teaching-tool.htmlteachingteachingTesting Feature Significance with the Likelihood Ratio Test<p><a href="https://en.wikipedia.org/wiki/Logistic_regression">Logistic Regression</a> (LR) is a popular technique for binary classification within the machine learning and statistics communities. From the machine learning perspective, it has a number of desirable properties. Training and prediction are incredibly fast. When using stochastic gradient descent and its cousins, LR supports online learning, enabling models to change as the data changes and training on datasets larger than the available memory on the machine. And finally, LR naturally accomodates sparse data.</p>
<p>Because of its roots in the statistics community, Logistic Regression is amenable to analyses other machine learning techniques are not. The <a href="https://en.wikipedia.org/wiki/Likelihood-ratio_test">Likelihood-Ratio Test</a> can be used to determine if the addition of the features to a LR model result in a statistically-significant improvement in the fit of the model.<sup><a href="#hosmer">1</a></sup></p>
<p>I originally learned about the Likelihood-Ratio Test when reading about ways that variants are found in genome-wide association studies (GWAS). The statistician <a href="https://en.wikipedia.org/wiki/David_Balding">David J. Balding</a> has significantly impacted the field and its methods. His <a href="http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20112012/Class4/Balding2006.pdf">tutorial on statistical methods for population association studies</a> is a great place to start for anyone interested in the subject.</p>
<p>As Prof. Balding points out, many GWA studies use the Likelihood-Ratio Test to perform single-SNP association tests. Basically, a LR model is built for each SNP and compared to a null model that only uses the class probabilities. SNPs with small p-values are then selected for further study.</p>
<h2 id="likelihood-ratio-test">Likelihood-Ratio Test</h2>
<p>The question we are trying to answer with the Likelihood-Ratio Test is:</p>
<blockquote>
<p>Does the model that includes the variable(s) in question tell us more about the outcome (or response) variable than a model that does not include the variable(s)?</p>
</blockquote>
<p>Using the Likelihood-Ratio Test, we compute a p-value indicating the significance of the additional features. Using that p-value, we can accept or reject the null hypothesis.</p>
<p>Let \(\theta^0\) and \(x^0\) and \(\theta^1\) and \(x^1\) be the weights and feature matrices used in the null and alternative models, respectively. Note that we need \(\theta^0 \subset \theta^1\) and \(x^0 \subset x^1\), meaning that the models are “nested.” Let \(y\) be the vector of class labels, \(N\) denote the number of samples, and \(df\) be number of additional weights / features in \(\theta^1\).</p>
<p>The Logistic Regression model is given by:</p>
\[\pi_\theta(x_i) = \frac{e^{\theta \cdot x_i}}{1+e^{\theta \cdot x_i}}\]
<p>Note that the intercept is considered part of \(\theta\). We append a columns of 1s to \(x\) to model the intercept. (In the implementation below, since you control the feature matrices and model, you can model it as you need.)</p>
<p>The likelihood for the Logistic Regression model is given by:</p>
\[L(\theta | x) = \prod_{i=1}^N \pi_\theta(x_i)^{y_i} (1 - \pi_\theta)^{1 - y^i} \\
\log L(\theta | x) = \sum_{i=1}^N y_i \log \pi_\theta(x_i) + (1 - y_i) \log (1 - \pi_\theta(x_i))\]
<p>The Likelihood-Ratio Test is then given by:</p>
\[G = 2 (\log L(\theta^1 | x^1) - \log L(\theta^0 | x^0))\]
<p>Finally, we compute the p-value for the null model using the \(\chi^2(df)\) CDF:</p>
\[p = P[\chi^2(df) > G]\]
<h2 id="python-implementation-and-example">Python Implementation and Example</h2>
<p>Using <a href="http://scikit-learn.org/stable/">scikit-learn</a> and <a href="https://www.scipy.org/">scipy</a>, implementing the Likelihood-Ratio Test is pretty straightforward (as long as you remember to use the <strong>unnormalized</strong> log losses and negate them):</p>
<script src="https://gist.github.com/rnowling/ec9c9038e492d55ffae2ae257aa4acd9.js?file=likelihood_ratio_test.py"></script>
<p>The <code class="language-plaintext highlighter-rouge">likelihood_ratio_test</code> function takes four parameters:</p>
<ol>
<li>Feature matrix for the alternative model</li>
<li>Labels for the samples</li>
<li>A LR model to use for the test</li>
<li>(Optional) Feature matrix for the null model. If this is not given, then the class probabilities are calculated from the sample labels and used.</li>
</ol>
<p>and returns a p-value indicating the statistical significance of the new features.</p>
<p>To illustrate its use, I generated some fake data with 20 binary features. The binary features range in their probability of matching the class labels from 0.5 (uncorrelated) to 1.0 (completely correlated). Half of the features have inverted values (<code class="language-plaintext highlighter-rouge">1 - label</code>). I generated 100 fake data sets with 100 samples each. I then ran the Likelihood-Ratio Test for each feature individually and created a box plot of the p-values:</p>
<p><img src="/images/likelihood_ratio_test_p_values_boxplot.png" alt="" /></p>
<p>As expected, the statistical significance varies according to the probability that the feature matches the label. And we so no difference in whether the features matches the label or is inverted, also as expected.</p>
<p>(I <a href="https://gist.github.com/rnowling/ec9c9038e492d55ffae2ae257aa4acd9">posted my code</a> under the Apache License v2 so you can re-create my results and use the test in your own work.)</p>
<p><a name="hosmer"></a>Note: the derivation given here comes from <em>Applied Logistic Regression</em> (3<sup>rd</sup> Ed.) by Hosmer, Lemeshow, and Sturdivant.</p>
Sat, 07 Oct 2017 12:13:19 +0000
http://rnowling.github.io/machine/learning/2017/10/07/likelihood-ratio-test.html
http://rnowling.github.io/machine/learning/2017/10/07/likelihood-ratio-test.htmlmathmachine learningstatisticsmachinelearningTesting CLI Apps with Bats<p>I’ve been looking for a good way to test <a href="https://github.com/rnowling/asaph">Asaph</a>, the small machine-learning application I wrote for my Ph.D. thesis.</p>
<p>Most testing solutions I found didn’t quite fit what I wanted. The built-in Python <a href="https://docs.python.org/2/library/unittest.html"><code class="language-plaintext highlighter-rouge">unittest</code></a> framework is my usual go-to. It’s flexible, powerful, and easy-to-use. Asaph is heavily data-dependent with relatively complex internal data structures and its workflow involves lots of file I/O. Consequently, I found it cumbersome to write unit tests.</p>
<p>Most command-line testing solutions seem to be focused on testing the interfaces. Most examples I found focused on using <code class="language-plaintext highlighter-rouge">unittest</code> to test argument parsing with libraries such as <a href="https://docs.python.org/2.7/library/argparse.html"><code class="language-plaintext highlighter-rouge">argparse</code></a>. Other options include testing interactive CLI apps such as those that prompt the user or use something like <a href="https://en.wikipedia.org/wiki/Curses_%28programming_library%29"><code class="language-plaintext highlighter-rouge">curses</code></a>. Asaph isn’t really interactive, though.</p>
<p>Asaph’s commands form a workflow. The user first calls Asaph to convert data to its internal format. The user then uses Asaph to train a Logistic Regression model or Random Forests models with different numbers of trees. The user can then call Asaph to check convergence of the SNP rankings, deciding whether to train models with more trees or not. Lastly, the user can output the SNP rankings from one of the models. In each step, new files (initial data, models, plots, rankings) are added to the work directory or the contents of the work directory are queried.</p>
<p>What I really wanted was to test Asaph and its workflow holistically. I want to call Asaph and check that it executes successfully and produces the expected output on disk. Sometimes it may be enough to merely check that the output exists, while in other cases, I want to query the output to make sure what it contains is reasonable. By running Asaph’s workflow on test data, we can check the most common codepaths and ensure no syntax or type errors have been introduced.</p>
<p>In my search, I came across <a href="https://github.com/sstephenson/bats">Bash Automated Testing System</a> or bats. Bats allows you to write tests in Bash. Bash scripts map well onto my use case: Bash commands are external programs in the traditional Unix philosophy. You then define assertions through standard Bash comparisons. Additionally, bash supports simple setup and teardown functions and loading helper functions.</p>
<p>Bats is best demonstrated through the tests I created for the Asaph import script:</p>
<script src="https://gist.github.com/rnowling/74224fed33ac99137d373297d6694c34.js"></script>
<p>In the example, I use the following features:</p>
<ol>
<li>Defining an embedded helper function (<code class="language-plaintext highlighter-rouge">count_snps</code>)</li>
<li>Setup and teardown functions</li>
<li>Defining tests with the annotation <code class="language-plaintext highlighter-rouge">@test</code></li>
<li>Running commands and checking the return codes</li>
<li>Check for the existence of output files and directories</li>
<li>Checking the contents of output files</li>
</ol>
<p>My experience is not unique to Asaph. Many of the scientific applications I’ve come across in my research are built around complex datasets and workflows of deeply-connected steps. It can be easier to use holistic tests with for these applications. I haven’t quite come across anything like Bats before, but I think it can be an useful tool to computational scientists.</p>
Sat, 04 Feb 2017 00:01:19 +0000
http://rnowling.github.io/software/engineering/2017/02/04/testing-cli-apps-with-bats.html
http://rnowling.github.io/software/engineering/2017/02/04/testing-cli-apps-with-bats.htmltestingasaphscientific computingsoftwareengineeringSymplectic Integrators Bound Energy Error<p>In my previous blog posts, I analyzed the position and velocity error of the harmonic oscillator simulated with the Leapfrog integrator. I <a href="/math/2016/11/19/leapfrog-global-error.html">proved that the Leapfrog integrator is a second-order method</a>. Using a simulation, I validated that the error of the positions and velocities between the numerically-integrated and analytical models grows linearly with the trajectory length and quadratically with the timestep.</p>
<p>What about the error of the total energy between the numerically-integrated and analytical models? The total energy $E(t)$ is calculated from the positions \(x(t)\) and velocities \(v(t)\) by</p>
\[E(t) = \frac{1}{2}m v^2(t) + \frac{1}{2} m \omega x^2(t)\]
<p>As the errors in the positions and velocities grow linearly with time, we can write the numerical positions and velocities as pertubations of the true positions and velocities:</p>
\[\tilde{x}(t) = x(t) + \mathcal{O}(t) \\
\tilde{v}(t) = v(t) + \mathcal{O}(t)\]
<p>We can then substitute the perturbed positions and velocities into \(E(t)\) to get \(\tilde{E}(t)\) and solve:</p>
\[\tilde{E}(t) = \frac{1}{2}m \tilde{v}(t)^2 + \frac{1}{2} m \omega \tilde{x}(t)^2 \\
\tilde{E}(t) = \frac{1}{2}m (v(t) + \mathcal{O}(t))^2 + \frac{1}{2} m \omega (x(t) + \mathcal{O}(t))^2 \\
\tilde{E}(t) = \frac{1}{2}m (v(t)^2 + \mathcal{O}(v(t)t) + \mathcal{O}(t^2)) + \frac{1}{2} m \omega (x(t)^2 + \mathcal{O}(x(t)t) + \mathcal{O}(t^2)) \\
\tilde{E}(t) = \frac{1}{2}m v^2(t) + \frac{1}{2} m \omega x^2(t) + \mathcal{O}(v(t)t) + \mathcal{O}(t^2) + \mathcal{O}(x(t)t) + \mathcal{O}(t^2) \\
\tilde{E}(t) = \frac{1}{2}m v^2(t) + \frac{1}{2} m \omega x^2(t) + \mathcal{O}(t^2)\]
<p>For the harmonic oscillator, the positions and velocities are bounded by constants. Thus, we end up with linear and quadratic error terms depending on \(t\). Since the quadratic error term to be the largest term in the large \(t\) limit, we can expect the error in the energies be bounded by quadratic growth with respect to the length of the trajectories.</p>
<p>Let’s do a simulation to validate our result. I simulated the harmonic oscillator using the Leapfrog integrator with a timestep of 0.01 s to generate trajectories ranging from 1 s to 1000 s. I sampled the total energy at each time step and plotted the average and standard deviations of the energies (black) for each trajectory length. I also included the analytical energy (magenta) as a reference.</p>
<p><img src="/images/symplectic_bounded_error/vv_duration_energies.png" alt="Total Energy" /></p>
<p>But wait! The energy doesn’t grow over time! In fact, the error in the energy doesn’t seem to change in time once the trajectories are long enough. What’s going on?</p>
<p>Our analysis above provided an upper-bound on the energy error over time – the error in the energy could always be less. Specifically, we didn’t take into account the stricter property of symplectiness.</p>
<p>In my <a href="math/2016/12/14/leapfrog-symplectic-harmonic-oscillator.html">last blog post</a>, I proved that the Leapfrog integrator is symplectic. <a href="http://www.cds.caltech.edu/~marsden/bib/1988/04-GeMa1988/GeMa1988.pdf">Ge and Mardsen</a> proved that if a symplectic integrator exactly conserves the total energy (Hamiltonian) of a system, then it is computing the exact trajectory for that system. They go on to suggest that for symplectic integrators, the error in the energy is a good proxy for evaluating the error in the trajectory.</p>
<p>Specifically, symplectic integrators seem to bound the error in the energy so that it doesn’t grow over time. Physicists would say that the error is secular. This is a useful property when studying physical systems.</p>
Wed, 11 Jan 2017 12:13:19 +0000
http://rnowling.github.io/math/2017/01/11/symplectic-integrators-bound-energy-error.html
http://rnowling.github.io/math/2017/01/11/symplectic-integrators-bound-energy-error.htmlmathmathLeapfrog is Symplectic for the Harmonic Oscillator<p>Microcanonical molecular dynamics describes the motion of molecules using the <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">Hamiltonian mechanics</a> framework. Hamiltonian dynamics are <a href="https://en.wikipedia.org/wiki/Symplectomorphism">symplectic</a>, meaning that they preserve volume in phase space. The symplectic property relates to properties we learned in first-semeter college physics such as conservation of energy.</p>
<p>The <a href="/math/2016/11/07/harmonic-oscillator.html">harmonic oscillator</a> is a simple symplectic model, useful for study. A plot of the path through phase space of our analytical derivation of the harmonic oscillator, demonstrates the symplectic property:</p>
<p><img src="/images/harmonic_oscillator/analytical_phase.png" alt="Harmonic Oscillator Phase Diagram" /></p>
<p>Symplectic integrators are important for bounding errors in the trajectories and resulting statistics such as transition rates<sup><a href="#reviews">1</a></sup>. <a href="http://www.cds.caltech.edu/~marsden/bib/1988/04-GeMa1988/GeMa1988.pdf">Ge and Mardsen</a> showed that if an integrator is symplectic, then the integrator can only conserve energy exactly if it computes the exact trajectory except for a reparameterization in time. Since the error in the energy is bounded, the error of statistics calculated from trajectories are bounded.</p>
<p>The bound would seem to contradict our results from the analyses of the <a href="/math/2016/11/13/leapfrog-local-error.html">local</a> and <a href="/math/2016/11/19/leapfrog-global-error.html">global truncation</a> errors. These analyses indicate that the errors in energy and other statistics computed from the trajectories would grow without bound. It turns out that symplectic integrators <strong>exactly</strong> simulate <em>shadow Hamiltonians</em>, which are perturbations of the original Hamiltonians. Thus, we can use the energy and other statistics from the shadow Hamiltonian as approximations to the values for the true Hamiltonian.</p>
<p>Additionally, the relationship between the bounds on the errors in the energy and the trajectories implies that the error in energy can be used as a measure of error for the trajectories. For example, unbounded inncreases or decreases in the energy from a simulation are indicative of an incorrect implementation of a symplectic integrator.</p>
<p>Unfortunately, the mathematical definition of the symplectic property and its relation to properties like the conservation of energy are expressed using advanced areas of math such as <a href="https://en.wikipedia.org/wiki/Differential_geometry">differential geometry</a>. Fortunately, it is much easier to prove that an integrator is symplectic than it is to state the definition of symplectiness.</p>
<h2 id="hamiltonian-dynamics">Hamiltonian Dynamics</h2>
<p>We’re going to start with a detour. Symplectiness is described using the language of Hamiltonians and flows, so we’ll describe some basics and show how Hamiltonians relate to Newton’s equations of motion.</p>
<p>The Hamiltonian is a function that takes the positions \(q\) and momenta \(p\). The form of the Hamiltonian commonly used in molecular dynamics is a linear combination of the kinetic and potential energies:</p>
\[H(q, p) = \frac{1}{2}p^T M^{-1} p + U(q)\]
<p>We can describe the dynamics of the Hamiltonian system using a pair of first-order differential equations:</p>
\[\frac{dq}{dt} = \frac{dH}{dp} = M^{-1} p \\
\frac{dp}{dt} = -\frac{dH}{dq} = -\nabla U(q) \\\]
<p>The system of two first-order ODEs can be rewritten as a single second-order ODE:</p>
\[\frac{d^2 q}{dt^2} = \frac{d}{dt} \frac{dq}{dt} \\
= \frac{d}{dt} M^{-1} p \\
= M^{-1} \frac{dp}{dt} \\
= M^{-1} (-\nabla U(q)) \\
= - M^{-1} \nabla U(q)\]
<p>By re-arranging the mass term, we get the form of Newton’s equations of motions we expect:</p>
\[M \frac{d^2 q}{dt^2} = - \nabla U(q)\]
<p>Thus, the Hamiltonian system is an equivalent description to Newton’s equations of motion.</p>
<h2 id="symplectic-maps--dynamical-systems">Symplectic Maps / Dynamical Systems</h2>
<p>We can define a map \(\phi\) that updates the state of the system over a length of time \(\Delta t\):</p>
\[(q_{i+1}, p_{i+1}) = \phi (q_i, p_i)\]
<p>Let \(\phi'\) be the <a href="https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant">Jacobian matrix</a> of \(\phi\):</p>
\[\phi' = \begin{pmatrix}
\frac{\partial \phi}{\partial q} & \frac{\partial \phi}{\partial p}
\end{pmatrix}\]
<p>The map \(\phi\) is symplectic if its Jacobian \(\phi'\) satifies:</p>
\[\phi'^T J \phi' = J\]
<p>where</p>
\[J = \begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix}\]
<h2 id="harmonic-oscillator-is-symplectic">Harmonic Oscillator is Symplectic</h2>
<p>We now have the basic tools for describing Hamiltonian systems and proving symplecticness. Let’s apply these tools. As a Hamiltonian system, the map for the harmonic oscillator is symplectic. To demonstrate the proof of symplectiness for map, we will start by validating the analytical map for the harmonic oscillator is symplectic. The Hamiltonian is defined as follows:</p>
\[H(q, p) = \frac{1}{2}p^T M^{-1} p + \frac{1}{2} M \omega^2 q^2\]
<p>The corresponding map for the system is given by</p>
\[\phi(q, p) = \begin{pmatrix}
\cos(\Delta t) q + \sin(\Delta t)p \\
-\sin(\Delta t) q + \cos(\Delta t)p
\end{pmatrix}\]
<p>with Jacobian</p>
\[\phi'(q, p) = \begin{pmatrix}
\frac{\partial \phi}{\partial q} & \frac{\partial \phi}{\partial p}
\end{pmatrix} \\
= \begin{pmatrix}
\cos (\Delta t) & \sin (\Delta t) \\
-\sin (\Delta t) & \cos (\Delta t)
\end{pmatrix}\]
<p>With a bit of arithmetic, we see that the map satisfies the criteria for symplectiness:</p>
\[\phi'^T J \phi' =
\begin{pmatrix}
\cos (\Delta t) & -\sin (\Delta t) \\
\sin (\Delta t) & \cos (\Delta t)
\end{pmatrix}
\begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix}
\begin{pmatrix}
\cos (\Delta t) & \sin (\Delta t) \\
-\sin (\Delta t) & \cos (\Delta t)
\end{pmatrix} \\
= \begin{pmatrix}
\sin (\Delta t) & \cos (\Delta t)\\
-\cos (\Delta t) & \sin (\Delta t)
\end{pmatrix}
\begin{pmatrix}
\cos (\Delta t) & \sin (\Delta t) \\
-\sin (\Delta t) & \cos (\Delta t)
\end{pmatrix} \\
= \begin{pmatrix}
\cos (\Delta t)\sin (\Delta t) - \cos (\Delta t)\sin (\Delta t) & \sin^2 (\Delta t) + \cos^2 (\Delta t) \\
-[\cos^2 (\Delta t) + \sin^2 (\Delta t)] & - \sin (\Delta t)\cos (\Delta t) + \cos (\Delta t)\sin (\Delta t)
\end{pmatrix} \\
= \begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix} \\
= J\]
<p>Thus, the analytical harmonic oscillator map satisfies the conditions for symplectiness as expected.</p>
<h2 id="leapfrog-for-harmonic-oscillator">Leapfrog for Harmonic Oscillator</h2>
<p>Next, we will prove that the Leapfrog method is symplectic for the harmonic oscillator system.</p>
<p>In our <a href="/math/2016/11/11/deriving-leapfrog.html">previous blog post</a>, we derived the Leapfrog integrator. We reproduce it here, with the substitutions \(x = q\) and \(v = M^{-1} p\) to be consistent with the notation used in this blog post. We’ll re-arrange the integrator into two equations, one for \(q(t + \Delta t)\) and another for \(p(t + \Delta t)\), so that we can form the flow equation \(\phi(q, p)\). We will then use the Jacobian \(\phi'\) to prove that the Leapfrog integrator is symplectic.</p>
\[F(t) = -\nabla U(q(t)) \\
q(t + \Delta t) = q(t) + M^{-1}p(t)\Delta t + \frac{1}{2}M^{-1}F(t)\Delta t^2 \\
F(t + \Delta t) = -\nabla U(q(t + \Delta t)) \\
p(t + \Delta t) = p(t) + \frac{1}{2} [F(t) + F(t + \Delta t)] \Delta t \\\]
<p>We substitute the potential</p>
\[U(q) = \frac{1}{2} M \omega^2 q^2\]
<p>into \(F(t)\) and \(F(t + \Delta t)\) of the integrator equations:</p>
\[F(t) = -M \omega^2 q(t) \\
q(t + \Delta t) = q(t) + M^{-1}p(t)\Delta t + \frac{1}{2}M^{-1} F(t)\Delta t^2 \\
F(t + \Delta t) = -M \omega^2 (q(t + \Delta t)) \\
p(t + \Delta t) = p(t) + \frac{1}{2} [F(t) + F(t + \Delta t)] \Delta t \\\]
<p>We substitute \(F(t)\) and \(F(t + \Delta t)\) into \(q(t + \Delta t)\) and \(p(t + \Delta t)\), reducing our system to two equations:</p>
\[q(t + \Delta t) = q(t) + M^{-1}p(t)\Delta t - \frac{1}{2}\omega^2 q(t) \Delta t^2 \\
p(t + \Delta t) = p(t) + \frac{1}{2} [-M \omega^2 q(t) - M \omega^2 q(t + \Delta t)] \Delta t\]
<p>We substitute \(q(t + \Delta t)\) into \(p(t + \Delta t)\) to get \(q(t + \Delta t)\) and \(p(t + \Delta t)\) in terms of \(q(t)\) and \(p(t)\) only:</p>
\[q(t + \Delta t) = q(t) + M^{-1}p(t)\Delta t - \frac{1}{2}\omega^2 q(t) \Delta t^2 \\
p(t + \Delta t) = p(t) + \frac{1}{2} [-M \omega^2 q(t) - M \omega^2 [q(t) + M^{-1}p(t)\Delta t - \frac{1}{2}\omega^2 q(t) \Delta t^2]] \Delta t\]
<p>Thus, we can form the flow \(\tilde{\phi}'(q, p)\) as</p>
\[\tilde{\phi}'(q, p) = \begin{pmatrix}
(1 - \frac{1}{2} \omega^2 \Delta t^2) q(t) + M^{-1}\Delta t \, p(t) \\
(- M \omega^2 \Delta t + \frac{1}{4} M \omega^4 \Delta t^3) q(t) + (1 - \frac{1}{2} \omega^2 \Delta t^2) p(t)
\end{pmatrix}\]
<p>with Jacobian</p>
\[\tilde{\phi}' =
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}\]
<p>where</p>
\[a = d = 1 - \frac{1}{2} \omega^2 \Delta t^2 \\
b = M^{-1} \Delta t \\
c = - M \omega^2 \Delta t + \frac{1}{4} M \omega^4 \Delta t^3\]
<p>Note that we denote the map as \(\tilde{\phi}(q, p)\) since it is an <strong>approximation</strong> to the true flow \(\phi(q, p)\). We then substitute the Jacobian \(\tilde{\phi}'(q, p)\) into the equation for the conditions of symplectiness and solve:</p>
\[\tilde{\phi}'^T J \tilde{\phi}' =
\begin{pmatrix}
a & c \\
b & d
\end{pmatrix}
\begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix}
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \\
= \begin{pmatrix}
-c & a \\
-d & b
\end{pmatrix}
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} \\
= \begin{pmatrix}
-ca + ac & -cb + ad \\
-da + bc & -db + bd
\end{pmatrix} \\
= \begin{pmatrix}
0 & -cb + ad \\
-da + bc & 0
\end{pmatrix} \\
= \begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix} \\
= J\]
<p>where</p>
\[-da + bc = -(1 - \frac{1}{2} \omega^2 \Delta t^2)(1 - \frac{1}{2} \omega^2 \Delta t^2) + M^{-1}\Delta t( - M \omega^2 \Delta t + \frac{1}{4} M \omega^4 \Delta t^3) \\
= -1 + \frac{1}{2} \omega^2 \Delta t^2 + \frac{1}{2} \omega^2 \Delta t^2 - \frac{1}{4} \omega^4 \Delta t^4 - \omega^2 \Delta t^2 + \frac{1}{4} \omega^4 \Delta t^4 \\
= -1\]
<p>and</p>
\[-cb + ad = -( - M \omega^2 \Delta t + \frac{1}{4} M \omega^4 \Delta t^3) M^{-1}\Delta t + (1 - \frac{1}{2} \omega^2 \Delta t^2)(1 - \frac{1}{2} \omega^2 \Delta t^2) \\
= \omega^2 \Delta t^2 - \frac{1}{4} \omega^4 \Delta t^4 + 1 - \frac{1}{2} \omega^2 \Delta t^2 - \frac{1}{2} \omega^2 \Delta t^2 + \frac{1}{4} \omega^4 \Delta t^4 \\
= 1\]
<p>Thus, the Leapfrog method is symplectic for the harmonic oscillator system.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this blog post, we covered the basics of symplectic Hamiltonians and maps. We described concepts and notation from Hamiltonian dynamics and showed how they relate to the second-order differential equations used for Newton’s equations of motion. We then discussed the conditions for proving a map is symplectic.</p>
<p>We then applied the framework to two problems. First, we demonstrated the approach by proving that the harmonic oscillator is symplectic. Next, we showed that the Leapfrog integrator, which is an <strong>approximate</strong> map, is symplectic for the harmonic oscillator.</p>
<p>We covered a lot of material, but we have even more to cover in the future. First and foremost, we want to prove that the Leapfrog method is symplectic for all Hamiltonians, or at least those of the form we use in molecular dynamics. We also want to dive into differential geometry and better understand the definition for symplectiness and the relationship with the conditions we expressed above.</p>
<p>We also want to better understand the implications of symplectiness for simulation. In particular, we mentioned that symplectiness guarantees a bound on the error in energies and other statistics computed from the resulting trajectories. We want to better understand this relationship and examine the proofs around the relationships.</p>
<p><a name="reviews">1</a>: I’ve used papers by <a href="http://scitation.aip.org/content/aapt/journal/ajp/73/10/10.1119/1.2034523">Donnelly and Rogers</a>, <a href="http://link.springer.com/chapter/10.1007/978-1-4612-4066-2_10">Leimkuhler, Reich, and Skeel</a>, <a href="https://doi.org/10.1017/S0962492900002282">Sanz-Serna</a>, <a href="http://bionum.cs.purdue.edu/Skee98b.pdf">Skeel</a>, and <a href="http://link.springer.com/chapter/10.1007/978-94-011-2030-2_3">Yoshida</a> as guides for this blog post.</p>
Wed, 14 Dec 2016 12:13:19 +0000
http://rnowling.github.io/math/2016/12/14/leapfrog-symplectic-harmonic-oscillator.html
http://rnowling.github.io/math/2016/12/14/leapfrog-symplectic-harmonic-oscillator.htmlmathmathVerifying Global Error of the Leapfrog Integrator<p>In my <a href="/math/2016/11/13/leapfrog-local-error.html">last blog post</a>, I described how to derive the local truncation, or per-step, error analytically. I then compared the analytical prediction to empirical results from the harmonic oscillator model I described in <a href="/math/2016/11/07/harmonic-oscillator.html">another previous blog post</a>. In this blog post, I’ll derive the <strong>global truncation error</strong>, or the error accumulated over all of the steps in a trajectory, and once again, compare to the error calculated from harmonic oscillator model.</p>
<p>In our last blog post, we found the local truncation error \(e_{t+\Delta}\) for step \(t + \Delta t\) is given by:</p>
\[e_{t+\Delta t} = \frac{1}{6} \Delta t^3 |x'''(t)|\]
<p>We can make the assumption that the errors accumulate linearly, meaning that the global truncation error over some time period \(\hat{t}\) is bounded by the sum of the local truncation errors of the \(N\) steps in the trajectory<sup><a href="#ode-textbook">1</a></sup>:</p>
\[|E_{\hat{t}}| \leq \sum_{i=1}^N \Delta t^3 \frac{|x'''(i \Delta t)|}{6}\]
<p>Further, we can assume that the local truncation error of each step is bounded by the maximum local truncation error (with time \(t^*\)) of all steps:</p>
\[\DeclareMathOperator*{\argmax}{arg\,max}
t^* = \argmax_{t \, \leq \, \hat{t}} |x'''(t)| \\
e_{i \Delta t} \leq e_{t^*} \text{ for all } 1 \leq i \leq N\]
<p>Thus, we can simplify our bound on the global truncation error to:</p>
\[|E_{\hat{t}}| \leq N\Delta t^3 \frac{|x'''(t^*)|}{6} \\\]
<p>We note that the time elapsed comes from taking \(N\) steps of length \(\Delta t\):</p>
\[\hat{t} = N \Delta t \\
N = \frac{\hat{t}}{\Delta t}\]
<p>By substituting for \(N\), we get our final expression of the bound:</p>
\[|E_{\hat{t}}| \leq \hat{t}\Delta t^2 \frac{|x'''(t^*)|}{6}\]
<h2 id="numerical-experiments">Numerical Experiments</h2>
<p>Using a harmonic oscillator model solved with the analytical and Leapfrog methods, we can demonstrate how to verify the per-step error. We note that the global truncation error is dependent on two variables: the timestep \(\Delta t\) and the elapsed time \(\hat{t}\). We will start by evaluating the scaling of \(E_{\hat{t}}\) with respect to \(\hat{t}\) by holding \(\Delta t\) steady.</p>
<p>Using the leapfrog method, we simulated the harmonic oscillator model over 100 seconds using a timestep of \(\Delta t = 0.01\) seconds. We also calculated the positions and velocities using the analytical model at each timestep. We then calculated the global truncation error by subtracting the leapfrog-calculated position and velocity for each step from the analytical position and velocity, respectively.</p>
<p>Since our analytical approach gives us an upperbound on the global truncation error, we found the maximum error for each timestep \(t\) as the maximum of the errors for that and every prior timestep. We then verified that the maximum error scales linearly with the elapsed time by fitting the following equation with linear regression to find values for \(m\) and \(b\):</p>
\[\max_{t \leq \hat{t}} |E_{\hat{t}}| = m\hat{t} + b\]
<p>The global truncation errors of the positions and velocites are plotted in the following graphs:</p>
<p><img src="/images/leapfrog_global_error/global_position_error.png" alt="Maximum Global Position Error" /></p>
<p><img src="/images/leapfrog_global_error/global_velocity_error.png" alt="Maximum Global Velocity Error" /></p>
<p>The cyan lines are the global truncation errors accumulated over time, while the black curves correspond to the lines fitted to the maximum global truncation errors as a function of elapsed time. We can see that the fitted lines approximate the maximums well. This is confirmed by \(r^2\) values of 0.999977182194 for the positions and 0.999976016476 for the velocities. Thus, we can be confident that the linear scaling of the global truncation error with respect to the elapsed predicted by our analysis holds empirically.</p>
<p>We notice that the global truncation error oscillates from step to step. In particular, the global truncation error seems to following a periodic function with an amplitude that grows linearly. For the harmonic oscillator model, we can actually calculate the global truncation error exactly.</p>
<p>First, we note that the third derivative of the analytical harmonic oscillator model is given by:</p>
\[x'''(t) = A m \omega^3 \sin(\omega t + \phi)\]
<p>Thus, the global truncation error for the harmonic oscillator model is given by:</p>
\[|E_{\hat{t}}| \leq \frac{\hat{t}\Delta t^2}{6} |A m \omega^3 \sin(\omega \hat{t} + \phi)|\]
<p>The \(\sin(\omega \hat{t} + \phi)\) term explains the periodic behavior we observed, while the \(\hat{t}\) coefficient explains the linear scaling of the amplitudes we observed.</p>
<p>Next, we empirically validated the order of the scaling of \(E_{\hat{t}}\) with respect to the time step \(\Delta t\). We ran simulations of the harmonic oscillator model for 100 seconds, but with a range of timesteps and number of steps. We found the maximum global truncation error for the positions and velocities from each simulation. The resulting data is reproduced here:</p>
<table>
<thead>
<tr>
<th style="text-align: center">Step size</th>
<th style="text-align: center">Steps</th>
<th style="text-align: center">Max Position Error</th>
<th style="text-align: center">Max Velocity Error</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">0.1</td>
<td style="text-align: center">1000</td>
<td style="text-align: center">0.02623153722</td>
<td style="text-align: center">11.94019129</td>
</tr>
<tr>
<td style="text-align: center">0.05</td>
<td style="text-align: center">2000</td>
<td style="text-align: center">1.925578275</td>
<td style="text-align: center">12.03171521</td>
</tr>
<tr>
<td style="text-align: center">0.025</td>
<td style="text-align: center">4000</td>
<td style="text-align: center">0.6346311748</td>
<td style="text-align: center">3.990773601</td>
</tr>
<tr>
<td style="text-align: center">0.01</td>
<td style="text-align: center">10000</td>
<td style="text-align: center">0.1030791043</td>
<td style="text-align: center">0.6491622412</td>
</tr>
<tr>
<td style="text-align: center">0.005</td>
<td style="text-align: center">20000</td>
<td style="text-align: center">0.02577397522</td>
<td style="text-align: center">0.1623304312</td>
</tr>
<tr>
<td style="text-align: center">0.0025</td>
<td style="text-align: center">40000</td>
<td style="text-align: center">0.006443626148</td>
<td style="text-align: center">0.04058671379</td>
</tr>
<tr>
<td style="text-align: center">0.001</td>
<td style="text-align: center">100000</td>
<td style="text-align: center">0.001030963096</td>
<td style="text-align: center">0.006493957821</td>
</tr>
</tbody>
</table>
<p>We then estimated the order of the error with respect to the timestep. Assume the max error is dominated by the time step to a power. We set up an equation to model that relationship and took the logarithm of both sides:</p>
\[|E_{\hat{t}}| \leq c \hat{t} \Delta t^n \\
\log |E_{\hat{t}}| = \log c \hat{t} \Delta t^n \\
\log |E_{\hat{t}}| = \log \Delta t^n + \log c \hat{t} \\
\log |E_{\hat{t}}| = n \log \Delta t + \log c \hat{t}\]
<p>where \(E_{\hat{t}}\) is the maximum accumulated error, \(\Delta t\) is the time step, \(n\) is the order, and \(c\) is a constant.</p>
<p>Using the log-log formulation, we then estimate \(n\) and \(c\) using linear regression on the table of time steps and average errors. Our initial results were off. When we plotted the log-log lines, we saw that the model fit well for timesteps \(\leq 0.025\) but not the two larger timesteps. We re-calculated the linear regression excluding the two largest timesteps.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Error Type</th>
<th style="text-align: center">Estimated Order</th>
<th style="text-align: center">\(r^2\)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">Position</td>
<td style="text-align: center">1.99599570395</td>
<td style="text-align: center">0.999996506915</td>
</tr>
<tr>
<td style="text-align: center">Velocity</td>
<td style="text-align: center">1.99553638001</td>
<td style="text-align: center">0.999995809409</td>
</tr>
</tbody>
</table>
<p>The global truncation errors for the positions and predictions from the log-log model are graphed below:</p>
<p><img src="/images/leapfrog_global_error/global_position_error_dt_truncated.png" alt="Global Truncation Error for Positions" /></p>
<p>We can see that the maximum global position and velocity truncation errors scale approximately quadratically with respect to the timestep, matching our expectation from the analytical analysis.</p>
<p>(All code for this blog post is available on <a href="https://github.com/rnowling/integrator-experiments">GitHub</a> under the Apache License v2.)</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post, we showed how to analytically derive an upperbound for the global truncation error<sup><a href="#integration-methods">2</a></sup>. We then validated that the maximum global truncation error scales linearly with the elapsed time and quadratically with the timestep using the harmonic oscillator model.</p>
<p>In our numerical experiments, we observed that the global truncation error oscillates periodically with the amplitude growing linearly. We then derived the exact upperbound of the global truncation error for the harmonic oscillator, which confirmed our observations.</p>
<p>Our numerical experiments also demonstrated that the analytical derivation of the upperbound on the global truncation error is not valid for large timesteps. The error for the larger timestep is probably affected by linear or nonlinear instabilities, which we will explore in a later blog post.</p>
<p><a name="ode-textbook">1</a>: I used chapter 8 of <a href="https://smile.amazon.com/Elementary-Differential-Equations-Boundary-Problems/dp/0470458313">Elementary Differential Equations and Boundary Value Problems</a> by Boyce and DiPrima as a reference for how to calculate the error.</p>
<p><a name="integration-methods">2</a>: Our derivation matches reported error analysis for the Leapfrog method in <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.4269&rep=rep1&type=pdf">Integration Methods for Molecular Dynamics</a> by Leimkuhler, Reich, and Skeel. The authors do not give their derivation, however.</p>
Sat, 19 Nov 2016 12:13:19 +0000
http://rnowling.github.io/math/2016/11/19/leapfrog-global-error.html
http://rnowling.github.io/math/2016/11/19/leapfrog-global-error.htmlmathmath