Meet Bud Mishra
Pathogens, Cancer, Terrorism . . Take a Walk on the Dark Side with NYU Tandon's Director of Bioinformatics
In a world that is changing at a prodigious rate, famously characterized by Moore’s Law which states that every eighteen months computers double in speed and capacity, one discipline stands out. Bud Mishra is the Director of the NYU Tandon Bioinformatics Program, and he contends that in biotechnology the rate of doubling is every five months, and has meticulously designed this program to address this staggering pace of change.
Holding professorships of Computer Science and Mathematics at NYU Courant Institute, NYU Tandon, and Cell Biology at the NYU School of Medicine, Dr. Mishra is uniquely positioned to have crafted this program.
In explaining his perspective on the Bioinformatics discipline, our conversation with Bud travelled some unexpected terrain, including his take on “Dog Fooding,” why it is that some of his colleagues call him Darth Vader, and how one uses 2000-year-old algorithmic principles to address a discipline that changes radically.
- 3 Credits Algorithms and Data Structures for Bioinformatics BI-GY7453
- The online course is aimed at introducing the foundational ideas from computer science in designing and implementing bioinformatics algorithms. The goal of the underlying algorithms and data structures is to accurately abstract and model the biological problems and to devise provably correct procedures with efficient computational complexity bounds. The algorithms will be described in pseudo-codes in order to simplify the correctness and complexity analysis, but with sufficient details to enable the students implement them in any suitable software pipelines and hardware architectures.
Prerequisites: MA-UY 2314
Learn About the Online Program
How do you view your role as Director of the Bioinformatics Program?
One of the things I did in addition to creating a board, creating a curriculum, creating syllabi, is to experience developing and teaching a course ab initio. I picked the Algorithms and Data Structures for Bioinformatics course, which is a traditional course in computer science, but it’s unusual to develop a course that specifically focuses on algorithms for Biology, Bio-Technology, and Bioinformatics. My experience was very useful in understanding what can be done better, what can be problematic. what students and other instructors may be going through, etc. It’s what people in Silicon Valley describe as “Dog Fooding.”
What do you mean by “Dog Fooding”?
The idea is that if you are a startup making dog food ... you should eat your dog food.
Please decribe the Algorithms and Data Structures for Bioinformatics course.
Algorithm means a step-by-step process for achieving a goal or completing a task. The word goes back to the mathematician Al-Khwarizmi, who lived around 800 A.D. and first described how to achieve a task in a step-by-step process. As biology has moved -- like many other fields -- to become a data intensive program, we need step by step interpretations of that data. There’s a lot of statistical analysis but there is also an algorithmic approach to combine this statistical inference in steps. We are now thinking about many disciplines, traditionally thought of as soft social sciences, in terms of algorithms. You want to create a viral meme, that’s a step by step process. You want to create a new drug, then that also involves a step by step process in biology, some steps involving statistical properties of enzymes, some involving genomes, some involving PCR. This course goes both ways. It describes computer science algorithms in the context of biological analysis but also it describes biology as an algorithm.
What topics in this course do you personally find most compelling?
One of the issues computer scientists face is that there are problems that you can solve, but cannot solve in a reasonable amount of time—the curse of intractability. A lot of the biological problems, specifically in genomics, are intractable. But if you change the problem or address it in another way or provide other data, they can become tractable. In this course, there are seven modules that go into understanding complexity, and how to go back and forth between thinking of algorithms in an algorithmic way and thinking about bio-technology through a computing lens, and combining them so that the intractable problems are efficiently solvable. One example is short read genome assembly, which is a central but intractable problem in biology, but in the context of another technology that we developed called nano-mapping, it becomes tractable. And that has a lot of implications. For example, finding structural variants in cancer— it has real potential to result in breakthroughs but (and) a lot of it is coming from purely algorithmic thinking.
What’s cutting edge about your course?
Well, as I described the core of the course is thousands of years old, so everything we are doing was more or less known to Euclid over 2,000 years ago. So that’s not cutting edge, but algorithms are at the cutting edge in lots of different fields. Both computer science and biotechnology follow Moore’s Law— for example, every certain amount of period, say, eighteen months, computers become twice as fast and better and cheaper. Biotechnology is also doing that, only faster. Thanks to Moore’s Law we are beginning to address lots of new biological questions, there are tons of problems on the cutting edge. I am covering, for example, crispr technology to do gene editing, new biotechs leading to new ways of collecting data, applying algorithms to model regulatory networks to study model animals or patients, or by building organoids, etc. We can’t anticipate all the problems that are coming up, but we know that they’ll be coming faster and faster, and we can give our students the tools and techniques on how to think about any problem as new things come in.
What skills or knowledge do students develop by taking Algorithms and Data Structures for Bioinformatics? How will their experience, skills, and knowledge from your course translate in the real world?
One of the things we ask students to do in this course is a class project. They pick a problem at the very beginning of the semester. By midterm, they write an abstract and then spend the rest of the time covering the subject. One student focused on genome assembly--he wrote an assembler from scratch but also assembled the genome for an organism that was a pathogen but in the process of becoming a symbiont. These are the kinds of things students learn by doing it themselves. Also, we have Webex where we have discussions about what is going on and how algorithms are making an important contribution to that. One example might be to try to understand Y chromosomes. The Y chromosome is the smallest human chromosome, only men have it. But because it doesn’t recombine with anything, it exhibits very complicated patterns. It makes the algorithmic problem very, very hard. And yet, the biology of why Y chromosome has become so complex becomes an interesting issue—for both biology and computing, going back and forth.
How does a bioinformatic scientist fit into a team in the workplace?
A lot of algorithmic work is very close to mathematical thinking and it is pretty much lone-wolf-work. It does not really involve teamwork. It’s sort of like solving puzzles. But then you need to explain it to others. You need to, for example, test it out, profile it, understand complexity and that's called prototyping. We have a course that shows how to take algorithms into Python and that’s how you will explain it to the team. Why you did something a certain way. And then you work with your team to scale it, maybe put it in the cloud, maybe scale it to thousands of processors as would be necessary in a disease biomarker detection.
What do you think students enjoy and find most meaningful about the course?
We usually have two kinds of students. Some are biologists who have never taken algorithms before. They clearly enjoy algorithms for the same reason we all love algorithms and solving puzzles. But on the other side we have computer scientists who are familiar with algorithms, and what they like about this course is to see a new field where the algorithmic thinking, the step by step computational thinking, becomes as important as it has been in thinking about computer science. Understanding how, for example, the FBI designs forensics, how they can use genealogical data to find a culprit. This was a class project. They begin to see new applications that they haven’t thought about. So everybody in the class gets something that they like.
What do you think are some of the most urgent real world problems addressed in NYU Tandon’s Bioinformatics program?
We have a Translational track that directly relates to biomedical applications. One of the courses in the track is Translational Genomics and focuses on cancer genomics. As we live longer, it is speculated that one in three of us will die of cancer. Now that we understand that cancer is a disease of the genome, there are companies that routinely take tumor biopsies and apply genomics analysis to it. So genomics has become a big component of drug design in cancer—more algorithmic thinking and bioinformatics thinking is going into solving cancer. There is a Population Genomics course that tries to understand how to relate disease to variability in genomics. I’m sure we’ll study neuro genetics and diseases like dementia, autism, bipolar disorder using genomics algorithms. So it’s showing up in many, many places. One of the goals that I like to think about is a very old goal, going back to Buddha--how to end human suffering. There is always some crisis waiting to happen: we’ll probably run out of food in places like Africa in the next twenty to thirty years from global warming, and loss of water. Creating genetically modified organisms, new sources of food, etc. are also critical components that our students need to understand and address. There are new diseases, new viral epidemics, like ebola, there are all sorts of things that can come anytime and we’ll need to be able to address them very quickly.
Of all your professional research interests, which do you feel especially strong about? How does that come through in your teaching?
Somebody in a meeting described me as Darth Vader, the man who has gone to the dark side, as someone who is interested in the bad things that can happen. Hardware bugs, software problems, pathogens, cancer, terrorisms, financial crashes, deception, security failure, etc. I don’t know why but I’m attracted to problems that have a dark side to it. It’s surprising that they come up so often. So a lot of it is being able to understand the world, understand data, understand how to predict when these bad things will happen and the second part is how to interpret the causal connections. How to know how to fix a hardware bug, how to make sure that the smart contracts on your block chain are valid, how to make a financial transaction without creating a flash crash. All of this is essentially the same problem. They’re algorithmic but they also involve designing interventions. Welcome to the dark side.