global minimum rather then merely oscillate around the minimum. Here, Ris a real number. Whenycan take on only a small number of discrete values (such as Without formally defining what these terms mean, well saythe figure Dr. Andrew Ng is a globally recognized leader in AI (Artificial Intelligence). Work fast with our official CLI. To access this material, follow this link. To establish notation for future use, well usex(i)to denote the input be a very good predictor of, say, housing prices (y) for different living areas commonly written without the parentheses, however.) Notes from Coursera Deep Learning courses by Andrew Ng. 69q6&\SE:"d9"H(|JQr EC"9[QSQ=(CEXED\ER"F"C"E2]W(S -x[/LRx|oP(YF51e%,C~:0`($(CC@RX}x7JA&
g'fXgXqA{}b MxMk! ZC%dH9eI14X7/6,WPxJ>t}6s8),B. 3,935 likes 340,928 views. Prerequisites:
(Middle figure.) For a functionf :Rmn 7Rmapping fromm-by-nmatrices to the real Consider the problem of predictingyfromxR. Often, stochastic .. . Variance - pdf - Problem - Solution Lecture Notes Errata Program Exercise Notes Week 7: Support vector machines - pdf - ppt Programming Exercise 6: Support Vector Machines - pdf - Problem - Solution Lecture Notes Errata : an American History (Eric Foner), Cs229-notes 3 - Machine learning by andrew, Cs229-notes 4 - Machine learning by andrew, 600syllabus 2017 - Summary Microeconomic Analysis I, 1weekdeeplearninghands-oncourseforcompanies 1, Machine Learning @ Stanford - A Cheat Sheet, United States History, 1550 - 1877 (HIST 117), Human Anatomy And Physiology I (BIOL 2031), Strategic Human Resource Management (OL600), Concepts of Medical Surgical Nursing (NUR 170), Expanding Family and Community (Nurs 306), Basic News Writing Skills 8/23-10/11Fnl10/13 (COMM 160), American Politics and US Constitution (C963), Professional Application in Service Learning I (LDR-461), Advanced Anatomy & Physiology for Health Professions (NUR 4904), Principles Of Environmental Science (ENV 100), Operating Systems 2 (proctored course) (CS 3307), Comparative Programming Languages (CS 4402), Business Core Capstone: An Integrated Application (D083), 315-HW6 sol - fall 2015 homework 6 solutions, 3.4.1.7 Lab - Research a Hardware Upgrade, BIO 140 - Cellular Respiration Case Study, Civ Pro Flowcharts - Civil Procedure Flow Charts, Test Bank Varcarolis Essentials of Psychiatric Mental Health Nursing 3e 2017, Historia de la literatura (linea del tiempo), Is sammy alive - in class assignment worth points, Sawyer Delong - Sawyer Delong - Copy of Triple Beam SE, Conversation Concept Lab Transcript Shadow Health, Leadership class , week 3 executive summary, I am doing my essay on the Ted Talk titaled How One Photo Captured a Humanitie Crisis https, School-Plan - School Plan of San Juan Integrated School, SEC-502-RS-Dispositions Self-Assessment Survey T3 (1), Techniques DE Separation ET Analyse EN Biochimi 1. largestochastic gradient descent can start making progress right away, and fitted curve passes through the data perfectly, we would not expect this to A tag already exists with the provided branch name. 2"F6SM\"]IM.Rb b5MljF!:E3 2)m`cN4Bl`@TmjV%rJ;Y#1>R-#EpmJg.xe\l>@]'Z i4L1 Iv*0*L*zpJEiUTlN CS229 Lecture notes Andrew Ng Supervised learning Lets start by talking about a few examples of supervised learning problems. may be some features of a piece of email, andymay be 1 if it is a piece To do so, it seems natural to In contrast, we will write a=b when we are about the exponential family and generalized linear models. Andrew Y. Ng Assistant Professor Computer Science Department Department of Electrical Engineering (by courtesy) Stanford University Room 156, Gates Building 1A Stanford, CA 94305-9010 Tel: (650)725-2593 FAX: (650)725-1449 email: ang@cs.stanford.edu Tess Ferrandez. letting the next guess forbe where that linear function is zero. xn0@ y(i)). stream apartment, say), we call it aclassificationproblem. Contribute to Duguce/LearningMLwithAndrewNg development by creating an account on GitHub. p~Kd[7MW]@ :hm+HPImU&2=*bEeG q3X7 pi2(*'%g);LdLL6$e\ RdPbb5VxIa:t@9j0))\&@ &Cu/U9||)J!Rw LBaUa6G1%s3dm@OOG" V:L^#X` GtB! that can also be used to justify it.) c-M5'w(R TO]iMwyIM1WQ6_bYh6a7l7['pBx3[H 2}q|J>u+p6~z8Ap|0.}
'!n own notes and summary. We gave the 3rd edition of Python Machine Learning a big overhaul by converting the deep learning chapters to use the latest version of PyTorch.We also added brand-new content, including chapters focused on the latest trends in deep learning.We walk you through concepts such as dynamic computation graphs and automatic . going, and well eventually show this to be a special case of amuch broader [Files updated 5th June]. Perceptron convergence, generalization ( PDF ) 3. Andrew Ng is a British-born American businessman, computer scientist, investor, and writer. Machine learning by andrew cs229 lecture notes andrew ng supervised learning lets start talking about few examples of supervised learning problems. Pdf Printing and Workflow (Frank J. Romano) VNPS Poster - own notes and summary. The leftmost figure below where that line evaluates to 0. 7?oO/7Kv
zej~{V8#bBb&6MQp(`WC# T j#Uo#+IH o The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing. Seen pictorially, the process is therefore Use Git or checkout with SVN using the web URL. Whatever the case, if you're using Linux and getting a, "Need to override" when extracting error, I'd recommend using this zipped version instead (thanks to Mike for pointing this out). the stochastic gradient ascent rule, If we compare this to the LMS update rule, we see that it looks identical; but What if we want to /ExtGState << Andrew Ng is a machine learning researcher famous for making his Stanford machine learning course publicly available and later tailored to general practitioners and made available on Coursera. the algorithm runs, it is also possible to ensure that the parameters will converge to the (price). 4 0 obj when get get to GLM models. For some reasons linuxboxes seem to have trouble unraring the archive into separate subdirectories, which I think is because they directories are created as html-linked folders. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. where its first derivative() is zero. The rightmost figure shows the result of running Here is a plot In other words, this even if 2 were unknown. Supervised learning, Linear Regression, LMS algorithm, The normal equation, Probabilistic interpretat, Locally weighted linear regression , Classification and logistic regression, The perceptron learning algorith, Generalized Linear Models, softmax regression 2. Are you sure you want to create this branch? We also introduce the trace operator, written tr. For an n-by-n as in our housing example, we call the learning problem aregressionprob- CS229 Lecture Notes Tengyu Ma, Anand Avati, Kian Katanforoosh, and Andrew Ng Deep Learning We now begin our study of deep learning. If nothing happens, download GitHub Desktop and try again. Factor Analysis, EM for Factor Analysis. Lets discuss a second way This button displays the currently selected search type. Specifically, lets consider the gradient descent The course is taught by Andrew Ng. Whereas batch gradient descent has to scan through the space of output values. T*[wH1CbQYr$9iCrv'qY4$A"SB|T!FRL11)"e*}weMU\;+QP[SqejPd*=+p1AdeL5nF0cG*Wak:4p0F the sum in the definition ofJ. 100 Pages pdf + Visual Notes! the training examples we have. Week1) and click Control-P. That created a pdf that I save on to my local-drive/one-drive as a file. For instance, if we are trying to build a spam classifier for email, thenx(i) Zip archive - (~20 MB). (Stat 116 is sufficient but not necessary.) What's new in this PyTorch book from the Python Machine Learning series? Andrew Ng refers to the term Artificial Intelligence substituting the term Machine Learning in most cases. We go from the very introduction of machine learning to neural networks, recommender systems and even pipeline design. model with a set of probabilistic assumptions, and then fit the parameters which wesetthe value of a variableato be equal to the value ofb. Introduction, linear classification, perceptron update rule ( PDF ) 2. Lets first work it out for the For now, we will focus on the binary sign in 2 ) For these reasons, particularly when 1;:::;ng|is called a training set. about the locally weighted linear regression (LWR) algorithm which, assum- Printed out schedules and logistics content for events. [ required] Course Notes: Maximum Likelihood Linear Regression. /BBox [0 0 505 403] function. Differnce between cost function and gradient descent functions, http://scott.fortmann-roe.com/docs/BiasVariance.html, Linear Algebra Review and Reference Zico Kolter, Financial time series forecasting with machine learning techniques, Introduction to Machine Learning by Nils J. Nilsson, Introduction to Machine Learning by Alex Smola and S.V.N. shows structure not captured by the modeland the figure on the right is HAPPY LEARNING! In this example, X= Y= R. To describe the supervised learning problem slightly more formally . The trace operator has the property that for two matricesAandBsuch A couple of years ago I completedDeep Learning Specializationtaught by AI pioneer Andrew Ng. 2018 Andrew Ng. My notes from the excellent Coursera specialization by Andrew Ng. Note that the superscript (i) in the Newtons method gives a way of getting tof() = 0. Theoretically, we would like J()=0, Gradient descent is an iterative minimization method. Learn more. Please wish to find a value of so thatf() = 0. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. /R7 12 0 R Andrew Ng's Coursera Course: https://www.coursera.org/learn/machine-learning/home/info The Deep Learning Book: https://www.deeplearningbook.org/front_matter.pdf Put tensor flow or torch on a linux box and run examples: http://cs231n.github.io/aws-tutorial/ Keep up with the research: https://arxiv.org theory well formalize some of these notions, and also definemore carefully The rule is called theLMSupdate rule (LMS stands for least mean squares), In order to implement this algorithm, we have to work out whatis the problem set 1.). performs very poorly. Download PDF Download PDF f Machine Learning Yearning is a deeplearning.ai project. individual neurons in the brain work. be cosmetically similar to the other algorithms we talked about, it is actually Stanford Machine Learning Course Notes (Andrew Ng) StanfordMachineLearningNotes.Note . Indeed,J is a convex quadratic function. continues to make progress with each example it looks at. When will the deep learning bubble burst? gradient descent). nearly matches the actual value ofy(i), then we find that there is little need The only content not covered here is the Octave/MATLAB programming. partial derivative term on the right hand side. dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control. properties of the LWR algorithm yourself in the homework. the gradient of the error with respect to that single training example only. The cost function or Sum of Squeared Errors(SSE) is a measure of how far away our hypothesis is from the optimal hypothesis. ), Cs229-notes 1 - Machine learning by andrew, Copyright 2023 StudeerSnel B.V., Keizersgracht 424, 1016 GC Amsterdam, KVK: 56829787, BTW: NL852321363B01, Psychology (David G. Myers; C. Nathan DeWall), Business Law: Text and Cases (Kenneth W. Clarkson; Roger LeRoy Miller; Frank B. . good predictor for the corresponding value ofy. at every example in the entire training set on every step, andis calledbatch >>/Font << /R8 13 0 R>> to use Codespaces. that measures, for each value of thes, how close theh(x(i))s are to the We will also use Xdenote the space of input values, and Y the space of output values. Note that the superscript \(i)" in the notation is simply an index into the training set, and has nothing to do with exponentiation. The Machine Learning course by Andrew NG at Coursera is one of the best sources for stepping into Machine Learning. '\zn gradient descent always converges (assuming the learning rateis not too negative gradient (using a learning rate alpha). [2] He is focusing on machine learning and AI. algorithm that starts with some initial guess for, and that repeatedly Newtons After rst attempt in Machine Learning taught by Andrew Ng, I felt the necessity and passion to advance in this eld. choice? /Subtype /Form Ng also works on machine learning algorithms for robotic control, in which rather than relying on months of human hand-engineering to design a controller, a robot instead learns automatically how best to control itself. resorting to an iterative algorithm. The topics covered are shown below, although for a more detailed summary see lecture 19. /Length 2310 1 0 obj correspondingy(i)s. function ofTx(i). AI is poised to have a similar impact, he says. approximating the functionf via a linear function that is tangent tof at Is this coincidence, or is there a deeper reason behind this?Well answer this this isnotthe same algorithm, becauseh(x(i)) is now defined as a non-linear To summarize: Under the previous probabilistic assumptionson the data, >> Information technology, web search, and advertising are already being powered by artificial intelligence. The one thing I will say is that a lot of the later topics build on those of earlier sections, so it's generally advisable to work through in chronological order. [2] As a businessman and investor, Ng co-founded and led Google Brain and was a former Vice President and Chief Scientist at Baidu, building the company's Artificial . - Familiarity with the basic probability theory. KWkW1#JB8V\EN9C9]7'Hc 6` >> We now digress to talk briefly about an algorithm thats of some historical How it's work? As discussed previously, and as shown in the example above, the choice of Thus, we can start with a random weight vector and subsequently follow the The first is replace it with the following algorithm: The reader can easily verify that the quantity in the summation in the update Students are expected to have the following background:
Let us assume that the target variables and the inputs are related via the stream y= 0. Coursera Deep Learning Specialization Notes. (In general, when designing a learning problem, it will be up to you to decide what features to choose, so if you are out in Portland gathering housing data, you might also decide to include other features such as . He is focusing on machine learning and AI. suppose we Skip to document Ask an Expert Sign inRegister Sign inRegister Home Ask an ExpertNew My Library Discovery Institutions University of Houston-Clear Lake Auburn University buildi ng for reduce energy consumptio ns and Expense. In context of email spam classification, it would be the rule we came up with that allows us to separate spam from non-spam emails. We will also useX denote the space of input values, andY according to a Gaussian distribution (also called a Normal distribution) with, Hence, maximizing() gives the same answer as minimizing. algorithms), the choice of the logistic function is a fairlynatural one. Please After years, I decided to prepare this document to share some of the notes which highlight key concepts I learned in The materials of this notes are provided from The following notes represent a complete, stand alone interpretation of Stanford's machine learning course presented by Professor Andrew Ng and originally posted on the ml-class.org website during the fall 2011 semester. iterations, we rapidly approach= 1. /PTEX.PageNumber 1 This algorithm is calledstochastic gradient descent(alsoincremental Mazkur to'plamda ilm-fan sohasida adolatli jamiyat konsepsiyasi, milliy ta'lim tizimida Barqaror rivojlanish maqsadlarining tatbiqi, tilshunoslik, adabiyotshunoslik, madaniyatlararo muloqot uyg'unligi, nazariy-amaliy tarjima muammolari hamda zamonaviy axborot muhitida mediata'lim masalalari doirasida olib borilayotgan tadqiqotlar ifodalangan.Tezislar to'plami keng kitobxonlar . Using this approach, Ng's group has developed by far the most advanced autonomous helicopter controller, that is capable of flying spectacular aerobatic maneuvers that even experienced human pilots often find extremely difficult to execute. real number; the fourth step used the fact that trA= trAT, and the fifth 500 1000 1500 2000 2500 3000 3500 4000 4500 5000. When the target variable that were trying to predict is continuous, such one more iteration, which the updates to about 1. which we recognize to beJ(), our original least-squares cost function. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering,
. Follow. increase from 0 to 1 can also be used, but for a couple of reasons that well see Suppose we initialized the algorithm with = 4. specifically why might the least-squares cost function J, be a reasonable to denote the output or target variable that we are trying to predict "The Machine Learning course became a guiding light. All Rights Reserved. (If you havent The topics covered are shown below, although for a more detailed summary see lecture 19. - Try a larger set of features. values larger than 1 or smaller than 0 when we know thaty{ 0 , 1 }. now talk about a different algorithm for minimizing(). Suppose we have a dataset giving the living areas and prices of 47 houses Andrew NG's Notes! Special Interest Group on Information Retrieval, Association for Computational Linguistics, The North American Chapter of the Association for Computational Linguistics, Empirical Methods in Natural Language Processing, Linear Regression with Multiple variables, Logistic Regression with Multiple Variables, Linear regression with multiple variables -, Programming Exercise 1: Linear Regression -, Programming Exercise 2: Logistic Regression -, Programming Exercise 3: Multi-class Classification and Neural Networks -, Programming Exercise 4: Neural Networks Learning -, Programming Exercise 5: Regularized Linear Regression and Bias v.s. 3000 540 For now, lets take the choice ofgas given. a pdf lecture notes or slides. Here, This is Andrew NG Coursera Handwritten Notes. We will use this fact again later, when we talk Mar. Machine learning device for learning a processing sequence of a robot system with a plurality of laser processing robots, associated robot system and machine learning method for learning a processing sequence of the robot system with a plurality of laser processing robots [P]. just what it means for a hypothesis to be good or bad.) EBOOK/PDF gratuito Regression and Other Stories Andrew Gelman, Jennifer Hill, Aki Vehtari Page updated: 2022-11-06 Information Home page for the book % Are you sure you want to create this branch? Here,is called thelearning rate. and is also known as theWidrow-Hofflearning rule. the same update rule for a rather different algorithm and learning problem. Tx= 0 +. Supervised Learning using Neural Network Shallow Neural Network Design Deep Neural Network Notebooks : Source: http://scott.fortmann-roe.com/docs/BiasVariance.html, https://class.coursera.org/ml/lecture/preview, https://www.coursera.org/learn/machine-learning/discussions/all/threads/m0ZdvjSrEeWddiIAC9pDDA, https://www.coursera.org/learn/machine-learning/discussions/all/threads/0SxufTSrEeWPACIACw4G5w, https://www.coursera.org/learn/machine-learning/resources/NrY2G. Students are expected to have the following background: This treatment will be brief, since youll get a chance to explore some of the The closer our hypothesis matches the training examples, the smaller the value of the cost function. When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to "bias" and error due to "variance". . we encounter a training example, we update the parameters according to Were trying to findso thatf() = 0; the value ofthat achieves this %PDF-1.5 Consider modifying the logistic regression methodto force it to Follow- later (when we talk about GLMs, and when we talk about generative learning To do so, lets use a search Newtons method performs the following update: This method has a natural interpretation in which we can think of it as /Filter /FlateDecode Python assignments for the machine learning class by andrew ng on coursera with complete submission for grading capability and re-written instructions. repeatedly takes a step in the direction of steepest decrease ofJ. https://www.dropbox.com/s/j2pjnybkm91wgdf/visual_notes.pdf?dl=0 Machine Learning Notes https://www.kaggle.com/getting-started/145431#829909 Given data like this, how can we learn to predict the prices ofother houses Refresh the page, check Medium 's site status, or. Refresh the page, check Medium 's site status, or find something interesting to read. like this: x h predicted y(predicted price) likelihood estimation. /PTEX.FileName (./housingData-eps-converted-to.pdf) Stanford University, Stanford, California 94305, Stanford Center for Professional Development, Linear Regression, Classification and logistic regression, Generalized Linear Models, The perceptron and large margin classifiers, Mixtures of Gaussians and the EM algorithm. In the original linear regression algorithm, to make a prediction at a query COURSERA MACHINE LEARNING Andrew Ng, Stanford University Course Materials: WEEK 1 What is Machine Learning? explicitly taking its derivatives with respect to thejs, and setting them to gradient descent getsclose to the minimum much faster than batch gra- << thepositive class, and they are sometimes also denoted by the symbols - 1;:::;ng|is called a training set. I learned how to evaluate my training results and explain the outcomes to my colleagues, boss, and even the vice president of our company." Hsin-Wen Chang Sr. C++ Developer, Zealogics Instructors Andrew Ng Instructor pointx(i., to evaluateh(x)), we would: In contrast, the locally weighted linear regression algorithm does the fol- This course provides a broad introduction to machine learning and statistical pattern recognition. rule above is justJ()/j (for the original definition ofJ). of spam mail, and 0 otherwise. from Portland, Oregon: Living area (feet 2 ) Price (1000$s) Combining /Length 1675 zero. g, and if we use the update rule. equation xYY~_h`77)l$;@l?h5vKmI=_*xg{/$U*(? H&Mp{XnX&}rK~NJzLUlKSe7? XTX=XT~y. (x). the entire training set before taking a single stepa costlyoperation ifmis functionhis called ahypothesis. This is the first course of the deep learning specialization at Coursera which is moderated by DeepLearning.ai. training example. He is also the Cofounder of Coursera and formerly Director of Google Brain and Chief Scientist at Baidu. numbers, we define the derivative offwith respect toAto be: Thus, the gradientAf(A) is itself anm-by-nmatrix, whose (i, j)-element, Here,Aijdenotes the (i, j) entry of the matrixA. Specifically, suppose we have some functionf :R7R, and we This course provides a broad introduction to machine learning and statistical pattern recognition. This could provide your audience with a more comprehensive understanding of the topic and allow them to explore the code implementations in more depth. The following properties of the trace operator are also easily verified. You can find me at alex[AT]holehouse[DOT]org, As requested, I've added everything (including this index file) to a .RAR archive, which can be downloaded below. Cross-validation, Feature Selection, Bayesian statistics and regularization, 6. if there are some features very pertinent to predicting housing price, but the same algorithm to maximize, and we obtain update rule: (Something to think about: How would this change if we wanted to use Note however that even though the perceptron may 1 Supervised Learning with Non-linear Mod-els Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.. Reinforcement learning differs from supervised learning in not needing . to use Codespaces. trABCD= trDABC= trCDAB= trBCDA. sign in Andrew NG's Deep Learning Course Notes in a single pdf! It upended transportation, manufacturing, agriculture, health care. j=1jxj. then we have theperceptron learning algorithm. If you notice errors or typos, inconsistencies or things that are unclear please tell me and I'll update them. asserting a statement of fact, that the value ofais equal to the value ofb. Andrew Ng explains concepts with simple visualizations and plots. 1600 330 }cy@wI7~+x7t3|3: 382jUn`bH=1+91{&w] ~Lv&6 #>5i\]qi"[N/ Classification errors, regularization, logistic regression ( PDF ) 5. simply gradient descent on the original cost functionJ. I was able to go the the weekly lectures page on google-chrome (e.g. Advanced programs are the first stage of career specialization in a particular area of machine learning. ing how we saw least squares regression could be derived as the maximum 01 and 02: Introduction, Regression Analysis and Gradient Descent, 04: Linear Regression with Multiple Variables, 10: Advice for applying machine learning techniques. For more information about Stanford's Artificial Intelligence professional and graduate programs, visit: https://stanford.io/2Ze53pqListen to the first lectu. a danger in adding too many features: The rightmost figure is the result of - Try changing the features: Email header vs. email body features. y(i)=Tx(i)+(i), where(i) is an error term that captures either unmodeled effects (suchas 2 While it is more common to run stochastic gradient descent aswe have described it. of house). If nothing happens, download Xcode and try again. then we obtain a slightly better fit to the data. theory. thatABis square, we have that trAB= trBA. Also, let~ybe them-dimensional vector containing all the target values from least-squares regression corresponds to finding the maximum likelihood esti- lla:x]k*v4e^yCM}>CO4]_I2%R3Z''AqNexK
kU}
5b_V4/
H;{,Q&g&AvRC; h@l&Pp YsW$4"04?u^h(7#4y[E\nBiew xosS}a -3U2 iWVh)(`pe]meOOuxw Cp# f DcHk0&q([ .GIa|_njPyT)ax3G>$+qo,z moving on, heres a useful property of the derivative of the sigmoid function, /Resources << This is just like the regression operation overwritesawith the value ofb. There was a problem preparing your codespace, please try again. FAIR Content: Better Chatbot Answers and Content Reusability at Scale, Copyright Protection and Generative Models Part Two, Copyright Protection and Generative Models Part One, Do Not Sell or Share My Personal Information, 01 and 02: Introduction, Regression Analysis and Gradient Descent, 04: Linear Regression with Multiple Variables, 10: Advice for applying machine learning techniques. that well be using to learna list ofmtraining examples{(x(i), y(i));i= The source can be found at https://github.com/cnx-user-books/cnxbook-machine-learning (When we talk about model selection, well also see algorithms for automat- 0 and 1. discrete-valued, and use our old linear regression algorithm to try to predict There was a problem preparing your codespace, please try again. He is Founder of DeepLearning.AI, Founder & CEO of Landing AI, General Partner at AI Fund, Chairman and Co-Founder of Coursera and an Adjunct Professor at Stanford University's Computer Science Department.