ODSC India 2019 Warm-Up: Data Science Kick Starter
Kavita D. Chiplunkar
Data Science Head, Infinite-Sum Modelling Inc.
Founder of OnPoint Insights
Building a Scorecard using Python
This webinar will tell you the importance of Credit Scorecards in Banking /Financial Institutions , how they are used to measure the credit worthiness of a customer and how Machine Learning Algorithms are helping built better scorecards than traditional algorithms.We plan to briefly discuss the key data elements that would be required to build such scorecards.We will talk at high level about various steps in building a scorecard .We will also share a brief snapshot of what to expect out of our session at ODSC and how this session can benefit Data Science Enthusiasts and Banking professionals.
Kavita D. Chiplunkar
Kavita is an Analytics leader with 12 + years of core hands on experience having an excellent track record on Presales, Partner Management, Analytics Delivery and Team management across domains in World Class Organizations. Currently, she is heading the Data Science function at Infinite Sum Modeling. She is a Chemical Engineer by education followed by a Masters (Eco) from IGIDR. She is a seasoned analytics professional with work experiences across companies like Fair Isaac, Experian, Accenture, Infosys and Vodafone. Her vast experience in domains like Banking, Insurance, Telecom, Fraud and Risk Management give her the right kind of diversification. She has published papers in areas of Financial Econometrics and Social Media Analytics. She has been an esteemed speaker at various national seminars on Analytics.
Nirav Shah is the Founder of OnPoint Insights, a data analytics, software services and staff augmentation consultancy based in Boston. He has 15 years of industry experience – mainly in consulting on data analytics, big data modeling, control systems, process analytics and software tools, off-line and real-time data solutions, and training customers in data analytics,dashboards and data visualization. He is an expert in Dashboards and Visualization using Tableau and other Multivariate Data Analytics software.
Director & Co-Founder of Zentropy Technologies
Gurram Poorna Prudhvi
Machine Learning Engineer at mroads
Time Series analysis in Python
Time series analysis has been around for centuries helping us to solve from astronomical problems to business problems and advanced scientific research around us now. Time stores precious information, which most machine learning algorithms don’t deal with. But time series analysis, which is a mix of machine learning and statistics helps us to get useful insights. Time series can be applied to various fields like economy forecasting, budgetary analysis, sales forecasting, census analysis and much more. In this workshop, We will look at how to dive deep into time series data and make use of deep learning to make accurate predictions.
Co-Founder, Director & Head of Research & Development at Zentropy Technologies. Before finding Zentropy, Ram worked with a leading hedge fund as a Project Manager responsible for building tools and technologies required by the middle and the back office. He was instrumental in delivering some of the most mission-critical strategic projects that helped in the overall business of the firm.
Gurram Poorna Prudhvi
Prudhvi is working as a machine learning engineer at mroads. He is interested in NLP research, Opensource, Public Speaking, and Python. In his free time he explores and tries to understand different dimensions of life. He is also a core team member of Hyderabad Python Community.
ODSC West 2019 Warm-Up: Machine Learning
Data Scientist at Coursera
Causal Inference & Machine Learning
Lots of data science problems, especially towards informing business and product strategy, involve understanding causal relationships. The standard way to measure these is through AB testing, but many times that is infeasible, requiring alternative techniques from causal inference that are an essential component of any data scientist’s toolkit. The talk will walk through these techniques, some applications, and recent work at the intersection of causal inference and machine learning to handle large data sets.
Vinod Bakthavachalam is a data scientist working with the Content Strategy and Enterprise teams where his work has recently focused on understanding the skills landscape around the world using Coursera data (see the Global Skills Index Coursera recently published for some of his work). Prior to Coursera, he majored in Economics, Statistics, and Molecular and Cell Biology at UC Berkeley, and worked in quantitative finance.
Principal Data Scientist at Red Hat
Real-ish Time Predictive Analytics with Spark Structured Streaming
In 20 short minutes learn what becomes possible when you add Spark into your analytics pipeline. Learn how to effectivley solve common Data Engineering problems with compile-time guarenttes – like how to ingest, normalize, transform and join datasets in realtime. Learn how to add insights on top of your streaming data with simple filters and pre-trained models.
Scott Haines is a distributed systems engineer focused on real-time, highly available, trust- worthy analytics systems. He works at Twilio where he is a Principal Software Engineer on the Voice Insights team where he helped drive spark adoption, streaming pipeline architecture best practices, as well as a massive stream processing platform. Prior to Twilio, he worked writing the backend Java API’s for Yahoo Games, as well as the real- time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.
Data Visualization Artist at University of Vermont Complex Systems Center
Data Art: Seeing the Future of Exploratory Analysis
The landscape of data visualization tools is expansive and growing. Data artist Jane Adams gives a scintillating teaser of the myriad methods for interactive visual analytics through a cursory demonstration of a project structure and creative workflow. Jane reviews one project’s development process: from paper & pencil exercises in user experience stories and user interface wireframing, to prototyping visualizations in Python using Plotly, building an API in React, and developing a customized visualization user interface in D3.js.
Jane Adams is an emergent media artist, working at the intersection of visual expression and scientific inquiry. As the Data Visualization Artist in Residence at the University of Vermont Complex Systems Center, Jane builds engaging, interactive, web-based visualizations of high-dimensional data for exploratory analysis. Her visualization research topics include social network lexical analysis, healthcare morbidity and mortality modeling, and geospatial temporal dynamics, all through a lens of complexity science. In her spare time, Jane experiments with music-color synesthesia, machine learning for computational creativity, self-sustaining aquaponic sculpture, and citizen science. She is the lead community organizer of Vermont Women in Machine Learning and Data Science (WiMLDS), and holds a MFA in Emergent Media. Stay in touch on Twitter @artistjaneadams
ODSC India 2019 Warm-Up: Machine Learning & Deep Learning
Sr. Scientist at Novozymes South Asia Pvt Ltd
Principal Data Scientist at Mysuru Consulting Group
Faculty Scientist at Institute of Bioinformatics and Applied Biotechnology (IBAB)
Deep learning powered Genomic Research
The event disease happens when there is a slip in the finely orchestrated dance between physiology, environment and genes. Treatment with chemicals (natural, synthetic or combination) solved some diseases but others persisted and got propagated along the generations. Molecular basis of disease became prime center of studies to understand and to analyze root cause. Cancer also showed a way that origin of disease, detection, prognosis and treatment along with cure was not so uncomplicated process. Treatment of diseases had to be done case by case basis (no one size fits).
With the advent of next generation sequencing, high through put analysis, enhanced computing power and new aspirations with neural network to address this conundrum of complicated genetic elements (structure and function of various genes in our systems). This requires the genomic material extraction, their sequencing (automated system) and analysis to map the strings of As, Ts, Gs, and Cs which yields genomic dataset. These datasets are too large for traditional and applied statistical techniques. Consequently, the important signals are often incredibly small along with blaring technical noise. This further requires far more sophisticated analysis techniques. Artificial intelligence and deep learning gives us the power to draw clinically useful information from the genetic datasets obtained by sequencing.
As Senior Technology Innovation Specialist,work on exploring innovative technologies in the field of biology. Before Novozymes, worked on comparative genomics of H. Pylori, mutational analysis of cataract protein and developing human model for cancer studies at prestigious national laboratories at CDFD, CCMB (Hyderabad) and NCCS, Pune respectively.
Additionally, I am a registered patent agent. Combining my domain knowledge in Biological science and application oriented patent analytics (PatInformatics) and work one three areas:
a. Using Patent & Literature data for deriving technology evolution insights for future project planning
b. Pitching new ideas and exploring their feasibility
c. Networking with new ventures and exploring new areas for organization opportunities.
I am a polymath and unicorn data scientist with strong foundations in Economics, Finance, Business Foundations, Business Analytics and Psychology. I specialize in Probabilistic Graphical Models, Machine Learning and Deep Learning. I have completed Financial Engineering and Risk Management program from Columbia University with top honors, micromasters in Marketing Analytics from UC Berkeley and statistical analysis in Life Sciences specialization from Harvard. I am chapter lead/Co-Organizer of Women in Machine Learning and Data Science Bengaluru Chapter and Core oganizing team member at WIDS Bengaluru .I have around 6 years of technical experience working in various companies like Infosys, Temenos, NeoEYED and Mysuru Consulting Group. I am part of dedicated group of experts and enthusiasts who explore Coursera courses before they open to the public, an ambassador at AIMed (an initiative which brings together physicians and AI experts), part time Data science instructor, mentor at GLAD (gladmentorship.com), mentor at JobsForHer and volunteer at Statistics without Borders. I developed the course curriculum for Probabilistic Graphical Models @ Upgrad which is taught by Professor Srinivasa Raghavan from IIIT Bangalore.
With a background in Physics and Electronics from the Bharathidasan University,Trichy, Dr.Vijayalakshmi Mahadevan completed her Ph.D. from the National Centre for Biological Sciences- Tata Institute of Fundamental Research( NCBS-TIFR), Bangalore. She was an Assistant Professor in the School of Electrical and Electronics Engineering at SASTRA Deemed University in Thanjavur and a TCS Chair Professor of Bioinformatics and Associate Dean of the School of Chemical & Biotechnology.She was the Group Lead of the Chromatin and Epigenetics group also headed the Department of Bioinformatics from 2008 to 2016 besides being affiliated to the Centre for Nanotechnology and Advanced Biomaterials (CeNTAB) at SASTRA.
Dr.Vijayalakshmi was also a Research Mentor in the National Network for Mathematical and Computational Biology (NNMCB), India from 2013 and was a Research Mentor – Research Science Initiative (RSI) of the IIT Madras, Chennai Mathematical Institute, SASTRA University, Thanjavur, PSBB Group of Schools, Chennai and Centre for Excellence in Education, McLean,USA to promote scientific research among school children.
Principal Data Scientist at Red Hat
Scientist at Intuit
A Hands-on Introduction to Natural Language Processing
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will be able to learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.
The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!
Dipanjan (DJ) Sarkar is a Data Scientist at Red Hat, a published author, and a consultant and trainer. He has consulted and worked with several startups as well as Fortune 500 companies like Intel. He primarily works on leveraging data science, advanced analytics, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses. He has recently ventured into the world of open-source products to improve the productivity of developers across the world.
Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, statistical methods and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills on areas like Data Science and Machine Learning. He also acts as a key contributor and Editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing, and Deep Learning.
Dipanjan’s interests include learning about new technology, financial markets, disruptive start-ups, data science, artificial intelligence and deep learning. In his spare time he loves reading, gaming, watching popular sitcoms and football and writing interesting articles on https://firstname.lastname@example.org and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his books and articles on GitHub at https://github.com/dipanjanS.
I am part of Intuit AI team. Prior to this, I was heading ML efforts for Huawei Technologies, Freshworks, Chennai and Airwoot, Delhi. I did my masters in theoretical computer science from IIIT Hyderabad and I dropped out of my Phd from IIT Delhi to work with startups.
I am a regular speaker at ML conferences like Pydata, Nvidia forums, Fifth Elephant, Anthill. I have also conducted a bunch of workshop attended by machine learning practitioners. I am also the co-organizer for one of the early Deep Learning meetups in Bangalore. I am also Editor of “Anthill-2018” – deep learning focused conference by HasGeek.
ODSC East 2019 Warm-Up: DataOps
Haftan Eckholdt, Ph.D.
Chief Data Science & Chief Science Officer, Understood.org
Making Data Science: AIG, Amazon, Albertsons
Developing an internal data science capability requires a cultural shift, a strategic mapping process that aligns with existing business objectives, a technical infrastructure that can host new processes, and an organizational structure that can alter business practice to create a measurable impact on business functions. This workshop will take you through ways to consider the vast opportunities for data science to identify and prioritize what will add the most value to your organization, and then budget and hire into commitments. Learn the most effective ways to establish data science objectives from a business perspective including recruiting, retention, goal setting, and improving business.
Haftan Eckholdt, PhD. is Chief Data Science Office at Understood.org. His career began with research professorships in Neuroscience, Neurology, and Psychiatry followed by industrial research appointments at companies like Amazon and AIG. He holds graduate degrees in Biostatistics and Developmental Psychology from Columbia and Cornell Universities. In his spare time, he thinks about things like chess and cooking and cross country skiing and jogging and reading. When things get really really busy, he actually plays chess and cooks delicious meals and jogs a lot. Born and raised in Baltimore, Haftan has been a resident of Kings County, New York since the late 1900s.
Christopher P. Berg
CEO, Head Chef, DataKitchen
The DataOps Manifesto
The list of failed big data projects is long. They leave end-users, data analysts and data scientists frustrated with long lead times for changes. This presentation will illustrate how to make changes to big data, models, and visualizations quickly, with high quality, using the tools analytic teams love. We synthesize DevOps, Demming, and direct experience into the DataOps Manifesto.
To paraphrase an old saying: “It takes a village to get insights from data.” Data analysts, data scientists, and data engineers are already working in teams delivering insight and analysis, but how do you get the team to support experimentation and insight delivery without ending up failing? Christopher Bergh presents the seven shocking steps to get these groups of people working together. These seven steps contain practical, doable steps that can help you achieve data agility.
After looking at trends in analytics and a brief review of Agile, Christopher outlines the steps to apply DevOps techniques from software development to create an Agile analytics operations environment, including how to add tests, modularize and containerize, do branching and merging, use multiple environments, parameterize your process, use simple storage, and use multiple workflows deploy to production with W. Edwards Deming efficiency. They also explain why “don’t be a hero” should be the motto of analytic teams—emphasizing that while being a hero can feel good, it is not the path to success for individuals in analytic teams.
Christopher’s goal is to teach analytic teams how to deliver business value quickly and with high quality. They illustrate how to apply Agile processes to your department. However, a process is not enough. Walking through the seven shocking steps will demonstrate how to create a technical environment that truly enables speed and quality by supporting DataOps.
Christopher Bergh is a Founder and Head Chef at DataKitchen.
Chris has more than 20 years of research, engineering, analytics, and executive management experience. Previously, Chris was Regional Vice President in the Revenue Management Intelligence group in Model N. Before Model N, Chris was COO of LeapFrogRx and analytics software and service provider. Chris led the acquisition of LeapFrogRx by Model N in January 2012. Prior to LeapFrogRx Chris was CTO and VP of Product Management of MarketSoft (now part of IBM) an Enterprise Marketing Management software vendor. Prior to that, Chris developed Microsoft Passport, the predecessor to Windows Live ID, a distributed authentication system used by 100s of Millions of users today. He was awarded a US Patent for his work on that project. Before joining Microsoft, he led the technical architecture and implementation of Firefly Passport, an early leader in Internet Personalization and Privacy. Microsoft subsequently acquired Firefly. Chris led the development of the first travel-related e-commerce web site at NetMarket. Chris began his career at the Massachusetts Institute of Technology’s (MIT) Lincoln Laboratory and NASA Ames Research Center. There he created software and algorithms that provided aircraft arrival optimization assistance to Air Traffic Controllers at several major airports in the United States. Chris served as a Peace Corps Volunteer Math Teacher in Botswana, Africa. Chris has an M.S. from Columbia University and a B.S. from the University of Wisconsin-Madison. He is an avid cyclist, hiker, reader, and father of two teenagers.