You’ll learn how to...
-
Create a complete, end-to-end streaming data analytics pipeline
-
Interactively analyze, approximate, and visualize streaming data
-
Generate machine learning, graph & NLP recommendation models
-
Productionize our ML models to serve real-time recommendations
-
Perform a hybrid on-premise and cloud deployment using Docker
-
Customize this workshop environment to your specific use cases
Agenda
Part 1 (Analytics and Visualizations)
Analytics and Visualizations Overview (Live Demo!)
Verify Environment Setup (Docker)
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark Graph, NetworkX, TitanDB)
Time-series Analytics (Cassandra)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Workflow Management (AirFlow)
Part 2 (Streaming and Recommendations)
Streaming and Recommendations Overview (Live Demo!)
Streaming (NiFi, Kafka, Spark Streaming, Flink)
Cluster-based Recommendation (Spark ML, Scikit-Learn)
Graph-based Recommendation (Spark ML, Spark Graph)
Collaborative-based Recommendation (Spark ML)
NLP-based Recommendation (CoreNLP, NLTK)
Geo-based Recommendation (ElasticSearch)
Hybrid On-Premise+Cloud Auto-Scale Deploy (Docker)
Customize the Workshop Environment for Your Use Cases
Target Audience
-
Interest in building more intuition about machine learning, graph processing, natural language processing, statistical approximation techniques, and visualizations
Pre-requisites
-
Basic familiarity with Unix/Linux commands
-
Experience in SQL, Java, Scala, Python, or R
-
Basic familiarity with linear algebra concepts (dot product)
-
Laptop with an ssh client and modern browser
-
Every attendee will get their own fully-configured cloud instance running the entire environment
-
At the end of the workshop, you will be able to save and download your environment to your local laptop in the form of a Docker image