Welcome to Big Data Science
My name is Justin Miller, I am a computer scientist and engineer specializing in big data research with a focus on analytic system operations.
I created this domain ("BigDataSci") to help me organize my work into a clear consistent direction.
Interests
My current research focuses on expanding upon my knowledge of the following subjects.
- Big data system operations (beyond but also including Hadoop).
- Approaches for cohesively integrating real time processing (Apache Storm, Apache Flink, Apache Spark Streaming) with batch processing (traditional Hadoop, Pig, Spark) with near real time visualization dashboards (ElasticSearch).
- Data management concepts that present in complex systems. Examples of said concepts include data set management, flow, ingestion, schema design, lineage, and data quality.
- Role based access control.
- Reliability (processing is consistent with requirements ex: only once), durability (surviving system failures), manageability.
Authors and Contributors
Justin Miller (@mageru)