Big Data Science

Follow me on GitHub

Welcome to Big Data Science

My name is Justin Miller, I am a computer scientist and engineer specializing in big data research with a focus on analytic system operations.

I created this domain ("BigDataSci") to help me organize my work into a clear consistent direction.

Interests

My current research focuses on expanding upon my knowledge of the following subjects.

  • Big data system operations (beyond but also including Hadoop).
  • Approaches for cohesively integrating real time processing (Apache Storm, Apache Flink, Apache Spark Streaming) with batch processing (traditional Hadoop, Pig, Spark) with near real time visualization dashboards (ElasticSearch).
  • Data management concepts that present in complex systems. Examples of said concepts include data set management, flow, ingestion, schema design, lineage, and data quality.
  • Role based access control.
  • Reliability (processing is consistent with requirements ex: only once), durability (surviving system failures), manageability.

Authors and Contributors

Justin Miller (@mageru)