3-C: Data Science
Using custom objects to bring life to each row in a spreadsheet
Learning Targets
I can create a class to represent a row of real-world data.
I can parse a CSV file by splitting strings and converting data types.
I can write constructors that initialize multiple instance variables from parsed data.
I can implement utility methods that perform calculations using instance data.
Why Data Science in AP CSA?
You've learned how to create custom classes, work with ArrayLists, and traverse collections. Now it's time to put all these skills together with real-world data. Data science is about taking raw information and extracting meaningful insights from it. In Java, we do this by:
Modeling data as custom classes
Reading files to populate collections of objects
Analyzing data through methods that compute results
This combination of object-oriented programming and data analysis is exactly what you'll encounter on the AP exam and in professional software development.
Data Classes: Objects as Rows
Think about a spreadsheet. Each row represents one thing (a person, a product, a measurement) and each column represents an attribute of that thing. In Java, we can model each row as an object.
For example, imagine you're working with cereal nutrition data:
100% Bran,70,10,5,0.33
All-Bran,70,9,5,0.33
Apple Jacks,110,1,11,1.0Each line is one cereal. The columns are: name, calories, fiber, carbohydrates, and serving size in cups.
Creating the Data Class
We'll create a Cereal class where each instance represents one row:
Let's say you read this line from a file: "100% Bran,70,10,5,0.33"
Here's how to break it apart:
Important: The split() method returns an array of Strings. You need to convert numeric strings to their proper types using parseInt() and parseDouble().
Reading the Whole File
To process an entire CSV file, you'll combine file reading with this parsing logic:
Pattern to remember: Read line → Split line → Parse each part → Create object → Add to ArrayList
Utility Methods: Objects Computing Their Own Results
One of the most powerful aspects of OOP is that objects can answer questions about themselves. Instead of writing static methods that take an object as a parameter, we write instance methods that operate on the object's data directly.
Let's switch to a different example to illustrate this concept. Imagine you're working with book sales data:
Now let's add utility methods that let each book compute its own results:
Why Write Methods Inside the Class?
Compare these two approaches:
Static approach (what you've done before):
Instance method (what we're doing now):
The instance method is:
Cleaner: No need to pass the object as a parameter
Encapsulated: Direct access to private instance variables
Object-oriented: The book knows how to calculate its own data
Reusable: Can be called on any Book object
Using Utility Methods in Analysis
Once you've added utility methods to your data class, you can write clean analysis code:
Notice how readable this is. The logic is clear because each object knows how to compute its own values. You're not cluttering your analysis code with calculations—the Book class handles that internally.
The Power of Encapsulation

By putting these calculations inside the Book class, you get several benefits:
Less repetition: Write
book.totalRevenue()instead ofbook.getPrice() * book.getUnitsSold()everywhereEasier maintenance: If the revenue calculation changes (maybe add sales tax?), you only update it in one place
Better organization: All book-related logic lives in the Book class
Cleaner analysis: Your filtering and searching methods focus on logic, not arithmetic
This is exactly what encapsulation is about—keeping related data and behaviors bundled together.
Putting It All Together

The workflow for data science in Java:
Design your data class - What fields do you need? What calculations make sense?
Write the constructor - Takes all the data needed to create one instance
Add accessor methods - Getters for each field
Add utility methods - Calculations and comparisons
Parse the CSV file - Read, split, convert types, create objects
Store in a collection - Usually an ArrayList
Analyze the data - Filtering, finding max/min, calculating statistics
This is the same pattern professional developers use when working with databases, APIs, and data files.
Data Lab Assignment
You're now ready to work with real-world nutritional data! In the next assignment, you'll:
Download authentic cereal data from Kaggle
Create a complete data class with multiple fields
Write a CSV parser that loads dozens of records into an ArrayList
Implement filtering and max-finding algorithms
Discover a data quality issue in the dataset
The lab includes warm-up activities with live weather data using the Sinbad library, showing you how professional data scientists verify their data sources and fetch real-time information.
You'll get a GitHub Classroom link to set up the project. Follow the detailed instructions in the README to complete each activity. This lab brings together everything you've learned in Unit 3: class creation, ArrayLists, and traversal algorithms.
Last updated
Was this helpful?