Data Science
I wanted to put our own spin on the Data lab from CollegeBoard for this project. Rather than just following the standard lab, we're going to analyze real-world video game sales data that will give you hands-on experience with classes, interfaces, and data processing in Java.
For this project, we'll be using a comprehensive dataset containing information about video games, including sales figures, critic ratings, user scores, and more. I've sourced this dataset from Kaggle: Video Game Sales and Ratings.
This project will give you practical experience with:
Creating and using custom classes
Working with interfaces
Processing CSV data files
Implementing search and analysis algorithms
Visualizing data patterns
Activity 1: Getting Started
Our project requires two main Java classes:
GameDataRow
- Represents a single game record from our datasetGameDataManager
- Handles loading and analyzing collections of game data
Let's start by looking at how to set up the GameDataRow
class:
Understanding the Class Header and Interfaces
The class header public class GameDataRow implements Comparable<GameDataRow>
has several important components:
public
- This access modifier makes the class accessible from outside its package.class GameDataRow
- Defines a new class named GameDataRow.implements Comparable<GameDataRow>
- Indicates that this class implements the Comparable interface.
The Comparable Interface
The Comparable interface is part of the Java Collections Framework and enables objects of a class to be compared to each other. When a class implements Comparable, it must provide an implementation of the compareTo()
method:
The compareTo()
method must return:
A negative value if this object should be ordered before the other object
Zero if both objects are equal
A positive value if this object should be ordered after the other object
In our implementation, we're sorting games by global sales in descending order (highest sales first).
Other Common Interfaces in Java
While we're only using Comparable in this project, here are other common interfaces you might encounter:
Serializable - Marks a class as capable of being serialized (converted to a byte stream).
Runnable - Used for classes whose instances will be executed by a thread.
ActionListener - Used for handling action events, commonly in GUI applications.
Iterable - Allows an object to be the target of the "for-each loop" statement.
Properties and Encapsulation
Now, let's determine what properties our GameDataRow
class needs. Our CSV file contains 17 columns of data about each game, and we want to represent each data point as instance variables in our class.
Looking at the structure of our CSV file, we need to create properties that match each column:
Encapsulation Review
Remember that encapsulation is one of the four fundamental OOP concepts, along with inheritance, polymorphism, and abstraction. Encapsulation means bundling data and methods that operate on that data within a single unit (the class) and restricting direct access to some of the object's components.
Notice all our instance variables are marked as private
. This is a key principle of encapsulation:
Private instance variables: By making our instance variables private, we prevent outside classes from directly accessing or modifying them.
Public accessor methods: We provide controlled access through getter methods (and sometimes setter methods).
Data validation: Through these methods, we can validate any changes to ensure they meet our requirements.
Good encapsulation ensures that the internal representation of an object can't be seen from outside the object's definition, which creates a "black box" effect.
Constructors
We need constructors to initialize our GameDataRow
objects. A well-designed class should have multiple constructors to provide flexibility in object creation:
Notice I've included three constructors:
A primary constructor with all fields
A simplified constructor with only essential fields
A copy constructor to create a deep copy of an existing object
Accessor Methods (Getters)
Now we need accessor methods (getters) for all our instance variables. This is a perfect opportunity to use AI responsibly. Rather than typing out all these repetitive methods, you can ask Claude or an IDE's chatbot to generate them for you.
For example, you could say:
Please generate all the accessor methods for the GameDataRow class based on the instance variables.
This isn't plagiarism or cutting corners - it's using a tool to handle the tedious, repetitive aspects of coding so you can focus on the more challenging and creative aspects like designing algorithms and implementing the business logic.
Here's a snippet showing what a few of these accessor methods would look like:
Test-Driven Development
Let's talk about Test-Driven Development (TDD) - a software development approach where you write tests before writing the actual implementation code. This practice has been widely adopted by professional developers because it leads to more reliable, maintainable code.
In TDD, the process follows three simple steps, often called "Red-Green-Refactor":
Write a test that fails (Red)
Write the minimal code to make the test pass (Green)
Improve the code without changing its behavior (Refactor)
For our project, I've prepared a complete main method that serves as a test suite. It will systematically test all the required functionality of your GameDataRow
class and tell you exactly what's missing or not working correctly.
Adding GameDataManager and Testing
Now that your GameDataRow
class is complete, let's work with collections of games:
Copy the contents of
GameDataManager.java
file from this link: https://gist.github.com/dadiletta/2310e45953e3bcb8db3cf151c7c75d06Download the
video_games.csv
file and place it in your project directory. The location should match the path in theGameDataManager.loadFromCSV()
method call. https://www.kaggle.com/datasets/thedevastator/video-game-sales-and-ratings?resource=download
Activity 2: Building Methods
Now that we have our class structure and properties defined, it's time to add meaningful functionality to our GameDataRow
class. To complete this class, you'll need to implement several methods that analyze game data in different ways.
I'm providing you with a few utility methods that handle some of the trickier aspects of parsing data from the CSV file, such as handling non-numeric user scores. Let's first look at these provided methods:
I'm giving you these methods because:
The
hasUserReviews()
andgetUserScoreNumeric()
methods handle data conversion and error checking that might be challengingThe
toString()
method is lengthy but straightforward, and implementing it wouldn't teach you much
Now, let's focus on the methods you need to implement:
Method 1: getAverageScore()
Return Type: float
This method calculates the average between critic and user scores. However, there's a challenge: critic scores are on a scale of 0-100, while user scores are on a scale of 0-10.
Requirements:
Return a float representing the normalized average score on a scale of 0-10
Normalize the critic score to a 0-10 scale by dividing by 10
Handle cases where one or both scores are missing/invalid
Return the appropriate average or individual score based on what's available
Return 0.0f if no scores are available
Key Considerations:
What constitutes an "available" score? (Hint: check for positive values)
How do you handle the case where only one score type is available?
Hints:
You can paste this code into your method as a starting guide:
Method 2: getSalesPercentages()
Return Type: float[]
This method calculates what percentage of the global sales comes from each region (North America, Europe, Japan, and Other).
Requirements:
Return an array of float values representing percentages for each region
The array should have 4 elements, one for each region
Use the defined constants (NA_REGION, EU_REGION, etc.) for array indices
Handle the case where globalSales is zero to avoid division by zero
Key Considerations:
The formula for calculating percentage is: (regional sales / global sales) * 100
If globalSales is zero, return an array of zeros to avoid division by zero
Method 3: hasSalesData()
Return Type: boolean
This method determines whether the game has any sales data available.
Requirements:
Return true if any sales data (regional or global) is available
Return false if all sales fields are zero
Key Considerations:
Check all sales fields (naSales, euSales, jpSales, otherSales, globalSales)
Use logical OR operators to efficiently check multiple conditions
Method 4: hasCriticReviews()
Return Type: boolean
This method determines whether the game has critic review data available.
Requirements:
Return true if the game has both a critic score and a count of critics
Return false if either piece of information is missing or zero
Key Considerations:
Both criticScore and criticCount should be positive values
Method 5: getDominantRegion()
Return Type: String
This method determines which region has the highest sales for this game.
Requirements:
Compare sales across all regions and find the highest
Return a string representing that region ("North America", "Europe", "Japan", or "Other")
Handle ties by choosing the first region encountered with the maximum value
Key Considerations:
You'll need to track the maximum sales value and the corresponding region
You can use a switch statement to convert from region index to region name (see example below)
Method 6: getRegionalSales()
Return Type: float
This method returns the sales figures for a specific region.
Requirements:
Accept an int parameter representing the region (use the class constants)
Return the sales figure (as a float) for the specified region
Handle all four possible regions: NA_REGION, EU_REGION, JP_REGION, OTHER_REGION
Key Considerations:
Use a switch statement to select the appropriate sales figure based on the region parameter
Switch Statement Example
A switch statement allows you to select one of many code blocks to be executed. Here's how you might implement the getRegionalSales
method using a switch statement:
The switch statement evaluates the regionIndex
parameter and executes the code block that matches the case. The default
case handles any values that don't match any specified cases.
Method 7: Implementation of Comparable
Return Type: int
You need to implement the compareTo
method from the Comparable
interface.
Requirements:
Override the compareTo method to compare games based on global sales
Higher sales should sort before lower sales (descending order)
Return negative value if this game has higher sales than the other game
Return positive value if this game has lower sales than the other game
Return zero if both games have equal sales
Key Considerations:
Remember that returning a negative value means "this" comes before "other" in the sorted order
Activity 3: Results
Let's add one final method to our GameDataRow
class before moving on to working with collections of game data:
Understanding Static Variables and Methods
This method introduces an important concept in Java: static members. Unlike instance variables and methods that belong to specific objects, static members belong to the class itself.
Here's what you need to know:
Static variables (like our
gamesAnalyzed
counter) are shared across all instances of a class. There's only one copy in memory, regardless of how many objects you create.Static methods operate at the class level rather than the instance level. They can be called using the class name without creating an object:
GameDataRow.getGamesAnalyzed()
.Cannot use
this
in static context: Since static methods don't operate on a specific instance, you cannot use thethis
keyword inside them. There's no "current object" to reference! If you try to usethis
in a static method, you'll get a compiler error.Access limitations: Static methods can only directly access other static members (variables or methods). They cannot directly access instance variables or methods without referencing a specific object.
In our case, we're using the static variable gamesAnalyzed
to track how many game objects have been created throughout our program's execution. Each constructor increments this counter, and our static method lets us retrieve the current count.
Creating Your Own Analysis
Now it's time to put your skills to work! Modify the GameDataManager.java
file to add your own custom analysis of the video game data.
Add at least one new method to the GameDataManager
class that performs an interesting analysis. Here are some ideas:
Year Analysis: Which year had the highest average game quality? Most releases?
Developer Success Rate: For developers with multiple games, which ones have the highest average scores?
Cross-Platform Analysis: Which games appeared on the most platforms? Do multi-platform games sell better?
Sales vs. Ratings Correlation: Do higher-rated games generally sell better? Is there a difference between critic and user ratings?
Genre Evolution: How have the popularity of different genres changed over time?
Once you've added your analysis method(s), modify the main
method to call your new method and display the results.
Example Custom Analysis Method
Here's what your custom analysis might look like (don't just copy this - create your own!):
In the main
method, you would add a call to your new analysis:
Requirements for Your Custom Analysis
Your analysis should:
Process data from multiple games
Use loops and conditionals appropriately
Output meaningful results
Include comments explaining your approach
Handle edge cases (like missing data)
Activity 4: Recursion
In this final activity, you'll explore recursion - a powerful programming technique where a method calls itself to solve a problem by breaking it down into smaller instances of the same problem.
Understanding Recursion
Recursion is a fundamental concept in computer science that provides elegant solutions to certain types of problems. A recursive method has two essential components:
Base case(s): The simplest scenario(s) that can be solved directly without further recursion
Recursive case(s): Where the method calls itself with a simpler version of the problem
Let's examine a simple example - calculating the factorial of a number:
When calculating factorial(5)
:
The method first checks if n equals 0 or 1 (our base case) - it doesn't, so we move to the recursive case
It returns
5 * factorial(4)
Now it calculates
factorial(4)
, which is4 * factorial(3)
This continues until we reach
factorial(1)
, which returns 1 (base case)Then the recursion "unwinds":
1 * 2 * 3 * 4 * 5 = 120
Why Use Recursion?
Recursion offers several advantages:
Elegance: Recursive solutions can be concise and elegant for problems that would be complex to solve iteratively
Divide and conquer: Recursion naturally implements divide-and-conquer strategies, breaking down large data sets into smaller pieces
Tree traversal: For hierarchical data structures like trees, recursion provides a natural approach
Problem decomposition: Complex problems become simpler when broken into smaller, similar sub-problems
Implementing Recursion in Game Data Analysis
Looking at our GameDataManager class, several methods could benefit from a recursive approach. For example, finding games within a specific criteria could be implemented using a binary search algorithm rather than the current linear search.
Here's an example of how we could improve the findGamesByTitle
method using a recursive binary search (assuming the games list is sorted by title):
The recursive binary search has a time complexity of O(log n), which is much more efficient than the O(n) time complexity of linear search, especially with large datasets.
Another example would be implementing a recursive merge sort:
Your Task
Choose one method from your GameDataRow or GameDataManager class that could benefit from a recursive approach
Reimplement it using recursion
Test your implementation and compare it to the original approach
Document your findings: Was the recursive approach more elegant? More efficient? What challenges did you face?
Alternatively, you can implement a new recursive method that performs a complex analysis that would be difficult to do iteratively. Here are some ideas:
Find the most profitable franchise (games with similar names) recursively
Implement a recursive algorithm to categorize games into hierarchical genres (e.g., "RPG > Action RPG")
Create a recursive method to find "hidden gems" - games with high ratings but low sales
Example: Recursive Method for Maximum Critic Score
Here's an example of how you might implement a recursive method to find the game with the highest critic score within a list of games:
This method uses a divide-and-conquer approach, recursively finding the maximum in each half of the list and then comparing the results. While this specific example might not be more efficient than a simple loop, it demonstrates the recursive pattern that can be applied to more complex problems.
Remember, not all problems are best solved with recursion. Some considerations:
Recursion can lead to stack overflow errors with very large inputs
Recursive solutions sometimes have higher memory usage
Some problems have simpler iterative solutions
Last updated
Was this helpful?