Final Project: Forecasting Average Price of Avocados in 2020

Code repository: ToadHanks
Data set: data.world

Project description: In my final project, I am forecasting the relationship between the volume and average price of both conventional and organic avocados. The growing price of Chipotle guacamole add-on has me in concern that eventually an affordable dish like 'Chipotle Bowl' will go up in price around $8-$10. The basic economics says that there are a couple of reasons why a commodity gets a price hike (in my case, the commodity is guacamole). These are some of the reasons, though not in any particular order: 


1.     The demand for a particular commodity is high.
2.     The supply for a particular commodity is low.
3.     The commodity is going through a price experimentation phase.
4.     There are trade conflicts if the commodity is imported.
5.     Materials that make up a commodity themselves went up on the price.

So, keeping these reasons in my mind, I am basically looking at the consumption of avocados, where it is consumed most, which type of avocados is consumed most, and lastly, how much supply there is for both types of avocados. Factoring all these, I am forecasting the price of avocados in 2020.

Problem description: Here is my general hypothesis:


H0: The volume and average price of avocados are inversely proportioned, meaning, as the volume increases in the market, the average price for both types of avocados will go down around the USA.


H1: The volume and average price of avocados is directly proportioned, meaning, as the volume increases in the market, the average price for both types of avocados would either stall and/or rise around the USA.


Here are my preliminary results after observing the core data: 





Observations: It is known that the USA imports a lot of avocados from various South American countries, particularly from Mexico. Weather, in particular, makes the consistent volume of Avocados possible, however, the price fluctuations are semi-consistent compared to the volume. A lot of the price peaks have to do with tariffs, and which type of avocado is eaten most. Late 2015 to early 2018 has shown the price jump due to tariffs, and high-demand of organic avocado. However, post-2017 the price dip is obvious because there was a new trade deal reached between USA-Mexico-Canada. 

Time-series models: Volume vs Avg. Price from 2015 to early-2018 for both avocado labels-






Observations: Time-series shows the result which I thought it would- as volume increase, the price would go down. Conventional Hass avocados had maintained their prices, but there is a piece of evidence that when a lot of volumes were traded, the prices were raised a little bit. Organic Hass avocados maintain their volume to price ratio fairly well.

Forecasting: Using the time-series and ARIMA functions, here are my forecasts for average price and volume for both types of fruit-






Observations: The forecast model shows that both conventional and organic label avocados will maintain the current trend. The price spike and fall is entirely dependent on the amount of volume for both labels. However, it is interesting to see that organic avocados prices will be more oscillating when compared to its volume. I guess that would depend on the demand and trend for being health-conscious.

Related Work
When dr. Friedman was teaching us the time-series in Module 12, the statistics with doing that module's homework felt interesting to me. In his book and lecture slides, he mentioned forecasting methods too. I then looked around about various forecasting models, and ultimately settle onto to the forecasting model called ARIMA (autoregressive integrated moving average). ARIMA is very easy to work with when data sets contain any type averages in a large number of frequencies. I also made good use of ggplot2, dplyr, forecast, and base package in this project. I have used many in-built functions in these packages to visualize and clean my data.

On the slides of the module with Time-series, there were a couple of examples given by Dr. Friedman where he mentioned about moving averages. Moving averages are useful in sales vs supply economics mainly. My ARIMA model is one variation of moving averages. ARIMA, just like other moving averages, gets an overall impression of the pattern of movement of variables over time. It is a series of arithmetic means over time. Note, I did use a length of period, in our case years from time-series, for computing means.

Solution: Based on ARIMA (moving averages) plots, the trend is clear. Again, the volume and prices are inversely correlated. If volume goes up, the price remains to stall or dip down. Of course, the dataset values do make a difference in this analysis. Factors like the season, festivity, short demand vs high volume, all are missing. Nonetheless, these plots are estimating fairly close. For instance, medium Hass Avocado price here in my local Walmart is @2.08, and if you examine the graph "Forecast: Conventional Label Average Price Towards Yr. 2020" near the end of 2019, you will see the average is above $2.00. 

Anyhow, I accept the H0 (Null Hypothesis) -The volume and average price of avocados are inversely proportioned, meaning, as the volume increases in the market, the average price for both types of avocados will go down around the USA.


Thanks for reading my project. Let me know here in the comments if you have any questions. Happy Thanksgiving to you and your family, and in case I don't see ya', happy holidays/ Merry Christmas, Happy New Year and good luck next semester :)


~Mihir














Comments