Evaluating GPT-4o And Gemini 1.5-Pro: Which AI Reigns Supreme?

May 29, 2024 No Comments

OpenAI recently unveiled its flagship GPT-4o model at the Update event, offering it for free to everyone. This model is multimodal, capable of accepting both text and image inputs and producing text outputs, enhancing its versatility and application. The announcement marked a significant milestone in the accessibility of advanced AI technology.

In a rapid follow-up, Google introduced the Gemini 1.5 Pro model to consumers via Gemini Advanced at the Google I/O 2024 event. With both of these state-of-the-art models now available to the public, it’s an ideal moment to evaluate and compare their performances.

Let’s explore this comparison to evaluate how GPT-4o stands against Gemini 1.5 Pro and identify which model delivers superior results.

Why do we perform comparisons?

We conducted a thorough comparison to evaluate the performance of the GPT-4o and the Gemini 1.5 Pro for different inputs. Our main focus was to assess the quality of responses. By delving deeply into this aspect, we aimed to determine which model excels and ultimately identify the superior option. This detailed comparison helps us understand the strengths and weaknesses of each model, guiding users to make an informed decision based on their specific needs.

Environment

In our comparative study, we took measures to maintain fairness by using the same input images and questions for both the GPT-4o and Gemini 1.5 Pro models. We used different methods: API calls for GPT-4o and a UI for Gemini Pro. These methods enabled us to assess accuracy, response time, and user experience. Our objective is to offer stakeholders practical insights into the performance of both models.

Comparing GPT 4o vs Gemini 1.5 Pro

We used the same prompt in both models to avoid variation due to the prompt structure. We evaluated and measured the performance of the model on different types of data, including textual, tabular, and graphical data such as line graphs and column charts. Additionally, we asked logical and general questions to assess the model’s performance with these data types. We used financial data in tabular and textual form, line graph data in revenue vs. expense form, and column graph data related to the work of two different people.

Prompt structure

				
					You are an image entity extraction specialist. Your task is to analyze the given image and extract the answer to the following question.
1. What was the percentage increase in revenue for Fiscal Year 2022 compared to Fiscal Year 2021?
2. How much did the cost of revenue increase in Fiscal Year 2022 compared to Fiscal Year 2021?
3. What is the gross margin in 2022?
4. What is the Net income?
5. What were the main drivers behind the increase in Intelligent Cloud revenue?

Input (Image on which we performed QnA)

Results

Question: What was the percentage increase in revenue for Fiscal Year 2022 compared to Fiscal Year 2021? GPT-4o response: The percentage increase in revenue for Fiscal Year 2022 compared to Fiscal Year 2021 is 18%. Gemini 1.5 pro response: 18%

Question: What were the main drivers behind the increase in Intelligent Cloud revenue? GPT-4o response: The main drivers behind the increase in Intelligent Cloud revenue were driven by Azure and other cloud services Gemini 1.5 pro response: Azure and other cloud services

Question: How much did the cost of revenue increase in Fiscal Year 2022 compared to Fiscal Year 2021? GPT-4o response: The cost of revenue increased by $10.4 billion, or 20%, in Fiscal Year 2022 compared to Fiscal Year 2021. Gemini 1.5 pro response: $10.4 billion

Question: What is the gross margin in 2022? GPT-4o response: The gross margin in 2022 is $135,620 million. Gemini 1.5 pro response: $135,620 million

Question: What is the Net income? GPT-4o response: The Net income in 2022 is $72,738 million. Gemini 1.5 pro response: 72,738 (in millions)

Observation

Both Gemini 1.5 Pro and GPT-4o consistently and accurately provide answers to all questions. Their responses are aligned and correct based on the given data, indicating that both models are reliable for extracting financial information. Gemini 1.5 Pro excels in providing precise answers, while GPT-4o tends to offer more detailed responses.

If you need detailed answers from tabular data, you should prefer GPT-4o. For simple, concise answers, you should use Gemini 1.5 Pro.

Prompt structure

				
					You are an image entity extraction specialist. Your task is to analyze the given image and extract the answer to the following question. You need to follow below guidelines:

1.Ensure your answer comprehensively addresses the question posed. Cover all relevant aspects visible in the image.
2.Provide a clear and concise response that is easy to understand. Avoid ambiguity in your answer.
3.Accuracy is paramount. Identify and describe elements within the image with precision, using appropriate terminology.
4.Include all mandatory details necessary for a complete understanding of the image. Don't overlook any significant features or components.
5.Focus on pertinent information. Exclude extraneous details that do not directly relate to the question.
6.Organize your answer logically, presenting information in a coherent manner. Use paragraphs or bullet points if necessary for clarity.
7.Use professional language suitable for conveying technical information. Avoid jargon unless necessary, and ensure your answer is accessible to the intended audience.
8.Before submitting your answer, proofread carefully to correct any errors in grammar, spelling, or punctuation. A polished response reflects positively on your expertise and professionalism.

Questions:[
1.Explain the expense of 2005.
2.What is the ratio of revenue vs expenses in march?
3.What is the most profitable month?
4.Which month did the company suffer a huge loss?
5.At what point in the year is there no profit and no loss? ]

Input (Image on which we performed QnA)

Results

Question: Explain the expense of 2005. GPT-4o response: The expenses throughout 2005 are represented by the green line on the graph. The expenses start at around 20000 in January and then slightly increase throughout the year. After January, the expenses show minor fluctuations but largely stay around the 20000 to 25000 range. Notable peaks are observed in March, with expenses around 27000, another smaller peak in July, and a slight increase towards December, ending at around 30000. Gemini 1.5 pro response: Analysis of 2005 Expenses: The provided line graph illustrates the monthly expenses incurred throughout 2005, depicted by the green line. Here’s a breakdown of the expense trends: Starting Point: The year began with expenses around the 20,000 mark in January. Early Year Increase: A notable rise occurred, reaching approximately 30,000 by February and maintaining a similar level through April. Mid-Year Fluctuation: May saw a dip in expenses, falling back to around 25,000. However, June and July experienced a slight rebound, with expenses hovering around the 28,000 to 30,000 range. Late Year Trends: August saw a dip below 25,000 before a rise in September exceeding 30,000. October and November maintained a similar level to the beginning of the year, around 25,000. Year End: December concludes the year with a significant increase in expenses, reaching approximately 40,000. Overall Observations: The expense trend throughout 2005 demonstrates several fluctuations. While there were periods of increase and decrease, the expenses remained within a range of approximately 20,000 to 40,000 for the majority of the year. December stands out as having the highest expense, potentially due to factors such as holiday spending or year-end financial activities. Comparing the expense trend with the revenue trend (red line) could provide further insights into the financial health and profitability of the entity during 2005.

Question: What is the ratio of revenue vs expenses in march? GPT-4o response: In March, the revenue (red line) is approximately 60000, and the expense (green line) is approximately 27000. – The ratio of revenue to expenses in March is calculated as: \[ \text{Ratio} = \frac{\text{Revenue}}{\text{Expenses}} = \frac{60000}{27000} \approx 2.22 \] Therefore, the ratio is approximately 2.22:1. Gemini 1.5 pro response: Analyzing Revenue vs. Expenses in March: In March, the revenue is significantly higher than the expenses. While the exact values are not directly readable from the chart, we can estimate the ratio. Revenue: The red line representing revenue peaks around the 60,000 mark in March. Expenses: The green line representing expenses appears to be around the 25,000 mark in March. Therefore, the ratio of revenue to expenses in March is roughly 60,000:25,000, which simplifies to 12:5. This indicates that the revenue in March was more than double the expenses, demonstrating a profitable month.

Question: What is the most profitable month? GPT-4o response: – The most profitable month is the one where the difference between revenue and expenses is the highest. Based on the graph, March shows the highest revenue at around 60000 and expenses at around 27000. Hence, \[ \text{Profit in March} = 60000 – 27000 = 33000 \] Therefore, March is the most profitable month with a profit of 33000. Gemini 1.5 pro response: Analysis of Revenue vs. Expense – 2005 Based on the provided chart, we can analyze the profitability of each month by comparing the revenue (red line) and expense (green line). The greater the gap between the revenue and expense lines, the higher the profit. Observations: March: Shows a significant peak in revenue, reaching above 60,000, while the expense remains relatively low, around 25,000. This suggests March is a strong candidate for the most profitable month. December: Although revenue reaches its highest point in December, the expenses also rise considerably, narrowing the gap between the two lines. More information is needed to determine if December surpasses March in profitability. May & June: These months show a relatively low difference between revenue and expense, indicating lower profitability compared to March. January & February: Revenue and expenses are close together, suggesting minimal profit during these months. July to November: Fluctuations occur, but none seem to reach the level of profitability observed in March. Conclusion While March appears to be the most profitable month due to the large gap between revenue and expense, December requires further analysis as the final values are not visible on the chart. Comparing the exact values for March and December would definitively determine the most profitable month.

Question: Which month did the company suffer a huge loss? GPT-4o response: – A huge loss is observed when expenses exceed revenue significantly. In June, the revenue (red line) is approximately at 20000, and expenses are at around 25000. \[ \text{Loss in June} = 20000 – 25000 = -5000 \] Therefore, June is the month where the company suffered a huge loss, amounting to a loss of 5000. Gemini 1.5 pro response: Analyzing the Revenue vs. Expense Chart for Loss The provided chart showcases the company’s revenue and expenses throughout the year 2005. To identify the month where the company suffered a significant loss, we need to analyze the gap between the revenue (red line) and expense (green line). A substantial gap with expenses exceeding revenue indicates a loss. Observations: January to February: The company starts with a relatively small gap between revenue and expenses. March: A noticeable increase in revenue creates a larger gap, indicating a profitable month. April to May: The gap decreases as revenue falls and expenses rise slightly. June: Both revenue and expenses dip, with expenses slightly exceeding revenue, hinting at a minor loss. July: A significant drop in revenue occurs while expenses rise, marking this as a potential month of considerable loss. August: Revenue recovers slightly, but expenses also increase, maintaining a loss, though smaller than July. September: Expenses surpass revenue again, indicating another month of loss. October to November: The gap narrows as revenue increases and expenses remain relatively stable. December: A dramatic spike in revenue occurs, far exceeding expenses and signifying a highly profitable month. Conclusion: Based on the analysis, July appears to be the month where the company suffered the most substantial loss. This is evident from the significant drop in revenue coupled with a rise in expenses, creating the largest gap between the two lines throughout the year. While other months like June, August, and September also experienced losses, the gap during July is the most pronounced, suggesting the most significant financial setback

Question: At what point in the year is there no profit and no loss? GPT-4o response: – No profit and no loss occur when the revenue equals the expenses. This situation is observed in October, where both revenue and expenses intersect at approximately 25000. ### Summary – Expenses of 2005: Mostly stable around the 20000 to 25000 range with slight fluctuations, notable peaks in March, July, and December. – Ratio of Revenue to Expenses in March: Approximately 2.22:1. – Most Profitable Month: March, with a profit of 33000. – Month of Huge Loss: June, with a loss of 5000. – No Profit and No Loss: October, where revenue and expenses match at around 25000. Gemini 1.5 pro response: Break-Even Points in 2005: June and October Based on the provided Revenue vs. Expense graph for 2005, there are two points where the company experiences neither profit nor loss. These points occur when the Revenue and Expense lines intersect, indicating equal values for both: June: During this month, both revenue and expenses seem to fall around the 25,000 mark. This is the first instance in the year where the company breaks even. October: Later in the year, another break-even point occurs, again with both revenue and expenses appearing to be near the 25,000 level. It’s important to note that without specific data points, we can only estimate the values based on the visual representation of the graph. However, the intersections clearly highlight the months of June and October as the periods where the company’s revenue and expenses balanced out, resulting in neither profit nor loss.

Observation

Gemini 1.5 Pro offers detailed explanations and excels in detecting numerical data from graphs, while GPT-4o shows comparable performance in logical calculations. However, for graphical data interpretation, GPT-4o appears to be more accurate. The choice between the two models depends on specific task requirements, considering factors such as detailed explanations, numerical accuracy, and graphical data interpretation.

Prompt structure

				
					You are an image entity extraction specialist. Your task is to analyze the given image and extract the answer to the following question. You need to follow below guidelines:

1.Ensure your answer comprehensively addresses the question posed. Cover all relevant aspects visible in the image.
2.Provide a clear and concise response that is easy to understand. Avoid ambiguity in your answer.
3.Accuracy is paramount. Identify and describe elements within the image with precision, using appropriate terminology.
4.Include all mandatory details necessary for a complete understanding of the image. Don't overlook any significant features or components.
5.Focus on pertinent information. Exclude extraneous details that do not directly relate to the question.
6.Organize your answer logically, presenting information in a coherent manner. Use paragraphs or bullet points if necessary for clarity.
7.Use professional language suitable for conveying technical information. Avoid jargon unless necessary, and ensure your answer is accessible to the intended audience.
8.Before submitting your answer, proofread carefully to correct any errors in grammar, spelling, or punctuation. A polished response reflects positively on your expertise and professionalism.


Questions:[ 
1. Compare cab driver performance.
2.Which Driver has the highest number of passengers at 11 AM?
3.Which Driver has the highest number of passengers during the time interval 8 am-10 am?
]

Input (Image on which we performed QnA)

Results

Question: Compare cab driver performance. GPT-4o response: – Cab Driver 1: – 6am-8am: 4 passengers – 8am-10am: 5 passengers – 10am-12pm: 11 passengers (highest number of passengers in this period) – 1pm-3pm: 6 passengers – 3pm-5pm: 4 passengers – Cab Driver 2: – 6am-8am: 3 passengers – 8am-10am: 6 passengers (highest number of passengers in this period) – 10am-12pm: 5 passengers – 1pm-3pm: 4 passengers – 3pm-5pm: 3 passengers Gemini 1.5 pro response: Comparing Cab Driver Performance: Both drivers exhibit similar trends in passenger volume throughout the day, with peak hours during the 10am-12pm timeframe and lower activity during the 6am-8am and 3pm-5pm periods. However, there are notable differences in their overall performance: Passenger Volume: Driver 1 consistently transported a higher number of passengers across all time slots compared to Driver 2. This is evident in the taller bars on Driver 1’s chart. Peak Performance: Driver 1 experienced a significantly higher peak, carrying more than 10 passengers during the 10am-12pm slot, while Driver 2’s peak was just below 6 passengers in the same period. Consistency: Both drivers demonstrate a relatively consistent performance across the day, with a clear drop in the early morning and late afternoon. However, Driver 1 maintains a higher baseline number of passengers even during these off-peak hours. Possible Explanations for the Difference: Driver Experience/Reputation: Driver 1 might be more experienced or have a better reputation, leading to higher demand and more assigned rides. Vehicle Type/Capacity: The type of vehicle each driver uses could influence passenger numbers. Driver 1 might have a larger vehicle, allowing for more passengers per trip. Route/Location: The areas where each driver operates could have different passenger densities or demand patterns. Further Analysis: To gain a more comprehensive understanding of the drivers’ performance, additional factors should be considered: Earnings: Comparing the total earnings for each driver would provide insights into the profitability of their respective routes and schedules. Customer Reviews: Analyzing customer reviews could reveal qualitative differences in service quality, driver behavior, and overall passenger satisfaction. Trip distances/durations: Understanding the types of trips each driver typically undertakes (short vs. long distances) could explain variations in passenger volume and earnings. By considering these additional factors, a more complete picture of each driver’s performance and potential areas for improvement can be developed.

Question: Which Driver has the highest number of passengers at 11 AM? GPT-4o response: Cab Driver 1 has the highest number of passengers at 11 AM, as the interval 10am-12pm includes this time, showing 11 passengers. Gemini 1.5 pro response: Analysis of Cab Driver Passenger Loads at 11 AM To determine which driver had more passengers at 11 AM, we need to focus on the “10am-12pm” time slot in both graphs. Cab Driver 1: This driver had a significantly higher number of passengers during the 10am-12pm window, with the graph reaching its peak at over 10 passengers. Cab Driver 2: In contrast, Cab Driver 2 had around 4 passengers during the same 10am-12pm period. Therefore, Cab Driver 1 clearly had a higher number of passengers at 11 AM compared to Cab Driver 2.

Question: Which Driver has the highest number of passengers during the time interval 8 am-10 am? GPT-4o response: Cab Driver 2 has the highest number of passengers during the time interval 8am-10am, with 6 passengers. Gemini 1.5 pro response: Analysis of Cab Driver Passenger Data Based on the provided bar graphs illustrating the number of passengers for Cab Driver 1 and Cab Driver 2 across different time intervals, we can determine the driver with the highest passenger count during the 8 am-10 am slot. Observations: Cab Driver 1: During the 8 am-10 am interval, Cab Driver 1 transported 6 passengers. Cab Driver 2: In the same 8 am-10 am time frame, Cab Driver 2 had a significantly higher number of passengers, totaling 6 passengers. Conclusion: Both Cab Driver 1 and Cab Driver 2 share the highest number of passengers during the 8 am-10 am interval, each having transported 6 passengers.

Observation

Both models have their strengths. Gemini 1.5 Pro is quite good for detailed analysis and comprehensive insights, making it superior for tasks that require depth and context. On the other hand, GPT-4o performs well for tasks requiring quick and precise answers without additional context. It may be preferred due to its accuracy and clarity in straightforward questions.

Conclusion

To determine superiority, the choice between Gemini 1.5 Pro and GPT-4o depends on the balance between answer length and quality. If the priority is comprehensive insights and detailed analyses, Gemini 1.5 Pro emerges as the preferred option due to its detailed answers. Conversely, for tasks that demand precise financial data and specific answers, GPT-4o proves to be more suitable, despite potentially shorter responses.

Therefore, while both models excel in providing detailed answers, the superior model depends on the specific needs and objectives of the user or organization, balancing between answer length and quality. Ultimately, the decision should be made based on which model best aligns with the requirements of the task at hand.

If you’re interested in integrating GPT-4o or Gemini 1.5 Pro into your application, please contact us or email us at letstalk@pragnakalp.com.

Need Help Choosing the Right AI Model?

Confused between GPT-4o and Gemini 1.5 Pro? Share your project details, and we’ll help you pick the perfect AI solution to match your needs.

Why do we perform comparisons?

Environment

Comparing GPT 4o vs Gemini 1.5 Pro

Prompt structure

Input (Image on which we performed QnA)

Results

Observation

Prompt structure

Input (Image on which we performed QnA)

Results

Observation

Prompt structure

Input (Image on which we performed QnA)

Results

Observation

Conclusion

Need Help Choosing the Right AI Model?

Write a comment Cancel reply

Search

Our Services

Case Studies

Recent Posts

Categories

Pragnakalp Techlabs: Your trusted partner in Python, AI, NLP, Generative AI, ML, and Automation. Our skilled experts have successfully delivered robust solutions to satisfied clients, driving innovation and success.

Hire Dedicated Developers

Services

Contact Us