Google's new Gemini 1.5 Pro model is mind-blowing.
It can understand a 44-minute movie and analyze a PDF document over 400 pages long.
Here's why it's a game changer for developers and beyond:
Complex reasoning about vast amounts of information
With 1.5 Pro, you are able to analyze, classify and summarize large amounts of content within a given prompt, depending on what prompt you use.
For example, if I give you a 402-page transcript of 44 mns movie, 1.5 Pro will be able to reason about events, conversations and details found within the document itself.
Woww!
Better understanding and reasoning across modalities
Gemini 1.5 Pro demonstrates exceptional capabilities in analyzing and interpreting complex content across various formats, such as silent films.
It can discern intricate plot elements and subtle details in a 44-minute movie, showcasing its advanced understanding and reasoning abilities.
Microsoft Copilot app is now available both in Android and iOS.
As claimed, it gives you GPT-4 features, such as DALL-E 3 and Vision for free.
Here are a few of my experiments:
A Thread 🧵
It hasn't been tested on a mobile app since I am more comfortable with the desktop platform because it gives me a better output and can be more easily experimented with.
1. Vision Feature:
It was my first time using its vision feature today, and I was not pleased with the output.
I was not satisfied with the output.
The image I uploaded was not difficult to understand.
Newly updated Google Gemini Pro challenge ChatGPT-4?
It's free but with limitations.
A comparison of Gemini Pro with ChatGPT-4:
[ 2nd one surprised you ]
Use-case 1:
I decided to develop a simple web application that integrates with a RESTful API.
Several key points to remember when developing this:
- Understanding the API
- Development Best Practices
- Security Considerations
- Performance Optimization
- UX
- Code Quality and Maintenance
- Testing and Debugging
Prompt:
Develop a simple web application that integrates with a RESTful API (e.g., a weather API or a social media API). Display the retrieved data in an organized and visually appealing manner using HTML and CSS
Use-case 2:
Following this use-case closely, Bard failed to recognize it, but ChatGPT gave me a crazy response I don't know about, but somehow I confirmed.
Now next I decided to upload a "Quaid e Azam" pic the founding father of Pakistan