Google Gemini Integration #67

SentryCoderDev · 2024-01-01T20:36:53Z

SentryCoderDev
Jan 1, 2024

Hello everyone,

Today, I'd like to initiate a discussion on the integration of the "Modular Biped" software with Google Gemini, a versatile artificial intelligence model by Google capable of working in various modes such as text, image, video, sound, and code.

Google Gemini Integration:

Google Gemini, being a multi-faceted AI model, offers potential advantages when integrated with our software. These include advanced text analysis, language translation, and even capabilities for solving complex problems. However, it's crucial to evaluate potential drawbacks and issues associated with this integration.**

Advantages:

Notable feature: Operation without explicit prompts
Advanced text analysis and comprehension
Language translation capabilities
Abilities to solve complex problems

Disadvantages:

High computational power requirements (Consideration of alternative boards like Jetson instead of Raspberry Pi)
Data security concerns (Offline versions available as a solution)

Solutions Enabled by Integration:

Smarter and more understanding text-based interactions
Coding and translation capabilities in various languages
Faster and more effective solutions to problems

Limitations and Performance:

Providing information about specific limitations and performance features of both the software and Google Gemini is essential. This can assist users in better understanding the feasibility of the integration.

References:

Modular Biped GitHub Page
Google Gemini Official Website
Google Gemini Build

This discussion aims to provide a platform for our software community to share ideas, convey experiences, and perhaps offer different perspectives. What are your thoughts? How do you think the integration of the software with Google Gemini could introduce new possibilities? Those willing to assist with integration can get in touch.

Feel free to join the discussion under this thread to share your thoughts. Thank you!

danic85 · 2024-01-02T13:32:59Z

danic85
Jan 2, 2024
Maintainer

😆 Did Gemini write this @SentryCoderDev ?

I like this idea, and it would be pretty easy to get the sample code working alongside the text to speech and voice recognition modules that are already in place.
Have you had any success with the examples they provide?

0 replies

SentryCoderDev · 2024-01-02T22:34:01Z

SentryCoderDev
Jan 2, 2024
Author

Yes @danic85 😂 wrote this to Gemini, I taught him the modules in the repository and asked for a discussion post and as a result he wrote this text.

In the examples they gave, I managed to give a written output by taking a written input, but I am currently investigating how I can use Gemini (real-time) with OpenCV and TTS.

1 reply

danic85 Jan 5, 2024
Maintainer

If we want to integrate it into the framework, the pubsub module can handle voice recognition and TTS with the existing topics.

SentryCoderDev · 2024-01-03T21:08:51Z

SentryCoderDev
Jan 3, 2024
Author

Hello again @danic85 and Community I have made some progress on Gemini integration and wanted to show it to you. The system sends a frame it receives via the GUI to the Gemini API and writes its output in the specified language. Currently, there are 5 language options and it can use TTS.

Example:

1 reply

danic85 Jan 5, 2024
Maintainer

I love that it specifies the photo is blurry 😆

SentryCoderDev · 2024-01-04T09:51:41Z

SentryCoderDev
Jan 4, 2024
Author

Edit: I'm currently working on the different camera selection feature and I'm dealing with the problem of deleting the text when I switch to a different line.The text has been updated to appear in the middle right

The PR I created for my own GUI version based on the original repository I was inspired by:

HamzahJomaa/Google-Gemini-Live-Camera#2

example:

0 replies

SentryCoderDev · 2024-01-07T19:24:25Z

SentryCoderDev
Jan 7, 2024
Author

Hello community, I wanted to share the current situation.

There were problems with the code related to the camera (the camera was constantly refreshing) and that problem is no longer there.
Now there is a text field, we can write what we want it to do there.
I'm still trying to get it to work in live mode instead of a snapshot.
I changed the layout a bit and made it print the output at the bottom.
Finally, I added the option to change the theme.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Gemini Integration #67

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Google Gemini Integration #67

SentryCoderDev Jan 1, 2024

Google Gemini Integration:

Advantages:

Disadvantages:

Solutions Enabled by Integration:

Limitations and Performance:

References:

Replies: 5 comments · 2 replies

danic85 Jan 2, 2024 Maintainer

SentryCoderDev Jan 2, 2024 Author

danic85 Jan 5, 2024 Maintainer

SentryCoderDev Jan 3, 2024 Author

danic85 Jan 5, 2024 Maintainer

SentryCoderDev Jan 4, 2024 Author

SentryCoderDev Jan 7, 2024 Author

SentryCoderDev
Jan 1, 2024

Replies: 5 comments 2 replies

danic85
Jan 2, 2024
Maintainer

SentryCoderDev
Jan 2, 2024
Author

danic85 Jan 5, 2024
Maintainer

SentryCoderDev
Jan 3, 2024
Author

danic85 Jan 5, 2024
Maintainer

SentryCoderDev
Jan 4, 2024
Author

SentryCoderDev
Jan 7, 2024
Author