
Image by Author | Gemini (nano-banana self portrait)
# Introduction
Image generation with generative AI has become a widely used tool for both individuals and businesses, allowing them to instantly create their intended visuals without needing any design expertise. Essentially, these tools can accelerate tasks that would otherwise take a significant amount of time, completing them in mere seconds.
With the progression of technology and competition, many modern, advanced image generation products have been released, such as Stable Diffusion, Midjourney, DALL-E, Imagen, and many more. Each offers unique advantages to its users. However, Google recently made a significant impact on the image generation landscape with the release of Gemini 2.5 Flash Image (or nano-banana).
Nano-banana is Google’s advanced image generation and editing model, featuring capabilities like realistic image creation, multiple image blending, character consistency, targeted prompt-based transformations, and public accessibility. The model offers far greater control than previous models from Google or its competitors.
This article will explore nano-banana’s ability to generate and edit images. We will demonstrate these features using the Google AI Studio platform and the Gemini API within a Python environment.
Let’s get into it.
# Testing the Nano-Banana Model
To follow this tutorial, you will need to register for a Google account and sign in to Google AI Studio. You will also need to acquire an API key to use the Gemini API, which requires a paid plan as there is no free tier available.
If you prefer to use the API with Python, make sure to install the Google Generative AI library with the following command:
Once your account is set up, let’s explore how to use the nano-banana model.
First, navigate to Google AI Studio and select the Gemini-2.5-flash-image-preview
model, which is the nano-banana model we will be using.
With the model selected, you can start a new chat to generate an image from a prompt. As Google suggests, a fundamental principle for getting the best results is to describe the scene, not just list keywords. This narrative approach, describing the image you envision, typically produces superior results.
In the AI Studio chat interface, you’ll see a platform like the one below where you can enter your prompt.
We will use the following prompt to generate a photorealistic image for our example.
A photorealistic close-up portrait of an Indonesian batik artisan, hands stained with wax, tracing a flowing motif on indigo cloth with a canting pen. She works at a wooden table in a breezy veranda; folded textiles and dye vats blur behind her. Late-morning window light rakes across the fabric, revealing fine wax lines and the grain of the teak. Captured on an 85 mm at f/2 for gentle separation and creamy bokeh. The overall mood is focused, tactile, and proud.
The generated image is shown below:
As you can see, the image generated is realistic and faithfully adheres to the given prompt. If you prefer the Python implementation, you can use the following code to create the image:
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
from IPython.display import display
# Replace 'YOUR-API-KEY' with your actual API key
api_key = 'YOUR-API-KEY'
client = genai.Client(api_key=api_key)
prompt = "A photorealistic close-up portrait of an Indonesian batik artisan, hands stained with wax, tracing a flowing motif on indigo cloth with a canting pen. She works at a wooden table in a breezy veranda; folded textiles and dye vats blur behind her. Late-morning window light rakes across the fabric, revealing fine wax lines and the grain of the teak. Captured on an 85 mm at f/2 for gentle separation and creamy bokeh. The overall mood is focused, tactile, and proud."
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=prompt,
)
image_parts = [
part.inline_data.data
for part in response.candidates[0].content.parts
if part.inline_data
]
if image_parts:
image = Image.open(BytesIO(image_parts[0]))
# image.save('your_image.png')
display(image)
If you provide your API key and the desired prompt, the Python code above will generate the image.
We have seen that the nano-banana model can generate a photorealistic image, but its strengths extend further. As mentioned previously, nano-banana is particularly powerful for image editing, which we will explore next.
Let’s try prompt-based image editing with the image we just generated. We will use the following prompt to slightly alter the artisan’s appearance:
Using the provided image, place a pair of thin reading glasses gently on the artisan’s nose while she draws the wax lines. Ensure reflections look realistic and the glasses sit naturally on her face without obscuring her eyes.
The resulting image is shown below:
The image above is identical to the first one, but with glasses added to the artisan’s face. This demonstrates how nano-banana can edit an image based on a descriptive prompt while maintaining overall consistency.
To do this with Python, you can provide your base image and a new prompt using the following code:
from PIL import Image
# This code assumes 'client' has been configured from the previous step
base_image = Image.open('/path/to/your/photo.png')
edit_prompt = "Using the provided image, place a pair of thin reading glasses gently on the artisan's nose..."
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[edit_prompt, base_image])
Next, let’s test character consistency by generating a new scene where the artisan is looking directly at the camera and smiling:
Generate a new and photorealistic image using the provided image as a reference for identity: the same batik artisan now looking up at the camera with a relaxed smile, seated at the same wooden table. Medium close-up, 85 mm look with soft veranda light, background jars subtly blurred.
The image result is shown below.
We’ve successfully changed the scene while maintaining character consistency. To test a more drastic change, let’s use the following prompt to see how nano-banana performs.
Create a product-style image using the provided image as identity reference: the same artisan presenting a finished indigo batik cloth, arms extended toward the camera. Soft, even window light, 50 mm look, neutral background clutter.
The result is shown below.
The resulting image shows a completely different scene but maintains the same character. This highlights the model’s ability to realistically produce varied content from a single reference image.
Next, let’s try image style transfer. We will use the following prompt to change the photorealistic image into a watercolor painting.
Using the provided image as identity reference, recreate the scene as a delicate watercolor on cold-press paper: loose indigo washes for the cloth, soft bleeding edges on the floral motif, pale umbers for the table and background. Keep her pose holding the fabric, gentle smile, and round glasses; let the veranda recede into light granulation and visible paper texture.
The result is shown below.
The image demonstrates that the style has been transformed into watercolor while preserving the subject and composition of the original.
Lastly, we will try image fusion, where we add an object from one image into another. For this example, I’ve generated an image of a woman’s hat using nano-banana:
Using the image of the hat, we will now place it on the artisan’s head with the following prompt:
Move the same woman and pose outdoors in open shade and place the straw hat from the product image on her head. Align the crown and brim to the head realistically; bow over her right ear (camera left), ribbon tails drifting softly with gravity. Use soft sky light as key with a gentle rim from the bright background. Maintain true straw and lace texture, natural skin tone, and a believable shadow from the brim over the forehead and top of the glasses. Keep the batik cloth and her hands unchanged. Keep the watercolor style unchanged.
This process merges the hat photo with the base image to generate a new image, with minimal changes to the pose and overall style. In Python, use the following code:
from PIL import Image
# This code assumes 'client' has been configured from the first step
base_image = Image.open('/path/to/your/photo.png')
hat_image = Image.open('/path/to/your/hat.png')
fusion_prompt = "Move the same woman and pose outdoors in open shade and place the straw hat..."
response = client.models.generate_content(
model="gemini-2.5-flash-image-preview",
contents=[fusion_prompt, base_image, hat_image])
For best results, use a maximum of three input images. Using more may reduce output quality.
That covers the basics of using the nano-banana model. In my opinion, this model excels when you have existing images that you want to transform or edit. It’s especially useful for maintaining consistency across a series of generated images.
Try it for yourself and don’t be afraid to iterate, as you often won’t get the perfect image on the first try.
# Wrapping Up
Gemini 2.5 Flash Image, or nano-banana, is the latest image generation and editing model from Google. It boasts powerful capabilities compared to previous image generation models. In this article, we explored how to use nano-banana to generate and edit images, highlighting its features for maintaining consistency and applying stylistic changes.
I hope this has been helpful!
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.