Goal — Orris via Instagram DM / Messenger / WhatsApp All three are gen

Goal — Orris via Instagram DM / Messenger / WhatsApp All three are genuinely doable and share the same core pattern: Meta sends an inbound message to your webhook → your server processes it via Claude → you reply via the Send API. Instagram DM and Facebook Messenger run on the same Messenger Platform infrastructure, so they can share one codebase. You subscribe to the messages webhook field, receive incoming DMs, pass them to Orris/Claude, and send a reply. WhatsApp is a separate product (WhatsApp Business Cloud API) but uses the same webhook pattern. It requires a WhatsApp Business Account linked to your Meta Business Manager. Requirements across all three: Business Verification (your business must be verified with Meta) App Review — specifically the instagram_business_manage_messages permission for IG DM (Advanced Access), and pages_messaging for Messenger WhatsApp: a separate WABA (WhatsApp Business Account) approval. now my question here is, what can we do if someone shares a reel, post, media. how do we get it and how to analyse that via llms and other tools

Checking Date & Time

Great question — this is one of the most interesting parts of building a conversational AI on social platforms. Here's a comprehensive breakdown of how shared media works across all three platforms and how to analyse it with LLMs and other tools.

When a User Shares Media in a DM

What Meta Actually Sends You

When someone shares a reel, post, image, video, or story, the webhook payload differs slightly per platform, but the core structure is the same — you get a URL or attachment object in the message body.

Platform-by-Platform Breakdown

1. Instagram DM

When a user shares a reel or post inside a DM, the webhook delivers an attachments array:

{
  "message": {
    "mid": "...",
    "attachments": [
      {
        "type": "ig_reel",        // or "image", "video", "ig_reel", "share"
        "payload": {
          "url": "https://cdn.instagram.com/...",
          "title": "Reel title (if any)"
        }
      }
    ]
  }
}

Shared posts come as "type": "share" with a payload.url pointing to the public post URL. Reels come as "type": "ig_reel" or "video" depending on context.

Key constraints:

The CDN URL in payload.url is ephemeral — it expires in minutes. Download immediately upon webhook receipt.
For private/restricted accounts, you may only get a URL to the post page, not the raw media.
You need the instagram_business_manage_messages permission to read attachment payloads.

2. Facebook Messenger

Very similar structure. Facebook is more permissive — you get direct CDN URLs for images and videos:

{
  "message": {
    "attachments": [
      {
        "type": "video",          // "image", "video", "audio", "file"
        "payload": {
          "url": "https://video.xx.fbcdn.net/..."
        }
      }
    ]
  }
}

Shared Facebook posts/reels come via the share type with a payload.url (the public post link) and optionally payload.sticker_id.

3. WhatsApp Business Cloud API

WhatsApp handles media differently — it uses media IDs instead of direct URLs:

{
  "messages": [{
    "type": "image",              // "image", "video", "audio", "document", "sticker"
    "image": {
      "id": "3857693717630",
      "mime_type": "image/jpeg",
      "sha256": "...",
      "caption": "Check this out"
    }
  }]
}

You then do a two-step retrieval:

GET https://graph.facebook.com/v19.0/{media-id} → returns a temporary download URL
GET {download_url} with Authorization: Bearer {token} → download the actual file

WhatsApp does not send direct public post/reel links with parsed metadata — it just sends whatever the user pasted as text, or an image/video they forwarded.

The Processing Pipeline

Here's the full architecture for handling shared media:

Webhook received
       │
       ▼
[1] Extract attachment type + URL/media-ID
       │
       ▼
[2] Download media immediately (CDN URLs expire fast)
       │
       ├─ Image → Vision LLM (Claude, GPT-4V, Gemini Vision)
       ├─ Video → Frame extraction → Vision LLM per frame
       ├─ Audio → Whisper / Deepgram → transcript → LLM
       ├─ Post URL → Scrape OG tags / oEmbed → LLM
       └─ Reel URL → yt-dlp download → frames + audio
       │
       ▼
[3] Build context + user message → Claude
       │
       ▼
[4] Send reply via Send API

Step-by-Step: Analysing Each Media Type

Images

Send directly to a vision-capable LLM:

import anthropic, base64, httpx

def analyse_image(image_url: str, user_caption: str) -> str:
    image_data = base64.b64encode(httpx.get(image_url).content).decode()
    
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": f"The user shared this image and said: '{user_caption}'. Respond helpfully as Orris."
                }
            ]
        }]
    )
    return response.content[0].text

Videos / Reels

Videos need to be broken into frames first (LLMs don't natively process video streams):

import cv2, tempfile, httpx

def extract_frames(video_url: str, num_frames: int = 8) -> list[bytes]:
    # Download video
    video_bytes = httpx.get(video_url).content
    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
        f.write(video_bytes)
        tmp_path = f.name
    
    # Extract evenly-spaced frames
    cap = cv2.VideoCapture(tmp_path)
    total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    frames = []
    for i in range(num_frames):
        cap.set(cv2.CAP_PROP_POS_FRAMES, int(i * total / num_frames))
        ret, frame = cap.read()
        if ret:
            _, buf = cv2.imencode(".jpg", frame)
            frames.append(buf.tobytes())
    cap.release()
    return frames

Then send all frames in a single Claude message with multiple image blocks — Claude can reason across them coherently.

For the audio track of a reel, extract with ffmpeg and transcribe:

ffmpeg -i reel.mp4 -vn -acodec mp3 audio.mp3

Then send to OpenAI Whisper or Deepgram for transcription, and include the transcript in Claude's context.

Shared Post URLs (Instagram/Facebook)

When you get a share URL like https://www.instagram.com/reel/ABC123/, you have two options:

Option A — oEmbed API (Meta official):

GET https://graph.facebook.com/v19.0/instagram_oembed
    ?url=https://www.instagram.com/reel/ABC123/
    &access_token={token}

Returns: title, author, thumbnail URL, HTML embed. Works without special permissions for public posts.

Option B — Open Graph / scraping: Fetch the URL and parse <meta property="og:..."> tags — you get the title, description, thumbnail, and video URL without an API call. Tools like BeautifulSoup or opengraph-py3 make this trivial.

import requests
from bs4 import BeautifulSoup

def scrape_og(url: str) -> dict:
    html = requests.get(url, headers={"User-Agent": "facebookexternalhit/1.1"}).text
    soup = BeautifulSoup(html, "html.parser")
    return {
        tag["property"]: tag["content"]
        for tag in soup.find_all("meta", property=True)
        if tag.get("content")
    }

Audio Messages (WhatsApp)

WhatsApp voice notes and audio messages come as "type": "audio" with a media ID. After downloading:

import openai

def transcribe_audio(audio_path: str) -> str:
    with open(audio_path, "rb") as f:
        transcript = openai.audio.transcriptions.create(
            model="whisper-1",
            file=f
        )
    return transcript.text

Putting It Together: A Unified Handler

async def handle_attachment(attachment: dict, user_text: str) -> str:
    atype = attachment["type"]
    url   = attachment["payload"].get("url", "")

    if atype == "image":
        return analyse_image(url, user_text)

    elif atype in ("video", "ig_reel"):
        frames    = extract_frames(url)
        transcript = extract_and_transcribe_audio(url)  # optional
        return analyse_video_frames(frames, transcript, user_text)

    elif atype == "share":
        og = scrape_og(url)
        thumbnail = og.get("og:image")
        description = og.get("og:description", "")
        # Optionally fetch and analyse thumbnail
        return ask_claude_about_post(og, thumbnail, user_text)

    elif atype == "audio":
        local_path = download_whatsapp_media(attachment["id"])
        transcript = transcribe_audio(local_path)
        return ask_claude(f"The user sent a voice note: '{transcript}'. {user_text}")

    else:
        return "I received your attachment but can't process this type yet."

Key Constraints to Keep in Mind

Platform	Media Type	What You Get	Expiry
Instagram DM	Image/Video	Direct CDN URL	~1 hour
Instagram DM	Shared Reel/Post	Post URL or CDN URL	~1 hour
Facebook Messenger	Image/Video	Direct CDN URL	~1 hour
Facebook Messenger	Shared Post	Post URL + snippet	~1 hour
WhatsApp	Image/Video/Audio	Media ID → temp URL	30 days (but download ASAP)

Important:

Download on webhook receipt — don't store URLs, store the files or base64.
For private Instagram posts shared in DMs, you'll only get a link back to the post — you can't pull the raw video unless the account owner granted your app permissions.
Meta's terms prohibit storing user media beyond what's needed to fulfil the request. Build your pipeline to be stateless — process and discard, don't cache media.
For reels with music, you'll get the audio but the music may be copyright-restricted. Transcription is fine; redistribution is not.

Recommended Tool Stack

Purpose	Tool
Vision analysis	Claude (`claude-opus-4-5` / `claude-sonnet-4-5`)
Audio transcription	OpenAI Whisper or Deepgram Nova-3
Video frame extraction	OpenCV (`cv2`) or `ffmpeg`
OG/post scraping	`BeautifulSoup` + `requests`
Reel downloading (dev/test)	`yt-dlp`
Media storage	S3 or GCS (pre-signed URLs, short TTL)

This pipeline gives Orris full awareness of whatever a user throws at it — images, reels, voice notes, shared posts — and lets Claude reason over all of it contextually before replying.

we have access to bright data and azure credits.. would that help here?

Yes, both help significantly — and they slot into different parts of the pipeline. Here's exactly where each one earns its place:

Bright Data

Bright Data solves the private/restricted media problem — the biggest gap in the previous pipeline.

Where it helps:

1. Scraping Instagram/Facebook post content when you only get a URL

When a user shares a reel or post link, Meta sometimes gives you just the URL (especially for public posts shared from outside your app's scope). Bright Data's Web Unlocker or Scraping Browser can fetch that URL as if it's a real browser session — bypassing Instagram's bot detection, login walls, and geo-restrictions.

import requests

def scrape_with_brightdata(post_url: str) -> dict:
    proxy = {
        "http":  "http://USERNAME:PASSWORD@brd.superproxy.io:22225",
        "https": "http://USERNAME:PASSWORD@brd.superproxy.io:22225"
    }
    resp = requests.get(post_url, proxies=proxy, verify=False)
    # parse OG tags / JSON-LD from the response
    return parse_og_tags(resp.text)

2. Bright Data's Instagram Dataset / SERP API

Bright Data has a ready-made Instagram Scraper API (part of their dataset products) — you pass a reel/post URL and get back structured JSON: caption, likes, author, hashtags, video download URL, thumbnail URL. No DOM parsing needed.

POST https://api.brightdata.com/datasets/v3/trigger
{
  "dataset_id": "gd_lk5ns7kz21pck8jpis",   // Instagram posts dataset
  "inputs": [{"url": "https://www.instagram.com/reel/ABC123/"}]
}

This is cleaner than scraping raw HTML and more reliable for extracting the actual video/image URL from a reel.

3. Bypassing CDN expiry issues

If a Meta CDN URL has expired by the time you try to re-fetch it (e.g. for retry logic), routing through Bright Data's residential proxies can sometimes recover access. More importantly, for public posts, it lets you fetch fresh media even without a valid Meta CDN URL.

Azure Credits

Azure covers the compute and AI services layer — this is where the heavy lifting happens.

Where it helps:

1. Azure AI Vision (multimodal analysis)

Azure Computer Vision has dedicated endpoints for:

Image captioning (imageAnalysis/caption)
OCR — extracting text from images/screenshots in reels
Object detection, face detection, brand/logo detection
Dense captions — describing multiple regions in one image

This can run before Claude as a cheap pre-processing step, enriching Claude's context with structured metadata rather than raw pixels.

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(endpoint=AZURE_ENDPOINT, credential=AzureKeyCredential(AZURE_KEY))

result = client.analyze_from_url(
    image_url=cdn_url,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TEXT, VisualFeatures.OBJECTS]
)
# result.caption.text → "a person cooking pasta in a pan"
# result.read.blocks → OCR text from any on-screen text

2. Azure Speech-to-Text (Whisper on Azure)

Azure hosts Whisper as a managed endpoint under Azure OpenAI Service. You can transcribe reel audio/voice notes without managing your own Whisper deployment:

from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_KEY,
    api_version="2024-02-01"
)

with open("reel_audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper",         # your Azure deployment name
        file=f
    )

3. Azure Blob Storage for media buffering

Download the media from Meta's ephemeral CDN URL, push it to an Azure Blob immediately, then process asynchronously. This decouples webhook receipt (must be fast, <5s) from processing (can take 20-30s for video).

from azure.storage.blob import BlobServiceClient

blob_client = BlobServiceClient.from_connection_string(CONN_STR)
container   = blob_client.get_container_client("media-buffer")

# On webhook receipt — fast
media_bytes = httpx.get(cdn_url).content
container.upload_blob(name=message_id, data=media_bytes, overwrite=True)

# In background worker — slow OK
blob = container.download_blob(message_id).readall()

4. Azure Functions for the webhook server itself

Host the entire webhook handler as an Azure Function (serverless, HTTP trigger). It scales to zero when idle, handles spikes automatically, and integrates natively with Blob Storage, Service Bus (for async queuing), and Azure Monitor for logging.

5. Azure Content Safety

Before passing user-shared media to Claude, run it through Azure Content Safety to detect harmful content (violence, sexual content, hate). This is important for a production bot — you don't want to blindly forward graphic content to your LLM.

from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeImageOptions, ImageData

safety_client = ContentSafetyClient(endpoint, AzureKeyCredential(key))
result = safety_client.analyze_image(AnalyzeImageOptions(image=ImageData(content=image_bytes)))
# Check result.hate_result, result.violence_result, etc.

Revised Full Pipeline with Both

Webhook (Azure Function)
        │
        ▼
[Bright Data] Fetch post/reel metadata + media URL if CDN URL is missing/expired
        │
        ▼
[Azure Blob] Buffer raw media immediately (decouple from webhook timeout)
        │
        ├── [Azure Content Safety] Screen for harmful content
        │
        ├── [Azure Vision] Caption + OCR + object detect (cheap, fast)
        │
        ├── [Azure Whisper] Transcribe audio track
        │
        └── [Claude] Receive enriched context: caption + transcript + OCR + objects
                 │
                 ▼
        [Meta Send API] Reply to user

Cost Impact

Task	Without credits	With Azure
Whisper transcription	~$0.006/min (OpenAI)	Included in Azure credits
Vision pre-processing	Claude vision tokens (~$15/M)	Azure Vision $1/1000 calls
Media buffering	S3 ($0.023/GB)	Azure Blob (~$0.018/GB, credits offset)
Webhook hosting	EC2/container (always-on cost)	Azure Functions (pay-per-execution, ~free at low volume)
Content moderation	Custom logic or 3rd party	Azure Content Safety (credits)

In short: Bright Data fills the gap where Meta won't give you raw media (private posts, expired URLs, structured reel metadata), and Azure handles the entire compute + storage + safety layer at effectively zero marginal cost while you have credits.

This is a shared conversation. Sign in to Orris to start your own chat.