UniDoc File Intelligence Blog

How to Check Real File Type (Not Just the Extension)

A file named invoice.pdf is not always a real PDF. Attackers often rename files to look safe. This guide shows how to detect true file type using magic signatures, confidence, and risk indicators.

You will learn a simple workflow to quickly answer: What is this file really? and What should I do next?

Why Extension Is Not Enough

File extension is only text at the end of a filename. Anyone can rename malware.exe to report.pdf in seconds. Your system may still detect danger, but users often trust the visible name.

[!] Common mistake
  • Trusting only .pdf, .docx, or .jpg extension
  • Opening files from unknown senders directly
  • Skipping mismatch or warning signals
[+] Better approach
  • Check internal file signature (magic bytes)
  • Compare extension with actual detected type
  • Use trust score and risk guidance before opening

What Real File Type Means

Real type is identified from file content, usually the header bytes. These signatures are often called magic bytes.

File Type Signature (Hex) Readable Form
PDF 25 50 44 46 %PDF
PNG 89 50 4E 47 PNG header
ZIP / Office 50 4B 03 04 PK..
EXE 4D 5A MZ

If extension says one thing but signature says another, treat the file as suspicious until verified.

How to Check Real File Type (Step by Step)

1) Detect file signature

Read first bytes -> match known signatures -> detect real type

2) Compare with extension

Example: filename says .pdf, detected type says EXE -> this is a high-risk mismatch.

3) Measure confidence

High confidence (90%+) means strong signature match. Lower confidence means uncommon or partial pattern.

4) Analyze structure

For PDFs: pages, text, scripts, encryption. For archives: internal content and Office document clues.

5) Decide action

Use assistant guidance like: verify source, run OCR, compress, convert, or avoid opening.

Try it with your own file

Upload once and get real type, trust score, key finding, and recommended next steps instantly.

Open File Intelligence

How to Read Mismatch Warnings

A mismatch warning means file extension does not match real content. This is one of the most useful signals in file security triage.

[High Risk Example]

statement.pdf detected as EXE

Action: do not open directly, verify source immediately.

[Usually Safe Example]

photo.jpg detected as JPEG

Action: safe to preview, continue normal workflow.

The key is not panic - it is verification. Mismatch does not always mean malware, but it always means you should pause and confirm.

Trust Score and Risk Levels

A useful file assistant should not stop at detection. It should convert analysis into a practical risk message.

Level Meaning Recommended Action
SAFE Signature valid, known format, no risky indicators Open normally
CAUTION Some inconsistency (encryption, low confidence, minor warning) Verify source before using
RISKY Mismatch, unknown format, corruption, or script risk Avoid opening until verified

Preview and Corruption Checks

Good assistants also show whether a file is usable. Header-only detection is not enough if the file is truncated.

  • PDF integrity: check for EOF marker
  • Image integrity: check end markers like IEND or EOI
  • ZIP integrity: verify central directory record
  • Unknown signature + broken structure: treat as potentially corrupted
[Quick decision rule]

If real type is unknown and integrity checks fail, do not trust the file for production use.

Best Practices Before Opening Any File

  1. Check real type from signature, not extension.
  2. Review mismatch warnings first.
  3. Look at trust/risk message before opening.
  4. Preview safely when possible (image/PDF first page).
  5. If risky, verify sender in a second channel.
  6. For scanned documents, run OCR before text extraction.

Need a fast real-type check?

Use UniDoc File Intelligence to detect true format, trust level, and next best action in one view.

Analyze File Now

FAQ: Real File Type Detection

Can I trust file extension alone?

No. Extension can be renamed easily and should never be your only check.

Is magic byte detection accurate?

Yes, for known formats it is highly reliable and should be your first validation step.

What if the tool says "unknown format"?

It may be proprietary, encrypted, or corrupted. Verify source and avoid opening blindly.

Can a file be valid but still risky?

Yes. For example, a valid PDF can still include scripts or suspicious embedded content.