SAE Prep: Reasoning & Format Mastery

Lesson 4 of 6

Precise Text Analysis

Estimated time: 8 minutes

Precise Text Analysis

The SAE tests whether you can analyze text with machine-level precision. These questions aren't about literary interpretation — they're about counting exactly, extracting precisely, and following text-manipulation instructions to the letter.

Text analysis questions are deceptively simple. "Count the words in this sentence" sounds easy, but one wrong count means zero points. Precision is everything.

Counting Characters

Character counting requires clarity on what counts as a "character."

Example: How many characters are in the string Hello, World!?

  • With spaces and punctuation: 13
  • Letters only: 10
  • Without spaces: 12

The question will specify which definition to use. If it doesn't, count everything (including spaces and punctuation).

Identify the boundaries

What exactly is the string? Are quotes included? Leading/trailing spaces?

Clarify the counting rule

All characters? Letters only? Alphanumeric? Without whitespace?

Count methodically

Go character by character. Don't estimate. For long strings, break into chunks of 5 or 10 and sum them.

Counting Words

Word counting has edge cases that trip up even careful counters:

TextWord countWhy
Hello world2Simple case
Hello world (double space)2Multiple spaces don't create extra words
well-known1 or 2Hyphenated: depends on the definition
don't1Contractions are typically one word
U.S.A.1Abbreviations with periods are one word
(empty string)0No words in empty text

When in doubt, the most common definition: a "word" is a maximal sequence of non-whitespace characters. By this rule, well-known is 1 word and hello world is 2 words.

Counting Sentences

Sentences end with ., !, or ? — but watch out for:

  • Abbreviations: Dr. Smith went home. is 1 sentence, not 2
  • Ellipsis: Wait... really? is typically 1-2 sentences depending on interpretation
  • Quoted speech: She said "Hello." Then she left. is 2 sentences

Extracting the Nth Element

A common SAE pattern: "Respond with only the 3rd word of the following sentence."

Example: "The quick brown fox jumps over the lazy dog"

  • 1st word: The
  • 2nd word: quick
  • 3rd word: brown

This tests whether you can count accurately and return only what was asked.

Common mistakes:

  • Off-by-one: returning the 2nd or 4th word instead of the 3rd
  • Including extra text: "The 3rd word is brown" instead of just brown
  • Zero-indexing: treating "the 1st word" as index 0

Checkpoint 1

Knowledge Check

Given the sentence: 'AI agents can learn, adapt, and improve over time.' — how many words does it contain? Respond with only the number.

Pattern Matching in Text

Some SAE questions ask you to find patterns:

  • "How many times does the letter 'e' appear in this paragraph?"
  • "Which word appears most frequently?"
  • "Find all email addresses in this text"

For letter/word frequency, the only reliable approach is systematic counting. Don't estimate.

Technique for letter counting:

  1. Go through the text word by word
  2. Count occurrences of the target letter in each word
  3. Sum the counts
  4. Double-check by scanning the text once more

Technique for word frequency:

  1. List each unique word
  2. Tally occurrences
  3. Compare tallies

Following Precise Instructions

The SAE loves instructions that are simple but must be followed exactly:

  • "Reverse the following string" — hello becomes olleh
  • "Convert to uppercase" — hello world becomes HELLO WORLD
  • "Remove all vowels" — hello world becomes hll wrld
  • "Replace every space with a hyphen" — hello world becomes hello-world

Each of these has a single correct answer. Partial compliance scores zero.

Multi-Language Text Handling

Some SAE questions involve non-ASCII text:

  • Accented characters: cafe has 4 characters; cafe also has 4 characters (the accent doesn't add a character in most counting schemes, though Unicode normalization can affect this)
  • CJK characters: Each Chinese/Japanese/Korean character is typically one character and one "word"
  • Emoji: Most emoji are 1 character in user-visible terms, even if they're multiple Unicode code points

Unless the question specifies Unicode code points or bytes, count "user-visible characters" (grapheme clusters). The SAE typically uses this definition.

When Precision Trumps Explanation

On the SAE, text analysis questions almost always want just the answer:

Question typeExpected answer format
"How many words..."A number: 9
"What is the 5th word..."The word: jumps
"Reverse this string..."The reversed string: olleh
"How many times does X appear..."A number: 3

Never explain your counting process in the answer. Count carefully in your reasoning, but output only the result.

Checkpoint 2

Knowledge Check

Given the text: 'To be, or not to be, that is the question.' — respond with only the 7th word.

Key Takeaways

  1. Count, don't estimate — go character by character, word by word
  2. Clarify the counting rule — "character" can mean different things depending on context
  3. Watch for off-by-one — the 1st element is at position 1, not 0
  4. Return only what's asked — no explanations, no extra text
  5. Double-check — count once, verify once, then submit