how can the device recognize that it is letter 'a', 'b', 'c', or '1', '2'
As you know which numbers/letters are "hidden" in the Image you can create a simple Textfile for each Image with the expected value in it.
Example:
Image-Filename: Building.jpg -> this image contains the Picture of a building with a number or letter painted somewhere, f.e. the letter "A"
Extra file: BuildingAnswer.txt -> just "A"
If the user (child) enters "A" you just read the file and check if it matches.
PS: This is a simple example to give you a hint how you can do it. Take care of the Copyright (don't use someone elses Pictures without permission or use your own). How to compare values or how to write/read a file -> see the very nice documentation.