This is not the most complex task. The main goal is to remove the meaningless informations.
For instance: Articles, very common words and formating characters of a text document are removed. High frequency or low frequency sounds of low intensity masked by a middle frequency sound of an audio recording are ignored. Edges of a still image are detected and extracted. Motions in video are detected and quantified.
In case of text, we use a small dictionary containing the words to suppress and simple algorithm to suppress paragraph numbers, tabulation, indentation marks... The words can be simplified. Plurals can be converted in singular or stemmed or lemmatized...
In case of audio, we use simultaneous and temporal masking codec to suppress sounds that ears cannot discriminate.
In case of picture, we can apply the Canny edge detection. http://www.pages.drexel.edu/~weg22/can_tut.html
In case of video, we can detect motions by comparing changes between each frames. http://www.codeproject.com/KB/audio-video/Motion_Detection.aspx
A additional step in the data formating is to compute the relative values of the parameters in order to create a kind of invariant patterns. For instance, we compute the relative frequency of words (1), the relative frequency and amplitude of sounds (2), the relative size and position of shapes (3). This operation makes the informations to analyse independent of the size of the text (1), independent of the pitch and volume of the speaker (2), independent of the distance of the objects on the image (3).