The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • API
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Code
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • Ruby
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

Text Processing with UltraEdit

Derek Willis, The Washington Post
NICAR 2005

Pattern Matching Using UltraEdit

Learning to match patterns is like learning a new language – the initial burden can be steep, but pretty soon you’re doing it without much effort. It involves seeing the data for its structure, not its content per se. The stuff that surrounds the content – spaces, tabs, returns – are what matters. Data was meant to be manipulated, but you don’t need to be in Excel or Access to do it.

Your New Friends

^ – the carat character is a special one that usually is followed by a letter or number that has particular meaning in UltraEdit syntax. ^p means a paragraph mark, while ^t means a tab.
Control-R – The beginning of a search and replace in UE. Yes, it’s not Control-F or Control-H. But at least R stands for “replace,� right?
/ – the backslash is the magic “escapeâ€? character, the one that allows you to search for a question mark, for example, which has special meaning in UE syntax. So we escape it: /?

Common UltraEdit Commands

  • Find (Control-F or Alt-F3): UE’s Find command has several features that most text editors or word processors don’t. Perhaps the most important is the first check box, labeled “List Lines Containing Stringâ€?, which will do exactly that – retrieve each full line in which your search criteria is found. This may not sound like much on paper, but give it a try and you’ll see how powerful this can be. Once you have a list of lines, you can double-click on any line in the smaller box and it will take you to that line in the full file. Very useful for spot editing and checking. You can also click the “Clipboardâ€? button and it will copy the contents of the smaller box – all the lines that contain your search terms – onto the Clipboard, meaning you can paste them into another file.
  • Replace (Control-R): UE has all the features you’d expect from a search and replace command, but it adds a few more, too. Radio buttons allow you to perform the replace within highlighted text or across all open files in UltraEdit, in addition to the currently open file. You can also tell it to match only whole words or be case-sensitive.
  • Regular Expressions (used in Find or Replace): Regular expressions are powerful tools that can help you perform only the find or replace commands that you exactly want, thus not ruining your file in order to use it. UE supports traditional Unix regular expression syntax, but you may want to start out using the regexp syntax that comes with UltraEdit:
    Symbol Function
    % Matches the start of line – Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
    $ Matches the end of line – Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
    ? Matches any single character except newline
    * Matches any number of occurrences of any character except newline
    + Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
    ++ Matches the preceding character/expression zero or more times. Does not match repeated newlines.
    ^b Matches a page break
    ^p Matches a newline (CR/LF) (paragraph) (DOS Files)
    ^r Matches a newline (CR Only) (paragraph) (MAC Files)
    ^n Matches a newline (LF Only) (paragraph) (UNIX Files)
    ^t Matches a tab character
    [ ] Matches any single character, or range in the brackets
    ^{A^}^{B^} Matches expression A OR B
    ^ Overrides the following regular expression character
    ^(…^) Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(h*o^) ^(f*s^) matches “hello folks”, ^2 ^1 would replace it with “folks hello”.

Recent Comments

  • Seth Lewis on Lost in the Weeds
  • Reporters' Lab // News algorithms already exist – and that’s good on The Programmer-Reporter
  • Eric Mill on On Legislative Data Transparency
  • (19:19 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (15:05 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency

Recent Posts

  • Lost in the Weeds
  • Our Mark Knoller Problem
  • The Programmer-Reporter
  • Investigating House Freshmen Voting Patterns
  • On Legislative Data Transparency

Linking Out

  • Mapping America — Census Bureau 2005-9 American Community Survey - NYTimes.com
    holy crap
  • Backbone.js and Django | joshbohde.com
  • ProPublica
  • Geoff: GeoJSON Feature Functions for JavaScript
  • Introducing Spanner: From Documents to Linked Data Apps—Clark & Parsia: Thinking Clearly
  • A performance lesson on Django QuerySets | Seek Nuance
  • http://www.post-gazette.com/pg/03001/1108747-209.stm
  • CBC News - Canada - Database: Canadian cables in WikiLeaks
  • Federal prosecutors likely to keep jobs after cases collapse - USATODAY.com
  • Strata Gems: Explore and visualize graphs with Gephi - O'Reilly Radar


©2012 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.