Place name recognition: Exercises

29.04.2017 12:30 Uhr

Get the source data

I have made a digital text only file based on the Gutenberg Project version of: Rudolf Georg Binding: Erlebtes Leben, Frankfurt am Main 1928.

The file is included in the package above: Binding.txt

Save it as a text file on your computer.

Named entity recognition

The tool we use for this is Weblicht, which works for German.

  1. Open and click on Start WebLicht.
  2. On first login you will get a list of universities. Select Universität zu Köln. Log in with your ordinary uni username and password.
  3. Select Start or + New Chain if necessary.
  4. Upload the file Binding.txt that you saved earlier and use language German.
  5. Use Easy Mode.
  6. Select Named Entities and Run Tools. The job will take some minutes.
  7. Once the job is done, select Highlighted view –> namedEntities. Go through a few pages to see what the tool have done to the text.
  8. Select Table view and two categories: lemmas and namesEntities. Compare this view with the previous one.
  9. Download the Table view with lemmas and namesEntities as an Excel sheet.
  10. Open the excel sheet and see how the data looks.

Adding coordinates and create map

First cheat: it is not very difficult to extract the data we need from the excel sheet created above. However, it takes some Excel work which are not central to the topic for today. Thus, if you do not know how to do the necessary Excel work you are free to use a file with just the place names (B-GPE’s) and identifiers that I created for you. You find the data to use in the spreadsheet tab B-GPE. The spreadsheet is included here: BindingNER.xlsx

The step where we add coordinates to place names is called geocoding. There are many tool available, a simple online one who works reasonably well for German place names is GPS Visualizer’s Address Locator.

  1. Open the webpage:
  2. Unless you already have a MapQuest AppKey:
    1. Click [Get a key]
    2. Follow the instructions under the MapQuest tab.
  3. Copy your MapQuest Key into the Your MapQuest AppKey field.
  4. Copy the column lemmas from the tab B-GPE in the spreadsheet.
  5. Paste into Input on the webpage.
  6. Click on the Start geocoding button. The process will take a few minutes.
  7. Optional: When the job is done, copy the data from Output and put it into a text file and save it as a .csv file for use in the next exercise (this step is not needed for that exercise as there will soon be cheating again).
  8. Select Labels on map.
  9. Click Draw a map.
  10. Wait a second and see what happens.

Remember: CSV means comma separated files. Such files can be imported to spreadsheets.

Putting it on another map  if you have time

The other mapping tool we will use is Palladio, a simple visualisation tool that also displays maps:

Second cheat: The input format to Palladio is a bit tricky. It is comma separated, but because geographical coordinate pairs have a comma in the middle they are split. It turned out it works with “ around the coordinates. Further, errors (place names where the geocoding gave an error message) must be removed. Therefore, I suggest you use another file included here: Binding-GPE.txt

  1. Open Palladio and click Start.
  2. Select the tab Create a new project.
  3. Copy and paste the data from the coordinate file you downloaded above or from Binding-GPE.txt. Click on Load.
  4. Click on Map in the top menu.
  5. Click on New layer.
  6. For Places select latitude-longitude.
  7. For Tooltip label select original address.
  8. Click on Add layer.
  9. Zoom in and out. See how different types of places are shown on the map. Look for strange data and try to understand what happened.
  10. Try to select Size points and see how that works.
  11. Consider how the visualisation could have been different and better.

Additional exercises – if you have time

  1. (if time) Try other texts on the same tools.
  2. (if time) Try to find other tools that can do similar things.

If you are interested in working on this, just continue trying it out. This can be used for a small project later in the semester.