Fellowship Hacks » Projects » Blog » Line Breaks

Line breaks bug

This task is part of the FellowshipHacks/Projects/Blog project.

Goal

The blog posts migrated from EZ to WP contain some weird line breaks in some parts of the text. We must fix this bug.

People

People working on this task.

Volunteers are always welcome! Have a look at FellowshipHacks to know how you can help

Status

Last updated on: 090128

We've done some analysis to track the cause of the bug, and thought about possible solutions. We must evaluate the proposed solutions and implement one of them.

Subtasks

TODO

  • Test the tidy solution during the automated mass-migration test

DONE

  • Analysis and draft possible solutions
  • Tested filtering RSS files with tidy: results are OK for some selected blog entries. The command line is: tidy -xml -utf8 -wrap 0 input.rss > output.rss

Notes

Analysis

  • At first I noticed that the RSS file produced by EZ has extra CR characters; I tried removing them (sed -e 's/\r//g' ez.rss > clean.rss) and reimporting the RSS file but I still get the weird line breaks in the output; so I guess the problem are not the extra CRs but the actual line breaks in the RSS file

  • The "weird" line breaks don't appear on the EZ web pages, but only in the RSS file, and in the database object that stores the post text.
  • So this could be likely cause of the problem: EZ doesn't honour the linebreaks contained in the post source (relying only on html tags to display the output), while WP does (so "real" line breaks in the RSS file are converted into line breaks on the web page)

Possible solutions

  • Filter the intermediate RSS file through some tool (xmllint, tidy...) to remove line breaks in the <description> element

  • Fix the EZ RSS generator (/design/fsfe/templates/rss_pagelayout.tpl) not to put line breaks in the <description> XML element

  • Fix the WP RSS importer to ignore line breaks

Pre-formatted text

Pre-formatted text is also broken sometimes. Please compare

Cause: WP ignores <br /> and <br> elements inside <pre></pre> blocks. There's more: WP ignores <br /> tags, so I think we have to live with that and fix the post source (replacing <br> with real line breaks).


CategoryFellowshipHacksTasks