This MR addresses several robustness issues in the parse-pagexml library, specifically regarding lines without coordinates and metadata processing. It also ensures data splits are automatically generated if missing and provides comprehensive documentation for the YOLO dataset generation feature.