1
0
mirror of https://gitlab.com/ytdl-org/youtube-dl.git synced 2026-04-29 00:00:03 -04:00

Compare commits

...

72 Commits

Author SHA1 Message Date
Sergey M․ c485959034 release 2016.07.13 2016-07-13 23:58:01 +07:00
Sergey M․ a0560d8ab8 [ellentv] Improve extraction (Closes #10067) 2016-07-13 22:42:53 +07:00
Remita Amine 0385aa6199 [bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Remita Amine 00f4764cb7 [common] extract vbr, abr and fps for Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Sergey M․ 51c2cd0b83 [extractors] Add vk:wallpost extractor import 2016-07-13 21:53:23 +07:00
Sergey M․ 5f5a9d6158 [vk] Improve login 2016-07-13 21:52:52 +07:00
Sergey M․ 2d19fb5072 [vk:wallpost] Add extractor 2016-07-13 21:51:44 +07:00
Yen Chi Hsuan 9d865a1af6 [travis] Skip downloading srelay
SOCKS tests never run on Travis CI due to unknown reasons, and
downloading them broke some tests (e.g.
https://travis-ci.org/rg3/youtube-dl/builds/144306425)
2016-07-13 14:27:14 +08:00
Remita Amine 41aa44259d [shahid] try to bypass geo restriction and extract more metadata(closes #10062) 2016-07-12 23:15:38 +01:00
Philipp Hagemeister 381ff44756 [devscripts/generate-download] Remove MD5 and SHA1 2016-07-12 09:09:54 +02:00
Sergey M․ 7f29cf545a [youtube] Add YouTube Red paid video reference test (#10059) 2016-07-12 02:10:35 +07:00
Remita Amine 7d1219f3e0 [tmz] delegate extraction to KalturaIE 2016-07-11 19:08:22 +01:00
Remita Amine f1b4af7d79 [beightcove:new] remove html tags from description 2016-07-11 19:06:50 +01:00
Remita Amine 8a8590a617 [dbtv] delegate extraction to BrightcoveNewIE 2016-07-11 16:30:24 +01:00
Remita Amine 4a7a5e41f7 [tvplay] improve extraction 2016-07-11 14:51:44 +01:00
Yen Chi Hsuan 2a49d01600 [playvid] Update _TESTS
Blocks https://travis-ci.org/rg3/youtube-dl/jobs/143809100
2016-07-11 15:15:28 +08:00
Yen Chi Hsuan b99af8a51c [biobiochiletv] Fix extraction and update _TESTS 2016-07-11 13:23:57 +08:00
Yen Chi Hsuan 8e7020daef [rudo] Add new extractor
Used in biobiochile.tv
2016-07-11 13:19:25 +08:00
Sergey M․ a26bcc61c1 release 2016.07.11 2016-07-11 03:17:12 +07:00
Sergey M․ 5c4dcf8172 [vidzi] Add support for embed URLs (Closes #10058) 2016-07-11 03:14:39 +07:00
Sergey M․ e9fb6a4bbe [youtube] Relax TFA regexes 2016-07-11 03:08:38 +07:00
Yen Chi Hsuan e2dbcaa1bf [vuclip] Fix extraction 2016-07-11 00:52:25 +08:00
Yen Chi Hsuan ae01850165 [miomio] Fix _TESTS 2016-07-11 00:03:24 +08:00
Yen Chi Hsuan c3baaedfc8 [miomio] Support new 'h5' player (closes #9605)
Depends on #8876
2016-07-10 23:46:48 +08:00
Yen Chi Hsuan 0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Sergey M․ 39e9d524e5 Credit @nehalvpatel for roosterteeth (#9864) 2016-07-10 01:30:12 +07:00
Sergey M․ 865b087224 [roosterteeth] Improve (Closes #9864) 2016-07-10 01:30:12 +07:00
Nehal Patel 3121b25639 [roosterteeth] Add extractor 2016-07-10 01:30:12 +07:00
Sergey M․ 0286b85c79 release 2016.07.09.2 2016-07-09 22:22:24 +07:00
Sergey M․ ab52bb5137 [animeondemand] Fix typo 2016-07-09 22:20:34 +07:00
Sergey M․ 61a98b8623 [lynda] Remove md5 from test (Closes #10047) 2016-07-09 21:29:11 +07:00
Sergey M․ 6daf34a045 [facebook] Fix typo and break when found video_data (Closes #10048) 2016-07-09 21:25:07 +07:00
Yen Chi Hsuan c03adf90bd [generic] Add the test. Closes #1638 2016-07-09 14:39:01 +08:00
Yen Chi Hsuan 0ece114b7b [vimeo] Recognize non-standard embeds (#1638) 2016-07-09 14:38:27 +08:00
Yen Chi Hsuan 5b6a74856b Merge pull request #9288 from reyyed/issue#9063fix
[ffmpeg] Fix embedding subtitles (#9063)
2016-07-09 14:29:53 +08:00
Sergey M․ ce43100a01 release 2016.07.09.1 2016-07-09 10:06:40 +07:00
Remita Amine 8cc9b4016d [srmediathek] extend _VALID_URL(closes #9373) 2016-07-09 03:22:09 +01:00
Remita Amine 31eeab9f41 [ard] fix f4m extraction and skip tests with 404 errors 2016-07-09 03:22:09 +01:00
Sergey M․ 9558dcec9c [youtube:user] Preserve user/c path segment 2016-07-09 08:37:19 +07:00
Sergey M․ 6e6b70d65f [extractor/generic] Properly comment out a test 2016-07-09 08:37:19 +07:00
Sergey M․ d417fd88d0 release 2016.07.09 2016-07-09 07:16:47 +07:00
Sergey M․ 9e4f5dc1e9 [animeondemand] Pass num for episode based videos 2016-07-09 07:13:32 +07:00
Sergey M․ 1251565ee0 [options] Rollback old behavior for configuratio files' encoding
Until agreed with some solution
2016-07-09 07:12:52 +07:00
Sergey M․ 1f7258a367 [animeondemand] Add support for full length films (Closes #10031) 2016-07-09 06:57:04 +07:00
Sergey M․ 0af985069b [flipagram] Improve extraction (Closes #9898) 2016-07-09 03:31:17 +07:00
Sergey M․ 0de168f7ed [extractor/generic] Detect schema.org/VideoObject embeds 2016-07-09 03:29:07 +07:00
Sergey M․ 95b31e266b [extractor/common] Add expected_type in json ld routines 2016-07-09 03:28:04 +07:00
Sergey M․ 6b3a3098b5 [extractor/common] Extract more metadata for VideoObject in _json_ld 2016-07-09 03:27:11 +07:00
Sergey M․ 2de624fdd5 [extractor/common] Introduce filesize metafield for thumbnails 2016-07-09 03:24:36 +07:00
Déstin Reed 3fee7f636c [flipagram] Add extractor 2016-07-09 03:23:32 +07:00
Remita Amine 89e2fff2b7 [mgtv] pass geo verification headers for api request 2016-07-08 20:18:25 +01:00
Sergey M․ cedc70b292 [facebook] Fix invalid video being extracted (Closes #9851) 2016-07-09 00:28:07 +07:00
Remita Amine 07d7689f2e [le] extract http formats 2016-07-08 15:35:20 +01:00
Yen Chi Hsuan ae8cb5328d Merge branch 'JakubAdamWieczorek-polskie-radio' 2016-07-08 19:35:21 +08:00
Yen Chi Hsuan 2e32ac0b9a [polskieradio] Fix regex in _TESTS 2016-07-08 19:34:53 +08:00
Yen Chi Hsuan 672f01c370 Merge branch 'polskie-radio' of https://github.com/JakubAdamWieczorek/youtube-dl into JakubAdamWieczorek-polskie-radio 2016-07-08 19:33:28 +08:00
Jakub Adam Wieczorek e2d616dd30 [polskieradio] Add thumbnails. 2016-07-08 13:23:00 +02:00
Yen Chi Hsuan 0ab7f4fe2b [nick] support nickjr.com (closes #7542) 2016-07-08 15:11:28 +08:00
Sergey M․ 29c4a07776 [lynda] Fix test 2016-07-08 03:33:53 +07:00
Philipp Hagemeister 826e911e41 Merge branch 'master' of github.com:rg3/youtube-dl 2016-07-07 19:42:22 +02:00
Philipp Hagemeister 30d22dae8e [options] Do not decode Unicode on Python 2.x
The configuration file contents are being returned as unicode now, so decoding them is no longer necessary.
(Run python2 with -3 to see the warning before this commit)
2016-07-07 19:41:00 +02:00
Yen Chi Hsuan ec3518725b [compat] Fix test_cmdline_umlauts on Python 2.6
The original statement raises uncaught UnicodeWarning on Python 2.6
2016-07-07 22:30:58 +08:00
Remita Amine 5f87d845eb [tweakers] fix info extraction(closes #9516) 2016-07-07 12:51:42 +01:00
Philipp Hagemeister 571808a7aa document comments in configuration file (fixes #10024) 2016-07-07 12:12:21 +02:00
Yen Chi Hsuan dfe5fa49ae [compat] Fix compat_shlex_split for non-ASCII input
Closes #9871
2016-07-07 17:37:29 +08:00
Remita Amine 01a0c511eb [radiocanada] extract more formats 2016-07-07 03:46:12 +01:00
remitamine b3d30315ce Merge pull request #9597 from remitamine/toutv
[toutv] fix info extraction(closes #1792)(closes #2082)
2016-07-07 01:51:01 +01:00
remitamine 882af14d7d [toutv] fix info extraction(closes #1792)(closes #2082) 2016-07-07 01:47:28 +01:00
Remita Amine 47335a0efa [telecinco] fix info extraction 2016-07-06 23:09:13 +01:00
remitamine 59bbe4911a [extractor/common] add helper method to extract html5 media entries 2016-06-26 14:04:08 +01:00
remitamine 4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Wang Jun Tham ccff2c404d [ffmpeg] Fix embedding subtitles (#9063)
Changed command line parameters for ffmpeg when embedding subtitles.
Changed to ‘-map 0:v -c:v copy -map 0:a -c:a copy’
2016-04-24 00:08:02 +08:00
50 changed files with 1438 additions and 679 deletions
+3 -3
View File
@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.07**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.13**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.07.07
[debug] youtube-dl version 2016.07.13
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
-3
View File
@@ -7,9 +7,6 @@ python:
- "3.4"
- "3.5"
sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose
notifications:
email:
+1
View File
@@ -177,3 +177,4 @@ Roman Tsiupa
Artur Krysiak
Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
+1
View File
@@ -432,6 +432,7 @@ For example, with the following configuration file youtube-dl will always extrac
--no-mtime
--proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s
# Lines starting with # are comments
```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
-4
View File
@@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])
-8
View File
@@ -1,8 +0,0 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make
+4
View File
@@ -224,6 +224,7 @@
- **Firstpost**
- **FiveTV**
- **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
@@ -553,6 +554,7 @@
- **RICE**
- **RingTV**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes**
- **Roxwel**
- **RTBF**
@@ -566,6 +568,7 @@
- **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams
- **RTVNH**
- **Rudo**
- **RUHD**
- **RulePorn**
- **rutube**: Rutube videos
@@ -792,6 +795,7 @@
- **vine:user**
- **vk**: VK
- **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive**
- **Vodlocker**
- **VoiceRepublic**
+1
View File
@@ -88,6 +88,7 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self):
xml = '''
+24
View File
@@ -81,6 +81,7 @@ from youtube_dl.utils import (
cli_option,
cli_valueless_option,
cli_bool_option,
parse_codecs,
)
from youtube_dl.compat import (
compat_chr,
@@ -608,6 +609,29 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
+8 -3
View File
@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import binascii
@@ -2594,15 +2595,19 @@ except ImportError: # Python < 3.3
return "'" + s.replace("'", "'\"'\"'") + "'"
if sys.version_info >= (2, 7, 3):
try:
args = shlex.split('中文')
assert (isinstance(args, list) and
isinstance(args[0], compat_str) and
args[0] == '中文')
compat_shlex_split = shlex.split
else:
except (AssertionError, UnicodeEncodeError):
# Working around shlex issue with unicode strings on some python 2
# versions (see http://bugs.python.org/issue1548891)
def compat_shlex_split(s, comments=False, posix=True):
if isinstance(s, compat_str):
s = s.encode('utf-8')
return shlex.split(s, comments, posix)
return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
def compat_ord(c):
+69 -41
View File
@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
_TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': {
'id': '161',
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
},
'playlist_mincount': 4,
}, {
# Film wording is used instead of Episode
# Film wording is used instead of Episode, ger/jap, Dub/OmU
'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True,
}, {
# Episodes without titles
# Episodes without titles, jap, OmU
'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True,
}, {
# ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}]
def _login(self):
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
entries = []
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
def extract_info(html, video_id, num=None):
title, description = [None] * 2
formats = []
for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
attributes = extract_attributes(input_)
playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang)
if kind:
format_id_list.append(kind)
if not format_id_list:
if not format_id_list and num is not None:
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
})
formats.extend(file_formats)
if formats:
self._sort_formats(formats)
return {
'title': title,
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy()
f.update({
'title': title,
'description': description,
'formats': formats,
})
f.update(info)
entries.append(f)
# Extract teaser only when full episode is not available
if not formats:
# Extract teaser/trailer only when full episode is not available
if not info['formats']:
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description)
+12 -4
View File
@@ -13,6 +13,7 @@ from ..utils import (
parse_duration,
unified_strdate,
xpath_text,
update_url_query,
)
from ..compat import compat_etree_fromstring
@@ -34,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@@ -44,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
'skip': 'HTTP Error 404: Not Found',
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -55,6 +58,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
@@ -113,11 +117,14 @@ class ARDMediathekIE(InfoExtractor):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
video_id, preference=-1, f4m_id='hds', fatal=False))
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
@@ -231,7 +238,8 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$',
}
},
'skip': 'HTTP Error 404: Not Found',
}
def _real_extract(self, url):
+27 -22
View File
@@ -44,6 +44,8 @@ class BBCCoUkIE(InfoExtractor):
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
# Unified Streaming Platform
_USP_RE = r'/([^/]+)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
_NAMESPACES = (
_MEDIASELECTION_NS,
@@ -55,12 +57,11 @@ class BBCCoUkIE(InfoExtractor):
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'ext': 'mp4',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
},
'params': {
# rtmp download
'skip_download': True,
}
},
@@ -92,7 +93,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
'skip': 'this episode is not currently available',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
@@ -107,7 +108,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
'skip': 'this episode is not currently available',
}, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': {
@@ -127,13 +128,12 @@ class BBCCoUkIE(InfoExtractor):
'note': 'Audio',
'info_dict': {
'id': 'p022h44j',
'ext': 'flv',
'ext': 'mp4',
'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances',
'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.",
'duration': 227,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
@@ -141,13 +141,12 @@ class BBCCoUkIE(InfoExtractor):
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'ext': 'mp4',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
@@ -163,7 +162,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
'skip': 'this episode is not currently available',
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition',
'info_dict': {
@@ -177,7 +176,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
'skip': 'this episode is not currently available',
}, {
# iptv-all mediaset fails with geolocation however there is no geo restriction
# for this programme at all
@@ -192,18 +191,17 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'Now it\'s really geo-restricted',
'skip': 'this episode is not currently available on BBC iPlayer Radio',
}, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
'ext': 'flv',
'ext': 'mp4',
'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
@@ -248,9 +246,15 @@ class BBCCoUkIE(InfoExtractor):
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
is_unified_streaming = re.search(self._USP_RE, href)
if is_unified_streaming:
href = re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href)
m3u8_formats = self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
m3u8_id=supplier, fatal=False)
if is_unified_streaming:
self._check_formats(m3u8_formats, programme_id)
formats.extend(m3u8_formats)
# Direct link
else:
formats.append({
@@ -305,13 +309,14 @@ class BBCCoUkIE(InfoExtractor):
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if format.get('protocol') != 'm3u8_native':
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
+24 -29
View File
@@ -2,11 +2,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria',
},
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
# different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True,
@@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}
+2 -1
View File
@@ -26,6 +26,7 @@ from ..utils import (
unescapeHTML,
unsmuggle_url,
update_url_query,
clean_html,
)
@@ -620,7 +621,7 @@ class BrightcoveNewIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': json_data.get('description'),
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')),
+90 -19
View File
@@ -44,6 +44,7 @@ from ..utils import (
sanitized_Request,
unescapeHTML,
unified_strdate,
unified_timestamp,
url_basename,
xpath_element,
xpath_text,
@@ -54,6 +55,8 @@ from ..utils import (
update_Request,
update_url_query,
parse_m3u8_attributes,
extract_attributes,
parse_codecs,
)
@@ -161,6 +164,7 @@ class InfoExtractor(object):
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.
description: Full video description.
uploader: Full name of the video uploader.
@@ -803,15 +807,17 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html,
'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs):
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs)
if not json_ld:
return {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
return self._json_ld(
json_ld, video_id, fatal=kwargs.get('fatal', True),
expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True):
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld:
@@ -819,6 +825,8 @@ class InfoExtractor(object):
info = {}
if json_ld.get('@context') == 'http://schema.org':
item_type = json_ld.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(json_ld.get('name')),
@@ -837,6 +845,19 @@ class InfoExtractor(object):
'title': unescapeHTML(json_ld.get('headline')),
'description': unescapeHTML(json_ld.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': json_ld.get('contentUrl'),
'title': unescapeHTML(json_ld.get('name')),
'description': unescapeHTML(json_ld.get('description')),
'thumbnail': json_ld.get('thumbnailUrl'),
'duration': parse_duration(json_ld.get('duration')),
'timestamp': unified_timestamp(json_ld.get('uploadDate')),
'filesize': float_or_none(json_ld.get('contentSize')),
'tbr': int_or_none(json_ld.get('bitrate')),
'width': int_or_none(json_ld.get('width')),
'height': int_or_none(json_ld.get('height')),
})
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
@@ -1186,6 +1207,7 @@ class InfoExtractor(object):
'url': format_url(line.strip()),
'tbr': tbr,
'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
'protocol': entry_protocol,
'preference': preference,
}
@@ -1194,24 +1216,17 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x')
f['width'] = int(width_str)
f['height'] = int(height_str)
codecs = last_info.get('CODECS')
if codecs:
vcodec, acodec = [None] * 2
va_codecs = codecs.split(',')
if len(va_codecs) == 1:
# Audio only entries usually come with single codec and
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
# Unified Streaming Platform
mobj = re.search(
r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
if mobj:
abr, vbr = mobj.groups()
abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
f.update({
'acodec': acodec,
'vcodec': vcodec,
'vbr': vbr,
'abr': abr,
})
f.update(parse_codecs(last_info.get('CODECS')))
if last_media is not None:
f['m3u8_media'] = last_media
last_media = None
@@ -1616,6 +1631,62 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _parse_html5_media_entries(self, base_url, webpage):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
return {}
ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
if ctr:
mimetype, codecs = ctr.groups()
f = parse_codecs(codecs)
f['ext'] = mimetype2ext(mimetype)
return f
return {}
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_info = {
'formats': [],
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
media_info['formats'].append({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
src = source_attributes.get('src')
if not src:
continue
f = parse_content_type(source_attributes.get('type'))
f.update({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
src = track_attributes.get('src')
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
entries.append(media_info)
return entries
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
+19 -50
View File
@@ -4,78 +4,47 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
clean_html,
)
class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
'md5': '2e24f67936517b143a234b4cadf792ec',
'info_dict': {
'id': '33100',
'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'thumbnail': 're:https?://.*\.jpg$',
'timestamp': 1404039863.438,
'thumbnail': 're:https?://.*\.jpg',
'timestamp': 1404039863,
'upload_date': '20140629',
'duration': 69.544,
'view_count': int,
'categories': list,
}
'uploader_id': '1027729757001',
},
'add_ie': ['BrightcoveNew']
}, {
'url': 'http://dbtv.no/3649835190001',
'only_matching': True,
}, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
data = self._download_json(
'http://api.dbtv.no/discovery/%s' % video_id, display_id)
video = data['playlist'][0]
formats = [{
'url': f['URL'],
'vcodec': f.get('container'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'vbr': float_or_none(f.get('rate'), 1000),
'filesize': int_or_none(f.get('size')),
} for f in video['renditions'] if 'URL' in f]
if not formats:
for url_key, format_id in [('URL', 'mp4'), ('HLSURL', 'hls')]:
if url_key in video:
formats.append({
'url': video[url_key],
'format_id': format_id,
})
self._sort_formats(formats)
video_id, display_id = re.match(self._VALID_URL, url).groups()
return {
'id': compat_str(video['id']),
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video['title'],
'description': clean_html(video['desc']),
'thumbnail': video.get('splash') or video.get('thumb'),
'timestamp': float_or_none(video.get('publishedAt'), 1000),
'duration': float_or_none(video.get('length'), 1000),
'view_count': int_or_none(video.get('views')),
'categories': video.get('tags'),
'formats': formats,
'ie_key': 'BrightcoveNew',
}
+37 -13
View File
@@ -6,12 +6,13 @@ import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
NO_DEFAULT,
)
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
@@ -22,24 +23,47 @@ class EllenTVIE(InfoExtractor):
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',
}
}
},
}, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://widgets.ellentube.com/videos/%s' % video_id,
video_id)
URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id')
for num, url_ in enumerate(URLS, 1):
webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id')
default = NO_DEFAULT if num == len(URLS) else None
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')
+4
View File
@@ -256,6 +256,7 @@ from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .fktv import FKTVIE
from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
@@ -679,6 +680,7 @@ from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
@@ -689,6 +691,7 @@ from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .rutube import (
@@ -986,6 +989,7 @@ from .viki import (
from .vk import (
VKIE,
VKUserVideosIE,
VKWallPostIE,
)
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
+17 -4
View File
@@ -219,12 +219,25 @@ class FacebookIE(InfoExtractor):
BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
if m:
swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
for m in re.findall(PATTERN, webpage):
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data']
video_data_candidate = json.loads(params_raw)['video_data']
for _, f in video_data_candidate.items():
if not f:
continue
if isinstance(f, dict):
f = [f]
if not isinstance(f, list):
continue
if f[0].get('video_id') == video_id:
video_data = video_data_candidate
break
if video_data:
break
def video_data_list2dict(video_data):
ret = {}
+115
View File
@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default=False)
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initally loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}
+45
View File
@@ -1313,6 +1313,38 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
'md5': '64d86f1c7d369afd9a78b38cbb88d80a',
'info_dict': {
'id': '148867247',
'ext': 'mp4',
'title': 'Understanding the web - Teaser',
'description': 'This is "Understanding the web - Teaser" by openclassrooms on Vimeo, the home for high quality videos and the people who love them.',
'upload_date': '20151214',
'uploader': 'OpenClassrooms',
'uploader_id': 'openclassrooms',
},
'add_ie': ['Vimeo'],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
# 'url': 'https://flipagram.com/f/nyvTSJMKId',
# 'md5': '888dcf08b7ea671381f00fab74692755',
# 'info_dict': {
# 'id': 'nyvTSJMKId',
# 'ext': 'mp4',
# 'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
# 'description': '#love for cats.',
# 'timestamp': 1461244995,
# 'upload_date': '20160421',
# },
# 'params': {
# 'force_generic_extractor': True,
# },
# }
]
def report_following_redirect(self, new_url):
@@ -2157,6 +2189,19 @@ class GenericIE(InfoExtractor):
if embed_url:
return self.url_result(embed_url)
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default=None, expected_type='VideoObject')
if json_ld and json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
+90 -45
View File
@@ -23,6 +23,7 @@ from ..utils import (
str_or_none,
url_basename,
urshift,
update_url_query,
)
@@ -89,6 +90,10 @@ class LeIE(InfoExtractor):
_loc3_ = self.ror(_loc3_, _loc2_ % 17)
return _loc3_
# reversed from http://jstatic.letvcdn.com/sdk/player.js
def get_mms_key(self, time):
return self.ror(time, 8) ^ 185025305
# see M3U8Encryption class in KLetvPlayer.swf
@staticmethod
def decrypt_m3u8(encrypted_data):
@@ -109,23 +114,7 @@ class LeIE(InfoExtractor):
return bytes(_loc7_)
def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
params = {
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com'
}
play_json = self._download_json(
'http://api.le.com/mms/out/video/playJson',
media_id, 'Downloading playJson data', query=params,
headers=self.geo_verification_headers())
def _check_errors(self, play_json):
# Check for errors
playstatus = play_json['playstatus']
if playstatus['status'] == 0:
@@ -136,43 +125,99 @@ class LeIE(InfoExtractor):
msg = 'Generic error. flag = %d' % flag
raise ExtractorError(msg, expected=True)
playurl = play_json['playurl']
def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
formats = ['350', '1000', '1300', '720p', '1080p']
dispatch = playurl['dispatch']
play_json_h5 = self._download_json(
'http://api.le.com/mms/out/video/playJsonH5',
media_id, 'Downloading html5 playJson data', query={
'id': media_id,
'platid': 3,
'splatid': 304,
'format': 1,
'tkey': self.get_mms_key(int(time.time())),
'domain': 'www.le.com',
'tss': 'no',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_h5)
urls = []
for format_id in formats:
if format_id in dispatch:
media_url = playurl['domain'][0] + dispatch[format_id][0]
media_url += '&' + compat_urllib_parse_urlencode({
'm3v': 1,
play_json_flash = self._download_json(
'http://api.le.com/mms/out/video/playJson',
media_id, 'Downloading flash playJson data', query={
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_flash)
def get_h5_urls(media_url, format_id):
location = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id, query={
'format': 1,
'expect': 3,
'rateid': format_id,
})
'tss': 'no',
})['location']
nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
return {
'http': update_url_query(location, {'tss': 'no'}),
'hls': update_url_query(location, {'tss': 'ios'}),
}
req = self._request_webpage(
nodes_data['nodelist'][0]['location'], media_id,
note='Downloading m3u8 information for format %s' % format_id)
def get_flash_urls(media_url, format_id):
media_url += '&' + compat_urllib_parse_urlencode({
'm3v': 1,
'format': 1,
'expect': 3,
'rateid': format_id,
})
m3u8_data = self.decrypt_m3u8(req.read())
nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
url_info_dict = {
'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
'ext': determine_ext(dispatch[format_id][1]),
'format_id': format_id,
'protocol': 'm3u8',
}
req = self._request_webpage(
nodes_data['nodelist'][0]['location'], media_id,
note='Downloading m3u8 information for format %s' % format_id)
if format_id[-1:] == 'p':
url_info_dict['height'] = int_or_none(format_id[:-1])
m3u8_data = self.decrypt_m3u8(req.read())
urls.append(url_info_dict)
return {
'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
}
extracted_formats = []
formats = []
for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
playurl = play_json['playurl']
play_domain = playurl['domain'][0]
for format_id, format_data in playurl.get('dispatch', []).items():
if format_id in extracted_formats:
continue
extracted_formats.append(format_id)
media_url = play_domain + format_data[0]
for protocol, format_url in get_urls(media_url, format_id).items():
f = {
'url': format_url,
'ext': determine_ext(format_data[1]),
'format_id': '%s-%s' % (protocol, format_id),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'quality': int_or_none(format_id),
}
if format_id[-1:] == 'p':
f['height'] = int_or_none(format_id[:-1])
formats.append(f)
self._sort_formats(formats, ('height', 'quality', 'format_id'))
publish_time = parse_iso8601(self._html_search_regex(
r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
@@ -181,7 +226,7 @@ class LeIE(InfoExtractor):
return {
'id': media_id,
'formats': urls,
'formats': formats,
'title': playurl['title'],
'thumbnail': playurl['pic'],
'description': description,
+1 -1
View File
@@ -100,7 +100,7 @@ class LyndaIE(LyndaBaseIE):
_TESTS = [{
'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'md5': 'ecfc6862da89489161fb9cd5f5a6fac1',
# md5 is unstable
'info_dict': {
'id': '114408',
'ext': 'mp4',
+2 -1
View File
@@ -26,7 +26,8 @@ class MGTVIE(InfoExtractor):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data']
query={'video_id': video_id},
headers=self.geo_verification_headers())['data']
info = api_data['info']
formats = []
+45 -14
View File
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import random
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
xpath_text,
int_or_none,
@@ -18,13 +19,16 @@ class MioMioIE(InfoExtractor):
_TESTS = [{
# "type=video" in flashvars
'url': 'http://www.miomio.tv/watch/cc88912/',
'md5': '317a5f7f6b544ce8419b784ca8edae65',
'info_dict': {
'id': '88912',
'ext': 'flv',
'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
'duration': 5923,
},
'params': {
# The server provides broken file
'skip_download': True,
}
}, {
'url': 'http://www.miomio.tv/watch/cc184024/',
'info_dict': {
@@ -32,7 +36,7 @@ class MioMioIE(InfoExtractor):
'title': '《动漫同人插画绘制》',
},
'playlist_mincount': 86,
'skip': 'This video takes time too long for retrieving the URL',
'skip': 'Unable to load videos',
}, {
'url': 'http://www.miomio.tv/watch/cc173113/',
'info_dict': {
@@ -40,20 +44,23 @@ class MioMioIE(InfoExtractor):
'title': 'The New Macbook 2015 上手试玩与简评'
},
'playlist_mincount': 2,
'skip': 'Unable to load videos',
}, {
# new 'h5' player
'url': 'http://www.miomio.tv/watch/cc273295/',
'md5': '',
'info_dict': {
'id': '273295',
'ext': 'mp4',
'title': 'アウト×デラックス 20160526',
},
'params': {
# intermittent HTTP 500
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
def _extract_mioplayer(self, webpage, video_id, title, http_headers):
xml_config = self._search_regex(
r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
webpage, 'xml config')
@@ -92,10 +99,34 @@ class MioMioIE(InfoExtractor):
'http_headers': http_headers,
})
return entries
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer(?:_h5)?/[^"]+)"', webpage, 'ref_path')
if '_h5' in mioplayer_path:
player_url = compat_urlparse.urljoin(url, mioplayer_path)
player_webpage = self._download_webpage(
player_url, video_id,
note='Downloading player webpage', headers={'Referer': url})
entries = self._parse_html5_media_entries(player_url, player_webpage)
http_headers = {'Referer': player_url}
else:
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
entries = self._extract_mioplayer(webpage, video_id, title, http_headers)
if len(entries) == 1:
segment = entries[0]
segment['id'] = video_id
segment['title'] = title
segment['http_headers'] = http_headers
return segment
return {
+65 -50
View File
@@ -12,12 +12,69 @@ from ..utils import (
get_element_by_attribute,
int_or_none,
remove_start,
extract_attributes,
determine_ext,
)
class MiTeleIE(InfoExtractor):
class MiTeleBaseIE(InfoExtractor):
def _get_player_info(self, url, webpage):
player_data = extract_attributes(self._search_regex(
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')
mmc_url = config['services']['mmc']
duration = None
formats = []
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
mmc = self._download_json(
m_url, video_id, 'Downloading mmc JSON')
if not duration:
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
video_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
ext = determine_ext(file_)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
'duration': duration,
}
class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
@@ -25,7 +82,7 @@ class MiTeleIE(InfoExtractor):
'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144',
'ext': 'flv',
'ext': 'mp4',
'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
'series': 'Diario de',
@@ -40,7 +97,7 @@ class MiTeleIE(InfoExtractor):
'info_dict': {
'id': 'eLZSwoEd1S3pVyUm8lc6F',
'display_id': 'programa-226',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
'series': 'Cuarto Milenio',
@@ -59,40 +116,7 @@ class MiTeleIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
config_url = self._search_regex(
r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
config_url = compat_urlparse.urljoin(url, config_url)
config = self._download_json(
config_url, display_id, 'Downloading config JSON')
mmc = self._download_json(
config['services']['mmc'], display_id, 'Downloading mmc JSON')
formats = []
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
display_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
display_id, f4m_id=loc))
self._sort_formats(formats)
info = self._get_player_info(url, webpage)
title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
@@ -112,21 +136,12 @@ class MiTeleIE(InfoExtractor):
title = remove_start(self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
video_id = self._search_regex(
r'data-media-id\s*=\s*"([^"]+)"', webpage,
'data media id', default=None) or display_id
thumbnail = config.get('poster', {}).get('imageUrl')
duration = int_or_none(mmc.get('duration'))
return {
'id': video_id,
info.update({
'display_id': display_id,
'title': title,
'description': get_element_by_attribute('class', 'text', webpage),
'series': series,
'season': season,
'episode': episode,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
})
return info
+4 -1
View File
@@ -8,7 +8,7 @@ from ..utils import update_url_query
class NickIE(MTVServicesInfoExtractor):
IE_NAME = 'nick.com'
_VALID_URL = r'https?://(?:www\.)?nick\.com/videos/clip/(?P<id>[^/?#.]+)'
_VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
_TESTS = [{
'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
@@ -52,6 +52,9 @@ class NickIE(MTVServicesInfoExtractor):
}
},
],
}, {
'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
'only_matching': True,
}]
def _get_feed_query(self, uri):
+14 -3
View File
@@ -15,7 +15,7 @@ from ..utils import (
class PlayvidIE(InfoExtractor):
_VALID_URL = r'https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_TEST = {
_TESTS = [{
'url': 'http://www.playvid.com/watch/RnmBNgtrrJu',
'md5': 'ffa2f6b2119af359f544388d8c01eb6c',
'info_dict': {
@@ -24,8 +24,19 @@ class PlayvidIE(InfoExtractor):
'title': 'md5:9256d01c6317e3f703848b5906880dc8',
'duration': 82,
'age_limit': 18,
}
}
},
'skip': 'Video removed due to ToS',
}, {
'url': 'http://www.playvid.com/watch/hwb0GpNkzgH',
'md5': '39d49df503ad7b8f23a4432cbf046477',
'info_dict': {
'id': 'hwb0GpNkzgH',
'ext': 'mp4',
'title': 'Ellen Euro Cutie Blond Takes a Sexy Survey Get Facial in The Park',
'age_limit': 18,
'thumbnail': 're:^https?://.*\.jpg$',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
+4
View File
@@ -33,6 +33,7 @@ class PolskieRadioIE(InfoExtractor):
'timestamp': 1456594200,
'upload_date': '20160227',
'duration': 2364,
'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$'
},
}],
}, {
@@ -68,6 +69,8 @@ class PolskieRadioIE(InfoExtractor):
r'(?s)<span[^>]+id="datetime2"[^>]*>(.+?)</span>',
webpage, 'timestamp', fatal=False))
thumbnail_url = self._og_search_thumbnail(webpage)
entries = []
media_urls = set()
@@ -87,6 +90,7 @@ class PolskieRadioIE(InfoExtractor):
'duration': int_or_none(media.get('length')),
'vcodec': 'none' if media.get('provider') == 'audio' else None,
'timestamp': timestamp,
'thumbnail': thumbnail_url
})
title = self._og_search_title(webpage).strip()
+31 -13
View File
@@ -12,6 +12,7 @@ from ..utils import (
unified_strdate,
xpath_element,
ExtractorError,
determine_protocol,
)
@@ -22,13 +23,13 @@ class RadioCanadaIE(InfoExtractor):
'url': 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272',
'info_dict': {
'id': '7184272',
'ext': 'flv',
'ext': 'mp4',
'title': 'Le parcours du tireur capté sur vidéo',
'description': 'Images des caméras de surveillance fournies par la GRC montrant le parcours du tireur d\'Ottawa',
'upload_date': '20141023',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
}
@@ -36,11 +37,14 @@ class RadioCanadaIE(InfoExtractor):
def _real_extract(self, url):
app_code, video_id = re.match(self._VALID_URL, url).groups()
device_types = ['ipad', 'android']
if app_code != 'toutv':
device_types.append('flash')
formats = []
# TODO: extract m3u8 and f4m formats
# m3u8 formats can be extracted using ipad device_type return 403 error code when ffmpeg try to download segements
# TODO: extract f4m formats
# f4m formats can be extracted using flashhd device_type but they produce unplayable file
for device_type in ('flash',):
for device_type in device_types:
v_data = self._download_xml(
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx',
video_id, note='Downloading %s XML' % device_type, query={
@@ -52,7 +56,7 @@ class RadioCanadaIE(InfoExtractor):
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
'paysJ391wsHjbOJwvCs26toz': 'CA',
'bypasslock': 'NZt5K62gRqfc',
})
}, fatal=False)
v_url = xpath_text(v_data, 'url')
if not v_url:
continue
@@ -64,7 +68,8 @@ class RadioCanadaIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
v_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(v_url, video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_f4m_formats(
v_url, video_id, f4m_id='hds', fatal=False))
else:
ext = determine_ext(v_url)
bitrates = xpath_element(v_data, 'bitrates')
@@ -72,15 +77,28 @@ class RadioCanadaIE(InfoExtractor):
tbr = int_or_none(url_e.get('bitrate'))
if not tbr:
continue
f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url)
protocol = determine_protocol({'url': f_url})
formats.append({
'format_id': 'rtmp-%d' % tbr,
'url': re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url),
'ext': 'flv',
'protocol': 'rtmp',
'format_id': '%s-%d' % (protocol, tbr),
'url': f_url,
'ext': 'flv' if protocol == 'rtmp' else ext,
'protocol': protocol,
'width': int_or_none(url_e.get('width')),
'height': int_or_none(url_e.get('height')),
'tbr': tbr,
})
if protocol == 'rtsp':
base_url = self._search_regex(
r'rtsp://([^?]+)', f_url, 'base url', default=None)
if base_url:
base_url = 'http://' + base_url
formats.extend(self._extract_m3u8_formats(
base_url + '/playlist.m3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
base_url + '/manifest.f4m', video_id,
f4m_id='hds', fatal=False))
self._sort_formats(formats)
metadata = self._download_xml(
@@ -115,13 +133,13 @@ class RadioCanadaAudioVideoIE(InfoExtractor):
'url': 'http://ici.radio-canada.ca/audio-video/media-7527184/barack-obama-au-vietnam',
'info_dict': {
'id': '7527184',
'ext': 'flv',
'ext': 'mp4',
'title': 'Barack Obama au Vietnam',
'description': 'Les États-Unis lèvent l\'embargo sur la vente d\'armes qui datait de la guerre du Vietnam',
'upload_date': '20160523',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
}
+148
View File
@@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
strip_or_none,
unescapeHTML,
urlencode_postdata,
)
class RoosterTeethIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)'
_LOGIN_URL = 'https://roosterteeth.com/login'
_NETRC_MACHINE = 'roosterteeth'
_TESTS = [{
'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'md5': 'e2bd7764732d785ef797700a2489f212',
'info_dict': {
'id': '26576',
'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'ext': 'mp4',
'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
'thumbnail': 're:^https?://.*\.png$',
'series': 'Million Dollars, But...',
'episode': 'Million Dollars, But... The Game Announcement',
'comment_count': int,
},
}, {
'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
'only_matching': True,
}, {
'url': 'http://funhaus.roosterteeth.com/episode/funhaus-shorts-2016-austin-sucks-funhaus-shorts',
'only_matching': True,
}, {
'url': 'http://screwattack.roosterteeth.com/episode/death-battle-season-3-mewtwo-vs-shadow',
'only_matching': True,
}, {
'url': 'http://theknow.roosterteeth.com/episode/the-know-game-news-season-1-boring-steam-sales-are-better',
'only_matching': True,
}, {
# only available for FIRST members
'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
'only_matching': True,
}]
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None,
note='Downloading login page',
errnote='Unable to download login page')
login_form = self._hidden_inputs(login_page)
login_form.update({
'username': username,
'password': password,
})
login_request = self._download_webpage(
self._LOGIN_URL, None,
note='Logging in as %s' % username,
data=urlencode_postdata(login_form),
headers={
'Referer': self._LOGIN_URL,
})
if not any(re.search(p, login_request) for p in (
r'href=["\']https?://(?:www\.)?roosterteeth\.com/logout"',
r'>Sign Out<')):
error = self._html_search_regex(
r'(?s)<div[^>]+class=(["\']).*?\balert-danger\b.*?\1[^>]*>(?:\s*<button[^>]*>.*?</button>)?(?P<error>.+?)</div>',
login_request, 'alert', default=None, group='error')
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
episode = strip_or_none(unescapeHTML(self._search_regex(
(r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
default=None, group='title')))
title = strip_or_none(self._og_search_title(
webpage, default=None)) or episode
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
webpage, 'm3u8 url', default=None, group='url')
if not m3u8_url:
if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
self.raise_login_required('%s is not available yet' % display_id)
raise ExtractorError('Unable to extract m3u8 URL')
formats = self._extract_m3u8_formats(
m3u8_url, display_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
description = strip_or_none(self._og_search_description(webpage))
thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
series = self._search_regex(
(r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
webpage, 'series', fatal=False)
comment_count = int_or_none(self._search_regex(
r'>Comments \((\d+)\)<', webpage,
'comment count', fatal=False))
video_id = self._search_regex(
(r'containerId\s*=\s*["\']episode-(\d+)\1',
r'<div[^<]+id=["\']episode-(\d+)'), webpage,
'video id', default=display_id)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'series': series,
'episode': episode,
'comment_count': comment_count,
'formats': formats,
}
+53
View File
@@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from ..utils import (
js_to_json,
get_element_by_class,
unified_strdate,
)
class RudoIE(JWPlatformBaseIE):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://rudo.video/vod/oTzw0MGnyG',
'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
'info_dict': {
'id': 'oTzw0MGnyG',
'ext': 'mp4',
'title': 'Comentario Tomás Mosciatti',
'upload_date': '20160617',
},
}
@classmethod
def _extract_url(self, webpage):
mobj = re.search(
'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, m3u8_id='hls')
info_dict.update({
'title': self._og_search_title(webpage),
'upload_date': unified_strdate(get_element_by_class('date', webpage)),
})
return info_dict
+26 -49
View File
@@ -2,11 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
str_or_none,
)
@@ -33,45 +33,27 @@ class ShahidIE(InfoExtractor):
'only_matching': True
}]
def _handle_error(self, response):
if not isinstance(response, dict):
return
error = response.get('error')
def _call_api(self, path, video_id, note):
data = self._download_json(
'http://api.shahid.net/api/v1_1/' + path, video_id, note, query={
'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}).get('data', {})
error = data.get('error')
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, '\n'.join(error.values())),
expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
response = super(ShahidIE, self)._download_json(url, video_id, note)['data']
self._handle_error(response)
return response
return data
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
api_vars = {
'id': video_id,
'type': 'player',
'url': 'http://api.shahid.net/api/v1_1',
'playerType': 'episode',
}
flashvars = self._search_regex(
r'var\s+flashvars\s*=\s*({[^}]+})', webpage, 'flashvars', default=None)
if flashvars:
for key in api_vars.keys():
value = self._search_regex(
r'\b%s\s*:\s*(?P<q>["\'])(?P<value>.+?)(?P=q)' % key,
flashvars, 'type', default=None, group='value')
if value:
api_vars[key] = value
player = self._download_json(
'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-%s.html'
% (video_id, api_vars['type']), video_id, 'Downloading player JSON')
player = self._call_api(
'Content/Episode/%s' % video_id,
video_id, 'Downloading player JSON')
if player.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
@@ -79,22 +61,11 @@ class ShahidIE(InfoExtractor):
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
self._sort_formats(formats)
video = self._download_json(
'%s/%s/%s?%s' % (
api_vars['url'], api_vars['playerType'], api_vars['id'],
compat_urllib_parse_urlencode({
'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
})),
video_id, 'Downloading video JSON')
video = video[api_vars['playerType']]
video = self._call_api(
'episode/%s' % video_id, video_id,
'Downloading video JSON')['episode']
title = video['title']
description = video.get('description')
thumbnail = video.get('thumbnailUrl')
duration = int_or_none(video.get('duration'))
timestamp = parse_iso8601(video.get('referenceDate'))
categories = [
category['name']
for category in video.get('genres', []) if 'name' in category]
@@ -102,10 +73,16 @@ class ShahidIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'description': video.get('description'),
'thumbnail': video.get('thumbnailUrl'),
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('referenceDate')),
'categories': categories,
'series': video.get('showTitle') or video.get('showName'),
'season': video.get('seasonTitle'),
'season_number': int_or_none(video.get('seasonNumber')),
'season_id': str_or_none(video.get('seasonId')),
'episode_number': int_or_none(video.get('number')),
'episode_id': video_id,
'formats': formats,
}
+4 -2
View File
@@ -11,7 +11,7 @@ from ..utils import (
class SRMediathekIE(ARDMediathekIE):
IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek\.sr-online\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=28455',
@@ -35,7 +35,9 @@ class SRMediathekIE(ARDMediathekIE):
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest']
}, {
'url': 'http://sr-mediathek.de/index.php?seite=7&id=7480',
'only_matching': True,
}]
def _real_extract(self, url):
+26 -59
View File
@@ -1,50 +1,41 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
get_element_by_attribute,
parse_duration,
strip_jsonp,
)
from .mitele import MiTeleBaseIE
class TelecincoIE(InfoExtractor):
class TelecincoIE(MiTeleBaseIE):
IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
_VALID_URL = r'https?://www\.(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [{
'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html',
'md5': '5cbef3ad5ef17bf0d21570332d140729',
'md5': '8d7b2d5f699ee2709d992a63d5cd1712',
'info_dict': {
'id': 'MDSVID20141015_0058',
'id': 'JEA5ijCnF6p5W08A1rNKn7',
'ext': 'mp4',
'title': 'Con Martín Berasategui, hacer un bacalao al ...',
'title': 'Bacalao con kokotxas al pil-pil',
'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
'duration': 662,
},
}, {
'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
'md5': '0a5b9f3cc8b074f50a0578f823a12694',
'md5': '284393e5387b3b947b77c613ef04749a',
'info_dict': {
'id': 'MDSVID20150916_0128',
'id': 'jn24Od1zGLG4XUZcnUnZB6',
'ext': 'mp4',
'title': '¿Quién es este ex futbolista con el que hablan ...',
'title': '¿Quién es este ex futbolista con el que hablan Leo Messi y Luis Suárez?',
'description': 'md5:a62ecb5f1934fc787107d7b9a2262805',
'duration': 79,
},
}, {
'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
'md5': 'ad1bfaaba922dd4a295724b05b68f86a',
'md5': '749afab6ea5a136a8806855166ae46a2',
'info_dict': {
'id': 'MDSVID20150513_0220',
'id': 'aywerkD2Sv1vGNqq9b85Q2',
'ext': 'mp4',
'title': '#DOYLACARA. Con la trata no hay trato',
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50,
},
}, {
@@ -56,40 +47,16 @@ class TelecincoIE(InfoExtractor):
}]
def _real_extract(self, url):
episode = self._match_id(url)
webpage = self._download_webpage(url, episode)
embed_data_json = self._search_regex(
r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
).replace('\'', '"')
embed_data = json.loads(embed_data_json)
domain = embed_data['mediaUrl']
if not domain.startswith('http'):
# only happens in telecinco.es videos
domain = 'http://' + domain
info_url = compat_urlparse.urljoin(
domain,
compat_urllib_parse_unquote(embed_data['flashvars']['host'])
)
info_el = self._download_xml(info_url, episode).find('./video/info')
video_link = info_el.find('videoUrl/link').text
token_query = compat_urllib_parse_urlencode({'id': video_link})
token_info = self._download_json(
embed_data['flashvars']['ov_tk'] + '?' + token_query,
episode,
transform_source=strip_jsonp
)
formats = self._extract_m3u8_formats(
token_info['tokenizedUrl'], episode, ext='mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
return {
'id': embed_data['videoId'],
'display_id': episode,
'title': info_el.find('title').text,
'formats': formats,
'description': get_element_by_attribute('class', 'text', webpage),
'thumbnail': info_el.find('thumb').text,
'duration': parse_duration(info_el.find('duration').text),
}
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage, 'title')
info = self._get_player_info(url, webpage)
info.update({
'display_id': display_id,
'title': title,
'description': self._html_search_meta(
['og:description', 'twitter:description'],
webpage, 'title', fatal=False),
})
return info
+12 -16
View File
@@ -5,31 +5,27 @@ from .common import InfoExtractor
class TMZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/]+)/?'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.tmz.com/videos/0_okj015ty/',
'md5': '791204e3bf790b1426cb2db0706184c0',
'md5': '4d22a51ef205b6c06395d8394f72d560',
'info_dict': {
'id': '0_okj015ty',
'url': 'http://tmz.vo.llnwd.net/o28/2014-03/13/0_okj015ty_0_rt8ro3si_2.mp4',
'ext': 'mp4',
'title': 'Kim Kardashian\'s Boobs Unlock a Mystery!',
'description': 'Did Kim Kardasain try to one-up Khloe by one-upping Kylie??? Or is she just showing off her amazing boobs?',
'thumbnail': r're:http://cdnbakmi\.kaltura\.com/.*thumbnail.*',
'timestamp': 1394747163,
'uploader_id': 'batchUser',
'upload_date': '20140313',
}
}
}, {
'url': 'http://www.tmz.com/videos/0-cegprt2p/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return {
'id': video_id,
'url': self._html_search_meta('VideoURL', webpage, fatal=True),
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._html_search_meta('ThumbURL', webpage),
}
video_id = self._match_id(url).replace('-', '_')
return self.url_result('kaltura:591531:%s' % video_id, 'Kaltura', video_id)
class TMZArticleIE(InfoExtractor):
+19 -52
View File
@@ -1,74 +1,41 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unified_strdate,
)
from ..utils import int_or_none
class TouTvIE(InfoExtractor):
IE_NAME = 'tou.tv'
_VALID_URL = r'https?://www\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/(?P<episode>S[0-9]+E[0-9]+)))'
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+/S[0-9]+E[0-9]+)'
_TEST = {
'url': 'http://www.tou.tv/30-vies/S04E41',
'url': 'http://ici.tou.tv/garfield-tout-court/S2015E17',
'info_dict': {
'id': '30-vies_S04E41',
'id': '122017',
'ext': 'mp4',
'title': '30 vies Saison 4 / Épisode 41',
'description': 'md5:da363002db82ccbe4dafeb9cab039b09',
'age_limit': 8,
'uploader': 'Groupe des Nouveaux Médias',
'duration': 1296,
'upload_date': '20131118',
'thumbnail': 'http://static.tou.tv/medias/images/2013-11-18_19_00_00_30VIES_0341_01_L.jpeg',
'title': 'Saison 2015 Épisode 17',
'description': 'La photo de famille 2',
'upload_date': '20100717',
},
'params': {
'skip_download': True, # Requires rtmpdump
# m3u8 download
'skip_download': True,
},
'skip': 'Only available in Canada'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
mediaId = self._search_regex(
r'"idMedia":\s*"([^"]+)"', webpage, 'media ID')
streams_url = 'http://release.theplatform.com/content.select?pid=' + mediaId
streams_doc = self._download_xml(
streams_url, video_id, note='Downloading stream list')
video_url = next(n.text
for n in streams_doc.findall('.//choice/url')
if '//ad.doubleclick' not in n.text)
if video_url.endswith('/Unavailable.flv'):
raise ExtractorError(
'Access to this video is blocked from outside of Canada',
expected=True)
duration_str = self._html_search_meta(
'video:duration', webpage, 'duration')
duration = int(duration_str) if duration_str else None
upload_date_str = self._html_search_meta(
'video:release_date', webpage, 'upload date')
upload_date = unified_strdate(upload_date_str) if upload_date_str else None
path = self._match_id(url)
metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
video_id = metadata['IdMedia']
details = metadata['Details']
title = details['OriginalTitle']
return {
'_type': 'url_transparent',
'url': 'radiocanada:%s:%s' % (metadata.get('AppCode', 'toutv'), video_id),
'id': video_id,
'title': self._og_search_title(webpage),
'url': video_url,
'description': self._og_search_description(webpage),
'uploader': self._dc_search_uploader(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'age_limit': self._media_rating_search(webpage),
'duration': duration,
'upload_date': upload_date,
'ext': 'mp4',
'title': title,
'thumbnail': details.get('ImageUrl'),
'duration': int_or_none(details.get('LengthInSeconds')),
}
+64 -59
View File
@@ -8,43 +8,36 @@ from ..compat import compat_str
from ..utils import (
parse_iso8601,
qualities,
determine_ext,
update_url_query,
int_or_none,
)
class TVPlayIE(InfoExtractor):
IE_DESC = 'TV3Play and related services'
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?:tvplay\.lv/parraides|
tv3play\.lt/programos|
play\.tv3\.lt/programos|
tv3play\.ee/sisu|
tv3play\.se/program|
tv6play\.se/program|
tv8play\.se/program|
tv10play\.se/program|
tv3play\.no/programmer|
viasat4play\.no/programmer|
tv6play\.no/programmer|
tv3play\.dk/programmer|
(?:tvplay(?:\.skaties)?\.lv/parraides|
(?:tv3play|play\.tv3)\.lt/programos|
tv3play(?:\.tv3)?\.ee/sisu|
tv(?:3|6|8|10)play\.se/program|
(?:(?:tv3play|viasat4play|tv6play)\.no|tv3play\.dk)/programmer|
play\.novatv\.bg/programi
)/[^/]+/(?P<id>\d+)
'''
_TESTS = [
{
'url': 'http://www.tvplay.lv/parraides/vinas-melo-labak/418113?autostart=true',
'md5': 'a1612fe0849455423ad8718fe049be21',
'info_dict': {
'id': '418113',
'ext': 'flv',
'ext': 'mp4',
'title': 'Kādi ir īri? - Viņas melo labāk',
'description': 'Baiba apsmej īrus, kādi tie ir un ko viņi dara.',
'duration': 25,
'timestamp': 1406097056,
'upload_date': '20140723',
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://play.tv3.lt/programos/moterys-meluoja-geriau/409229?autostart=true',
@@ -82,7 +75,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.tv3play.se/program/husraddarna/395385?autostart=true',
'info_dict': {
'id': '395385',
'ext': 'flv',
'ext': 'mp4',
'title': 'Husräddarna S02E07',
'description': 'md5:f210c6c89f42d4fc39faa551be813777',
'duration': 2574,
@@ -90,7 +83,6 @@ class TVPlayIE(InfoExtractor):
'upload_date': '20140520',
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -98,7 +90,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.tv6play.se/program/den-sista-dokusapan/266636?autostart=true',
'info_dict': {
'id': '266636',
'ext': 'flv',
'ext': 'mp4',
'title': 'Den sista dokusåpan S01E08',
'description': 'md5:295be39c872520221b933830f660b110',
'duration': 1492,
@@ -107,7 +99,6 @@ class TVPlayIE(InfoExtractor):
'age_limit': 18,
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -115,7 +106,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.tv8play.se/program/antikjakten/282756?autostart=true',
'info_dict': {
'id': '282756',
'ext': 'flv',
'ext': 'mp4',
'title': 'Antikjakten S01E10',
'description': 'md5:1b201169beabd97e20c5ad0ad67b13b8',
'duration': 2646,
@@ -123,7 +114,6 @@ class TVPlayIE(InfoExtractor):
'upload_date': '20120925',
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -131,7 +121,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.tv3play.no/programmer/anna-anka-soker-assistent/230898?autostart=true',
'info_dict': {
'id': '230898',
'ext': 'flv',
'ext': 'mp4',
'title': 'Anna Anka søker assistent - Ep. 8',
'description': 'md5:f80916bf5bbe1c5f760d127f8dd71474',
'duration': 2656,
@@ -139,7 +129,6 @@ class TVPlayIE(InfoExtractor):
'upload_date': '20100628',
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -147,7 +136,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.viasat4play.no/programmer/budbringerne/21873?autostart=true',
'info_dict': {
'id': '21873',
'ext': 'flv',
'ext': 'mp4',
'title': 'Budbringerne program 10',
'description': 'md5:4db78dc4ec8a85bb04fd322a3ee5092d',
'duration': 1297,
@@ -155,7 +144,6 @@ class TVPlayIE(InfoExtractor):
'upload_date': '20090929',
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -163,7 +151,7 @@ class TVPlayIE(InfoExtractor):
'url': 'http://www.tv6play.no/programmer/hotelinspektor-alex-polizzi/361883?autostart=true',
'info_dict': {
'id': '361883',
'ext': 'flv',
'ext': 'mp4',
'title': 'Hotelinspektør Alex Polizzi - Ep. 10',
'description': 'md5:3ecf808db9ec96c862c8ecb3a7fdaf81',
'duration': 2594,
@@ -171,7 +159,6 @@ class TVPlayIE(InfoExtractor):
'upload_date': '20140224',
},
'params': {
# rtmp download
'skip_download': True,
},
},
@@ -191,6 +178,14 @@ class TVPlayIE(InfoExtractor):
'skip_download': True,
},
},
{
'url': 'http://tvplay.skaties.lv/parraides/vinas-melo-labak/418113?autostart=true',
'only_matching': True,
},
{
'url': 'http://tv3play.tv3.ee/sisu/kodu-keset-linna/238551?autostart=true',
'only_matching': True,
}
]
def _real_extract(self, url):
@@ -199,7 +194,9 @@ class TVPlayIE(InfoExtractor):
video = self._download_json(
'http://playapi.mtgx.tv/v1/videos/%s' % video_id, video_id, 'Downloading video JSON')
if video['is_geo_blocked']:
title = video['title']
if video.get('is_geo_blocked'):
self.report_warning(
'This content might not be available in your country due to copyright reasons')
@@ -208,42 +205,50 @@ class TVPlayIE(InfoExtractor):
quality = qualities(['hls', 'medium', 'high'])
formats = []
for format_id, video_url in streams['streams'].items():
for format_id, video_url in streams.get('streams', {}).items():
if not video_url or not isinstance(video_url, compat_str):
continue
fmt = {
'format_id': format_id,
'preference': quality(format_id),
}
if video_url.startswith('rtmp'):
m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
continue
fmt.update({
'ext': 'flv',
'url': m.group('url'),
'app': m.group('app'),
'play_path': m.group('playpath'),
})
elif video_url.endswith('.f4m'):
ext = determine_ext(video_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id))
continue
update_url_query(video_url, {
'hdcore': '3.5.0',
'plugin': 'aasp-3.5.0.151.81'
}), video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
fmt.update({
'url': video_url,
})
formats.append(fmt)
fmt = {
'format_id': format_id,
'quality': quality(format_id),
'ext': ext,
}
if video_url.startswith('rtmp'):
m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
continue
fmt.update({
'ext': 'flv',
'url': m.group('url'),
'app': m.group('app'),
'play_path': m.group('playpath'),
})
else:
fmt.update({
'url': video_url,
})
formats.append(fmt)
self._sort_formats(formats)
return {
'id': video_id,
'title': video['title'],
'description': video['description'],
'duration': video['duration'],
'timestamp': parse_iso8601(video['created_at']),
'view_count': video['views']['total'],
'age_limit': video.get('age_limit', 0),
'title': title,
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('created_at')),
'view_count': int_or_none(video.get('views', {}).get('total')),
'age_limit': int_or_none(video.get('age_limit', 0)),
'formats': formats,
}
+43 -6
View File
@@ -1,25 +1,62 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
determine_ext,
mimetype2ext,
)
class TweakersIE(InfoExtractor):
_VALID_URL = r'https?://tweakers\.net/video/(?P<id>\d+)'
_TEST = {
'url': 'https://tweakers.net/video/9926/new-nintendo-3ds-xl-op-alle-fronten-beter.html',
'md5': '3147e4ddad366f97476a93863e4557c8',
'md5': 'fe73e417c093a788e0160c4025f88b15',
'info_dict': {
'id': '9926',
'ext': 'mp4',
'title': 'New Nintendo 3DS XL - Op alle fronten beter',
'description': 'md5:f97324cc71e86e11c853f0763820e3ba',
'description': 'md5:3789b21fed9c0219e9bcaacd43fab280',
'thumbnail': 're:^https?://.*\.jpe?g$',
'duration': 386,
'uploader_id': 's7JeEm',
}
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
entries = self._extract_xspf_playlist(
'https://tweakers.net/video/s1playlist/%s/playlist.xspf' % playlist_id, playlist_id)
return self.playlist_result(entries, playlist_id)
video_id = self._match_id(url)
video_data = self._download_json(
'https://tweakers.net/video/s1playlist/%s/1920/1080/playlist.json' % video_id,
video_id)['items'][0]
title = video_data['title']
formats = []
for location in video_data.get('locations', {}).get('progressive', []):
format_id = location.get('label')
width = int_or_none(location.get('width'))
height = int_or_none(location.get('height'))
for source in location.get('sources', []):
source_url = source.get('src')
if not source_url:
continue
ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
formats.append({
'format_id': format_id,
'url': source_url,
'width': width,
'height': height,
'ext': ext,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('poster'),
'duration': int_or_none(video_data.get('duration')),
'uploader_id': video_data.get('account'),
'formats': formats,
}
+8 -4
View File
@@ -9,8 +9,8 @@ from ..utils import (
class VidziIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?P<id>\w+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html',
'md5': '4f16c71ca0c8c8635ab6932b5f3f1660',
'info_dict': {
@@ -22,12 +22,16 @@ class VidziIE(JWPlatformBaseIE):
# m3u8 download
'skip_download': True,
},
}
}, {
'url': 'http://vidzi.tv/embed-4z2yb0rzphe9-600x338.html',
'skip_download': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://vidzi.tv/%s' % video_id, video_id)
title = self._html_search_regex(
r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
+5
View File
@@ -364,6 +364,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage)
if mobj:
return mobj.group(1)
# Look more for non-standard embedded Vimeo player
mobj = re.search(
r'<video[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/[0-9]+)(?P=q1)', webpage)
if mobj:
return mobj.group('url')
def _verify_player_video_password(self, url, video_id):
password = self._downloader.params.get('videopassword')
+176 -48
View File
@@ -6,11 +6,18 @@ import json
import sys
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
get_element_by_class,
int_or_none,
orderedSet,
parse_duration,
remove_start,
str_to_int,
unescapeHTML,
unified_strdate,
@@ -20,7 +27,55 @@ from .vimeo import VimeoIE
from .pladform import PladformIE
class VKIE(InfoExtractor):
class VKBaseIE(InfoExtractor):
_NETRC_MACHINE = 'vk'
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page, url_handle = self._download_webpage_handle(
'https://vk.com', None, 'Downloading login page')
login_form = self._hidden_inputs(login_page)
login_form.update({
'email': username.encode('cp1251'),
'pass': password.encode('cp1251'),
})
# https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
# and expects the first one to be set rather than second (see
# https://github.com/rg3/youtube-dl/issues/9841#issuecomment-227871201).
# As of RFC6265 the newer one cookie should be set into cookie store
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
cookies = url_handle.headers.get('Set-Cookie')
if cookies:
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
login_page = self._download_webpage(
'https://login.vk.com/?act=login', None,
note='Logging in as %s' % username,
data=urlencode_postdata(login_form))
if re.search(r'onLoginFailed', login_page):
raise ExtractorError(
'Unable to login, incorrect username and/or password', expected=True)
def _real_initialize(self):
self._login()
class VKIE(VKBaseIE):
IE_NAME = 'vk'
IE_DESC = 'VK'
_VALID_URL = r'''(?x)
@@ -38,8 +93,6 @@ class VKIE(InfoExtractor):
(?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>[\da-f]+))?
)
'''
_NETRC_MACHINE = 'vk'
_TESTS = [
{
'url': 'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
@@ -189,49 +242,6 @@ class VKIE(InfoExtractor):
}
]
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page, url_handle = self._download_webpage_handle(
'https://vk.com', None, 'Downloading login page')
login_form = self._hidden_inputs(login_page)
login_form.update({
'email': username.encode('cp1251'),
'pass': password.encode('cp1251'),
})
# https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
# and expects the first one to be set rather than second (see
# https://github.com/rg3/youtube-dl/issues/9841#issuecomment-227871201).
# As of RFC6265 the newer one cookie should be set into cookie store
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
cookies = url_handle.headers.get('Set-Cookie')
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
login_page = self._download_webpage(
'https://login.vk.com/?act=login', None,
note='Logging in as %s' % username,
data=urlencode_postdata(login_form))
if re.search(r'onLoginFailed', login_page):
raise ExtractorError(
'Unable to login, incorrect username and/or password', expected=True)
def _real_initialize(self):
self._login()
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
@@ -355,7 +365,7 @@ class VKIE(InfoExtractor):
}
class VKUserVideosIE(InfoExtractor):
class VKUserVideosIE(VKBaseIE):
IE_NAME = 'vk:uservideos'
IE_DESC = "VK - User's Videos"
_VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
@@ -396,3 +406,121 @@ class VKUserVideosIE(InfoExtractor):
webpage, 'title', default=page_id))
return self.playlist_result(entries, page_id, title)
class VKWallPostIE(VKBaseIE):
IE_NAME = 'vk:wallpost'
_VALID_URL = r'https?://(?:(?:(?:(?:m|new)\.)?vk\.com/(?:[^?]+\?.*\bw=)?wall(?P<id>-?\d+_\d+)))'
_TESTS = [{
# public page URL, audio playlist
'url': 'https://vk.com/bs.official?w=wall-23538238_35',
'info_dict': {
'id': '23538238_35',
'title': 'Black Shadow - Wall post 23538238_35',
'description': 'md5:3f84b9c4f9ef499731cf1ced9998cc0c',
},
'playlist': [{
'md5': '5ba93864ec5b85f7ce19a9af4af080f6',
'info_dict': {
'id': '135220665_111806521',
'ext': 'mp3',
'title': 'Black Shadow - Слепое Верование',
'duration': 370,
'uploader': 'Black Shadow',
'artist': 'Black Shadow',
'track': 'Слепое Верование',
},
}, {
'md5': '4cc7e804579122b17ea95af7834c9233',
'info_dict': {
'id': '135220665_111802303',
'ext': 'mp3',
'title': 'Black Shadow - Война - Негасимое Бездны Пламя!',
'duration': 423,
'uploader': 'Black Shadow',
'artist': 'Black Shadow',
'track': 'Война - Негасимое Бездны Пламя!',
},
'params': {
'skip_download': True,
},
}],
'skip': 'Requires vk account credentials',
}, {
# single YouTube embed, no leading -
'url': 'https://vk.com/wall85155021_6319',
'info_dict': {
'id': '85155021_6319',
'title': 'Sergey Gorbunov - Wall post 85155021_6319',
},
'playlist_count': 1,
'skip': 'Requires vk account credentials',
}, {
# wall page URL
'url': 'https://vk.com/wall-23538238_35',
'only_matching': True,
}, {
# mobile wall page URL
'url': 'https://m.vk.com/wall-23538238_35',
'only_matching': True,
}]
def _real_extract(self, url):
post_id = self._match_id(url)
wall_url = 'https://vk.com/wall%s' % post_id
post_id = remove_start(post_id, '-')
webpage = self._download_webpage(wall_url, post_id)
error = self._html_search_regex(
r'>Error</div>\s*<div[^>]+class=["\']body["\'][^>]*>([^<]+)',
webpage, 'error', default=None)
if error:
raise ExtractorError('VK said: %s' % error, expected=True)
description = clean_html(get_element_by_class('wall_post_text', webpage))
uploader = clean_html(get_element_by_class(
'fw_post_author', webpage)) or self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
entries = []
for audio in re.finditer(r'''(?sx)
<input[^>]+
id=(?P<q1>["\'])audio_info(?P<id>\d+_\d+).*?(?P=q1)[^>]+
value=(?P<q2>["\'])(?P<url>http.+?)(?P=q2)
.+?
</table>''', webpage):
audio_html = audio.group(0)
audio_id = audio.group('id')
duration = parse_duration(get_element_by_class('duration', audio_html))
track = self._html_search_regex(
r'<span[^>]+id=["\']title%s[^>]*>([^<]+)' % audio_id,
audio_html, 'title', default=None)
artist = self._html_search_regex(
r'>([^<]+)</a></b>\s*&ndash', audio_html,
'artist', default=None)
entries.append({
'id': audio_id,
'url': audio.group('url'),
'title': '%s - %s' % (artist, track) if artist and track else audio_id,
'thumbnail': thumbnail,
'duration': duration,
'uploader': uploader,
'artist': artist,
'track': track,
})
for video in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>/video(?:-?[\d_]+).*?)\1', webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, video.group('url')), VKIE.ie_key()))
title = 'Wall post %s' % post_id
return self.playlist_result(
orderedSet(entries), post_id,
'%s - %s' % (uploader, title) if uploader else title,
description)
+15 -28
View File
@@ -9,7 +9,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
parse_duration,
qualities,
remove_end,
)
@@ -22,7 +22,7 @@ class VuClipIE(InfoExtractor):
'id': '922692425',
'ext': '3gp',
'title': 'The Toy Soldiers - Hollywood Movie Trailer',
'duration': 180,
'duration': 177,
}
}
@@ -46,34 +46,21 @@ class VuClipIE(InfoExtractor):
'%s said: %s' % (self.IE_NAME, error_msg), expected=True)
# These clowns alternate between two page types
links_code = self._search_regex(
r'''(?xs)
(?:
<img\s+src="[^"]*/play.gif".*?>|
<!--\ player\ end\ -->\s*</div><!--\ thumb\ end-->
)
(.*?)
(?:
<a\s+href="fblike|<div\s+class="social">
)
''', webpage, 'links')
title = self._html_search_regex(
r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip()
video_url = self._search_regex(
r'<a[^>]+href="([^"]+)"[^>]*><img[^>]+src="[^"]*/play\.gif',
webpage, 'video URL', default=None)
if video_url:
formats = [{
'url': video_url,
}]
else:
formats = self._parse_html5_media_entries(url, webpage)[0]['formats']
quality_order = qualities(['Reg', 'Hi'])
formats = []
for url, q in re.findall(
r'<a\s+href="(?P<url>[^"]+)".*?>(?:<button[^>]*>)?(?P<q>[^<]+)(?:</button>)?</a>', links_code):
format_id = compat_urllib_parse_urlparse(url).scheme + '-' + q
formats.append({
'format_id': format_id,
'url': url,
'quality': quality_order(q),
})
self._sort_formats(formats)
title = remove_end(self._html_search_regex(
r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip(), ' - Video')
duration = parse_duration(self._search_regex(
r'\(([0-9:]+)\)</span>', webpage, 'duration', fatal=False))
duration = parse_duration(self._html_search_regex(
r'[(>]([0-9]+:[0-9]+)(?:<span|\))', webpage, 'duration', fatal=False))
return {
'id': video_id,
+1 -11
View File
@@ -9,7 +9,6 @@ from ..utils import (
ExtractorError,
unified_strdate,
HEADRequest,
float_or_none,
)
@@ -95,16 +94,7 @@ class WatIE(InfoExtractor):
m3u8_url.replace('ios.', 'web.').replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
for m3u8_format in m3u8_formats:
mobj = re.search(
r'audio.*?%3D(\d+)(?:-video.*?%3D(\d+))?', m3u8_format['url'])
if not mobj:
continue
abr, vbr = mobj.groups()
abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
m3u8_format.update({
'vbr': vbr,
'abr': abr,
})
vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
if not vbr or not abr:
continue
f = m3u8_format.copy()
+31 -7
View File
@@ -137,7 +137,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
# Two-Factor
# TODO add SMS and phone call support - these require making a request and then prompting the user
if re.search(r'(?i)<form[^>]* id="challenge"', login_results) is not None:
if re.search(r'(?i)<form[^>]+id="challenge"', login_results) is not None:
tfa_code = self._get_tfa_info('2-step verification code')
if not tfa_code:
@@ -165,17 +165,17 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if tfa_results is False:
return False
if re.search(r'(?i)<form[^>]* id="challenge"', tfa_results) is not None:
if re.search(r'(?i)<form[^>]+id="challenge"', tfa_results) is not None:
self._downloader.report_warning('Two-factor code expired or invalid. Please try again, or use a one-use backup code instead.')
return False
if re.search(r'(?i)<form[^>]* id="gaia_loginform"', tfa_results) is not None:
if re.search(r'(?i)<form[^>]+id="gaia_loginform"', tfa_results) is not None:
self._downloader.report_warning('unable to log in - did the page structure change?')
return False
if re.search(r'smsauth-interstitial-reviewsettings', tfa_results) is not None:
self._downloader.report_warning('Your Google account has a security notice. Please log in on your web browser, resolve the notice, and try again.')
return False
if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
if re.search(r'(?i)<form[^>]+id="gaia_loginform"', login_results) is not None:
self._downloader.report_warning('unable to log in: bad username or password')
return False
return True
@@ -858,6 +858,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
{
'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY',
'only_matching': True,
},
{
# YouTube Red paid video (https://github.com/rg3/youtube-dl/issues/10059)
'url': 'https://www.youtube.com/watch?v=i1Ko8UG-Tdo',
'only_matching': True,
}
]
@@ -1978,10 +1983,13 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
return (False if YoutubePlaylistsIE.suitable(url) or YoutubeLiveIE.suitable(url)
else super(YoutubeChannelIE, cls).suitable(url))
def _build_template_url(self, url, channel_id):
return self._TEMPLATE_URL % channel_id
def _real_extract(self, url):
channel_id = self._match_id(url)
url = self._TEMPLATE_URL % channel_id
url = self._build_template_url(url, channel_id)
# Channel by page listing is restricted to 35 pages of 30 items, i.e. 1050 videos total (see #5778)
# Workaround by extracting as a playlist if managed to obtain channel playlist URL
@@ -2038,8 +2046,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeUserIE(YoutubeChannelIE):
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:user/|c/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/user/%s/videos'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
IE_NAME = 'youtube:user'
_TESTS = [{
@@ -2049,12 +2057,24 @@ class YoutubeUserIE(YoutubeChannelIE):
'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
'title': 'Uploads from The Linux Foundation',
}
}, {
# Only available via https://www.youtube.com/c/12minuteathlete/videos
# but not https://www.youtube.com/user/12minuteathlete/videos
'url': 'https://www.youtube.com/c/12minuteathlete/videos',
'playlist_mincount': 249,
'info_dict': {
'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
'title': 'Uploads from 12 Minute Athlete',
}
}, {
'url': 'ytuser:phihag',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/c/gametrailers',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/gametrailers',
'only_matching': True,
}, {
# This channel is not available.
'url': 'https://www.youtube.com/user/kananishinoSMEJ/videos',
@@ -2071,6 +2091,10 @@ class YoutubeUserIE(YoutubeChannelIE):
else:
return super(YoutubeUserIE, cls).suitable(url)
def _build_template_url(self, url, channel_id):
mobj = re.match(self._VALID_URL, url)
return self._TEMPLATE_URL % (mobj.group('user') or 'user', mobj.group('id'))
class YoutubeLiveIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube.com live streams'
+7 -3
View File
@@ -26,7 +26,11 @@ def parseOpts(overrideArguments=None):
except IOError:
return default # silently skip if file is not present
try:
res = compat_shlex_split(optionf.read(), comments=True)
# FIXME: https://github.com/rg3/youtube-dl/commit/dfe5fa49aed02cf36ba9f743b11b0903554b5e56
contents = optionf.read()
if sys.version_info < (3,):
contents = contents.decode(preferredencoding())
res = compat_shlex_split(contents, comments=True)
finally:
optionf.close()
return res
@@ -812,11 +816,11 @@ def parseOpts(overrideArguments=None):
system_conf = []
user_conf = []
else:
system_conf = compat_conf(_readOptions('/etc/youtube-dl.conf'))
system_conf = _readOptions('/etc/youtube-dl.conf')
if '--ignore-config' in system_conf:
user_conf = []
else:
user_conf = compat_conf(_readUserConf())
user_conf = _readUserConf()
argv = system_conf + user_conf + command_line_conf
opts, args = parser.parse_args(argv)
+4 -2
View File
@@ -363,8 +363,10 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
input_files = [filename] + sub_filenames
opts = [
'-map', '0',
'-c', 'copy',
'-map', '0:v',
'-c:v', 'copy',
'-map', '0:a',
'-c:a', 'copy',
# Don't copy the existing subtitles, we may be running the
# postprocessor a second time
'-map', '-0:s',
+36
View File
@@ -2126,6 +2126,42 @@ def mimetype2ext(mt):
}.get(res, res)
def parse_codecs(codecs_str):
# http://tools.ietf.org/html/rfc6381
if not codecs_str:
return {}
splited_codecs = list(filter(None, map(
lambda str: str.strip(), codecs_str.strip().strip(',').split(','))))
vcodec, acodec = None, None
for full_codec in splited_codecs:
codec = full_codec.split('.')[0]
if codec in ('avc1', 'avc2', 'avc3', 'avc4', 'vp9', 'vp8', 'hev1', 'hev2', 'h263', 'h264', 'mp4v'):
if not vcodec:
vcodec = full_codec
elif codec in ('mp4a', 'opus', 'vorbis', 'mp3', 'aac'):
if not acodec:
acodec = full_codec
else:
write_string('WARNING: Unknown codec %s' % full_codec, sys.stderr)
if not vcodec and not acodec:
if len(splited_codecs) == 2:
return {
'vcodec': vcodec,
'acodec': acodec,
}
elif len(splited_codecs) == 1:
return {
'vcodec': 'none',
'acodec': vcodec,
}
else:
return {
'vcodec': vcodec or 'none',
'acodec': acodec or 'none',
}
return {}
def urlhandle_detect_ext(url_handle):
getheader = url_handle.headers.get
+1 -1
View File
@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.07.07'
__version__ = '2016.07.13'